Last active
January 3, 2026 21:17
-
-
Save JoeJoe1313/d6c4c92ce18d7a537df39041853055ea to your computer and use it in GitHub Desktop.
Thinking Backwards: The "Reversal Blessing" in LLM Multiple-Choice Reasoning
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "id": "583c59cd", | |
| "metadata": {}, | |
| "source": [ | |
| "Given a sequence of tokens $x = (x_1, x_2, \\dots, x_T)$, the model expresses the joint probability as a product of conditional probabilities:\n", | |
| "\n", | |
| "$$\n", | |
| "p_{L2R}(x) = \\prod_{t=1}^{T} p_{L2R}(x_{t}\\mid x_{<t}),\n", | |
| "$$\n", | |
| "\n", | |
| "where $x_{<t}$ represents the preceding tokens $(x_{1}, x_{2}, \\dots, x_{t-1})$. This approach can introduce inductive biases and approximation errors that compound at each step, which may not be optimal for all tasks.\n", | |
| "\n", | |
| "For training, models typically optimize the log-likelihood of the sequence for numerical stability and easier differentiation:\n", | |
| "\n", | |
| "$$\n", | |
| "\\log p(x) = \\sum_{t=1}^{T} \\log p_{L2R}(x_{t}\\mid x_{<t}).\n", | |
| "$$" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "99bb236b", | |
| "metadata": {}, | |
| "source": [] | |
| } | |
| ], | |
| "metadata": { | |
| "language_info": { | |
| "name": "python" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 5 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment