Skip to content

Instantly share code, notes, and snippets.

@JoeJoe1313
Last active January 3, 2026 21:17
Show Gist options
  • Select an option

  • Save JoeJoe1313/d6c4c92ce18d7a537df39041853055ea to your computer and use it in GitHub Desktop.

Select an option

Save JoeJoe1313/d6c4c92ce18d7a537df39041853055ea to your computer and use it in GitHub Desktop.
Thinking Backwards: The "Reversal Blessing" in LLM Multiple-Choice Reasoning
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "583c59cd",
"metadata": {},
"source": [
"The joint probability is factored in the reverse order\n",
"\n",
"$$\n",
"p_{R2L}(x) \\;=\\; \\prod_{t=1}^{T} p_{R2L}(x_{t}\\mid x_{>t}),\n",
"$$\n",
"\n",
"where $x_t$ is the sequence of subsequent tokens $(x_{t+1},\\dots,x_T)$. To train an R2L model, each training instance is tokenized, and then the order of all tokens is reversed before being fed to the model. Although L2R and R2L factorizations represent the same distribution in theory, their neural network approximations learn different behaviors and exhibit different performance characteristics."
]
},
{
"cell_type": "markdown",
"id": "99bb236b",
"metadata": {},
"source": []
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment