Last active
January 3, 2026 21:17
-
-
Save JoeJoe1313/d6c4c92ce18d7a537df39041853055ea to your computer and use it in GitHub Desktop.
Thinking Backwards: The "Reversal Blessing" in LLM Multiple-Choice Reasoning
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "id": "583c59cd", | |
| "metadata": {}, | |
| "source": [ | |
| "The joint probability is factored in the reverse order\n", | |
| "\n", | |
| "$$\n", | |
| "p_{R2L}(x) \\;=\\; \\prod_{t=1}^{T} p_{R2L}(x_{t}\\mid x_{>t}),\n", | |
| "$$\n", | |
| "\n", | |
| "where $x_t$ is the sequence of subsequent tokens $(x_{t+1},\\dots,x_T)$. To train an R2L model, each training instance is tokenized, and then the order of all tokens is reversed before being fed to the model. Although L2R and R2L factorizations represent the same distribution in theory, their neural network approximations learn different behaviors and exhibit different performance characteristics." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "99bb236b", | |
| "metadata": {}, | |
| "source": [] | |
| } | |
| ], | |
| "metadata": { | |
| "language_info": { | |
| "name": "python" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 5 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment