JoeJoe1313 · January 3, 2026 21:17
diff --git a/1.ipynb b/1.ipynb
diff --git a/2.ipynb b/2.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "583c59cd",
   "metadata": {},
   "source": [
    "The joint probability is factored in the reverse order\n",
    "\n",
    "$$\n",
    "p_{R2L}(x) \\;=\\; \\prod_{t=1}^{T} p_{R2L}(x_{t}\\mid x_{>t}),\n",
    "$$\n",
    "\n",
    "where $x_t$ is the sequence of subsequent tokens $(x_{t+1},\\dots,x_T)$. To train an R2L model, each training instance is tokenized, and then the order of all tokens is reversed before being fed to the model. Although L2R and R2L factorizations represent the same distribution in theory, their neural network approximations learn different behaviors and exhibit different performance characteristics."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99bb236b",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
diff --git a/3.ipynb b/3.ipynb
diff --git a/4.ipynb b/4.ipynb
	{
	"cells": [
	{
	"cell_type": "markdown",
	"id": "583c59cd",
	"metadata": {},
	"source": [
	"The joint probability is factored in the reverse order\n",
	"\n",
	"$$\n",
	"p_{R2L}(x) \\;=\\; \\prod_{t=1}^{T} p_{R2L}(x_{t}\\mid x_{>t}),\n",
	"$$\n",
	"\n",
	"where $x_t$ is the sequence of subsequent tokens $(x_{t+1},\\dots,x_T)$. To train an R2L model, each training instance is tokenized, and then the order of all tokens is reversed before being fed to the model. Although L2R and R2L factorizations represent the same distribution in theory, their neural network approximations learn different behaviors and exhibit different performance characteristics."
	]
	},
	{
	"cell_type": "markdown",
	"id": "99bb236b",
	"metadata": {},
	"source": []
	}
	],
	"metadata": {
	"language_info": {
	"name": "python"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 5
	}
No results found