Created
August 5, 2022 11:30
-
-
Save hpaul/9cac79cae5f83435078d476c656cec8c to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| --Abstract | |
| Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving infinite-length sequences from a recurrent language model when using common decoding algorithms. To analyze this issue, we first define inconsistency of a decoding algorithm, meaning that the algorithm can yield an infinite-length sequence that has zero probability under the model. We prove that commonly used incomplete decoding algorithms - greedy search, beam search, top-k sampling, and nucleus sampling - are inconsistent, despite the fact that recurrent language models are trained to produce sequences of finite length. Based on these insights, we propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model. Empirical results show that inconsistency occurs in practice, and that the proposed methods prevent inconsistency. | |
| ----------Summary---------- | |
| In this work, we investigate the issue of receiving infinite-length sequences from a recurrent language model. | |
| ----------Expanded---------- | |
| In this work, we investigate the issue of receiving infinite-length sequences from a recurrent language model. The underlying assumption is that there is a finite number of languages, and that all languages are mutually nonoverlapping. We will prove that this assumption does not lead to a large number of possible infinite-length sequences, which can be used for any language | |
| --Abstract | |
| Neural text generation is a key tool in natural language applications, but it is well | |
| known there are major problems at its core. In particular, standard likelihood | |
| training and decoding leads to dull and repetitive outputs (Holtzman et al., 2019). | |
| While some post-hoc fixes have been proposed, in particular top-k and nucleus | |
| sampling, they do not address the fact that the token-level probabilities predicted | |
| by the model are poor. In this paper we show that the likelihood objective itself is | |
| at fault, resulting in a model that assigns too much probability to sequences containing repeats and frequent words, unlike those from the human training distribution. We propose a new objective, unlikelihood training, which forces unlikely | |
| generations to be assigned lower probability by the model. We show that both | |
| token and sequence level unlikelihood training give less repetitive, less dull text | |
| while maintaining perplexity, giving superior generations using standard greedy or | |
| beam search. According to human evaluations, our approach with standard beam | |
| search also outperforms the currently popular decoding methods of nucleus sampling or beam blocking, thus providing a strong alternative to existing techniques. | |
| ----------Summary---------- | |
| We propose a new approach to training and decoding neural text. | |
| ----------Expanded---------- | |
| We propose a new approach to training and decoding neural text. In our approach, we first define the problem as the following: Given a sentence, what are the best representations of the sentence in terms of the original sentences? In the end, we propose a neural network with a single hidden layer that is able to recognize | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment