hpaul · August 5, 2022 11:30
diff --git a/What does the model say? b/What does the model say?
 --Abstract
 Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving infinite-length sequences from a recurrent language model when using common decoding algorithms. To analyze this issue, we first define inconsistency of a decoding algorithm, meaning that the algorithm can yield an infinite-length sequence that has zero probability under the model. We prove that commonly used incomplete decoding algorithms - greedy search, beam search, top-k sampling, and nucleus sampling - are inconsistent, despite the fact that recurrent language models are trained to produce sequences of finite length. Based on these insights, we propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model. Empirical results show that inconsistency occurs in practice, and that the proposed methods prevent inconsistency.  
    
 ----------Summary----------
 In this work, we investigate the issue of receiving infinite-length sequences from a recurrent language model.
 ----------Expanded----------
 In this work, we investigate the issue of receiving infinite-length sequences from a recurrent language model. The underlying assumption is that there is a finite number of languages, and that all languages are mutually nonoverlapping. We will prove that this assumption does not lead to a large number of possible infinite-length sequences, which can be used for any language


 --Abstract
 Neural text generation is a key tool in natural language applications, but it is well
 known there are major problems at its core. In particular, standard likelihood
 training and decoding leads to dull and repetitive outputs (Holtzman et al., 2019).
 While some post-hoc fixes have been proposed, in particular top-k and nucleus
 sampling, they do not address the fact that the token-level probabilities predicted
 by the model are poor. In this paper we show that the likelihood objective itself is
 at fault, resulting in a model that assigns too much probability to sequences containing repeats and frequent words, unlike those from the human training distribution. We propose a new objective, unlikelihood training, which forces unlikely
 generations to be assigned lower probability by the model. We show that both
 token and sequence level unlikelihood training give less repetitive, less dull text
 while maintaining perplexity, giving superior generations using standard greedy or
 beam search. According to human evaluations, our approach with standard beam
 search also outperforms the currently popular decoding methods of nucleus sampling or beam blocking, thus providing a strong alternative to existing techniques.
    
 ----------Summary----------
 We propose a new approach to training and decoding neural text.
 ----------Expanded----------
 We propose a new approach to training and decoding neural text. In our approach, we first define the problem as the following: Given a sentence, what are the best representations of the sentence in terms of the original sentences? In the end, we propose a neural network with a single hidden layer that is able to recognize
	--Abstract
	Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving infinite-length sequences from a recurrent language model when using common decoding algorithms. To analyze this issue, we first define inconsistency of a decoding algorithm, meaning that the algorithm can yield an infinite-length sequence that has zero probability under the model. We prove that commonly used incomplete decoding algorithms - greedy search, beam search, top-k sampling, and nucleus sampling - are inconsistent, despite the fact that recurrent language models are trained to produce sequences of finite length. Based on these insights, we propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model. Empirical results show that inconsistency occurs in practice, and that the proposed methods prevent inconsistency.

	----------Summary----------
	In this work, we investigate the issue of receiving infinite-length sequences from a recurrent language model.
	----------Expanded----------
	In this work, we investigate the issue of receiving infinite-length sequences from a recurrent language model. The underlying assumption is that there is a finite number of languages, and that all languages are mutually nonoverlapping. We will prove that this assumption does not lead to a large number of possible infinite-length sequences, which can be used for any language


	--Abstract
	Neural text generation is a key tool in natural language applications, but it is well
	known there are major problems at its core. In particular, standard likelihood
	training and decoding leads to dull and repetitive outputs (Holtzman et al., 2019).
	While some post-hoc fixes have been proposed, in particular top-k and nucleus
	sampling, they do not address the fact that the token-level probabilities predicted
	by the model are poor. In this paper we show that the likelihood objective itself is
	at fault, resulting in a model that assigns too much probability to sequences containing repeats and frequent words, unlike those from the human training distribution. We propose a new objective, unlikelihood training, which forces unlikely
	generations to be assigned lower probability by the model. We show that both
	token and sequence level unlikelihood training give less repetitive, less dull text
	while maintaining perplexity, giving superior generations using standard greedy or
	beam search. According to human evaluations, our approach with standard beam
	search also outperforms the currently popular decoding methods of nucleus sampling or beam blocking, thus providing a strong alternative to existing techniques.

	----------Summary----------
	We propose a new approach to training and decoding neural text.
	----------Expanded----------
	We propose a new approach to training and decoding neural text. In our approach, we first define the problem as the following: Given a sentence, what are the best representations of the sentence in terms of the original sentences? In the end, we propose a neural network with a single hidden layer that is able to recognize
No results found