Transformers仓库做语言生成的解码方法介绍

简介

$$P(w_{1:T}|W_0) = \prod_{t=1}^TP(w_t|w_{1:t-1},W_0), with\ w_{1:0}=\emptyset$$

pip install -q git+https://github.com/huggingface/transformers.git
pip install -q tensorflow==2.1
import tensorflow as tf

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

model = TFGPT2LMHeadModel.from_pretrained("gpt2", pad_token_id=tokenizer.eos_token_id)

Greedy search是指在每个$t$时刻选择下一个词时，根据$w_t=argmax_wP(w|w_{1:t-1})$选择概率最高的词。下面这张图给出Greedy search的解码路径。

(“I“,”enjoy“,”walking“,”with“,”my“,”cute“,”dog“)

# 编码初始内容
input_ids = tokenizer.encode('I enjoy walking with my cute dog', return_tensors='tf')

# 生成词直到长度50（包含初始内容）
greedy_output = model.generate(input_ids, max_length=50)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))
Output:
----------------------------------------------------------------------------------------------------
I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to walk with my dog. I'm not sure if I'll ever be able to walk with my dog.

I'm not sure if I'll

greedy search的主要缺点是，它只考虑了当前的高概率词，忽略了在当前低概率词后面的高概率词。就像上图中我们看到的：

beam search通过参数num_beams的配置，可以在每个时刻，记录概率最高的前几个路径，在下一个时刻可以有多个基础路径同时搜索。因此可以避免错过隐藏的高概率词。以num_beams=2为例：

beam search生成的词序列比greedy search生成的词序列的综合概率更高，但是也不能保证是概率最高的词序列。

# 激活beam search和提前停止
beam_output = model.generate(
input_ids,
max_length=50,
num_beams=5,
early_stopping=True
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))
Output:
----------------------------------------------------------------------------------------------------
I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to walk with him again.

I'm not sure if I'll ever be able to walk with him again. I'm not sure if I'll

# 配置 no_repeat_ngram_size = 2
beam_output = model.generate(
input_ids,
max_length=50,
num_beams=5,
no_repeat_ngram_size=2,
early_stopping=True
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))
Output:
----------------------------------------------------------------------------------------------------
I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to walk with him again.

I've been thinking about this for a while now, and I think it's time for me to take a break

transformers中，我们配置参数num_return_sequences，返回综合概率最高的几个生成结果，要注意num_return_sequences <= num_beams！

# set return_num_sequences > 1
beam_outputs = model.generate(
input_ids,
max_length=50,
num_beams=5,
no_repeat_ngram_size=2,
num_return_sequences=5,
early_stopping=True
)

# now we have 3 output sequences
print("Output:\n" + 100 * '-')
for i, beam_output in enumerate(beam_outputs):
print("{}: {}".format(i, tokenizer.decode(beam_output, skip_special_tokens=True)))
Output:
----------------------------------------------------------------------------------------------------
0: I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to walk with him again.

I've been thinking about this for a while now, and I think it's time for me to take a break
1: I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to walk with him again.

I've been thinking about this for a while now, and I think it's time for me to get back to
2: I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to walk with her again.

I've been thinking about this for a while now, and I think it's time for me to take a break
3: I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to walk with her again.

I've been thinking about this for a while now, and I think it's time for me to get back to
4: I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to walk with him again.

I've been thinking about this for a while now, and I think it's time for me to take a step

• beam search在做像是翻译和摘要这类可以大致预测生成长度的场景中表现还可以Murray et al. (2018)Yang et al. (2018)。但是在像是对话和故事生成这类开放生成领域效果就差得多了。

• 我们已经看到beam search经常会生成重复内容，在故事生成中，我们很难决定要不要n-gram惩罚，因为我们很难确定强制不要重复还是有重复会更好。

• Ari Holtzman et al. (2019)提到，高水平的人类语言不会按照下一个词条件概率最高的方式排列。换言之，我们人类的语言会经常让你意外，而不是每个词都是无聊的、可预测的。这篇论文画出了人类语言和beam search生成的概率分布如下↓

Sampling

$$w_t\sim{P(w|w_{1:t-1})}$$

transformers库中，我们可以配置do_sample=Truetop_k=0ransom_seed=0

# 通过配置种子，可以随意的控制不同的生成结果
tf.random.set_seed(0)

# 开启sampling，关闭top_k
sample_output = model.generate(
input_ids,
do_sample=True,
max_length=50,
top_k=0
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))
Output:
----------------------------------------------------------------------------------------------------
I enjoy walking with my cute dog. He just gave me a whole new hand sense."

But it seems that the dogs have learned a lot from teasing at the local batte harness once they take on the outside.

"I take

Nice，生成的文本看起来不错，但是仔细看，看起来仍然不是很连贯。三元组（“new”、“hand”、“sense”）（“local”、“batte”、“harness”）看起来很怪，不像是人写出来的，这在sampling生成中是一个很大的问题，同时他也经常胡言乱语 Ari Holtzman et al. (2019)

temperature配置的更低，可以把单词分布概率$P(w|w_{1:t-1})$的高概率词的概率调的更高，低概率词调的更低。

# 通过配置种子，可以随意的控制不同的生成结果
tf.random.set_seed(0)

# 通过配置temperature，降低了低概率候选词选中的可能性
sample_output = model.generate(
input_ids,
do_sample=True,
max_length=50,
top_k=0,
temperature=0.7
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))
Output:
----------------------------------------------------------------------------------------------------
I enjoy walking with my cute dog, but I don't like to be at home too much. I also find it a bit weird when I'm out shopping. I am always away from my house a lot, but I do have a few friends

OK，这样出现奇怪的n-grams组的机会小多了，整体看起来更连贯了。通过配置不同的temperature，可以改变随机性，当配置$temperature\to0$，那么生成效果就回到greedy search了。

Top-K Sampling

Fan et. al (2018)介绍了一个简单有效的sampling方法，叫做Top-K sampling。在处理中，前K个条件概率最高的词被拎出来，条件概率被重新分到这前K个词中。GPT2用了这种抽样方法，这也是它生成故事效果好的主要原因。

# 通过配置种子，可以随意的控制不同的生成结果
tf.random.set_seed(0)

# 配置 top-k至50
sample_output = model.generate(
input_ids,
do_sample=True,
max_length=50,
top_k=50
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))
Output:
----------------------------------------------------------------------------------------------------
I enjoy walking with my cute dog. It's so good to have an environment where your dog is available to share with you and we'll be taking care of you.

We hope you'll find this story interesting!

I am from

Top-p(nucleus) Sampling

transformers中，我们可以配置$0\ <\ top-p\ <\ 1$来启动top-p sampling。

# 通过配置种子，可以随意的控制不同的生成结果
tf.random.set_seed(0)

# 关闭 top-k sampling top-p sampling累计概率至0.92
sample_output = model.generate(
input_ids,
do_sample=True,
max_length=50,
top_p=0.92,
top_k=0
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))
Output:
----------------------------------------------------------------------------------------------------
I enjoy walking with my cute dog. He will never be the same. I watch him play.

Guys, my dog needs a name. Especially if he is found with wings.

What was that? I had a lot o

# 通过配置种子，可以随意的控制不同的生成结果
tf.random.set_seed(0)

# 配置 top_k = 50 、 top_p = 0.95 、 num_return_sequences = 3
sample_outputs = model.generate(
input_ids,
do_sample=True,
max_length=50,
top_k=50,
top_p=0.95,
num_return_sequences=3
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))
Output:
----------------------------------------------------------------------------------------------------
0: I enjoy walking with my cute dog. It's so good to have the chance to walk with a dog. But I have this problem with the dog and how he's always looking at us and always trying to make me see that I can do something
1: I enjoy walking with my cute dog, she loves taking trips to different places on the planet, even in the desert! The world isn't big enough for us to travel by the bus with our beloved pup, but that's where I find my love
2: I enjoy walking with my cute dog and playing with our kids," said David J. Smith, director of the Humane Society of the US.

"So as a result, I've got more work in my time," he said.

Cool，现在可以用transformers实现这些解码技术，生成你的故事了。

总结

• 在开放语言生成领域，top-p和top-k这些解码方法，比greedy search、beam search生成的文本更流利。如今，也有很多证据证明了greedy search、beam search的缺点，如生成重复的词序列，是因为训练的模型导致的，而不是解码方式Welleck et al. (2019)。同时，在Welleck et al. (2020)中，也证明了，top-K和top-p抽样也会生成重复的词序列。

• Welleck et al. (2019)中，在调整训练目标后，通过人类评测beam search和top-p sampling发现，beam search的生成文本更流利。

• 开放领域语言生成正在快速迭代研究中，没有通用的最优方法，只有在特定场景更适合的方法。

• 好消息是，你可以在transformers中，快速尝试所有的解码方法😊

• 以上就是用transformers实现开放领域语言生成的不同解码方法的简短介绍。

• 有任何反馈和问题请访问github repository

• 获取更多有趣的故事生成技术Writing with Transformers

• 感谢这个博客的贡献者：Alexander Rush, Julien Chaumand, Thomas Wolf, Victor Sanh, Sam Shleifer, Clément Delangue, Yacine Jernite, Oliver Åstrand and John de Wasseige

附录

• min_length可以用来强制模型在生成文本不到min_length时，不要生成EOS词。这在摘要领域很常见，而在想要生成长文本时几乎是必须要配置的。
• repetition_penalty可以被用来惩罚重复的词。这在Kesker et al. (2019)中第一次提到，在Welleck et al. (2019)中被用于训练目标。它在防止重复序列时很有用，但在不同的生成模型和案例中，情况又很复杂github

