UniLM

UniLM

The term stands for Unified Language Model Pre-training for Natural Language Understanding and Generation

Original address of paper

https://arxiv.org/pdf/1905.03197.pdf

Reason the idea come up

The reason of the BERT not good for NLG is the MLM(mask language model), which is not the same with the target of generation task. Except the masked position, other words are seen for model. Therefore, to improve the NLG performance of BERT, one way to handle is get better NLG ability.

Main idea of the technology

The author thought that the natural way to control the model is the mode of information input in the model. E.g., the access to the context of the word token to be predicted. The paper proposed the 3 ways: bidirectional LM, unidirectional LM, sequence-to-sequence LM.

For the special token, SOS, EOS stand for start and end of sentence. They helps to improve the ability of NLU and NLG, it marks the end of sequence, and help model to learn the when to terminate the NLG task.

In the unidirectional LM, comprised by left to right and right to left LM. Taking the example of left to right, the x1, x2, [mask], x4. The representation of mask only x1 and x2

In the bidirectional LM contains both directions context to represent the masked token.

In the sequence to sequence LM, the input has segment, that is the source and the target segments. SOS, t1 ,t2, EOS t3, t4, t5, EOS. t2 could access the first 4 tokens, t5, could only get the first 6 tokens. (including the SOS and EOS)

Also, the paper mentioned that the 1/3 time use the bidirectional LM objective, 1/3 time used the seq2seqLM objective, and the left to right and right to left are 1/6 of the time be set as the objective. By this strategy, the final objective of UniLM could be achieve, i.e., generative performance.

In more details, the initialisation of UniLM used BERT-large, and the strategy of mask is same with BERT, but 20% of chance will be bigram or trigram, to improve the predictive performance.

Terms appear in the paper

There are two new terms of NLP:

NLU: Natural language understanding

NLG: Natural language Generation

The concepts are not brand new, but the abbreviations are good for extension