Technology Sharing

"Deep Analysis" ChatGPT2: Language Model for Unsupervised Multi-Task Learning (2019)

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Paper Summary

The following is my personal summary after reading the entire paper, which contains the main content of the ChatGPT-2 article. You can only read the [Paper Summary] section.

data set

A web crawler was built by myself. Some of the crawled web pages came from social platforms, and these web pages were filtered manually.
WebText Dataset
, containing 45,000,000 links. The other part comes from news websites. As of December 2017, the total amount of data reached 8,000,000 articles, with a total of 40GB of text content. The article also mentioned that texts including Wikipedia were also included in the training data set,
Millions of people around the world participated
To create and clean the dataset used for GPT-2 training.

Input representation

Designed a
Hybrid input representation combining word-level representation and byte-level representation
. A large number of repeated words were removed from the previous word-level library, and byte-level representation was introduced to improve generalization ability.

Word-level representation has the advantage of priors, and byte-level representation has the advantage of generalization.

Model

Some modifications have been made for GPT1:

1. Move the layer normalization to the input of each sub-block.

2. Adding extra layer normalization after the self-attention block.

3. Improved initialization method (during initialization, the weights of the residual layer are expanded by multiples of 1/√N, where N is the number of residual layers).

4. The dictionary is expanded, the word segmentation is expanded, the instruction set is expanded, and the batch size is expanded.

5.GPT contains 117,000,000 parameters,
GPT-2 contains 1,542,000,000 parameters

experiment

Because we only train once but want to observe the performance of the model in various sub-fields, all experiments can be classified into
Zero-shot learning

Test itemsWhat aspects of the model are tested?Test Results
Children's BooksIdentify different types of vocabularyACC increased from 85.7 to 93.3
LAMBADA testAbility to identify long dependencies in textPPL99.8 decreased to 8.63
Winograd Schema ChallengeCommon sense reasoning63.7% to 70.7%
Reading comprehensionThe model needs to have a certain memory capacity3 out of 4 tests set new records
SummaryAbility to summarize news articlesOn par with historical performance
translateTranslation capabilities of large models automatically learnedThe English-French translation is poor, and the French-English translation is at a benchmark level
Questions and AnswersThe model’s ability to answer plausible questions correctlyAccuracy increased by 5.3 times
Summarize

The core content of the GPT-2 paper can be summarized in one sentence:
Based on the GPT model, the author increased the model size and training data set size, and found that GPT-2 can automatically adapt to and complete the learning of task objectives in different fields of NLP.

For example, if we feed a fixed language model with a dataset of daily conversation text and news report text, and the dataset is large enough, the model is large enough, and the training time is long enough, the final model will have the ability to distinguish between different scenarios of daily conversation and news reports, and the model will also automatically have some new capabilities, such as the ability to write news summaries.

This means that large language models have strong generalization capabilities, but also means that
Large language models will have potential autonomous consciousness
The paper then presents experimental results for several independent areas listed by the authors.

Compared to the GPT paper which only mentioned Large Dataset, the GPT-2 paper began to mention LLM (Large Language Model).


Interpretation of the original paper

Original paper address: https://cdn.openai.com/better-language-models/language_models_are_uns