2024-07-12
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Detailed link 1
Detailed Link 2
How to change the dimensions of multiple heads: First, process them at the input of q, k, and v, so that their dimensions are embedding_size / nums_head. Finally, concatenate them at the end of the attention layer. This is why: the number of heads must be divisible by embedding_size
Attention: The attention weight is calculated by Q and K, and then applied to V to get the entire weight and output
Detailed Links
p-tuning
lora
adapter
ⅰ. Dataset is a traditional class. Users set specific classes based on their specific needs.
https://huggingface.co/docs/datasets/loading
ⅱ. Dataloader accepts the classes defined by dataset and divides them into batches to facilitate subsequent training, reasoning and other operations.
ⅲ. Dataset retrieves the features of our dataset and labels one sample at a time. When training a model, we usually want to pass samples in "mini-batches" and reorganize the data in each period to reduce model overfitting; there is a shuffle that determines whether to shuffle each batch between each epoch.