Technology Sharing

Algorithm Interview Questions_Byte

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Question 1: Transformer matrix dimension analysis and MultiHead detailed explanation:

Detailed link 1
Detailed Link 2

Question 2: Transformer structure, process, dimension transformation, encoder, decoder:

How to change the dimensions of multiple heads: First, process them at the input of q, k, and v, so that their dimensions are embedding_size / nums_head. Finally, concatenate them at the end of the attention layer. This is why: the number of heads must be divisible by embedding_size
Attention: The attention weight is calculated by Q and K, and then applied to V to get the entire weight and output
Detailed Links

Question 3: Detailed algorithms of P-tuning, LoRa, and adapter:

p-tuning
lora
adapter

Question 4: What are the tasks of the evaluation framework?

Question 5: What models were trained and what are the dataset issues?

Question 6: cv, deepsortv3, the history of yolo, and the backbone of yolo

Question 7: What is the difference between dataloader and dataset?

ⅰ. Dataset is a traditional class. Users set specific classes based on their specific needs.
https://huggingface.co/docs/datasets/loading
ⅱ. Dataloader accepts the classes defined by dataset and divides them into batches to facilitate subsequent training, reasoning and other operations.
ⅲ. Dataset retrieves the features of our dataset and labels one sample at a time. When training a model, we usually want to pass samples in "mini-batches" and reorganize the data in each period to reduce model overfitting; there is a shuffle that determines whether to shuffle each batch between each epoch.