Technology Sharing

[PyTorch] torch.fmod uses truncated normal distribution to initialize the weights of the neural network

2024-07-11

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

This code snippet shows how to initialize the weights of a neural network using PyTorch, using a truncated normal distribution. A truncated normal distribution means that the generated values ​​are truncated within a certain range to prevent extreme values. torch.fmod As a workaround to achieve this effect.

Detailed explanation

1. Truncated Normal Distribution

The truncated normal distribution is a modification of the normal distribution that ensures that the generated values ​​are within a certain range. Specifically,torch.fmod The function returns the remainder when the input tensor is divided by 2 (i.e. the resulting value is between -2 and 2).

2. Weight initialization

In the code, the four weight tensors are calculated with different standard deviations (init_sd_first, init_sd_middle, init_sd_last) is generated from a truncated normal distribution. The specific dimensions are:

  • The shape of the weight tensor for the first layer is (x_dim, width n_double)
  • The shapes of the two weight tensors in the middle layer are (width, width n_double)
  • The shape of the weight tensor of the last layer is (width, 1)

These weight tensors are generated as follows:

initial_weights = [
    torch.fmod(torch.normal(0, init_sd_first, size=(x_dim, width   n_double)), 2),
    torch.fmod(torch.normal(0, init_sd_middle, size=(width, width   n_double)), 2),
    torch.fmod(torch.normal(0, init_sd_middle, size=(width, width   n_double)), 2),
    torch.fmod(torch.normal(0, init_sd_last, size=(width, 1)), 2)
]