Pytorch (Note 8 Neural Network nn)

2024-07-12

1、nn.Module

torch.nn is a module designed specifically for deep learning. The core data structure of torch.nn is Module, which is an abstract concept that can represent a layer in a neural network or a neural network with many layers. In actual use, the most common approach is to inherit nn.Module to write your own network/layer. Let's take a look at how to use nn.Module to implement your own fully connected layer.Y=AX+B

import torch as t
import torch.nn as nn

class network(nn.Module):
    def __init__(self, input, output):
        super().__init__()
        # 定义权重矩阵a，它是一个可训练的参数，形状为(input, output)
        self.a = nn.Parameter(t.randn(input, output))
        # 定义偏置向量b，它也是一个可训练的参数，形状为(output,)
        # 注意：偏置向量的长度应与输出特征的维度相匹配
        self.b = nn.Parameter(t.randn(output))

    def forward(self, x):
        """
        定义前向传播过程

        参数:
            x (torch.Tensor): 输入数据，形状应为(batch_size, input)

        返回:
            torch.Tensor: 输出数据，形状为(batch_size, output)
        """
        # 首先，使用权重矩阵a对输入x进行线性变换
        # [email protected]执行矩阵乘法，x的每一行与a相乘，结果形状为(batch_size, output)
        x = x @ self.a
        # 然后，将偏置向量b扩展（通过broadcasting）到与x相同的形状，并加到x上
        # self.b.expand_as(x)将b的形状从(output,)扩展到(batch_size, output)
        # x + self.b.expand_as(x)将偏置加到每个样本的输出上
        x = x + self.b.expand_as(x)
        # 返回变换后的输出
        return x


a = network(4, 3)
# 创建输入数据，形状为(6, 4)，表示有6个样本，每个样本有4个特征
input = t.rand(6, 4)
# 通过网络前向传播得到输出
output = a(input)
# 打印输出，形状应为(6, 3)，表示有6个样本，每个样本的输出特征维度为3
print(output)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

The custom layer network must inherit nn.Module and call the nn.Module constructor in its constructor, that is, super().init() or nn.Module.init(self), the first usage is recommended;
In the constructor __init__, you must define the learnable parameters and encapsulate them into Parameters. For example, in this example, we encapsulate w and b into Parameters. Parameter is a special Tensor, but it requires derivatives by default (requires_grad
= True）
The forward function implements the forward propagation process, and its input can be one or more Tensors;
No need to write a back-propagation function, nn.Module can use autograd to automatically implement back-propagation, which is much simpler than Function;
The learnable parameters in a module can return an iterator through named_parameters() or parameters(). The former will attach a name to each parameter to make it more identifiable.

2. Common Neural Network Layers

2.1 Image Correlation Layer

Image-related layers mainly include convolutional layers (Conv), pooling layers (Pool), etc. These layers can be divided into one-dimensional (1D), two-dimensional (2D) and three-dimensional (3D) in actual use. Pooling methods are divided into average pooling (AvgPool), maximum pooling (MaxPool), adaptive pooling (AdaptiveAvgPool), etc. In addition to the commonly used forward convolution, the convolutional layer also has inverse convolution (TransposeConv), etc. The following will give an example.

Feature extraction
Maintaining data space structure
After the convolution operation, an activation function (such as ReLU, Sigmoid, or Tanh) is usually applied to introduce nonlinear transformations. These activation functions can increase the expressive power of CNN, enabling it to learn more complex nonlinear relationships.
Improve computational efficiency Through the combination of convolution operations and pooling layers, the convolution layer can reduce the spatial dimension of the feature map, thereby reducing the amount of calculation and improving the computational efficiency of the model. At the same time, the pooling layer can also enhance the translation invariance of the features, making the model more robust to small changes in the input data.

Convolutional Layer

In deep learning, the most important network structure related to image processing is the convolution layer (Conv). The essence of convolutional neural network is the superposition of convolution layer, pooling layer, activation layer and other layers, so it is extremely important to understand the working principle of convolution layer. The following example will illustrate the specific process of convolution operation.
insert image description here

# 导入PyTorch库  
import torch  
import torch.nn as nn  
  
# 从torchvision.transforms导入ToTensor和ToPILImage，用于图像张量和PIL图像之间的转换  
from torchvision.transforms import ToTensor, ToPILImage  
  
# 从PIL（Python Imaging Library，Pillow是其一个分支）导入Image模块，用于处理图像文件  
from PIL import Image  
  
# 使用PIL的Image.open函数打开指定路径的图片文件，并通过.convert("L")将其转换为灰度图像（单通道）  
img = Image.open("H:\PYTHON_Proj\handlearnpytorch\OIP-C.jpg").convert("L")  
  
# 实例化ToTensor转换对象，用于将PIL图像转换为PyTorch张量  
to_tensor = ToTensor()  
  
# 实例化ToPILImage转换对象，用于将PyTorch张量转换回PIL图像  
to_PIL = ToPILImage()  
  
# 使用to_tensor将PIL图像转换为PyTorch张量，并通过.unsqueeze(0)在批次大小维度上增加一个维度，使其形状变为(1, 1, H, W)  
img = to_tensor(img).unsqueeze(0)  
  
# 创建一个3x3的卷积核（滤波器），初始时所有元素都被设置为-1/9，然后将中心元素设置为1  
kernel = torch.ones(3, 3) / (-9.0)  
kernel[1][1] = 1  
  
# 创建一个Conv2d层，指定输入通道数为1（因为是灰度图像），输出通道数也为1，卷积核大小为3x3，步长为1，填充为1（保持输出尺寸与输入相同），且不使用偏置项  
conv = nn.Conv2d(1, 1, 3, 1, 1, bias=False)  
  
# 将之前定义的卷积核赋值给Conv2d层的权重，注意要调整形状以匹配Conv2d层的期望（out_channels, in_channels, kernel_size[0], kernel_size[1]）  
conv.weight.data = kernel.reshape(1, 1, 3, 3)  
  
# 对图像应用卷积操作，此时img是一个四维张量，Conv2d层会处理它并返回一个新的四维张量  
img = conv(img)  
  
# 使用to_PIL将卷积后的PyTorch张量转换回PIL图像，并通过.squeeze(0)移除批次大小维度  
img = to_PIL(img.squeeze(0))  
  
# 使用PIL的.show()方法显示图像  
img.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

Pooling Layer

The pooling layer can be regarded as a special convolution layer, which is mainly used for downsampling. Adding a pooling layer can reduce the number of parameters while retaining the main features, thereby preventing overfitting to a certain extent. The pooling layer has no learnable parameters and its weight is fixed. Various pooling layers are encapsulated in the torch.nn toolbox, the most common of which are the maximum pooling (MaxPool) and the average pooling (AvgPool). The pooling layer plays a very important role in the convolutional neural network (CNN). Its main uses can be summarized as follows:

Dimensionality reduction (reducing the amount of computation): The pooling layer reduces the amount of computation and the number of parameters in subsequent layers by reducing the spatial size (i.e., height and width) of the data. This is very helpful for preventing overfitting and speeding up calculations.
Feature invariance: The pooling layer enables the model to learn a more robust feature representation, that is, it is invariant to small changes in the input data (such as translation, rotation, etc.). This is because pooling operations (such as maximum pooling, average pooling, etc.) select representative features in the region instead of relying on specific location information.
Extracting main features: Through the pooling operation, the most important features in the image can be extracted, while ignoring some unimportant details. This is helpful for the subsequent convolutional layer to further extract high-level features.
Expanding the receptive field: As the number of network layers increases, the pooling layer can gradually expand the input area (i.e., receptive field) corresponding to each neuron in the subsequent layers. This helps the network learn more global feature information.
Reducing overfitting: Since the pooling layer reduces the number of parameters by reducing the spatial dimension of the data, this can reduce the complexity of the model to a certain extent, thus helping to prevent overfitting.

Common pooling operations include:

Max Pooling: Select the maximum value in the pooling window as the output. This method helps to preserve the edge and texture information of the image.
Average Pooling: Calculate the average of all values within the pooling window as the output. This method helps to preserve the background information of the image.
Stochastic Pooling: According to the value of each element in the pooling window, elements are randomly selected as output according to probability. This method combines the advantages of maximum pooling and average pooling, but the computational complexity is higher.

In short, the pooling layer is an indispensable part of the convolutional neural network. It provides important support for the learning ability and performance of the entire network by reducing the spatial dimension of the data, extracting the main features, expanding the receptive field, and preventing overfitting.

# 导入PyTorch库  
import torch  
  
# 导入PyTorch的神经网络模块，用于构建和训练神经网络  
import torch.nn as nn  
  
# 从torchvision.transforms模块导入ToTensor和ToPILImage，这两个转换工具用于图像数据的预处理和后处理  
from torchvision.transforms import ToTensor, ToPILImage  
  
# 从PIL库导入Image模块，用于图像的打开、显示等操作  
from PIL import Image  
  
# 创建一个ToTensor的实例，用于将PIL图像或numpy.ndarray转换为FloatTensor，并归一化到[0.0, 1.0]  
to_tensor = ToTensor()  
  
# 创建一个ToPILImage的实例，用于将Tensor或ndarray转换为PIL图像  
to_pil = ToPILImage()  
  
# 使用PIL的Image.open方法打开指定路径的图像文件，并将其转换为灰度图像（'L'模式）  
img = Image.open("H:\PYTHON_Proj\handlearnpytorch\OIP-C.jpg").convert('L')  
  
# 使用PIL的show方法显示图像  
img.show()  
  
# 使用ToTensor转换将PIL图像转换为Tensor，并增加一个维度使其成为[1, H, W]形状，即增加一个批次维度  
img = to_tensor(img).unsqueeze(0)  
  
# 创建一个平均池化层实例，使用2x2的窗口大小和步长为2进行池化  
pool = nn.AvgPool2d(2, 2)  
  
# 对图像Tensor应用平均池化层，然后移除批次维度（squeeze(0)），使其变回[H', W']形状  
img = pool(img).squeeze(0)  
  
# 将Tensor转换回PIL图像以便显示  
img = to_pil(img)  
  
# 再次使用PIL的show方法显示经过池化处理的图像  
img.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

Other layers

In addition to convolutional layers and pooling layers, the following layers are also commonly used in deep learning:

Linear: fully connected layer;
BatchNorm: Batch normalization layer, divided into 1D, 2D and 3D. In addition to the standard BatchNorm, there is also the InstanceNorm layer which is commonly used in style transfer;
Dropout: Dropout layer is used to prevent overfitting and is also divided into 1D, 2D and 3D.

3. Initialization strategy

Parameter initialization is very important in deep learning. Good initialization can make the model converge faster and reach a higher level, while bad initialization may cause the model to crash quickly. The module parameters of nn.Module in PyTorch adopt a more reasonable initialization strategy, so we generally don't need to consider it. Of course, we can also use custom initialization instead of the system's default initialization. When we use Parameter, custom initialization is particularly important. This is because torch.Tensor() returns random numbers in memory, which may have maximum values, which will cause overflow or gradient disappearance in the actual training network. The nn.init module in PyTorch is a module specially designed for initialization, which implements commonly used initialization strategies. If a certain initialization strategy nn.init is not provided, users can also initialize it directly by themselves.

import torch  
from torch.nn import init  
from torch import nn  
  
# 创建一个线性层，其权重和偏置会被随机初始化（与torch.manual_seed无关，因为这是在调用torch.manual_seed之前发生的）  
linear = nn.Linear(3, 4)  
  
# 打印层创建时默认初始化的权重  
print("默认初始化的权重:")  
print(linear.weight)  
  
# 设置随机数生成的种子，以确保接下来的随机数生成是可重复的  
torch.manual_seed(2021)  
  
# 使用Xavier正态分布重新初始化权重  
# 这个初始化是受torch.manual_seed(2021)影响的  
init.xavier_normal_(linear.weight)  
  
# 打印重新初始化后的权重  
print("Xavier正态分布初始化后的权重:")  
print(linear.weight)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

Technology Sharing