Technology Sharing

Shengsi 25-day learning card camp day 12 | Simple deep learning ResNet50 image classification - Building a ResNet50 network

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

ResNet mainly solves the "degradation" problem of deep convolutional networks when the depth is deepened. In general convolutional neural networks, the first problem caused by increasing the depth of the network is gradient vanishing and explosion. This problem was successfully solved after Szegedy proposed the BN layer. The BN layer can normalize the output of each layer, so that the gradient can remain stable in size after being transmitted layer by layer in the reverse direction, and will not be too small or too large. However, the author found that it is still not easy to converge after adding BN and increasing the depth. He mentioned the second problem - the problem of decreased accuracy: when the layer is large to a certain extent, the accuracy will saturate and then drop rapidly. This drop is not caused by gradient vanishing or overfitting, but because the network is too complex, so it is difficult to achieve the ideal error rate by unconstrained free-range training alone.

The problem of decreased accuracy is not a problem with the network structure itself, but is caused by the existing training method being not ideal. Currently, the widely used optimizers, whether SGD, RMSProp, or Adam, cannot achieve the theoretically optimal convergence result when the network depth increases.

As long as there is a suitable network structure, a deeper network will definitely perform better than a shallower network. The proof is also very simple: suppose a few layers are added to the back of a network A to form a new network B. If the added layers only perform an identity mapping on the output of A, that is, the output of A does not change after passing through the added layers to become the output of B, then the error rates of network A and network B are equal, which proves that the effect of the deepened network will not be worse than that of the original network.


ResNet50 Image Classification

Image classification is the most basic computer vision application and belongs to the category of supervised learning. For example, given an image (cat, dog, airplane, car, etc.), determine the category of the image. This chapter will introduce the use of ResNet50 network to classify the CIFAR-10 dataset.

ResNet Network Introduction

The ResNet50 network was proposed by He Kaiming of Microsoft Lab in 2015 and won the first place in the ILSVRC2015 image classification competition. Before the ResNet network was proposed, traditional convolutional neural networks were obtained by stacking a series of convolutional layers and pooling layers, but when the network is stacked to a certain depth, degradation problems will occur. The figure below shows the training error and test error of a 56-layer network and a 20-layer network on the CIFAR-10 dataset. From the data in the figure, it can be seen that the training error and test error of the 56-layer network are larger than those of the 20-layer network. As the network deepens, its error does not decrease as expected.

resnet-1

The ResNet network proposed a residual network structure to alleviate the degradation problem. The ResNet network can be used to build a deeper network structure (over 1000 layers). The training error and test error graph of the ResNet network on the CIFAR-10 dataset in the paper is shown in the figure below. The dotted line in the figure represents the training error and the solid line represents the test error. It can be seen from the data in the figure that the deeper the ResNet network layer, the smaller its training error and test error.

resnet-4

Dataset preparation and loading

The CIFAR-10 dataset contains 60,000 32*32 color images, divided into 10 categories, with 6,000 images in each category. The dataset contains 50,000 training images and 10,000 evaluation images. First, the following example uses the download interface to download and decompress. Currently, only the binary version of the CIFAR-10 file (CIFAR-10 binary version) is supported.

%%capture captured_output
# 实验环境已经预装了mindspore==2.2.14,如需更换mindspore版本,可更改下面mindspore的版本号
!pip uninstall mindspore -y
!pip install -i https://pypi.mirrors.ustc.edu.cn/simple mindspore==2.2.14
%%capture captured_output
# 实验环境已经预装了mindspore==2.2.14,如需更换mindspore版本,可更改下面mindspore的版本号
!pip uninstall mindspore -y
!pip install -i https://pypi.mirrors.ustc.edu.cn/simple mindspore==2.2.14
# 查看当前 mindspore 版本
!pip show mindspore
# 查看当前 mindspore 版本
!pip show mindspore
from download import download

url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/cifar-10-binary.tar.gz"

download(url, "./datasets-cifar10-bin", kind="tar.gz", replace=True)
from download import download
​
url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/cifar-10-binary.tar.gz"
​
download(url, "./datasets-cifar10-bin", kind="tar.gz", replace=True)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

‘./datasets-cifar10-bin’
The directory structure of the downloaded dataset is as follows:

datasets-cifar10-bin/cifar-10-batches-bin
├── batches.meta.text
├── data_batch_1.bin
├── data_batch_2.bin
├── data_batch_3.bin
├── data_batch_4.bin
├── data_batch_5.bin
├── readme.html
└── test_batch.bin
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

insert image description here

Then, use the mindspore.dataset.Cifar10Dataset interface to load the dataset and perform relevant image enhancement operations.

import mindspore as ms
import mindspore.dataset as ds
import mindspore.dataset.vision as vision
import mindspore.dataset.transforms as transforms
from mindspore import dtype as mstype

data_dir = "./datasets-cifar10-bin/cifar-10-batches-bin"  # 数据集根目录
batch_size = 256  # 批量大小
image_size = 32  # 训练图像空间大小
workers = 4  # 并行线程个数
num_classes = 10  # 分类数量

def create_dataset_cifar10(dataset_dir, usage, resize, batch_size, workers):

    data_set = ds.Cifar10Dataset(dataset_dir=dataset_dir,
                                 usage=usage,
                                 num_parallel_workers=workers,
                                 shuffle=True)

    trans = []
    if usage == "train":
        trans += [
            vision.RandomCrop((32, 32), (4, 4, 4, 4)),
            vision.RandomHorizontalFlip(prob=0.5)
        ]

    trans += [
        vision.Resize(resize),
        vision.Rescale(1.0 / 255.0, 0.0),
        vision.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]),
        vision.HWC2CHW()
    ]

    target_trans = transforms.TypeCast(mstype.int32)

    # 数据映射操作
    data_set = data_set.map(operations=trans,
                            input_columns='image',
                            num_parallel_workers=workers)

    data_set = data_set.map(operations=target_trans,
                            input_columns='label',
                            num_parallel_workers=workers)

    # 批量操作
    data_set = data_set.batch(batch_size)

    return data_set


# 获取处理后的训练与测试数据集

dataset_train = create_dataset_cifar10(dataset_dir=data_dir,
                                       usage="train",
                                       resize=image_size,
                                       batch_size=batch_size,
                                       workers=workers)
step_size_train = dataset_train.get_dataset_size()

dataset_val = create_dataset_cifar10(dataset_dir=data_dir,
                                     usage="test",
                                     resize=image_size,
                                     batch_size=batch_size,
                                     workers=workers)
step_size_val = dataset_val.get_dataset_size()
import mindspore as ms
import mindspore.dataset as ds
import mindspore.dataset.vision as vision
import mindspore.dataset.transforms as transforms
from mindspore import dtype as mstype
​
data_dir = "./datasets-cifar10-bin/cifar-10-batches-bin"  # 数据集根目录
batch_size = 256  # 批量大小
image_size = 32  # 训练图像空间大小
workers = 4  # 并行线程个数
num_classes = 10  # 分类数量
​
​
def create_dataset_cifar10(dataset_dir, usage, resize, batch_size, workers):
​
    data_set = ds.Cifar10Dataset(dataset_dir=dataset_dir,
                                 usage=usage,
                                 num_parallel_workers=workers,
                                 shuffle=True)
​
    trans = []
    if usage == "train":
        trans += [
            vision.RandomCrop((32, 32), (4, 4, 4, 4)),
            vision.RandomHorizontalFlip(prob=0.5)
        ]
​
    trans += [
        vision.Resize(resize),
        vision.Rescale(1.0 / 255.0, 0.0),
        vision.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]),
        vision.HWC2CHW()
    ]
​
    target_trans = transforms.TypeCast(mstype.int32)
​
    # 数据映射操作
    data_set = data_set.map(operations=trans,
                            input_columns='image',
                            num_parallel_workers=workers)
​
    data_set = data_set.map(operations=target_trans,
                            input_columns='label',
                            num_parallel_workers=workers)
​
    # 批量操作
    data_set = data_set.batch(batch_size)
​
    return data_set
​
​
# 获取处理后的训练与测试数据集
​
dataset_train = create_dataset_cifar10(dataset_dir=data_dir,
                                       usage="train",
                                       resize=image_size,
                                       batch_size=batch_size,
                                       workers=workers)
step_size_train = dataset_train.get_dataset_size()
​
dataset_val = create_dataset_cifar10(dataset_dir=data_dir,
                                     usage="test",
                                     resize=image_size,
                                     batch_size=batch_size,
                                     workers=workers)
step_size_val = dataset_val.get_dataset_size()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131

Visualize the CIFAR-10 training dataset.


import matplotlib.pyplot as plt
import numpy as np

data_iter = next(dataset_train.create_dict_iterator())

images = data_iter["image"].asnumpy()
labels = data_iter["label"].asnumpy()
print(f"Image shape: {images.shape}, Label shape: {labels.shape}")

# 训练数据集中,前六张图片所对应的标签
print(f"Labels: {labels[:6]}")

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

insert image description here

classes = []

with open(data_dir + "/batches.meta.txt", "r") as f:
    for line in f:
        line = line.rstrip()
        if line:
            classes.append(line)

# 训练数据集的前六张图片
plt.figure()
for i in range(6):
    plt.subplot(2, 3, i + 1)
    image_trans = np.transpose(images[i], (1, 2, 0))
    mean = np.array([0.4914, 0.4822, 0.4465])
    std = np.array([0.2023, 0.1994, 0.2010])
    image_trans = std * image_trans + mean
    image_trans = np.clip(image_trans, 0, 1)
    plt.title(f"{classes[labels[i]]}")
    plt.imshow(image_trans)
    plt.axis("off")
plt.show()
import matplotlib.pyplot as plt
import numpy as np
​
data_iter = next(dataset_train.create_dict_iterator())
​
images = data_iter["image"].asnumpy()
labels = data_iter["label"].asnumpy()
print(f"Image shape: {images.shape}, Label shape: {labels.shape}")
​
# 训练数据集中,前六张图片所对应的标签
print(f"Labels: {labels[:6]}")
​
classes = []with open(data_dir + "/batches.meta.txt", "r") as f:
    for line in f:
        line = line.rstrip()
        if line:
            classes.append(line)
​
# 训练数据集的前六张图片
plt.figure()
for i in range(6):
    plt.subplot(2, 3, i + 1)
    image_trans = np.transpose(images[i], (1, 2, 0))
    mean = np.array([0.4914, 0.4822, 0.4465])
    std = np.array([0.2023, 0.1994, 0.2010])
    image_trans = std * image_trans + mean
    image_trans = np.clip(image_trans, 0, 1)
    plt.title(f"{classes[labels[i]]}")
    plt.imshow(image_trans)
    plt.axis("off")
plt.show()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54

Image shape: (256, 3, 32, 32), Label shape: (256,)
Labels: [3 2 7 6 0 4]

Build the network

The residual network structure is the main highlight of the ResNet network. ResNet can effectively alleviate the degradation problem, achieve a deeper network structure design, and improve the training accuracy of the network. This section first describes how to build a residual network structure, and then build a ResNet50 network by stacking residual networks.

Constructing residual network structure
残差网络结构图如下图所示,残差网络由两个分支构成:一个主分支,一个shortcuts(图中弧线表示)。主分支通过堆叠一系列的卷积操作得到,shotcuts从输入直接到输出,主分支输出的特征矩阵 𝐹(𝑥)
加上shortcuts输出的特征矩阵 𝑥 得到 𝐹(𝑥)+𝑥,通过Relu激活函数后即为残差网络最后的输出。

residual

There are mainly two types of residual network structures. One is Building Block, which is suitable for shallower ResNet networks, such as ResNet18 and ResNet34; the other is Bottleneck, which is suitable for deeper ResNet networks, such as ResNet50, ResNet101 and ResNet152.

Building Block

The Building Block structure is shown in the figure below. The main branch has a two-layer convolutional network structure:

The first layer of the main branch network takes the input channel of 64 as an example. First, a 3×3
The convolution layer, then the Batch Normalization layer, and finally the Relu activation function layer, with an output channel of 64;
The input channel of the second layer of the main branch network is 64, first through a 3×3
The convolution layer is then passed through the Batch Normalization layer, and the output channel is 64.
Finally, the feature matrix output by the main branch is added to the feature matrix output by shortcuts, and the final output of the Building Block is obtained through the Relu activation function.

building-block-5

When adding the feature matrices output by the main branch and shortcuts, it is necessary to ensure that the shape of the feature matrices output by the main branch and shortcuts are the same. If the shape of the feature matrices output by the main branch and shortcuts are different, such as when the output channel is twice the input channel, shortcuts need to use a convolution kernel of 1×1 with the same number of output channels for convolution operation; if the output image is twice the size of the input image, the stride in the convolution operation in shortcuts needs to be set to 2, and the stride of the first convolution operation of the main branch also needs to be set to 2.

The following code defines the ResidualBlockBase class to implement the Building Block structure.

from typing import Type, Union, List, Optional
import mindspore.nn as nn
from mindspore.common.initializer import Normal
​
# 初始化卷积层与BatchNorm的参数
weight_init = Normal(mean=0, sigma=0.02)
gamma_init = Normal(mean=1, sigma=0.02)
​
class ResidualBlockBase(nn.Cell):
    expansion: int = 1  # 最后一个卷积核数量与第一个卷积核数量相等
​
    def __init__(self, in_channel: int, out_channel: int,
                 stride: int = 1, norm: Optional[nn.Cell] = None,
                 down_sample: Optional[nn.Cell] = None) -> None:
        super(ResidualBlockBase, self).__init__()
        if not norm:
            self.norm = nn.BatchNorm2d(out_channel)
        else:
            self.norm = norm
​
        self.conv1 = nn.Conv2d(in_channel, out_channel,
                               kernel_size=3, stride=stride,
                               weight_init=weight_init)
        self.conv2 = nn.Conv2d(in_channel, out_channel,
                               kernel_size=3, weight_init=weight_init)
        self.relu = nn.ReLU()
        self.down_sample = down_sample
​
    def construct(self, x):
        """ResidualBlockBase construct."""
        identity = x  # shortcuts分支
​
        out = self.conv1(x)  # 主分支第一层:3*3卷积层
        out = self.norm(out)
        out = self.relu(out)
        out = self.conv2(out)  # 主分支第二层:3*3卷积层
        out = self.norm(out)
​
        if self.down_sample is not None:
            identity = self.down_sample(x)
        out += identity  # 输出为主分支与shortcuts之和
        out = self.relu(out)
​
        return out
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44

Bottleneck

The Bottleneck structure diagram is shown in the figure below. Under the same input, the Bottleneck structure has fewer parameters than the Building Block structure and is more suitable for networks with deeper layers. The residual structure used by ResNet50 is Bottleneck. The main branch of this structure has three layers of convolutional structures, namely 1×1
The convolutional layer, 3×3
Convolutional layer and 1×1
The convolutional layer, where 1×1
The convolutional layers of play the role of dimensionality reduction and dimensionality increase respectively.

The first layer of the main branch network takes the input channel of 256 as an example. First, the number of channels is 64 and the size is 1×1.
The convolution kernel is used for dimensionality reduction, then passes through the Batch Normalization layer, and finally passes through the Relu activation function layer, with an output channel of 64;
The number of main branch second layer network passes is 64, and the size is 3×3
The convolution kernel extracts features, then passes through the Batch Normalization layer, and finally passes through the Relu activation function layer, with an output channel of 64;
The number of passes in the third layer of the main branch is 256, and the size is 1×1
The convolution kernel is upgraded and then passed through the Batch Normalization layer, with an output channel of 256.
Finally, the feature matrix output by the main branch is added to the feature matrix output by shortcuts, and the final output of Bottleneck is obtained through the Relu activation function.

building-block-6

When adding the feature matrices output by the main branch and shortcuts, it is necessary to ensure that the shape of the feature matrices output by the main branch and shortcuts are the same. If the shape of the feature matrices output by the main branch and shortcuts are different, such as when the output channel is twice the input channel, the number of channels used on shortcuts must be equal to the output channel, and the size is 1×1
convolution operation is performed with the convolution kernel of ; if the output image is half the size of the input image, the stride in the convolution operation in shortcuts should be set to 2, and the stride of the second convolution operation of the main branch should also be set to 2.

The following code defines the ResidualBlock class to implement the Bottleneck structure.

class ResidualBlock(nn.Cell):
    expansion = 4  # 最后一个卷积核的数量是第一个卷积核数量的4倍
​
    def __init__(self, in_channel: int, out_channel: int,
                 stride: int = 1, down_sample: Optional[nn.Cell] = None) -> None:
        super(ResidualBlock, self).__init__()
​
        self.conv1 = nn.Conv2d(in_channel, out_channel,
                               kernel_size=1, weight_init=weight_init)
        self.norm1 = nn.BatchNorm2d(out_channel)
        self.conv2 = nn.Conv2d(out_channel, out_channel,
                               kernel_size=3, stride=stride,
                               weight_init=weight_init)
        self.norm2 = nn.BatchNorm2d(out_channel)
        self.conv3 = nn.Conv2d(out_channel, out_channel * self.expansion,
                               kernel_size=1, weight_init=weight_init)
        self.norm3 = nn.BatchNorm2d(out_channel * self.expansion)
​
        self.relu = nn.ReLU()
        self.down_sample = down_sample
​
    def construct(self, x):
​
        identity = x  # shortscuts分支
​
        out = self.conv1(x)  # 主分支第一层:1*1卷积层
        out = self.norm1(out)
        out = self.relu(out)
        out = self.conv2(out)  # 主分支第二层:3*3卷积层
        out = self.norm2(out)
        out = self.relu(out)
        out = self.conv3(out)  # 主分支第三层:1*1卷积层
        out = self.norm3(out)
​
        if self.down_sample is not None:
            identity = self.down_sample(x)
​
        out += identity  # 输出为主分支与shortcuts之和
        out = self.relu(out)
​
        return out
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41

Build ResNet50 network

The ResNet network layer structure is shown in the figure below. Taking the input color image 224×224 as an example, it first passes through a convolutional layer conv1 with 64 convolution kernels of size 7×7 and stride 2. The output image size of this layer is 112×112 and the output channel is 64; then it passes through a 3×3 maximum downsampling pooling layer, the output image size of this layer is 56×56 and the output channel is 64; then 4 residual network blocks (conv2_x, conv3_x, conv4_x and conv5_x) are stacked, the output image size is 7×7 and the output channel is 2048; finally, through an average pooling layer, a fully connected layer and softmax, the classification probability is obtained.

For each residual network block, taking conv2_x in the ResNet50 network as an example, it is composed of 3 stacked Bottleneck structures, each Bottleneck input channel is 64, and the output channel is 256.

The following example defines make_layer to build a residual block, and its parameters are as follows:

  • last_out_channel: The number of channels of the previous residual network output.
  • block: The category of the residual network, which are ResidualBlockBase and ResidualBlock.
  • channel: The number of channels of the residual network input.
  • block_nums: The number of residual network blocks stacked.
  • stride: The stride of the convolution movement.
def make_layer(last_out_channel, block: Type[Union[ResidualBlockBase, ResidualBlock]],
               channel: int, block_nums: int, stride: int = 1):
    down_sample = None  # shortcuts分支
​
    if stride != 1 or last_out_channel != channel * block.expansion:
​
        down_sample = nn.SequentialCell([
            nn.Conv2d(last_out_channel, channel * block.expansion,
                      kernel_size=1, stride=stride, weight_init=weight_init),
            nn.BatchNorm2d(channel * block.expansion, gamma_init=gamma_init)
        ])
​
    layers = []
    layers.append(block(last_out_channel, channel, stride=stride, down_sample=down_sample))
​
    in_channel = channel * block.expansion
    # 堆叠残差网络
    for _ in range(1, block_nums):
​
        layers.append(block(in_channel, channel))
​
    return nn.SequentialCell(layers)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

The ResNet50 network has a total of 5 convolutional structures, an average pooling layer, and a fully connected layer. Take the CIFAR-10 dataset as an example:

conv1: The input image size is 32×32 and the input channel is 3. First, it passes through a convolution layer with 64 convolution kernels, a convolution kernel size of 7×7 and a stride of 2; then it passes through a Batch Normalization layer; and finally passes through a Reul activation function. The output feature map size of this layer is 16×16 and the output channel is 64.

conv2_x: The input feature map size is 16×16 and the input channel is 64. First, it undergoes a maximum downsampling pooling operation with a convolution kernel size of 3×3 and a stride of 2; then 3 bottles of [1×1, 64; 3×3, 64; 1×1, 256] structures are stacked. The output feature map size of this layer is 8×8 and the output channel is 256.

conv3_x: The input feature map size is 8×8 and the input channel is 256. This layer stacks 4 bottles of [1×1, 128; 3×3, 128; 1×1, 512] structure. The output feature map size of this layer is 4×4 and the output channel is 512.

conv4_x: The input feature map size is 4×4 and the input channel is 512. This layer stacks 6 bottles of [1×1, 256; 3×3, 256; 1×1, 1024] structure. The output feature map size of this layer is 2×2 and the output channel is 1024.

conv5_x: The input feature map size is 2×2 and the input channel is 1024. This layer stacks 3 bottles of [1×1, 512; 3×3, 512; 1×1, 2048] structure. The output feature map size of this layer is 1×1 and the output channel is 2048.

average pool & fc: The input channel is 2048, and the output channel is the number of classified categories.

The following sample code implements the construction of the ResNet50 model. The ResNet50 model can be constructed by calling the resnet50 function. The parameters of the resnet50 function are as follows:

  • num_classes: The number of categories for classification. The default number of categories is 1000.
  • pretrained: Download the corresponding training model and load the parameters in the pretrained model into the network.
from mindspore import load_checkpoint, load_param_into_net
​
​
class ResNet(nn.Cell):
    def __init__(self, block: Type[Union[ResidualBlockBase, ResidualBlock]],
                 layer_nums: List[int], num_classes: int, input_channel: int) -> None:
        super(ResNet, self).__init__()
​
        self.relu = nn.ReLU()
        # 第一个卷积层,输入channel为3(彩色图像),输出channel为64
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, weight_init=weight_init)
        self.norm = nn.BatchNorm2d(64)
        # 最大池化层,缩小图片的尺寸
        self.max_pool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode='same')
        # 各个残差网络结构块定义
        self.layer1 = make_layer(64, block, 64, layer_nums[0])
        self.layer2 = make_layer(64 * block.expansion, block, 128, layer_nums[1], stride=2)
        self.layer3 = make_layer(128 * block.expansion, block, 256, layer_nums[2], stride=2)
        self.layer4 = make_layer(256 * block.expansion, block, 512, layer_nums[3], stride=2)
        # 平均池化层
        self.avg_pool = nn.AvgPool2d()
        # flattern层
        self.flatten = nn.Flatten()
        # 全连接层
        self.fc = nn.Dense(in_channels=input_channel, out_channels=num_classes)
​
    def construct(self, x):
​
        x = self.conv1(x)
        x = self.norm(x)
        x = self.relu(x)
        x = self.max_pool(x)
​
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
​
        x = self.avg_pool(x)
        x = self.flatten(x)
        x = self.fc(x)
​
        return x
def _resnet(model_url: str, block: Type[Union[ResidualBlockBase, ResidualBlock]],
            layers: List[int], num_classes: int, pretrained: bool, pretrained_ckpt: str,
            input_channel: int):
    model = ResNet(block, layers, num_classes, input_channel)
​
    if pretrained:
        # 加载预训练模型
        download(url=model_url, path=pretrained_ckpt, replace=True)
        param_dict = load_checkpoint(pretrained_ckpt)
        load_param_into_net(model, param_dict)
​
    return model
​
​
def resnet50(num_classes: int = 1000, pretrained: bool = False):
    """ResNet50模型"""
    resnet50_url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/models/application/resnet50_224_new.ckpt"
    resnet50_ckpt = "./LoadPretrainedModel/resnet50_224_new.ckpt"
    return _resnet(resnet50_url, ResidualBlock, [3, 4, 6, 3], num_classes,
                   pretrained, resnet50_ckpt, 2048)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63

insert image description here

Model training and evaluation

This section uses the ResNet50 pre-trained model for fine-tuning. Call resnet50 to construct the ResNet50 model and set the pretrained parameter to True. The ResNet50 pre-trained model will be automatically downloaded and the parameters in the pre-trained model will be loaded into the network. Then define the optimizer and loss function, print the training loss value and evaluation accuracy epoch by epoch, and save the ckpt file with the highest evaluation accuracy (resnet50-best.ckpt) to ./BestCheckPoint in the current path.

Since the output size of the fully connected layer (fc) of the pre-trained model (corresponding to the parameter num_classes) is 1000, in order to successfully load the pre-trained weights, we set the model's fully connected output size to the default 1000. The CIFAR10 dataset has 10 categories. When using this dataset for training, the output size of the fully connected layer of the model with the pre-trained weights loaded needs to be reset to 10.

Here we show the training process of 5 epochs. If you want to achieve the ideal training effect, it is recommended to train for 80 epochs.

# 定义ResNet50网络
network = resnet50(pretrained=True)
​
# 全连接层输入层的大小
in_channel = network.fc.in_channels
fc = nn.Dense(in_channels=in_channel, out_channels=10)
# 重置全连接层
network.fc = fc
# 设置学习率
num_epochs = 5
lr = nn.cosine_decay_lr(min_lr=0.00001, max_lr=0.001, total_step=step_size_train * num_epochs,
                        step_per_epoch=step_size_train, decay_epoch=num_epochs)
# 定义优化器和损失函数
opt = nn.Momentum(params=network.trainable_params(), learning_rate=lr, momentum=0.9)
loss_fn = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
​
​
def forward_fn(inputs, targets):
    logits = network(inputs)
    loss = loss_fn(logits, targets)
    return loss
​
​
grad_fn = ms.value_and_grad(forward_fn, None, opt.parameters)
​
​
def train_step(inputs, targets):
    loss, grads = grad_fn(inputs, targets)
    opt(grads)
    return loss
import os
​
# 创建迭代器
data_loader_train = dataset_train.create_tuple_iterator(num_epochs=num_epochs)
data_loader_val = dataset_val.create_tuple_iterator(num_epochs=num_epochs)
​
# 最佳模型存储路径
best_acc = 0
best_ckpt_dir = "./BestCheckpoint"
best_ckpt_path = "./BestCheckpoint/resnet50-best.ckpt"
​
if not os.path.exists(best_ckpt_dir):
    os.mkdir(best_ckpt_dir)
import mindspore.ops as ops
​
​
def train(data_loader, epoch):
    """模型训练"""
    losses = []
    network.set_train(True)
​
    for i, (images, labels) in enumerate(data_loader):
        loss = train_step(images, labels)
        if i % 100 == 0 or i == step_size_train - 1:
            print('Epoch: [%3d/%3d], Steps: [%3d/%3d], Train Loss: [%5.3f]' %
                  (epoch + 1, num_epochs, i + 1, step_size_train, loss))
        losses.append(loss)
​
    return sum(losses) / len(losses)
​
​
def evaluate(data_loader):
    """模型验证"""
    network.set_train(False)
​
    correct_num = 0.0  # 预测正确个数
    total_num = 0.0  # 预测总数
​
    for images, labels in data_loader:
        logits = network(images)
        pred = logits.argmax(axis=1)  # 预测结果
        correct = ops.equal(pred, labels).reshape((-1, ))
        correct_num += correct.sum().asnumpy()
        total_num += correct.shape[0]
​
    acc = correct_num / total_num  # 准确率
​
    return acc
# 开始循环训练
print("Start Training Loop ...")
​
for epoch in range(num_epochs):
    curr_loss = train(data_loader_train, epoch)
    curr_acc = evaluate(data_loader_val)
​
    print("-" * 50)
    print("Epoch: [%3d/%3d], Average Train Loss: [%5.3f], Accuracy: [%5.3f]" % (
        epoch+1, num_epochs, curr_loss, curr_acc
    ))
    print("-" * 50)
​
    # 保存当前预测准确率最高的模型
    if curr_acc > best_acc:
        best_acc = curr_acc
        ms.save_checkpoint(network, best_ckpt_path)
​
print("=" * 80)
print(f"End of validation the best Accuracy is: {best_acc: 5.3f}, "
      f"save the best ckpt file in {best_ckpt_path}", flush=True)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
Start Training Loop ...
Epoch: [  1/  5], Steps: [  1/196], Train Loss: [2.389]
Epoch: [  1/  5], Steps: [101/196], Train Loss: [1.467]
Epoch: [  1/  5], Steps: [196/196], Train Loss: [1.093]
--------------------------------------------------
Epoch: [  1/  5], Average Train Loss: [1.641], Accuracy: [0.595]
--------------------------------------------------
Epoch: [  2/  5], Steps: [  1/196], Train Loss: [1.253]
Epoch: [  2/  5], Steps: [101/196], Train Loss: [0.974]
Epoch: [  2/  5], Steps: [196/196], Train Loss: [0.832]
--------------------------------------------------
Epoch: [  2/  5], Average Train Loss: [1.019], Accuracy: [0.685]
--------------------------------------------------
Epoch: [  3/  5], Steps: [  1/196], Train Loss: [0.917]
Epoch: [  3/  5], Steps: [101/196], Train Loss: [0.879]
Epoch: [  3/  5], Steps: [196/196], Train Loss: [0.743]
--------------------------------------------------
Epoch: [  3/  5], Average Train Loss: [0.852], Accuracy: [0.721]
--------------------------------------------------
Epoch: [  4/  5], Steps: [  1/196], Train Loss: [0.911]
Epoch: [  4/  5], Steps: [101/196], Train Loss: [0.703]
Epoch: [  4/  5], Steps: [196/196], Train Loss: [0.768]
--------------------------------------------------
Epoch: [  4/  5], Average Train Loss: [0.777], Accuracy: [0.737]
--------------------------------------------------
Epoch: [  5/  5], Steps: [  1/196], Train Loss: [0.793]
Epoch: [  5/  5], Steps: [101/196], Train Loss: [0.809]
Epoch: [  5/  5], Steps: [196/196], Train Loss: [0.734]
--------------------------------------------------
Epoch: [  5/  5], Average Train Loss: [0.745], Accuracy: [0.742]
--------------------------------------------------
================================================================================
End of validation the best Accuracy is:  0.742, save the best ckpt file in ./BestCheckpoint/resnet50-best.ckpt
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33

insert image description here

Visualizing Model Predictions

Define the visualize_model function, use the model with the highest verification accuracy to predict the CIFAR-10 test dataset, and visualize the prediction results. If the predicted font color is blue, it means the prediction is correct, and if the predicted font color is red, it means the prediction is wrong.

From the above results, we can see that the prediction accuracy of the model in the validation data set is about 70% under 5 epochs, that is, under normal circumstances, 2 out of 6 pictures will fail to be predicted. If you want to achieve the ideal training effect, it is recommended to train for 80 epochs.

import matplotlib.pyplot as plt


def visualize_model(best_ckpt_path, dataset_val):
    num_class = 10  # 对狼和狗图像进行二分类
    net = resnet50(num_class)
    # 加载模型参数
    param_dict = ms.load_checkpoint(best_ckpt_path)
    ms.load_param_into_net(net, param_dict)
    # 加载验证集的数据进行验证
    data = next(dataset_val.create_dict_iterator())
    images = data["image"]
    labels = data["label"]
    # 预测图像类别
    output = net(data['image'])
    pred = np.argmax(output.asnumpy(), axis=1)

    # 图像分类
    classes = []

    with open(data_dir + "/batches.meta.txt", "r") as f:
        for line in f:
            line = line.rstrip()
            if line:
                classes.append(line)

    # 显示图像及图像的预测值
    plt.figure()
    for i in range(6):
        plt.subplot(2, 3, i + 1)
        # 若预测正确,显示为蓝色;若预测错误,显示为红色
        color = 'blue' if pred[i] == labels.asnumpy()[i] else 'red'
        plt.title('predict:{}'.format(classes[pred[i]]), color=color)
        picture_show = np.transpose(images.asnumpy()[i], (1, 2, 0))
        mean = np.array([0.4914, 0.4822, 0.4465])
        std = np.array([0.2023, 0.1994, 0.2010])
        picture_show = std * picture_show + mean
        picture_show = np.clip(picture_show, 0, 1)
        plt.imshow(picture_show)
        plt.axis('off')

    plt.show()


# 使用测试数据集进行验证
visualize_model(best_ckpt_path=best_ckpt_path, dataset_val=dataset_val)
import matplotlib.pyplot as plt
​
​
def visualize_model(best_ckpt_path, dataset_val):
    num_class = 10  # 对狼和狗图像进行二分类
    net = resnet50(num_class)
    # 加载模型参数
    param_dict = ms.load_checkpoint(best_ckpt_path)
    ms.load_param_into_net(net, param_dict)
    # 加载验证集的数据进行验证
    data = next(dataset_val.create_dict_iterator())
    images = data["image"]
    labels = data["label"]
    # 预测图像类别
    output = net(data['image'])
    pred = np.argmax(output.asnumpy(), axis=1)
​
    # 图像分类
    classes = []
​
    with open(data_dir + "/batches.meta.txt", "r") as f:
        for line in f:
            line = line.rstrip()
            if line:
                classes.append(line)
​
    # 显示图像及图像的预测值
    plt.figure()
    for i in range(6):
        plt.subplot(2, 3, i + 1)
        # 若预测正确,显示为蓝色;若预测错误,显示为红色
        color = 'blue' if pred[i] == labels.asnumpy()[i] else 'red'
        plt.title('predict:{}'.format(classes[pred[i]]), color=color)
        picture_show = np.transpose(images.asnumpy()[i], (1, 2, 0))
        mean = np.array([0.4914, 0.4822, 0.4465])
        std = np.array([0.2023, 0.1994, 0.2010])
        picture_show = std * picture_show + mean
        picture_show = np.clip(picture_show, 0, 1)
        plt.imshow(picture_show)
        plt.axis('off')
​
    plt.show()
​
​
# 使用测试数据集进行验证
visualize_model(best_ckpt_path=best_ckpt_path, dataset_val=dataset_val)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93

insert image description here