Shengsi 12일차 25일 학습 체크인 캠프 | 간단한 딥러닝 ResNet50 이미지 분류

Shengsi 12일차 25일 학습 체크인 캠프 | 간단한 딥러닝 ResNet50 이미지 분류 - ResNet50 네트워크 구축

2024-07-12

ResNet은 주로 깊이가 깊어질 때 깊은 컨벌루션 네트워크의 "성능 저하" 문제를 해결합니다. 일반적인 컨벌루션 신경망에서 네트워크의 깊이가 증가함에 따라 발생하는 첫 번째 문제는 기울기 소멸 및 폭발입니다. 이 문제는 Szegedy가 BN 레이어를 제안한 이후 성공적으로 해결되었습니다. BN 레이어는 각 레이어의 출력을 정규화할 수 있으므로 그라데이션이 레이어별로 역방향으로 전달된 후에도 크기가 여전히 안정적으로 유지될 수 있으며 너무 작거나 크지 않습니다. 그러나 저자는 BN을 추가하고 깊이를 증가시킨 후에도 여전히 수렴하기가 쉽지 않다는 것을 발견했습니다. 그는 두 번째 문제인 정확도 저하 문제를 언급했습니다. 레벨이 어느 정도 커지면 정확도가 포화됩니다. 이러한 감소는 과적합으로 인한 것이 아니라 네트워크가 너무 복잡하여 제한되지 않은 자유 범위 훈련만으로는 이상적인 오류율을 달성하기 어렵기 때문입니다.

정확도가 떨어지는 문제는 네트워크 구조 자체의 문제가 아니라, 이상적이지 않은 기존 학습 방법으로 인해 발생합니다. 현재 널리 사용되는 최적화 프로그램인 SGD, RMSProp, Adam 등은 네트워크 깊이가 커지면 이론적으로 최적의 수렴 결과를 얻을 수 없습니다.

적절한 네트워크 구조가 있는 한, 심층 네트워크는 얕은 네트워크보다 확실히 더 나은 성능을 발휘합니다. 증명 과정도 매우 간단합니다. 네트워크 A 뒤에 몇 개의 레이어를 추가하여 새로운 네트워크 B를 형성한다고 가정합니다. 추가된 레이어가 A의 출력에 대해서만 ID 매핑을 수행하는 경우, 즉 A의 출력이 추가됩니다. 새로운 네트워크 A를 통해. 레벨이 B의 출력이 된 후에는 변화가 없으므로 네트워크 A와 네트워크 B의 오류율은 동일합니다. 이는 심화 후의 네트워크가 심화 전의 네트워크보다 나쁘지 않음을 증명합니다.

ResNet50 이미지 분류

이미지 분류는 가장 기본적인 컴퓨터 비전 응용 프로그램으로 지도 학습 범주에 속합니다. 예를 들어 이미지(고양이, 개, 비행기, 자동차 등)가 주어지면 해당 이미지가 속하는 범주를 결정합니다. 이 장에서는 ResNet50 네트워크를 사용하여 CIFAR-10 데이터 세트를 분류하는 방법을 소개합니다.

ResNet 네트워크 소개

ResNet50 네트워크는 2015년 Microsoft Labs의 He Kaiming이 제안했으며 ILSVRC2015 이미지 분류 대회에서 1위를 차지했습니다. ResNet 네트워크가 제안되기 전에는 일련의 컨볼루션 레이어와 풀링 레이어를 쌓아서 전통적인 컨벌루션 신경망을 얻었습니다. 그러나 네트워크가 특정 깊이까지 쌓이면 성능 저하 문제가 발생합니다. 아래 그림은 CIFAR-10 데이터 세트에 대해 56-layer 네트워크와 20-layer 네트워크를 이용한 training error와 test error를 그래프로 나타낸 것이다. 그림의 데이터를 보면 training error와 test error가 있음을 알 수 있다. 56계층 네트워크는 20계층 네트워크보다 크기가 더 크다. 네트워크가 깊어질수록 오차는 예상만큼 줄어들지 않는다.

레즈넷-1

ResNet 네트워크는 성능 저하 문제를 완화하기 위해 잔여 네트워크 구조(Residual Network)를 제안합니다. ResNet 네트워크를 사용하면 더 깊은 네트워크 구조(1000개 이상의 레이어)를 구축할 수 있습니다. CIFAR-10 데이터 세트에 대한 논문에서 사용된 ResNet 네트워크의 훈련 오류 및 테스트 오류 그래프는 아래 그림과 같습니다. 그림의 점선은 훈련 오류를 나타내고 실선은 테스트 오류를 나타냅니다. 그림의 데이터에서 ResNet 네트워크 계층이 깊어질수록 학습 오류와 테스트 오류가 작아지는 것을 볼 수 있습니다.

레스넷-4

데이터 세트 준비 및 로드

CIFAR-10 데이터 세트에는 총 60,000개의 32*32 컬러 이미지가 있으며, 10개의 카테고리로 나누어져 있으며, 각 카테고리에는 6,000개의 이미지가 있으며, 데이터 세트에는 총 50,000개의 훈련 이미지와 10,000개의 평가 이미지가 있습니다. 첫째, 다음 예에서는 다운로드 인터페이스를 사용하여 현재 CIFAR-10 파일의 바이너리 버전(CIFAR-10 바이너리 버전)만 지원됩니다.

%%capture captured_output
# 实验环境已经预装了mindspore==2.2.14，如需更换mindspore版本，可更改下面mindspore的版本号
!pip uninstall mindspore -y
!pip install -i https://pypi.mirrors.ustc.edu.cn/simple mindspore==2.2.14
%%capture captured_output
# 实验环境已经预装了mindspore==2.2.14，如需更换mindspore版本，可更改下面mindspore的版本号
!pip uninstall mindspore -y
!pip install -i https://pypi.mirrors.ustc.edu.cn/simple mindspore==2.2.14
# 查看当前 mindspore 版本
!pip show mindspore
# 查看当前 mindspore 版本
!pip show mindspore
from download import download

url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/cifar-10-binary.tar.gz"

download(url, "./datasets-cifar10-bin", kind="tar.gz", replace=True)
from download import download

url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/cifar-10-binary.tar.gz"

download(url, "./datasets-cifar10-bin", kind="tar.gz", replace=True)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

'./데이터셋-cifar10-bin'
다운로드한 데이터 세트의 디렉터리 구조는 다음과 같습니다.

datasets-cifar10-bin/cifar-10-batches-bin
├── batches.meta.text
├── data_batch_1.bin
├── data_batch_2.bin
├── data_batch_3.bin
├── data_batch_4.bin
├── data_batch_5.bin
├── readme.html
└── test_batch.bin
1
2
3
4
5
6
7
8
9

여기에 이미지 설명을 삽입하세요.

그런 다음 mindpore.dataset.Cifar10Dataset 인터페이스를 사용하여 데이터 세트를 로드하고 관련 이미지 향상 작업을 수행합니다.

import mindspore as ms
import mindspore.dataset as ds
import mindspore.dataset.vision as vision
import mindspore.dataset.transforms as transforms
from mindspore import dtype as mstype

data_dir = "./datasets-cifar10-bin/cifar-10-batches-bin"  # 数据集根目录
batch_size = 256  # 批量大小
image_size = 32  # 训练图像空间大小
workers = 4  # 并行线程个数
num_classes = 10  # 分类数量

def create_dataset_cifar10(dataset_dir, usage, resize, batch_size, workers):

    data_set = ds.Cifar10Dataset(dataset_dir=dataset_dir,
                                 usage=usage,
                                 num_parallel_workers=workers,
                                 shuffle=True)

    trans = []
    if usage == "train":
        trans += [
            vision.RandomCrop((32, 32), (4, 4, 4, 4)),
            vision.RandomHorizontalFlip(prob=0.5)
        ]

    trans += [
        vision.Resize(resize),
        vision.Rescale(1.0 / 255.0, 0.0),
        vision.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]),
        vision.HWC2CHW()
    ]

    target_trans = transforms.TypeCast(mstype.int32)

    # 数据映射操作
    data_set = data_set.map(operations=trans,
                            input_columns='image',
                            num_parallel_workers=workers)

    data_set = data_set.map(operations=target_trans,
                            input_columns='label',
                            num_parallel_workers=workers)

    # 批量操作
    data_set = data_set.batch(batch_size)

    return data_set


# 获取处理后的训练与测试数据集

dataset_train = create_dataset_cifar10(dataset_dir=data_dir,
                                       usage="train",
                                       resize=image_size,
                                       batch_size=batch_size,
                                       workers=workers)
step_size_train = dataset_train.get_dataset_size()

dataset_val = create_dataset_cifar10(dataset_dir=data_dir,
                                     usage="test",
                                     resize=image_size,
                                     batch_size=batch_size,
                                     workers=workers)
step_size_val = dataset_val.get_dataset_size()
import mindspore as ms
import mindspore.dataset as ds
import mindspore.dataset.vision as vision
import mindspore.dataset.transforms as transforms
from mindspore import dtype as mstype

data_dir = "./datasets-cifar10-bin/cifar-10-batches-bin"  # 数据集根目录
batch_size = 256  # 批量大小
image_size = 32  # 训练图像空间大小
workers = 4  # 并行线程个数
num_classes = 10  # 分类数量


def create_dataset_cifar10(dataset_dir, usage, resize, batch_size, workers):

    data_set = ds.Cifar10Dataset(dataset_dir=dataset_dir,
                                 usage=usage,
                                 num_parallel_workers=workers,
                                 shuffle=True)

    trans = []
    if usage == "train":
        trans += [
            vision.RandomCrop((32, 32), (4, 4, 4, 4)),
            vision.RandomHorizontalFlip(prob=0.5)
        ]

    trans += [
        vision.Resize(resize),
        vision.Rescale(1.0 / 255.0, 0.0),
        vision.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]),
        vision.HWC2CHW()
    ]

    target_trans = transforms.TypeCast(mstype.int32)

    # 数据映射操作
    data_set = data_set.map(operations=trans,
                            input_columns='image',
                            num_parallel_workers=workers)

    data_set = data_set.map(operations=target_trans,
                            input_columns='label',
                            num_parallel_workers=workers)

    # 批量操作
    data_set = data_set.batch(batch_size)

    return data_set


# 获取处理后的训练与测试数据集

dataset_train = create_dataset_cifar10(dataset_dir=data_dir,
                                       usage="train",
                                       resize=image_size,
                                       batch_size=batch_size,
                                       workers=workers)
step_size_train = dataset_train.get_dataset_size()

dataset_val = create_dataset_cifar10(dataset_dir=data_dir,
                                     usage="test",
                                     resize=image_size,
                                     batch_size=batch_size,
                                     workers=workers)
step_size_val = dataset_val.get_dataset_size()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131

CIFAR-10 훈련 데이터 세트를 시각화합니다.


import matplotlib.pyplot as plt
import numpy as np

data_iter = next(dataset_train.create_dict_iterator())

images = data_iter["image"].asnumpy()
labels = data_iter["label"].asnumpy()
print(f"Image shape: {images.shape}, Label shape: {labels.shape}")

# 训练数据集中，前六张图片所对应的标签
print(f"Labels: {labels[:6]}")

1
2
3
4
5
6
7
8
9
10
11
12
13

여기에 이미지 설명을 삽입하세요.

classes = []

with open(data_dir + "/batches.meta.txt", "r") as f:
    for line in f:
        line = line.rstrip()
        if line:
            classes.append(line)

# 训练数据集的前六张图片
plt.figure()
for i in range(6):
    plt.subplot(2, 3, i + 1)
    image_trans = np.transpose(images[i], (1, 2, 0))
    mean = np.array([0.4914, 0.4822, 0.4465])
    std = np.array([0.2023, 0.1994, 0.2010])
    image_trans = std * image_trans + mean
    image_trans = np.clip(image_trans, 0, 1)
    plt.title(f"{classes[labels[i]]}")
    plt.imshow(image_trans)
    plt.axis("off")
plt.show()
import matplotlib.pyplot as plt
import numpy as np

data_iter = next(dataset_train.create_dict_iterator())

images = data_iter["image"].asnumpy()
labels = data_iter["label"].asnumpy()
print(f"Image shape: {images.shape}, Label shape: {labels.shape}")

# 训练数据集中，前六张图片所对应的标签
print(f"Labels: {labels[:6]}")

classes = []

with open(data_dir + "/batches.meta.txt", "r") as f:
    for line in f:
        line = line.rstrip()
        if line:
            classes.append(line)

# 训练数据集的前六张图片
plt.figure()
for i in range(6):
    plt.subplot(2, 3, i + 1)
    image_trans = np.transpose(images[i], (1, 2, 0))
    mean = np.array([0.4914, 0.4822, 0.4465])
    std = np.array([0.2023, 0.1994, 0.2010])
    image_trans = std * image_trans + mean
    image_trans = np.clip(image_trans, 0, 1)
    plt.title(f"{classes[labels[i]]}")
    plt.imshow(image_trans)
    plt.axis("off")
plt.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

이미지 모양: (256, 3, 32, 32), 레이블 모양: (256,)
라벨: [3 2 7 6 0 4]

네트워크 구축

잔여 네트워크 구조(Residual Network)는 ResNet 네트워크의 주요 특징입니다. ResNet의 잔여 네트워크 구조 사용은 성능 저하 문제를 효과적으로 완화하고 더 심층적인 네트워크 구조 설계를 달성하며 네트워크의 훈련 정확도를 향상시킬 수 있습니다. 이번 섹션에서는 먼저 Residual Network 구조를 구축하는 방법을 설명한 후 Residual Network를 쌓아서 ResNet50 네트워크를 구축합니다.

잔여 네트워크 구조 구축
残差网络结构图如下图所示，残差网络由两个分支构成：一个主分支，一个shortcuts（图中弧线表示）。主分支通过堆叠一系列的卷积操作得到，shotcuts从输入直接到输出，主分支输出的特征矩阵 𝐹(𝑥)
加上shortcuts输出的特征矩阵 𝑥 得到 𝐹(𝑥)+𝑥，通过Relu激活函数后即为残差网络最后的输出。

잔여 네트워크 구조에는 두 가지 주요 유형이 있습니다. 하나는 ResNet18 및 ResNet34와 같은 더 얕은 ResNet 네트워크에 적합한 Building Block이고, 다른 하나는 ResNet50, ResNet101 및 ResNet152와 같은 더 깊은 ResNet 네트워크에 적합한 Bottleneck입니다. .

빌딩 블록

빌딩 블록 구조 다이어그램은 아래 그림에 나와 있습니다. 기본 분기에는 2계층 컨벌루션 네트워크 구조가 있습니다.

메인 브랜치의 첫 번째 레이어 네트워크는 입력 채널을 64로 예로 들어 3×3을 전달합니다.
컨볼루션 계층, Batch Normalization 계층, 마지막으로 Relu 활성화 함수 계층을 통해 출력 채널은 64입니다.
메인 브랜치의 두 번째 레이어 네트워크의 입력 채널은 64입니다. 먼저 3×3을 전달합니다.
그런 다음 컨벌루션 레이어는 Batch Normalization 레이어를 통과하고 출력 채널은 64입니다.
마지막으로 메인 브랜치에서 출력되는 피쳐 매트릭스와 단축키에 의해 출력되는 피쳐 매트릭스를 추가하고 Relu 활성화 함수를 통해 빌딩 블록의 최종 출력을 얻습니다.

빌딩블록5

메인 브랜치와 단축키에 의해 출력되는 특징 행렬을 추가할 때, 메인 브랜치와 단축키에 의해 출력되는 특징 행렬의 모양이 동일한지 확인해야 합니다. 메인 브랜치와 단축키에 의해 출력되는 특징 행렬 모양이 다른 경우(예: 출력 채널이 입력 채널의 두 배) 단축키는 출력 채널과 동일한 수의 컨볼루션 커널을 사용해야 하며 크기는 1×1입니다. 컨볼루션 작업, 출력 이미지가 입력 이미지의 두 배 작은 경우 바로가기의 컨볼루션 작업의 스트라이드는 2로 설정되어야 하며 기본 분기의 첫 번째 레이어의 컨볼루션 작업의 스트라이드도 2로 설정되어야 합니다. 2로 설정합니다.

다음 코드는 빌딩 블록 구조를 구현하기 위해 ResidualBlockBase 클래스를 정의합니다.

from typing import Type, Union, List, Optional
import mindspore.nn as nn
from mindspore.common.initializer import Normal

# 初始化卷积层与BatchNorm的参数
weight_init = Normal(mean=0, sigma=0.02)
gamma_init = Normal(mean=1, sigma=0.02)

class ResidualBlockBase(nn.Cell):
    expansion: int = 1  # 最后一个卷积核数量与第一个卷积核数量相等

    def __init__(self, in_channel: int, out_channel: int,
                 stride: int = 1, norm: Optional[nn.Cell] = None,
                 down_sample: Optional[nn.Cell] = None) -> None:
        super(ResidualBlockBase, self).__init__()
        if not norm:
            self.norm = nn.BatchNorm2d(out_channel)
        else:
            self.norm = norm

        self.conv1 = nn.Conv2d(in_channel, out_channel,
                               kernel_size=3, stride=stride,
                               weight_init=weight_init)
        self.conv2 = nn.Conv2d(in_channel, out_channel,
                               kernel_size=3, weight_init=weight_init)
        self.relu = nn.ReLU()
        self.down_sample = down_sample

    def construct(self, x):
        """ResidualBlockBase construct."""
        identity = x  # shortcuts分支

        out = self.conv1(x)  # 主分支第一层：3*3卷积层
        out = self.norm(out)
        out = self.relu(out)
        out = self.conv2(out)  # 主分支第二层：3*3卷积层
        out = self.norm(out)

        if self.down_sample is not None:
            identity = self.down_sample(x)
        out += identity  # 输出为主分支与shortcuts之和
        out = self.relu(out)

        return out
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

병목

병목 구조 다이어그램은 아래 그림과 같습니다. 입력이 동일한 경우 병목 구조는 Building Block 구조보다 더 적은 수의 매개 변수를 가지며 ResNet50에서 사용하는 잔여 구조는 병목 현상입니다. 이 구조의 주요 가지에는 3개의 컨볼루션 구조 레이어가 있으며, 각 레이어는 1×1입니다.
컨벌루션 레이어, 3×3
컨벌루션 레이어와 1×1
컨벌루션 레이어, 여기서 1×1
컨벌루션 레이어는 각각 차원 축소 및 차원 축소 역할을 합니다.

메인 브랜치의 첫 번째 레이어 네트워크는 입력 채널을 256으로 예시합니다. 첫 번째 패스 수는 64이고 크기는 1×1입니다.
컨볼루션 커널은 차원 축소를 수행한 다음 배치 정규화 계층을 통과하고 마지막으로 Relu 활성화 함수 계층을 통과하며 출력 채널은 64입니다.
Main Branch Second Layer 네트워크 패스 수는 64개이고 크기는 3×3입니다.
컨볼루션 커널은 특징을 추출한 다음 Batch Normalization 계층을 통과하고 마지막으로 Relu 활성화 함수 계층을 통과하며 출력 채널은 64입니다.
메인 브랜치 세 번째 레벨의 패스 수는 256개이고 크기는 1×1입니다.
컨볼루션 커널의 크기가 결정된 후 Batch Normalization 계층을 통과하며 출력 채널은 256입니다.
마지막으로 Main Branch에 의해 출력되는 Feature Matrix와 Shortcut에 의해 출력되는 Feature Matrix를 추가하고 Relu 활성화 함수를 통해 Bottleneck의 최종 출력을 얻습니다.

빌딩블록-6

메인 브랜치와 단축키에 의해 출력되는 특징 행렬을 추가할 때, 메인 브랜치와 단축키에 의해 출력되는 특징 행렬의 모양이 동일한지 확인해야 합니다. 메인 브랜치와 단축키에서 출력되는 특징 행렬 모양이 다른 경우, 예를 들어 출력 채널이 입력 채널의 두 배인 경우 단축키의 개수는 출력 채널과 동일해야 하며 크기는 1×1입니다.
컨볼루션 커널은 컨볼루션 작업을 수행합니다. 출력 이미지가 입력 이미지보다 두 배 작으면 바로가기의 컨볼루션 작업의 스트라이드를 2로 설정해야 하며 기본 분기의 두 번째 레이어 컨볼루션 작업의 스트라이드도 2로 설정해야 합니다. 2로 설정됩니다.

다음 코드는 Bottleneck 구조를 구현하기 위해 ResidualBlock 클래스를 정의합니다.

class ResidualBlock(nn.Cell):
    expansion = 4  # 最后一个卷积核的数量是第一个卷积核数量的4倍

    def __init__(self, in_channel: int, out_channel: int,
                 stride: int = 1, down_sample: Optional[nn.Cell] = None) -> None:
        super(ResidualBlock, self).__init__()

        self.conv1 = nn.Conv2d(in_channel, out_channel,
                               kernel_size=1, weight_init=weight_init)
        self.norm1 = nn.BatchNorm2d(out_channel)
        self.conv2 = nn.Conv2d(out_channel, out_channel,
                               kernel_size=3, stride=stride,
                               weight_init=weight_init)
        self.norm2 = nn.BatchNorm2d(out_channel)
        self.conv3 = nn.Conv2d(out_channel, out_channel * self.expansion,
                               kernel_size=1, weight_init=weight_init)
        self.norm3 = nn.BatchNorm2d(out_channel * self.expansion)

        self.relu = nn.ReLU()
        self.down_sample = down_sample

    def construct(self, x):

        identity = x  # shortscuts分支

        out = self.conv1(x)  # 主分支第一层：1*1卷积层
        out = self.norm1(out)
        out = self.relu(out)
        out = self.conv2(out)  # 主分支第二层：3*3卷积层
        out = self.norm2(out)
        out = self.relu(out)
        out = self.conv3(out)  # 主分支第三层：1*1卷积层
        out = self.norm3(out)

        if self.down_sample is not None:
            identity = self.down_sample(x)

        out += identity  # 输出为主分支与shortcuts之和
        out = self.relu(out)

        return out
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

ResNet50 네트워크 구축

ResNet 네트워크 계층 구조는 아래 그림에 나와 있습니다. 입력 컬러 이미지 224×224를 예로 들면, 먼저 수량 64, 컨볼루션 커널 크기 7×7, 스트라이드 2로 컨볼루션 계층 conv1을 전달합니다. 이 레이어의 출력 이미지 크기는 112×112이고 출력 채널은 64입니다. 그런 다음 3×3 최대 다운샘플링 풀링 레이어를 통해 이 레이어의 출력 이미지 크기는 56×56이고 출력 채널은 64입니다. 4개의 잔여 네트워크 블록(conv2_x, conv3_x, conv4_x 및 conv5_x)을 스택합니다. 이때 출력 이미지 크기는 7×7이고 출력 채널은 2048입니다. 마지막으로 완전 연결 계층인 평균 풀링 계층을 통해 분류 확률을 얻습니다. 그리고 소프트맥스.

각 잔여 네트워크 블록에 대해 ResNet50 네트워크의 conv2_x를 예로 들면 3개의 병목 구조로 쌓입니다. 각 병목의 입력 채널은 64이고 출력 채널은 256입니다.

다음 예에서는 잔차 블록의 구성을 구현하기 위해 make_layer를 정의하며 해당 매개변수는 다음과 같습니다.

last_out_channel: 이전 잔여 네트워크가 출력한 채널 수입니다.
block: 잔여 네트워크의 카테고리로, 각각 ResidualBlockBase 및 ResidualBlock입니다.
채널: 잔여 네트워크에 입력되는 채널 수입니다.
block_nums: 누적된 잔여 네트워크 블록의 수입니다.
stride : 컨볼루션 운동의 보폭.

def make_layer(last_out_channel, block: Type[Union[ResidualBlockBase, ResidualBlock]],
               channel: int, block_nums: int, stride: int = 1):
    down_sample = None  # shortcuts分支

    if stride != 1 or last_out_channel != channel * block.expansion:

        down_sample = nn.SequentialCell([
            nn.Conv2d(last_out_channel, channel * block.expansion,
                      kernel_size=1, stride=stride, weight_init=weight_init),
            nn.BatchNorm2d(channel * block.expansion, gamma_init=gamma_init)
        ])

    layers = []
    layers.append(block(last_out_channel, channel, stride=stride, down_sample=down_sample))

    in_channel = channel * block.expansion
    # 堆叠残差网络
    for _ in range(1, block_nums):

        layers.append(block(in_channel, channel))

    return nn.SequentialCell(layers)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

ResNet50 네트워크에는 평균 풀링 레이어와 완전 연결 레이어로 구성된 총 5개의 컨벌루션 구조가 있습니다. CIFAR-10 데이터 세트를 예로 들어 보겠습니다.

conv1: 입력 이미지 크기는 32×32이고 입력 채널은 3입니다. 먼저 컨볼루션 커널 번호가 64이고 컨볼루션 커널 크기가 7×7이며 스트라이드가 2인 컨볼루션 계층을 통과한 다음 배치 정규화 계층을 통과하고 마지막으로 Reul 활성화 함수를 통과합니다. 이 레이어의 출력 특징 맵 크기는 16×16이고 출력 채널은 64입니다.

conv2_x: 입력 특징 맵 크기는 16×16이고 입력 채널은 64입니다. 먼저 컨볼루션 커널 크기가 3×3이고 스트라이드가 2인 최대 다운샘플링 풀링 작업을 거친 다음 [1×1, 64; 1×1, 256; ] 구조. 이 레이어의 출력 특징 맵 크기는 8×8이고 출력 채널은 256입니다.

conv3_x: 입력 특징 맵 크기는 8×8이고 입력 채널은 256입니다. 이 레이어는 [1×1, 128; 3×3, 128; 1×1, 512] 구조로 4개의 병목 현상을 쌓습니다. 이 레이어의 출력 특징 맵 크기는 4×4이고 출력 채널은 512입니다.

conv4_x: 입력 특징 맵 크기는 4×4이고 입력 채널은 512입니다. 이 레이어는 [1×1, 256; 3×3, 256; 1×1, 1024] 구조로 6개의 병목 현상을 쌓습니다. 이 레이어의 출력 특징 맵 크기는 2×2이고 출력 채널은 1024입니다.

conv5_x: 입력 특징 맵 크기는 2×2이고 입력 채널은 1024입니다. 이 레이어는 [1×1, 512; 3×3, 512; 1×1, 2048] 구조로 3개의 병목 현상을 쌓습니다. 이 레이어의 출력 특징 맵 크기는 1×1이고 출력 채널은 2048입니다.

average pool & fc: 입력 채널은 2048이고, 출력 채널은 분류 항목의 개수입니다.

다음 예제 코드는 resnet50 모델의 구성을 구현합니다. resnet50 함수의 매개변수는 다음과 같습니다.

num_classes: 분류를 위한 카테고리 수입니다. 기본 카테고리 수는 1000입니다.
사전 훈련됨: 해당 훈련 모델을 다운로드하고 사전 훈련된 모델의 매개변수를 네트워크에 로드합니다.

from mindspore import load_checkpoint, load_param_into_net


class ResNet(nn.Cell):
    def __init__(self, block: Type[Union[ResidualBlockBase, ResidualBlock]],
                 layer_nums: List[int], num_classes: int, input_channel: int) -> None:
        super(ResNet, self).__init__()

        self.relu = nn.ReLU()
        # 第一个卷积层，输入channel为3（彩色图像），输出channel为64
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, weight_init=weight_init)
        self.norm = nn.BatchNorm2d(64)
        # 最大池化层，缩小图片的尺寸
        self.max_pool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode='same')
        # 各个残差网络结构块定义
        self.layer1 = make_layer(64, block, 64, layer_nums[0])
        self.layer2 = make_layer(64 * block.expansion, block, 128, layer_nums[1], stride=2)
        self.layer3 = make_layer(128 * block.expansion, block, 256, layer_nums[2], stride=2)
        self.layer4 = make_layer(256 * block.expansion, block, 512, layer_nums[3], stride=2)
        # 平均池化层
        self.avg_pool = nn.AvgPool2d()
        # flattern层
        self.flatten = nn.Flatten()
        # 全连接层
        self.fc = nn.Dense(in_channels=input_channel, out_channels=num_classes)

    def construct(self, x):

        x = self.conv1(x)
        x = self.norm(x)
        x = self.relu(x)
        x = self.max_pool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avg_pool(x)
        x = self.flatten(x)
        x = self.fc(x)

        return x
def _resnet(model_url: str, block: Type[Union[ResidualBlockBase, ResidualBlock]],
            layers: List[int], num_classes: int, pretrained: bool, pretrained_ckpt: str,
            input_channel: int):
    model = ResNet(block, layers, num_classes, input_channel)

    if pretrained:
        # 加载预训练模型
        download(url=model_url, path=pretrained_ckpt, replace=True)
        param_dict = load_checkpoint(pretrained_ckpt)
        load_param_into_net(model, param_dict)

    return model


def resnet50(num_classes: int = 1000, pretrained: bool = False):
    """ResNet50模型"""
    resnet50_url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/models/application/resnet50_224_new.ckpt"
    resnet50_ckpt = "./LoadPretrainedModel/resnet50_224_new.ckpt"
    return _resnet(resnet50_url, ResidualBlock, [3, 4, 6, 3], num_classes,
                   pretrained, resnet50_ckpt, 2048)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63

여기에 이미지 설명을 삽입하세요.

모델 훈련 및 평가

이 섹션에서는 미세 조정을 위해 사전 훈련된 ResNet50 모델을 사용합니다. resnet50을 호출하여 ResNet50 모델을 구성하고 사전 훈련된 매개변수를 True로 설정합니다. ResNet50 사전 훈련된 모델이 자동으로 다운로드되고 사전 훈련된 모델의 매개변수가 네트워크에 로드됩니다. 그런 다음 옵티마이저와 손실 함수를 정의하고 훈련 손실 값과 평가 정확도를 epoch별로 인쇄하고 평가 정확도가 가장 높은 ckpt 파일(resnet50-best.ckpt)을 현재 경로의 ./BestCheckPoint에 저장합니다.

사전 학습된 모델의 완전 연결 계층(fc)의 출력 크기(상응 매개 변수 num_classes)가 1000이므로 사전 학습된 가중치를 성공적으로 로드하기 위해 모델의 완전 연결 출력 크기를 기본값으로 설정합니다. 1000. CIFAR10 데이터 세트에는 총 10개의 카테고리가 있습니다. 이 데이터 세트를 학습에 사용할 경우 사전 학습된 가중치가 로드된 모델의 완전 연결 레이어의 출력 크기를 10으로 재설정해야 합니다.

여기서는 5 에포크의 훈련 과정을 보여줍니다. 이상적인 훈련 효과를 얻으려면 80 에포크 동안 훈련하는 것이 좋습니다.

# 定义ResNet50网络
network = resnet50(pretrained=True)

# 全连接层输入层的大小
in_channel = network.fc.in_channels
fc = nn.Dense(in_channels=in_channel, out_channels=10)
# 重置全连接层
network.fc = fc
# 设置学习率
num_epochs = 5
lr = nn.cosine_decay_lr(min_lr=0.00001, max_lr=0.001, total_step=step_size_train * num_epochs,
                        step_per_epoch=step_size_train, decay_epoch=num_epochs)
# 定义优化器和损失函数
opt = nn.Momentum(params=network.trainable_params(), learning_rate=lr, momentum=0.9)
loss_fn = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')


def forward_fn(inputs, targets):
    logits = network(inputs)
    loss = loss_fn(logits, targets)
    return loss


grad_fn = ms.value_and_grad(forward_fn, None, opt.parameters)


def train_step(inputs, targets):
    loss, grads = grad_fn(inputs, targets)
    opt(grads)
    return loss
import os

# 创建迭代器
data_loader_train = dataset_train.create_tuple_iterator(num_epochs=num_epochs)
data_loader_val = dataset_val.create_tuple_iterator(num_epochs=num_epochs)

# 最佳模型存储路径
best_acc = 0
best_ckpt_dir = "./BestCheckpoint"
best_ckpt_path = "./BestCheckpoint/resnet50-best.ckpt"

if not os.path.exists(best_ckpt_dir):
    os.mkdir(best_ckpt_dir)
import mindspore.ops as ops


def train(data_loader, epoch):
    """模型训练"""
    losses = []
    network.set_train(True)

    for i, (images, labels) in enumerate(data_loader):
        loss = train_step(images, labels)
        if i % 100 == 0 or i == step_size_train - 1:
            print('Epoch: [%3d/%3d], Steps: [%3d/%3d], Train Loss: [%5.3f]' %
                  (epoch + 1, num_epochs, i + 1, step_size_train, loss))
        losses.append(loss)

    return sum(losses) / len(losses)


def evaluate(data_loader):
    """模型验证"""
    network.set_train(False)

    correct_num = 0.0  # 预测正确个数
    total_num = 0.0  # 预测总数

    for images, labels in data_loader:
        logits = network(images)
        pred = logits.argmax(axis=1)  # 预测结果
        correct = ops.equal(pred, labels).reshape((-1, ))
        correct_num += correct.sum().asnumpy()
        total_num += correct.shape[0]

    acc = correct_num / total_num  # 准确率

    return acc
# 开始循环训练
print("Start Training Loop ...")

for epoch in range(num_epochs):
    curr_loss = train(data_loader_train, epoch)
    curr_acc = evaluate(data_loader_val)

    print("-" * 50)
    print("Epoch: [%3d/%3d], Average Train Loss: [%5.3f], Accuracy: [%5.3f]" % (
        epoch+1, num_epochs, curr_loss, curr_acc
    ))
    print("-" * 50)

    # 保存当前预测准确率最高的模型
    if curr_acc > best_acc:
        best_acc = curr_acc
        ms.save_checkpoint(network, best_ckpt_path)

print("=" * 80)
print(f"End of validation the best Accuracy is: {best_acc: 5.3f}, "
      f"save the best ckpt file in {best_ckpt_path}", flush=True)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99

Start Training Loop ...
Epoch: [  1/  5], Steps: [  1/196], Train Loss: [2.389]
Epoch: [  1/  5], Steps: [101/196], Train Loss: [1.467]
Epoch: [  1/  5], Steps: [196/196], Train Loss: [1.093]
--------------------------------------------------
Epoch: [  1/  5], Average Train Loss: [1.641], Accuracy: [0.595]
--------------------------------------------------
Epoch: [  2/  5], Steps: [  1/196], Train Loss: [1.253]
Epoch: [  2/  5], Steps: [101/196], Train Loss: [0.974]
Epoch: [  2/  5], Steps: [196/196], Train Loss: [0.832]
--------------------------------------------------
Epoch: [  2/  5], Average Train Loss: [1.019], Accuracy: [0.685]
--------------------------------------------------
Epoch: [  3/  5], Steps: [  1/196], Train Loss: [0.917]
Epoch: [  3/  5], Steps: [101/196], Train Loss: [0.879]
Epoch: [  3/  5], Steps: [196/196], Train Loss: [0.743]
--------------------------------------------------
Epoch: [  3/  5], Average Train Loss: [0.852], Accuracy: [0.721]
--------------------------------------------------
Epoch: [  4/  5], Steps: [  1/196], Train Loss: [0.911]
Epoch: [  4/  5], Steps: [101/196], Train Loss: [0.703]
Epoch: [  4/  5], Steps: [196/196], Train Loss: [0.768]
--------------------------------------------------
Epoch: [  4/  5], Average Train Loss: [0.777], Accuracy: [0.737]
--------------------------------------------------
Epoch: [  5/  5], Steps: [  1/196], Train Loss: [0.793]
Epoch: [  5/  5], Steps: [101/196], Train Loss: [0.809]
Epoch: [  5/  5], Steps: [196/196], Train Loss: [0.734]
--------------------------------------------------
Epoch: [  5/  5], Average Train Loss: [0.745], Accuracy: [0.742]
--------------------------------------------------
================================================================================
End of validation the best Accuracy is:  0.742, save the best ckpt file in ./BestCheckpoint/resnet50-best.ckpt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

여기에 이미지 설명을 삽입하세요.

시각적 모델 예측

시각화_모델 함수를 정의하고 위에서 언급한 검증 정확도가 가장 높은 모델을 사용하여 CIFAR-10 테스트 데이터 세트를 예측하고 예측 결과를 시각화합니다. 예측 글꼴 색상이 파란색이면 예측이 정확하다는 의미이고, 예측 글꼴 색상이 빨간색이면 예측이 잘못되었음을 의미합니다.

위의 결과를 보면 5 epoch 이하의 검증 데이터 세트에서 모델의 예측 정확도는 약 70%로, 정상적인 상황에서는 6장의 사진 중 2장의 사진이 예측에 실패함을 알 수 있습니다. 이상적인 훈련 효과를 얻으려면 80 에포크 동안 훈련하는 것이 좋습니다.

import matplotlib.pyplot as plt


def visualize_model(best_ckpt_path, dataset_val):
    num_class = 10  # 对狼和狗图像进行二分类
    net = resnet50(num_class)
    # 加载模型参数
    param_dict = ms.load_checkpoint(best_ckpt_path)
    ms.load_param_into_net(net, param_dict)
    # 加载验证集的数据进行验证
    data = next(dataset_val.create_dict_iterator())
    images = data["image"]
    labels = data["label"]
    # 预测图像类别
    output = net(data['image'])
    pred = np.argmax(output.asnumpy(), axis=1)

    # 图像分类
    classes = []

    with open(data_dir + "/batches.meta.txt", "r") as f:
        for line in f:
            line = line.rstrip()
            if line:
                classes.append(line)

    # 显示图像及图像的预测值
    plt.figure()
    for i in range(6):
        plt.subplot(2, 3, i + 1)
        # 若预测正确，显示为蓝色；若预测错误，显示为红色
        color = 'blue' if pred[i] == labels.asnumpy()[i] else 'red'
        plt.title('predict:{}'.format(classes[pred[i]]), color=color)
        picture_show = np.transpose(images.asnumpy()[i], (1, 2, 0))
        mean = np.array([0.4914, 0.4822, 0.4465])
        std = np.array([0.2023, 0.1994, 0.2010])
        picture_show = std * picture_show + mean
        picture_show = np.clip(picture_show, 0, 1)
        plt.imshow(picture_show)
        plt.axis('off')

    plt.show()


# 使用测试数据集进行验证
visualize_model(best_ckpt_path=best_ckpt_path, dataset_val=dataset_val)
import matplotlib.pyplot as plt


def visualize_model(best_ckpt_path, dataset_val):
    num_class = 10  # 对狼和狗图像进行二分类
    net = resnet50(num_class)
    # 加载模型参数
    param_dict = ms.load_checkpoint(best_ckpt_path)
    ms.load_param_into_net(net, param_dict)
    # 加载验证集的数据进行验证
    data = next(dataset_val.create_dict_iterator())
    images = data["image"]
    labels = data["label"]
    # 预测图像类别
    output = net(data['image'])
    pred = np.argmax(output.asnumpy(), axis=1)

    # 图像分类
    classes = []

    with open(data_dir + "/batches.meta.txt", "r") as f:
        for line in f:
            line = line.rstrip()
            if line:
                classes.append(line)

    # 显示图像及图像的预测值
    plt.figure()
    for i in range(6):
        plt.subplot(2, 3, i + 1)
        # 若预测正确，显示为蓝色；若预测错误，显示为红色
        color = 'blue' if pred[i] == labels.asnumpy()[i] else 'red'
        plt.title('predict:{}'.format(classes[pred[i]]), color=color)
        picture_show = np.transpose(images.asnumpy()[i], (1, 2, 0))
        mean = np.array([0.4914, 0.4822, 0.4465])
        std = np.array([0.2023, 0.1994, 0.2010])
        picture_show = std * picture_show + mean
        picture_show = np.clip(picture_show, 0, 1)
        plt.imshow(picture_show)
        plt.axis('off')

    plt.show()


# 使用测试数据集进行验证
visualize_model(best_ckpt_path=best_ckpt_path, dataset_val=dataset_val)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93

여기에 이미지 설명을 삽입하세요.

기술나눔