Technology sharing

VGG16 pytorcha instrumentorum classificationis imaginis exsequendam et vestigia explicat in specie

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

VGG16 implements image classification

Hic efficiendum retis VGG-16 retis ad CIFAR notitias digerendas

VGG16 network introductio

Praefatio

Altissima Convolutionis Networks pro magna-Scale Image agnitio》

ICLR 2015

VGGEx Oxonia estVisualGeometriaG Propositum a globo roup (originem nominis VGG videre possis). Hoc retis opus in ILSVRC 2014. Praecipuum opus est probare altitudinem retis augere posse finalem retis observantiam quodammodo afficere. VGG duas habet structuras, scilicet VGG16 et VGG19. Nulla differentia essentialis inter duas, sed retis profunditas differt.

VGG principium

Emendatio VGG16 comparati AlexNet estPluribus nucleis continuis 3x3 convolutionis utere, ut reponantur nuclei majoris convolutionis in AlexNet (11x11, 7x7, 5x5). . Pro dato recep- tivo agro (magnitudo loci input imaginis relativa ad output), usus nuclei reclinati parvae convolutionis melior est quam nucleis magnis convolutionis utens, quia multiplex nonlinearibus stratis retis altitudinem augere potest ut modus magis multiplex discat sumptus est relative parva (parametri pauciores).

Ut simpliciter, in VGG, tria 3x3 nuclei convolutionis 7x7 convolutionis reponere adhibentur, et nuclei convolutionis 3x3 duo nuclei convolutionis 5*5 reponere adhibentur agri receptivi, augetur altitudo reticuli, et effectus retis neuralis quodammodo emendatur.

Exempli gratia, iacuit superpositio trium 3x3 convolutionis nucleorum cum passu 1 considerari potest ut campus receptivus magnitudinis 7 (significat enim tres convolutiones continuas 3x3 aequipollere 7x7 convolutioni), eiusque parametri totalis sunt Quantitates 3x(9xC^2). Si 7x7 convolutionis nucleus directe adhibetur, numerus parametri est 49xC^2, ubi C refertur ad numerum canalium input et output.Patet, 27xC .II minus quam 49xC *2, hoc est, parametri reducuntur;

Haec explicatio est, cur nuclei convolutionis 3x3 duo loco nuclei convolutionis 5*5 adhiberi possint;

Convolutio 5x5 pro parva retis plene connexis labens in 5x5 area habetur. Primum cum 3x3 convolutione colum convolvi possumus, deinde strato plene connexo ad 3x3 convolutionis output connectere uti possumus videatur pro 3x3 accumsan volut. Hoc modo cascades (superponere) possumus duas convolutiones 3x3 loco unius 5x5 convolutionis.

Singula in figura infra monstrantur:

Insert imaginem descriptionis hic

VGG network compages

Insert imaginem descriptionis hic

Haec est structura retis VGG (sive VGG16 et VGG19 adsunt);

Insert imaginem descriptionis hic

GG network compages

VGG16 continet XVI stratas occultas (13 stratis convolutionum et stratis 3 plene connexis), ut in columna D in figura supra demonstratum est.

VGG19 XIX stratas latentes continet (stratas convolutionum 16 et 3 stratis plene connexas), ut in figura superiore E in columna ostensum est.

Structura retis VGG valde constans est, adhibitis 3x3 convolutionibus et 2x2 max collatis ab initio ad finem.

commoda VGG

VGGNet structura valde simplex est. Tota retis utitur eadem magnitudine nuclei nuclei (3x3) et magnitudine maxime mera (2x2).

Compositio plurium parvarum colum (3x3) stratarum convolutionum melior est quam unum magnum colum (5x5 vel 7x7) stratum convolutionis:

Certum est perficientur retis structuram perspiciendi continue emendari posse.

Incommoda VGG

VGG plures facultates computandi consumit et parametris pluribus utitur (hoc non est olla 3x3 convolutionis), unde in more memoriae usu (140M).

Data set processus

Dataset introduction

CIFAR (Institutum Canadian For Advanced Research) dataset parva imago dataset late in campo visionis computatrae adhibita. Maxime ad apparatus eruditionis et visionis computatrae algorithms exercenda adhibita est, praesertim in operibus sicut agnitio imaginis et divisio. Dataset CIFAR duabus partibus principalibus constat: CIFAR-10 et CIFAR-100.

CIFAR-10 dataset est continens imagines 60,000 et 32x32 coloratas, quae in 10 categorias dividuntur, cuiuslibet categoriae sunt quinque millia imagines. X genera sunt: ​​aëroplani, currus, aves, feles, cervi, canes, ranae, equi, lintres, aliquet. In e dataset, 50,000 imagines ad exercitationem adhibentur et 10,000 imagines ad probationem adhibentur. CIFAR-10 dataset unus e notitiarum popularium factus est in investigationibus et doctrina in campo visionis computatrae propter magnitudinem et locupletem informationes classis.

Dataset characteres
  • medium amplitudo: Parva magnitudo imaginis CIFAR dataset (32x32) easque specimen facit ad cito instituendum et probandum novum algorithms computatorium visionis.
  • Variis generibus: CIFAR-10 praecipuas imagines operum classificationis praebet, dum CIFAR-100 ulteriores classificationis facultates algorithmi provocat.
  • late usus est: Ob has notas, CIFAR notitia copia late adhibetur in investigationibus et doctrina in visione computatrali, in doctrina machina, in alta doctrina et in aliis campis.
argumenta utendum

CIFAR notitia copia communis usus est ad operas sicut divisio imaginis, agnitionis obiecti, ac reticulorum convolutionum neuralium institutio et probatio (Cnn). Ob eius modicam magnitudinem ac locupletem informationes praedicamentorum, specimen est incipientium et inquisitorum explorantium agnitionem imaginum algorithmorum. Multi praeterea visio computatoria et apparatus discendi certationis utuntur etiam CIFAR dataset ut Probatio ad aestimandas algorithmos decertantium faciendos.

Datas paro ut praeparare, id iam percepi.

Si notitias statuto eges, pete contactum electronicam: [email protected]

Datae copiae meae per datas taedas receptae primitus generatae sunt. Nolo facere quod nunc facere cupio. processus notitiarum processui paroeciarum, altius te efficere potest de alta doctrina.

Data copia scribendi talis est:

Insert imaginem descriptionis hic

Parse omnia pittacia dataset

Genus pittacii dataset utitur a.metaTabella reposita est, ut parse opus sit .meta omnia tag data file legere. Parsing in codice talis est:

# 首先了解所有的标签,TODO 可以详细了解一下这个解包的过程
import pickle


def unpickle(file):
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict


meta_data = unpickle('./dataset_method_1/cifar-10-batches-py/batches.meta')
label_names = meta_data[b'label_names']
# 将字节标签转换为字符串
label_names = [label.decode('utf-8') for label in label_names]
print(label_names)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

Eventus analysi hi sunt:

['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
  • 1

Oneratis unum batch de notitia pro simplici probatione data

Data copia nostra recepta est, ideo documenta contenta legere oportet. Cum fasciculus binarius fasciculus est, necesse est ut modus legendi binarius utatur.

Lectio codicis talis est:

# 载入单个批次的数据
import numpy as np


def load_data_batch(file):
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
        X = dict[b'data']
        Y = dict[b'labels']
        X = X.reshape(10000, 3, 32, 32).transpose(0, 2, 3, 1)  # reshape and transpose to (10000, 32, 32, 3)
        Y = np.array(Y)
    return X, Y


# 加载第一个数据批次
data_batch_1 = './dataset_method_1/cifar-10-batches-py/data_batch_1'
X1, Y1 = load_data_batch(data_batch_1)

print(f'数据形状: {X1.shape}, 标签形状: {Y1.shape}')

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20

eventum:

数据形状: (10000, 32, 32, 3), 标签形状: (10000,)
  • 1

Onerantes omnia data

Post supra experimentum, scimus quomodo notitias onerare.

Oneratisque exercitatio paro:

# 整合所有批次的数据
def load_all_data_batches(batch_files):
    X_list, Y_list = [], []
    for file in batch_files:
        X, Y = load_data_batch(file)
        X_list.append(X)
        Y_list.append(Y)
    X_all = np.concatenate(X_list)
    Y_all = np.concatenate(Y_list)
    return X_all, Y_all


batch_files = [
    './dataset_method_1/cifar-10-batches-py/data_batch_1',
    './dataset_method_1/cifar-10-batches-py/data_batch_2',
    './dataset_method_1/cifar-10-batches-py/data_batch_3',
    './dataset_method_1/cifar-10-batches-py/data_batch_4',
    './dataset_method_1/cifar-10-batches-py/data_batch_5'
]

X_train, Y_train = load_all_data_batches(batch_files)
print(f'训练数据形状: {X_train.shape}, 训练标签形状: {Y_train.shape}')
Y_train = Y_train.astype(np.int64)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23

Output:

训练数据形状: (50000, 32, 32, 3), 训练标签形状: (50000,)
  • 1

Oneratisque test paro:

test_batch = './dataset_method_1/cifar-10-batches-py/test_batch'
X_test, Y_test = load_data_batch(test_batch)
Y_test = Y_test.astype(np.int64)
print(f'测试数据形状: {X_test.shape}, 测试标签形状: {Y_test.shape}')

  • 1
  • 2
  • 3
  • 4
  • 5

Output:

测试数据形状: (10000, 32, 32, 3), 测试标签形状: (10000,)
  • 1

Definire genus of Dataset

Definire genus Dataset classis est ad faciliorem reddendam subsequentem onerationem Dataloader ad batch exercitationis.

Tres modi sunt, qui classes Dataset peragere debent.

  • __init__()genus conditor
  • __len__()Redit spatium data copia
  • __getitem__()Accipere fragmen data ex dataset

Hic mihi exsecutio talis est:

from torch.utils.data import DataLoader, Dataset


# 定义 Pytorch 的数据集 
class CIFARDataset(Dataset):
    def __init__(self, images, labels, transform=None):
        self.images = images
        self.labels = labels
        self.transform = transform

    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        image = self.images[idx]
        label = self.labels[idx]

        if self.transform:
            image = self.transform(image)

        return image, label
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

Onerantes dataset ut Dataloader

  1. Definire transformationem ad notitias augendas. Hic institutio praemissa est. Disciplina pone necessitates per 4px dilatari, normalizatas, transversas perpendiculariter, processit cineraceos, et tandem ad elementa prima 32*32 rediit.
transform_train = transforms.Compose(
    [transforms.Pad(4),
     transforms.ToTensor(),
     transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
     transforms.RandomHorizontalFlip(),
     transforms.RandomGrayscale(),
     transforms.RandomCrop(32, padding=4),
     ])
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  1. Quia involvit processum imaginis, et notitia quae legitur ex fasciculo binario numpy data est, necesse est nos in imagine data converti numpy ordinata ad faciliorem processus imaginis. Processus sic:
# 把数据集变成 Image 的数组,不然好像不能进行数据的增强
# 改变训练数据
from PIL import Image
def get_PIL_Images(origin_data):
    datas = []
    for i in range(len(origin_data)):
        data = Image.fromarray(origin_data[i])
        datas.append(data)
    return datas
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  1. Ut doctus dataloader
train_data = get_PIL_Images(X_train)
train_loader = DataLoader(CIFARDataset(train_data, Y_train, transform_train), batch_size=24, shuffle=True)
  • 1
  • 2
  1. Obtinendae testi televisificae tabulae non nimis multum processui eget.
# 测试集的预处理
transform_test = transforms.Compose(
    [
        transforms.ToTensor(),
        transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))]
)
test_loader = DataLoader(CIFARDataset(X_test, Y_test, transform_test), batch_size=24, shuffle=False)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

Define network

Compagem pytorch fabricamus secundum retis VGG16 de quibus supra.

maxime dividitur;

  • convolutionis tabulatum
  • Coniuncta accumsan
  • genus tabulatum

Exsecutio talis est:

class VGG16(nn.Module):
    def __init__(self):
        super(VGG16, self).__init__()
        # 卷积层,这里进行卷积
        self.convolusion = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=3, padding=1), # 设置为padding=1 卷积完后,数据大小不会变
            nn.BatchNorm2d(96),
            nn.ReLU(inplace=True),
            nn.Conv2d(96, 96, kernel_size=3, padding=1),
            nn.BatchNorm2d(96),
            nn.ReLU(inplace=True),
            nn.Conv2d(96, 96, kernel_size=3, padding=1),
            nn.BatchNorm2d(96),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(96, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(256, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.AvgPool2d(kernel_size=1, stride=1)
        )
        # 全连接层
        self.dense = nn.Sequential(
            nn.Linear(512, 4096), # 32*32 的图像大小经过 5 次最大化池化后就只有 1*1 了,所以就是 512 个通道的数据输入全连接层
            nn.ReLU(inplace=True),
            nn.Dropout(0.4),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(0.4),
        )
        # 输出层
        self.classifier = nn.Linear(4096, 10)

    def forward(self, x):
        out = self.convolusion(x)
        out = out.view(out.size(0), -1)
        out = self.dense(out)
        out = self.classifier(out)
        return out
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75

disciplina et probatio

Ad institutionem et probationem, exemplar tantum debes instantiare, tum ipsum munus definire, munus damnum, damnum rate, ac deinde formationem et experimentum praestare.

ut infra ostende codice:

Hyperparameter definitio:

# 定义模型进行训练
model = VGG16()
# model.load_state_dict(torch.load('./my-VGG16.pth'))
optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=5e-3)
loss_func = nn.CrossEntropyLoss()
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.4, last_epoch=-1)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

Munus test:

def test():
    model.eval()
    correct = 0  # 预测正确的图片数
    total = 0  # 总共的图片数
    with torch.no_grad():
        for data in test_loader:
            images, labels = data
            images = images.to(device)
            outputs = model(images).to(device)
            outputs = outputs.cpu()
            outputarr = outputs.numpy()
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum()
    accuracy = 100 * correct / total
    accuracy_rate.append(accuracy)
    print(f'准确率为:{accuracy}%'.format(accuracy))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

Exercitatio epochae:

# 定义训练步骤
total_times = 40
total = 0
accuracy_rate = []
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

for epoch in range(total_times):
    model.train()
    model.to(device)
    running_loss = 0.0
    total_correct = 0
    total_trainset = 0
    print("epoch: ",epoch)
    for i, (data,labels) in enumerate(train_loader):
        data = data.to(device)
        outputs = model(data).to(device)
        labels = labels.to(device)
        loss = loss_func(outputs,labels).to(device)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        _,pred = outputs.max(1)
        correct = (pred == labels).sum().item()
        total_correct += correct
        total_trainset += data.shape[0]
        if i % 100 == 0 and i > 0:
            print(f"正在进行第{i}次训练, running_loss={running_loss}".format(i, running_loss))
            running_loss = 0.0
    test()
    scheduler.step()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31

Salvum fac doctus exemplar:

torch.save(model.state_dict(), './my-VGG16.pth')
accuracy_rate = np.array(accuracy_rate)
times = np.linspace(1, total_times, total_times)
plt.xlabel('times')
plt.ylabel('accuracy rate')
plt.plot(times, accuracy_rate)
plt.show()
print(accuracy_rate)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

test

  1. Define exemplum
model_my_vgg = VGG16()
  • 1
  1. Addere eruditus exemplar notitia
model_my_vgg.load_state_dict(torch.load('./my-VGG16-best.pth',map_location='cpu'))
  • 1
  1. Processus verificationis imagines inveni ipse
from torchvision import transforms
from PIL import Image

# 定义图像预处理步骤
preprocess = transforms.Compose([
    transforms.Resize((32, 32)),
    transforms.ToTensor(),
    # transforms.Normalize(mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225]),
])

def load_image(image_path):
    image = Image.open(image_path)
    image = preprocess(image)
    image = image.unsqueeze(0)  # 添加批次维度
    return image

image_data = load_image('./plane2.jpg')
print(image_data.shape)
output = model_my_vgg(image_data)
verify_data = X1[9]
verify_label = Y1[9]
output_verify = model_my_vgg(transform_test(verify_data).unsqueeze(0))
print(output)
print(output_verify)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24

Output:

torch.Size([1, 3, 32, 32])
tensor([[ 1.5990, -0.5269,  0.7254,  0.3432, -0.5036, -0.3267, -0.5302, -0.9417,
          0.4186, -0.1213]], grad_fn=<AddmmBackward0>)
tensor([[-0.6541, -2.0759,  0.6308,  1.9791,  0.8525,  1.2313,  0.1856,  0.3243,
         -1.3374, -1.0211]], grad_fn=<AddmmBackward0>)
  • 1
  • 2
  • 3
  • 4
  • 5
  1. Proventus Print
print(label_names[torch.argmax(output,dim=1,keepdim=False)])
print(label_names[verify_label])
print("pred:",label_names[torch.argmax(output_verify,dim=1,keepdim=False)])
  • 1
  • 2
  • 3
airplane
cat
pred: cat
  • 1
  • 2
  • 3

Insert imaginem descriptionis hic

Quin equus

Insert imaginem descriptionis hic

verificationem canem

Insert imaginem descriptionis hic