2024-07-12
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Hic efficiendum retis VGG-16 retis ad CIFAR notitias digerendas
Praefatio
Altissima Convolutionis Networks pro magna-Scale Image agnitio》
ICLR 2015
VGGEx Oxonia estVisualGeometriaG Propositum a globo roup (originem nominis VGG videre possis). Hoc retis opus in ILSVRC 2014. Praecipuum opus est probare altitudinem retis augere posse finalem retis observantiam quodammodo afficere. VGG duas habet structuras, scilicet VGG16 et VGG19. Nulla differentia essentialis inter duas, sed retis profunditas differt.
VGG principium
Emendatio VGG16 comparati AlexNet estPluribus nucleis continuis 3x3 convolutionis utere, ut reponantur nuclei majoris convolutionis in AlexNet (11x11, 7x7, 5x5). . Pro dato recep- tivo agro (magnitudo loci input imaginis relativa ad output), usus nuclei reclinati parvae convolutionis melior est quam nucleis magnis convolutionis utens, quia multiplex nonlinearibus stratis retis altitudinem augere potest ut modus magis multiplex discat sumptus est relative parva (parametri pauciores).
Ut simpliciter, in VGG, tria 3x3 nuclei convolutionis 7x7 convolutionis reponere adhibentur, et nuclei convolutionis 3x3 duo nuclei convolutionis 5*5 reponere adhibentur agri receptivi, augetur altitudo reticuli, et effectus retis neuralis quodammodo emendatur.
Exempli gratia, iacuit superpositio trium 3x3 convolutionis nucleorum cum passu 1 considerari potest ut campus receptivus magnitudinis 7 (significat enim tres convolutiones continuas 3x3 aequipollere 7x7 convolutioni), eiusque parametri totalis sunt Quantitates 3x(9xC^2). Si 7x7 convolutionis nucleus directe adhibetur, numerus parametri est 49xC^2, ubi C refertur ad numerum canalium input et output.Patet, 27xC .II minus quam 49xC *2, hoc est, parametri reducuntur;
Haec explicatio est, cur nuclei convolutionis 3x3 duo loco nuclei convolutionis 5*5 adhiberi possint;
Convolutio 5x5 pro parva retis plene connexis labens in 5x5 area habetur. Primum cum 3x3 convolutione colum convolvi possumus, deinde strato plene connexo ad 3x3 convolutionis output connectere uti possumus videatur pro 3x3 accumsan volut. Hoc modo cascades (superponere) possumus duas convolutiones 3x3 loco unius 5x5 convolutionis.
Singula in figura infra monstrantur:
VGG network compages
Haec est structura retis VGG (sive VGG16 et VGG19 adsunt);
GG network compages
VGG16 continet XVI stratas occultas (13 stratis convolutionum et stratis 3 plene connexis), ut in columna D in figura supra demonstratum est.
VGG19 XIX stratas latentes continet (stratas convolutionum 16 et 3 stratis plene connexas), ut in figura superiore E in columna ostensum est.
Structura retis VGG valde constans est, adhibitis 3x3 convolutionibus et 2x2 max collatis ab initio ad finem.
commoda VGG
VGGNet structura valde simplex est. Tota retis utitur eadem magnitudine nuclei nuclei (3x3) et magnitudine maxime mera (2x2).
Compositio plurium parvarum colum (3x3) stratarum convolutionum melior est quam unum magnum colum (5x5 vel 7x7) stratum convolutionis:
Certum est perficientur retis structuram perspiciendi continue emendari posse.
Incommoda VGG
VGG plures facultates computandi consumit et parametris pluribus utitur (hoc non est olla 3x3 convolutionis), unde in more memoriae usu (140M).
CIFAR (Institutum Canadian For Advanced Research) dataset parva imago dataset late in campo visionis computatrae adhibita. Maxime ad apparatus eruditionis et visionis computatrae algorithms exercenda adhibita est, praesertim in operibus sicut agnitio imaginis et divisio. Dataset CIFAR duabus partibus principalibus constat: CIFAR-10 et CIFAR-100.
CIFAR-10 dataset est continens imagines 60,000 et 32x32 coloratas, quae in 10 categorias dividuntur, cuiuslibet categoriae sunt quinque millia imagines. X genera sunt: aëroplani, currus, aves, feles, cervi, canes, ranae, equi, lintres, aliquet. In e dataset, 50,000 imagines ad exercitationem adhibentur et 10,000 imagines ad probationem adhibentur. CIFAR-10 dataset unus e notitiarum popularium factus est in investigationibus et doctrina in campo visionis computatrae propter magnitudinem et locupletem informationes classis.
CIFAR notitia copia communis usus est ad operas sicut divisio imaginis, agnitionis obiecti, ac reticulorum convolutionum neuralium institutio et probatio (Cnn). Ob eius modicam magnitudinem ac locupletem informationes praedicamentorum, specimen est incipientium et inquisitorum explorantium agnitionem imaginum algorithmorum. Multi praeterea visio computatoria et apparatus discendi certationis utuntur etiam CIFAR dataset ut Probatio ad aestimandas algorithmos decertantium faciendos.
Datas paro ut praeparare, id iam percepi.
Si notitias statuto eges, pete contactum electronicam: [email protected]
Datae copiae meae per datas taedas receptae primitus generatae sunt. Nolo facere quod nunc facere cupio. processus notitiarum processui paroeciarum, altius te efficere potest de alta doctrina.
Data copia scribendi talis est:
Genus pittacii dataset utitur a.meta
Tabella reposita est, ut parse opus sit .meta
omnia tag data file legere. Parsing in codice talis est:
# 首先了解所有的标签,TODO 可以详细了解一下这个解包的过程
import pickle
def unpickle(file):
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
meta_data = unpickle('./dataset_method_1/cifar-10-batches-py/batches.meta')
label_names = meta_data[b'label_names']
# 将字节标签转换为字符串
label_names = [label.decode('utf-8') for label in label_names]
print(label_names)
Eventus analysi hi sunt:
['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
Data copia nostra recepta est, ideo documenta contenta legere oportet. Cum fasciculus binarius fasciculus est, necesse est ut modus legendi binarius utatur.
Lectio codicis talis est:
# 载入单个批次的数据
import numpy as np
def load_data_batch(file):
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
X = dict[b'data']
Y = dict[b'labels']
X = X.reshape(10000, 3, 32, 32).transpose(0, 2, 3, 1) # reshape and transpose to (10000, 32, 32, 3)
Y = np.array(Y)
return X, Y
# 加载第一个数据批次
data_batch_1 = './dataset_method_1/cifar-10-batches-py/data_batch_1'
X1, Y1 = load_data_batch(data_batch_1)
print(f'数据形状: {X1.shape}, 标签形状: {Y1.shape}')
eventum:
数据形状: (10000, 32, 32, 3), 标签形状: (10000,)
Post supra experimentum, scimus quomodo notitias onerare.
Oneratisque exercitatio paro:
# 整合所有批次的数据
def load_all_data_batches(batch_files):
X_list, Y_list = [], []
for file in batch_files:
X, Y = load_data_batch(file)
X_list.append(X)
Y_list.append(Y)
X_all = np.concatenate(X_list)
Y_all = np.concatenate(Y_list)
return X_all, Y_all
batch_files = [
'./dataset_method_1/cifar-10-batches-py/data_batch_1',
'./dataset_method_1/cifar-10-batches-py/data_batch_2',
'./dataset_method_1/cifar-10-batches-py/data_batch_3',
'./dataset_method_1/cifar-10-batches-py/data_batch_4',
'./dataset_method_1/cifar-10-batches-py/data_batch_5'
]
X_train, Y_train = load_all_data_batches(batch_files)
print(f'训练数据形状: {X_train.shape}, 训练标签形状: {Y_train.shape}')
Y_train = Y_train.astype(np.int64)
Output:
训练数据形状: (50000, 32, 32, 3), 训练标签形状: (50000,)
Oneratisque test paro:
test_batch = './dataset_method_1/cifar-10-batches-py/test_batch'
X_test, Y_test = load_data_batch(test_batch)
Y_test = Y_test.astype(np.int64)
print(f'测试数据形状: {X_test.shape}, 测试标签形状: {Y_test.shape}')
Output:
测试数据形状: (10000, 32, 32, 3), 测试标签形状: (10000,)
Definire genus Dataset classis est ad faciliorem reddendam subsequentem onerationem Dataloader ad batch exercitationis.
Tres modi sunt, qui classes Dataset peragere debent.
__init__()
genus conditor__len__()
Redit spatium data copia__getitem__()
Accipere fragmen data ex datasetHic mihi exsecutio talis est:
from torch.utils.data import DataLoader, Dataset
# 定义 Pytorch 的数据集
class CIFARDataset(Dataset):
def __init__(self, images, labels, transform=None):
self.images = images
self.labels = labels
self.transform = transform
def __len__(self):
return len(self.images)
def __getitem__(self, idx):
image = self.images[idx]
label = self.labels[idx]
if self.transform:
image = self.transform(image)
return image, label
transform_train = transforms.Compose(
[transforms.Pad(4),
transforms.ToTensor(),
transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
transforms.RandomHorizontalFlip(),
transforms.RandomGrayscale(),
transforms.RandomCrop(32, padding=4),
])
# 把数据集变成 Image 的数组,不然好像不能进行数据的增强
# 改变训练数据
from PIL import Image
def get_PIL_Images(origin_data):
datas = []
for i in range(len(origin_data)):
data = Image.fromarray(origin_data[i])
datas.append(data)
return datas
train_data = get_PIL_Images(X_train)
train_loader = DataLoader(CIFARDataset(train_data, Y_train, transform_train), batch_size=24, shuffle=True)
# 测试集的预处理
transform_test = transforms.Compose(
[
transforms.ToTensor(),
transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))]
)
test_loader = DataLoader(CIFARDataset(X_test, Y_test, transform_test), batch_size=24, shuffle=False)
Compagem pytorch fabricamus secundum retis VGG16 de quibus supra.
maxime dividitur;
Exsecutio talis est:
class VGG16(nn.Module):
def __init__(self):
super(VGG16, self).__init__()
# 卷积层,这里进行卷积
self.convolusion = nn.Sequential(
nn.Conv2d(3, 96, kernel_size=3, padding=1), # 设置为padding=1 卷积完后,数据大小不会变
nn.BatchNorm2d(96),
nn.ReLU(inplace=True),
nn.Conv2d(96, 96, kernel_size=3, padding=1),
nn.BatchNorm2d(96),
nn.ReLU(inplace=True),
nn.Conv2d(96, 96, kernel_size=3, padding=1),
nn.BatchNorm2d(96),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(96, 128, kernel_size=3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(inplace=True),
nn.Conv2d(128, 128, kernel_size=3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(inplace=True),
nn.Conv2d(128, 128, kernel_size=3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(128, 256, kernel_size=3, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(256, 512, kernel_size=3, padding=1),
nn.BatchNorm2d(512),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.BatchNorm2d(512),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.BatchNorm2d(512),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.BatchNorm2d(512),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.BatchNorm2d(512),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.BatchNorm2d(512),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.AvgPool2d(kernel_size=1, stride=1)
)
# 全连接层
self.dense = nn.Sequential(
nn.Linear(512, 4096), # 32*32 的图像大小经过 5 次最大化池化后就只有 1*1 了,所以就是 512 个通道的数据输入全连接层
nn.ReLU(inplace=True),
nn.Dropout(0.4),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Dropout(0.4),
)
# 输出层
self.classifier = nn.Linear(4096, 10)
def forward(self, x):
out = self.convolusion(x)
out = out.view(out.size(0), -1)
out = self.dense(out)
out = self.classifier(out)
return out
Ad institutionem et probationem, exemplar tantum debes instantiare, tum ipsum munus definire, munus damnum, damnum rate, ac deinde formationem et experimentum praestare.
ut infra ostende codice:
Hyperparameter definitio:
# 定义模型进行训练
model = VGG16()
# model.load_state_dict(torch.load('./my-VGG16.pth'))
optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=5e-3)
loss_func = nn.CrossEntropyLoss()
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.4, last_epoch=-1)
Munus test:
def test():
model.eval()
correct = 0 # 预测正确的图片数
total = 0 # 总共的图片数
with torch.no_grad():
for data in test_loader:
images, labels = data
images = images.to(device)
outputs = model(images).to(device)
outputs = outputs.cpu()
outputarr = outputs.numpy()
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum()
accuracy = 100 * correct / total
accuracy_rate.append(accuracy)
print(f'准确率为:{accuracy}%'.format(accuracy))
Exercitatio epochae:
# 定义训练步骤
total_times = 40
total = 0
accuracy_rate = []
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
for epoch in range(total_times):
model.train()
model.to(device)
running_loss = 0.0
total_correct = 0
total_trainset = 0
print("epoch: ",epoch)
for i, (data,labels) in enumerate(train_loader):
data = data.to(device)
outputs = model(data).to(device)
labels = labels.to(device)
loss = loss_func(outputs,labels).to(device)
optimizer.zero_grad()
loss.backward()
optimizer.step()
running_loss += loss.item()
_,pred = outputs.max(1)
correct = (pred == labels).sum().item()
total_correct += correct
total_trainset += data.shape[0]
if i % 100 == 0 and i > 0:
print(f"正在进行第{i}次训练, running_loss={running_loss}".format(i, running_loss))
running_loss = 0.0
test()
scheduler.step()
Salvum fac doctus exemplar:
torch.save(model.state_dict(), './my-VGG16.pth')
accuracy_rate = np.array(accuracy_rate)
times = np.linspace(1, total_times, total_times)
plt.xlabel('times')
plt.ylabel('accuracy rate')
plt.plot(times, accuracy_rate)
plt.show()
print(accuracy_rate)
model_my_vgg = VGG16()
model_my_vgg.load_state_dict(torch.load('./my-VGG16-best.pth',map_location='cpu'))
from torchvision import transforms
from PIL import Image
# 定义图像预处理步骤
preprocess = transforms.Compose([
transforms.Resize((32, 32)),
transforms.ToTensor(),
# transforms.Normalize(mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225]),
])
def load_image(image_path):
image = Image.open(image_path)
image = preprocess(image)
image = image.unsqueeze(0) # 添加批次维度
return image
image_data = load_image('./plane2.jpg')
print(image_data.shape)
output = model_my_vgg(image_data)
verify_data = X1[9]
verify_label = Y1[9]
output_verify = model_my_vgg(transform_test(verify_data).unsqueeze(0))
print(output)
print(output_verify)
Output:
torch.Size([1, 3, 32, 32])
tensor([[ 1.5990, -0.5269, 0.7254, 0.3432, -0.5036, -0.3267, -0.5302, -0.9417,
0.4186, -0.1213]], grad_fn=<AddmmBackward0>)
tensor([[-0.6541, -2.0759, 0.6308, 1.9791, 0.8525, 1.2313, 0.1856, 0.3243,
-1.3374, -1.0211]], grad_fn=<AddmmBackward0>)
print(label_names[torch.argmax(output,dim=1,keepdim=False)])
print(label_names[verify_label])
print("pred:",label_names[torch.argmax(output_verify,dim=1,keepdim=False)])
airplane
cat
pred: cat
Quin equus
verificationem canem