resnet18在cifar10数据集上训练

这篇文章是我入门pytorch的第一个实验,实验torchvision中的resnet18在CIFAR数据集上进行训练

使用ResNet18在CIFAR10上训练

处理数据集

这里先处理data_batch_1中的一万个数据
按照官网中的说明使用unpickle函数进行读取,然后使用TensorDataset进行封装。
另外这里有一个细节,由于数据集中我们的训练数据是五万张图像,如果读取出来的五万张图像的List直接转为张量,那么速度会很慢,我们先使用numpy.array()将ndarray的list转化为single ndarray,转为张量的过程就会显著加速。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
PATH = 'ianafp/CIFAR10/cifar-10-batches-py/data_batch_'
TEST_BATCH = 'ianafp/CIFAR10/cifar-10-batches-py/test_batch'
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
x_train,y_train = [],[]
for i in range(1,6):
dataset = unpickle(PATH+'{}'.format(i))
# print(dataset.keys())
x_train.extend(dataset[b'data'])
y_train.extend(dataset[b'labels'])
dataset = unpickle(TEST_BATCH)
x_valid,y_valid = dataset[b'data'],dataset[b'labels']

测试数据有效性

1
2
3
4
5
6
print(x_train.__len__())
import numpy as np
from matplotlib import pyplot
img = np.reshape(x_valid[8000],(3,32,32))
img = np.transpose(img,(1,2,0))
pyplot.imshow(img)

将数据使用dataset和dataloader类进行封装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from torch.utils.data import TensorDataset,DataLoader
import torch

print(x_train.__len__())
batch_size = 256
x_train = np.array(x_train)
y_train = np.array(y_train)
x_valid = np.array(x_valid)
y_valid = np.array(y_valid)
x_train = np.reshape(x_train,(x_train.__len__(),3,32,32))
x_valid = np.reshape(x_valid,(x_valid.__len__(),3,32,32))
train_ds = TensorDataset(torch.tensor(x_train,dtype=torch.float),torch.tensor(y_train,dtype=torch.float))
train_dl = DataLoader(train_ds,batch_size=batch_size)
valid_ds = TensorDataset(torch.tensor(x_valid,dtype=torch.float),torch.tensor(y_valid,dtype=torch.float))
valid_dl = DataLoader(valid_ds,batch_size=batch_size*2)

使用torchvison中的ResNet18

这里我们使用torchvision中的ResNet model

1
2
from torchvision.models import resnet18,ResNet18_Weights
model = resnet18()

设置损失函数,这里我们使用交叉熵损失函数(croos entropy error)

其中,$t_k$为label,$y_k$为网络输出值
这里实现使用torch.nn.functional中封装好的交叉熵函数

1
2
import torch.nn.functional as F
loss_func = F.cross_entropy

实现一个准确率函数

1
2
3
def accuracy(out, yb):
preds = torch.argmax(out, dim=1)
return (preds == yb).float().mean()

在训练前检验我们模型的损失函数值和准确率。
这里面要注意的点是pytorch卷积层是以浮点数工作的,而我们从图像中读取的训练数据集是byte类型的,因而我们要对tensor作类型转化。
另外torch.nn.functional中的交叉熵损失函数F.cross_entropy中标签以long类型工作,因而也需要作类型转化。

1
2
3
4
5
x,y = train_ds[0:batch_size]
pred = model(x)
print(pred.dtype,y.dtype)
print('loss_func = ',loss_func(pred,y.long()))
print('accuracy = ',accuracy(pred,y))

使用torch.optim进行梯度下降优化

1
2
3
4
from torch import optim
learning_rate = 0.5
opt = optim.SGD(model.parameters(),lr=learning_rate)
epochs = 8

定义随机梯度下降函数

1
2
3
4
5
6
7
def loss_batch(model,loss_func,xb,yb,opt=None):
loss = loss_func(model(xb),yb.long())
if opt is not None:
loss.backward()
opt.step()
opt.zero_grad()
return loss.item(),len(xb)

定义fit函数,fit函数中完成训练过程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# def fit(epochs,model,loss_func,opt,train_dl,valid_dl):
# for epoch in range(epochs):
# model.train()
# for xb,yb in train_dl:
# loss_batch(model,loss_func,xb,yb.long(),opt)
# model.eval()
# with torch.no_grad():
# losses,nums = zip(*[loss_batch(model,loss_func,xb,yb) for xb,yb in valid_dl])
# val_loss = np.sum(np.multiply(losses,nums)) / np.sum(nums)
# print(epoch,val_loss)

def fit(epochs,model,loss_func,opt,train_dl,valid_dl):
epoch = 0
pre_loss = 0
while True:
model.train()
for xb,yb in train_dl:
loss_batch(model,loss_func,xb,yb.long(),opt)
model.eval()
with torch.no_grad():
losses,nums = zip(*[loss_batch(model,loss_func,xb,yb) for xb,yb in valid_dl])
val_loss = np.sum(np.multiply(losses,nums)) / np.sum(nums)
print(epoch,val_loss)
import math
if math.fabs(val_loss-pre_loss)<1e-9:
break
pre_loss = val_loss
epoch = epoch + 1

试运行训练函数fit

1
# fit(epochs,model,loss_func,opt,train_dl,valid_dl)

运行时间过长。

将模型部署在GPU上

检验cuda是否可用

1
2
print(torch.cuda.is_available())
dev = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

将数据集迁移到GPU
这里使用的技巧是定义了WrappedDataLoader类,该类是可迭代的,因为定义了iter方法。
又因为iter方法中使用了关键字yield,因而该类可做生成器(generater)。
在后续

1
2
for x,y in train_dl:
...

的过程中会动态的将数据应用preprocess方法返回。

1
2
3
4
5
6
7
8
9
10
11
12
13
def preprocess(x,y):
return x.to(dev),y.to(dev)
class WrappedDataLoader:
def __init__(self, dl, func):
self.dl = dl
self.func = func
def __len__(self):
return len(self.dl)
def __iter__(self):
for b in self.dl:
yield (self.func(*b))
train_dl = WrappedDataLoader(train_dl,preprocess)
valid_dl = WrappedDataLoader(valid_dl,preprocess)

将模型迁移到GPU

1
2
model.to(dev)
opt = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)

试运行训练

1
fit(epochs, model, loss_func, opt, train_dl, valid_dl)

检验准确率

1
2
3
4
5
6
7
x,y = valid_ds[0:batch_size]
x = x.to(dev)
y = y.to(dev)
pred = model(x)
print(pred.dtype,y.dtype)
print('loss_func = ',loss_func(pred,y.long()))
print('accuracy = ',accuracy(pred,y))

运行结果表明,固定训练轮次准确率较低,若以$1\times 10^{-9}$为损失函数的收敛阈值,则运行无法收敛,具体原因暂未得知。