这篇文章是我入门pytorch的第一个实验,实验torchvision中的resnet18在CIFAR数据集上进行训练
使用ResNet18在CIFAR10上训练
处理数据集
这里先处理data_batch_1中的一万个数据
按照官网中的说明使用unpickle函数进行读取,然后使用TensorDataset进行封装。
另外这里有一个细节,由于数据集中我们的训练数据是五万张图像,如果读取出来的五万张图像的List直接转为张量,那么速度会很慢,我们先使用numpy.array()将ndarray的list转化为single ndarray,转为张量的过程就会显著加速。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| PATH = 'ianafp/CIFAR10/cifar-10-batches-py/data_batch_' TEST_BATCH = 'ianafp/CIFAR10/cifar-10-batches-py/test_batch' def unpickle(file): import pickle with open(file, 'rb') as fo: dict = pickle.load(fo, encoding='bytes') return dict x_train,y_train = [],[] for i in range(1,6): dataset = unpickle(PATH+'{}'.format(i)) x_train.extend(dataset[b'data']) y_train.extend(dataset[b'labels']) dataset = unpickle(TEST_BATCH) x_valid,y_valid = dataset[b'data'],dataset[b'labels']
|
测试数据有效性
1 2 3 4 5 6
| print(x_train.__len__()) import numpy as np from matplotlib import pyplot img = np.reshape(x_valid[8000],(3,32,32)) img = np.transpose(img,(1,2,0)) pyplot.imshow(img)
|
将数据使用dataset和dataloader类进行封装
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| from torch.utils.data import TensorDataset,DataLoader import torch
print(x_train.__len__()) batch_size = 256 x_train = np.array(x_train) y_train = np.array(y_train) x_valid = np.array(x_valid) y_valid = np.array(y_valid) x_train = np.reshape(x_train,(x_train.__len__(),3,32,32)) x_valid = np.reshape(x_valid,(x_valid.__len__(),3,32,32)) train_ds = TensorDataset(torch.tensor(x_train,dtype=torch.float),torch.tensor(y_train,dtype=torch.float)) train_dl = DataLoader(train_ds,batch_size=batch_size) valid_ds = TensorDataset(torch.tensor(x_valid,dtype=torch.float),torch.tensor(y_valid,dtype=torch.float)) valid_dl = DataLoader(valid_ds,batch_size=batch_size*2)
|
使用torchvison中的ResNet18
这里我们使用torchvision中的ResNet model
1 2
| from torchvision.models import resnet18,ResNet18_Weights model = resnet18()
|
设置损失函数,这里我们使用交叉熵损失函数(croos entropy error)
其中,$t_k$为label,$y_k$为网络输出值
这里实现使用torch.nn.functional中封装好的交叉熵函数
1 2
| import torch.nn.functional as F loss_func = F.cross_entropy
|
实现一个准确率函数
1 2 3
| def accuracy(out, yb): preds = torch.argmax(out, dim=1) return (preds == yb).float().mean()
|
在训练前检验我们模型的损失函数值和准确率。
这里面要注意的点是pytorch卷积层是以浮点数工作的,而我们从图像中读取的训练数据集是byte类型的,因而我们要对tensor作类型转化。
另外torch.nn.functional中的交叉熵损失函数F.cross_entropy中标签以long类型工作,因而也需要作类型转化。
1 2 3 4 5
| x,y = train_ds[0:batch_size] pred = model(x) print(pred.dtype,y.dtype) print('loss_func = ',loss_func(pred,y.long())) print('accuracy = ',accuracy(pred,y))
|
使用torch.optim进行梯度下降优化
1 2 3 4
| from torch import optim learning_rate = 0.5 opt = optim.SGD(model.parameters(),lr=learning_rate) epochs = 8
|
定义随机梯度下降函数
1 2 3 4 5 6 7
| def loss_batch(model,loss_func,xb,yb,opt=None): loss = loss_func(model(xb),yb.long()) if opt is not None: loss.backward() opt.step() opt.zero_grad() return loss.item(),len(xb)
|
定义fit函数,fit函数中完成训练过程
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
|
def fit(epochs,model,loss_func,opt,train_dl,valid_dl): epoch = 0 pre_loss = 0 while True: model.train() for xb,yb in train_dl: loss_batch(model,loss_func,xb,yb.long(),opt) model.eval() with torch.no_grad(): losses,nums = zip(*[loss_batch(model,loss_func,xb,yb) for xb,yb in valid_dl]) val_loss = np.sum(np.multiply(losses,nums)) / np.sum(nums) print(epoch,val_loss) import math if math.fabs(val_loss-pre_loss)<1e-9: break pre_loss = val_loss epoch = epoch + 1
|
试运行训练函数fit
运行时间过长。
将模型部署在GPU上
检验cuda是否可用
1 2
| print(torch.cuda.is_available()) dev = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
|
将数据集迁移到GPU
这里使用的技巧是定义了WrappedDataLoader类,该类是可迭代的,因为定义了iter方法。
又因为iter方法中使用了关键字yield,因而该类可做生成器(generater)。
在后续
1 2
| for x,y in train_dl: ...
|
的过程中会动态的将数据应用preprocess方法返回。
1 2 3 4 5 6 7 8 9 10 11 12 13
| def preprocess(x,y): return x.to(dev),y.to(dev) class WrappedDataLoader: def __init__(self, dl, func): self.dl = dl self.func = func def __len__(self): return len(self.dl) def __iter__(self): for b in self.dl: yield (self.func(*b)) train_dl = WrappedDataLoader(train_dl,preprocess) valid_dl = WrappedDataLoader(valid_dl,preprocess)
|
将模型迁移到GPU
1 2
| model.to(dev) opt = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)
|
试运行训练
1
| fit(epochs, model, loss_func, opt, train_dl, valid_dl)
|
检验准确率
1 2 3 4 5 6 7
| x,y = valid_ds[0:batch_size] x = x.to(dev) y = y.to(dev) pred = model(x) print(pred.dtype,y.dtype) print('loss_func = ',loss_func(pred,y.long())) print('accuracy = ',accuracy(pred,y))
|
运行结果表明,固定训练轮次准确率较低,若以$1\times 10^{-9}$为损失函数的收敛阈值,则运行无法收敛,具体原因暂未得知。