学习率调整方法

这篇文章整理了一些调整学习率的方法,并且根据上一篇文章,在CIFAR10数据集上训练ResNet18中的代码应用这些调整。

StepLR

使用torch.optim进行优化时,我们往往会使用step()接口更新参数。

1
2
3
4
5
6
7
def loss_batch(model,loss_func,xb,yb,opt=None):
loss = loss_func(model(xb),yb.long())
if opt is not None:
loss.backward()
opt.step()
opt.zero_grad()
return loss.item(),len(xb)

在上一篇文章中,每次对一个batch的数据训练然后更新参数。

在pytorch中,我们可以记录step的调用,设置等step更新学习率。

官网StepLR文档

1
2
3
4
5
6
7
8
9
10
# Assuming optimizer uses lr = 0.05 for all groups
# lr = 0.05 if epoch < 30
# lr = 0.005 if 30 <= epoch < 60
# lr = 0.0005 if 60 <= epoch < 90
# ...
scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
for epoch in range(100):
train(...)
validate(...)
scheduler.step()

在fit函数中加上权重更新

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def fit(epochs,model,loss_func,opt,train_dl,valid_dl):
epoch = 0
pre_loss = 0
while True:
model.train()
for xb,yb in train_dl:
loss_batch(model,loss_func,xb,yb.long(),opt)
scheduler.step()
model.eval()
with torch.no_grad():
losses,nums = zip(*[loss_batch(model,loss_func,xb,yb) for xb,yb in valid_dl])
val_loss = np.sum(np.multiply(losses,nums)) / np.sum(nums)
print(epoch,val_loss)
import math
if math.fabs(val_loss-pre_loss)<1e-7 or epoch > 100:
break
pre_loss = val_loss
epoch = epoch + 1

MULTISTEPLR

https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.MultiStepLR.html

1
2
3
4
5
6
7
8
9
# Assuming optimizer uses lr = 0.05 for all groups
# lr = 0.05 if epoch < 30
# lr = 0.005 if 30 <= epoch < 80
# lr = 0.0005 if epoch >= 80
scheduler = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)
for epoch in range(100):
train(...)
validate(...)
scheduler.step()

multisteplr原理很简单,只要step的计数值达到milestones中的节点值,那么学习率就会进行一次变换。

https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.ExponentialLR.html

Expotianal LR

https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.ExponentialLR.html

1
CLASStorch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=- 1, verbose=False)

expotional LR 调整学习率以epoch为单位,每个epoch进行一次学习率的衰减。

Linear LR

https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.LinearLR.html

1
2
3
4
5
6
7
8
9
10
11
# Assuming optimizer uses lr = 0.05 for all groups
# lr = 0.025 if epoch == 0
# lr = 0.03125 if epoch == 1
# lr = 0.0375 if epoch == 2
# lr = 0.04375 if epoch == 3
# lr = 0.05 if epoch >= 4
scheduler = LinearLR(self.opt, start_factor=0.5, total_iters=4)
for epoch in range(100):
train(...)
validate(...)
scheduler.step()

linear lr是设置学习率的最大值和最小值,以epoch为单位线性变化。

Cycle LR

https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.CyclicLR.html

1
CLASStorch.optim.lr_scheduler.CyclicLR(optimizer, base_lr, max_lr, step_size_up=2000, step_size_down=None, mode='triangular', gamma=1.0, scale_fn=None, scale_mode='cycle', cycle_momentum=True, base_momentum=0.8, max_momentum=0.9, last_epoch=- 1, verbose=False)
1
2
3
4
5
6
7
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=0.01, max_lr=0.1)
data_loader = torch.utils.data.DataLoader(...)
for epoch in range(10):
for batch in data_loader:
train_batch(...)
scheduler.step()

cycle lr 的工作模式是使得学习率处于循环的升降,以期望参数学习可以跳出鞍点。

One Cycle Lr

https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.OneCycleLR.html?highlight=one+cycle+lr#torch.optim.lr_scheduler.OneCycleLR

1
2
3
4
5
6
7
data_loader = torch.utils.data.DataLoader(...)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.01, steps_per_epoch=len(data_loader), epochs=10)
for epoch in range(10):
for batch in data_loader:
train_batch(...)
scheduler.step()

CycleLr的一周期版本

Consine Anealing Lr

https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.CosineAnnealingLR.html?highlight=cosineannealinglr#torch.optim.lr_scheduler.CosineAnnealingLR

余弦退火策略,使得学习率以余弦的规律变化,有利于逃脱鞍点。

Cosine Annealing WarmRestarts

https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.CosineAnnealingWarmRestarts.html?highlight=cosineannealingwarmrestarts#torch.optim.lr_scheduler.CosineAnnealingWarmRestarts

余弦退火策略的变种,在这个策略中,每轮迭代的轮数递增,且迭代中学习率处于余弦下降周期,完成一个周期后学习率突变到最大值。

1
2
3
4
5
6
7
8
9
10
11
scheduler = CosineAnnealingWarmRestarts(optimizer, T_0, T_mult)
iters = len(dataloader)
for epoch in range(20):
for i, sample in enumerate(dataloader):
inputs, labels = sample['inputs'], sample['labels']
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
scheduler.step(epoch + i / iters)

Lambda Lr

自定义学习率

https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.LambdaLR.html?highlight=lambda#torch.optim.lr_scheduler.LambdaLR

1
CLASStorch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=- 1, verbose=False)
1
2
3
4
5
6
7
8
# Assuming optimizer has two groups.
lambda1 = lambda epoch: epoch // 30
lambda2 = lambda epoch: 0.95 ** epoch
scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
for epoch in range(100):
train(...)
validate(...)
scheduler.step()

lambda是以epoch(int)作为参数的匿名函数,返回epoch相关的学习率值。

Sequencial Lr

https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.SequentialLR.html?highlight=sequential#torch.optim.lr_scheduler.SequentialLR

1
2
3
4
5
6
7
8
9
10
11
12
13
# Assuming optimizer uses lr = 1. for all groups
# lr = 0.1 if epoch == 0
# lr = 0.1 if epoch == 1
# lr = 0.9 if epoch == 2
# lr = 0.81 if epoch == 3
# lr = 0.729 if epoch == 4
scheduler1 = ConstantLR(self.opt, factor=0.1, total_iters=2)
scheduler2 = ExponentialLR(self.opt, gamma=0.9)
scheduler = SequentialLR(self.opt, schedulers=[scheduler1, scheduler2], milestones=[2])
for epoch in range(100):
train(...)
validate(...)
scheduler.step()
1
CLASStorch.optim.lr_scheduler.SequentialLR(optimizer, schedulers, milestones, last_epoch=- 1, verbose=False)

以milestone为界,顺序调用不同的scheduler。

Chain Scheduler

https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.ChainedScheduler.html?highlight=chainedscheduler#torch.optim.lr_scheduler.ChainedScheduler

1
2
3
4
5
6
7
8
9
10
11
12
13
# Assuming optimizer uses lr = 1. for all groups
# lr = 0.09 if epoch == 0
# lr = 0.081 if epoch == 1
# lr = 0.729 if epoch == 2
# lr = 0.6561 if epoch == 3
# lr = 0.59049 if epoch >= 4
scheduler1 = ConstantLR(self.opt, factor=0.1, total_iters=2)
scheduler2 = ExponentialLR(self.opt, gamma=0.9)
scheduler = ChainedScheduler([scheduler1, scheduler2])
for epoch in range(100):
train(...)
validate(...)
scheduler.step()

chainable scheduler 是和sequential scheduler同样,按照次序调用不同的scheduler,但是保证学习率连续。

Constant LR

https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.ConstantLR.html?highlight=constant

1
2
3
4
5
6
7
8
9
10
11
# Assuming optimizer uses lr = 0.05 for all groups
# lr = 0.025 if epoch == 0
# lr = 0.025 if epoch == 1
# lr = 0.025 if epoch == 2
# lr = 0.025 if epoch == 3
# lr = 0.05 if epoch >= 4
scheduler = ConstantLR(self.opt, factor=0.5, total_iters=4)
for epoch in range(100):
train(...)
validate(...)
scheduler.step()

constant lr即把学习率设为常数,保持一定epoch

Reduce LR On Plateau

https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.ReduceLROnPlateau.html?highlight=reducelronplateau#torch.optim.lr_scheduler.ReduceLROnPlateau

Reduce Lr On Plateau以学习率或者损失函数为观测对象,在更新学习率时根据学习率和损失函数为性能是否提升的标准,自适应更新学习率。