Optimizers

class torchelie.optim.Lookahead(base_optimizer, alpha=0.5, k=5)

Implements Lookahead from Lookahead Optimizer: k steps forward, 1 step back (Zhang et al, 2019)

Parameters:
  • base_optimizer (Optimizer) – an optimizer for the inner loop
  • alpha (float) – outer loop learning rate
  • k (int) – number of steps in the inner loop
load_state_dict(state)
state_dict()
step(closure=None)

Performs a single optimization step.

Parameters:closure (callable, optional) – A closure that reevaluates the model and returns the loss.
zero_grad()
class torchelie.optim.RAdamW(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01)

Implements RAdamW algorithm.

RAdam from _On the Variance of the Adaptive Learning Rate and Beyond_ https://arxiv.org/abs/1908.03265v1

Parameters:
  • params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
  • lr (float, optional) – learning rate (default: 1e-3)
  • betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))
  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)
  • weight_decay (float, optional) – weight decay coefficient (default: 1e-2)
step(closure=None)

Performs a single optimization step.

Parameters:closure (callable, optional) – A closure that reevaluates the model and returns the loss.
class torchelie.optim.DeepDreamOptim(params, lr=0.001, eps=1e-08, weight_decay=0)

Optimizer used by Deep Dream. It rescales the gradient by the average of the absolute values of the gradient.

\(\theta_i := \theta_i - lr \frac{g_i}{\epsilon+\frac{1}{M}\sum_j^M |g_j|}\)

Parameters:
  • params – parameters as expected by Pytorch’s optimizers
  • lr (float) – the learning rate
  • eps (float) – epsilon value to avoid dividing by zero
step(closure=None)

Update the weights

Parameters:closure (optional fn) – a function that computes gradients
class torchelie.optim.AddSign(params, lr=0.001, beta=0.9, weight_decay=0)

AddSign optimizer from Neural Optimiser search with Reinforcment learning (Bello et al, 2017)

\(\theta_i := \theta_i - \text{lr}(1+\text{sign}(g)*\text{sign}(m))*g\)

Parameters:
  • params – parameters as expected by Pytorch’s optimizers
  • lr (float) – the learning rate
  • eps (float) – epsilon value to avoid dividing by zero
step(closure=None)

Update the weights

Parameters:closure (optional fn) – a function that computes gradients

LR Schedulers

class torchelie.lr_scheduler.CurriculumScheduler(optimizer, schedule, last_iter=-1)

Allow to pre-specify learning rate and momentum changes

Parameters:
  • optimizer (torch.optim.Optimizer) – the optimizer to schedule. Currently works only with SGD
  • schedule (list) – a schedule. It’s a list of keypoints where each element is a 3-tuple like (iteration number, lr, mom). Values are interpolated linearly between neighboring keypoints
  • last_iter (int) – starting iteration
step(*unused)

Step the scheduler to another iteration

class torchelie.lr_scheduler.OneCycle(opt, lr, num_iters, mom=(0.95, 0.9), log=False, last_iter=-1)

Implements 1cycle policy.

Goes from: - lr[0] to lr[1] during num_iters // 3 iterations - lr[1] to lr[0] during another num_iters // 3 iterations - lr[0] to lr[0] // 10 during another num_iters // 3 iterations

Parameters:
  • opt (Optimizer) – the optimizer on which to modulate lr
  • lr (2-tuple) – lr range
  • num_iters (int) – total number of iterations
  • mom (2-tuple) – momentum range
  • last_iter (int) – last_iteration index
step(*unused)

Step the scheduler to another iteration