torchelie.optim¶
-
class
torchelie.optim.
Lookahead
(base_optimizer, alpha=0.5, k=5)¶ Implements Lookahead from Lookahead Optimizer: k steps forward, 1 step back (Zhang et al, 2019)
- Parameters
base_optimizer (Optimizer) – an optimizer for the inner loop
alpha (float) – outer loop learning rate
k (int) – number of steps in the inner loop
-
load_state_dict
(state)¶
-
state_dict
()¶
-
step
(closure=None)¶ Performs a single optimization step.
- Parameters
closure (callable, optional) – A closure that reevaluates the model and returns the loss.
-
zero_grad
()¶
-
class
torchelie.optim.
RAdamW
(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01)¶ Implements RAdamW algorithm.
RAdam from On the Variance of the Adaptive Learning Rate and Beyond
- Parameters
params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
lr (float, optional) – learning rate (default: 1e-3)
betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))
eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)
weight_decay (float, optional) – weight decay coefficient (default: 1e-2)
-
step
(closure=None)¶ Performs a single optimization step.
- Parameters
closure (callable, optional) – A closure that reevaluates the model and returns the loss.
-
class
torchelie.optim.
DeepDreamOptim
(params, lr=0.001, eps=1e-08, weight_decay=0)¶ Optimizer used by Deep Dream. It rescales the gradient by the average of the absolute values of the gradient.
\(\theta_i := \theta_i - lr \frac{g_i}{\epsilon+\frac{1}{M}\sum_j^M |g_j|}\)
- Parameters
params – parameters as expected by Pytorch’s optimizers
lr (float) – the learning rate
eps (float) – epsilon value to avoid dividing by zero
-
step
(closure=None)¶ Update the weights
- Parameters
closure (optional fn) – a function that computes gradients
-
class
torchelie.optim.
AddSign
(params, lr=0.001, beta=0.9, weight_decay=0)¶ AddSign optimizer from Neural Optimiser search with Reinforcment learning (Bello et al, 2017)
\(\theta_i := \theta_i - \text{lr}(1+\text{sign}(g)*\text{sign}(m))*g\)
- Parameters
params – parameters as expected by Pytorch’s optimizers
lr (float) – the learning rate
eps (float) – epsilon value to avoid dividing by zero
-
step
(closure=None)¶ Update the weights
- Parameters
closure (optional fn) – a function that computes gradients
torchelie.lr_scheduler¶
-
class
torchelie.lr_scheduler.
CosineDecay
(optimizer, total_iters: int, warmup_ratio: float = 0.05, last_epoch: int = - 1, verbose: bool = False)¶ Allow to pre-specify learning rate and momentum changes
- Parameters
optimizer (torch.optim.Optimizer) – the optimizer to schedule. Currently works only with SGD
schedule (list) – a schedule. It’s a list of keypoints where each element is a 3-tuple like (iteration number, lr multiplier, mom). Values are interpolated linearly between neighboring keypoints
last_epoch (int) – starting iteration
-
step
(*unused) → None¶ Step the scheduler to another iteration
-
class
torchelie.lr_scheduler.
CurriculumScheduler
(optimizer, schedule: List[Tuple[float, float, float]], last_epoch: int = - 1, verbose: bool = False)¶ Allow to pre-specify learning rate and momentum changes
- Parameters
optimizer (torch.optim.Optimizer) – the optimizer to schedule. Currently works only with SGD
schedule (list) – a schedule. It’s a list of keypoints where each element is a 3-tuple like (iteration number, lr multiplier, mom). Values are interpolated linearly between neighboring keypoints
last_epoch (int) – starting iteration
-
step
(*unused) → None¶ Step the scheduler to another iteration
-
class
torchelie.lr_scheduler.
HyperbolicTangentDecay
(optimizer: torch.optim.optimizer.Optimizer, n_iters_total: int, tanh_lower_bound: int = - 6, tanh_upper_bound: int = 3, last_epoch: int = - 1, verbose: bool = False)¶ Coming from Stochastic gradient descent with hyperbolic-tangent decay on classification (Hsueh et al., 2019), keeps a flat LR for about 70% of the training then decays following a hypertangent curve.
-
get_lr
() → List[float]¶
-
-
class
torchelie.lr_scheduler.
LinearDecay
(optimizer, total_iters: int, warmup_ratio: float = 0.05, last_epoch: int = - 1, verbose: bool = False)¶
-
class
torchelie.lr_scheduler.
OneCycle
(opt, lr: Tuple[float, float], num_iters: int, mom: Tuple[float, float] = (0.95, 0.85), log: bool = False, last_iter: int = - 1)¶ Implements 1cycle policy.
Goes from: - lr[0] to lr[1] during num_iters // 3 iterations - lr[1] to lr[0] during another num_iters // 3 iterations - lr[0] to lr[0] // 10 during another num_iters // 3 iterations
- Parameters
opt (Optimizer) – the optimizer on which to modulate lr
lr (2-tuple) – lr range
num_iters (int) – total number of iterations
mom (2-tuple) – momentum range
last_iter (int) – last_iteration index
-
step
(*unused)¶