Optimizers¶

NetsPresso Trainer uses the optimizers implemented in PyTorch as is. By selecting an optimizer that suits your training recipe, you can configure the optimal training. If you are unsure which optimizer to use, we recommend reading the blog post from towardsdatascience.

Supporting optimizers¶

The currently supported methods in NetsPresso Trainer are as follows. Since techniques are adapted from pre-existing codes, most of the parameters remain unchanged. We note that most of these parameter descriptions are derived from original implementations.

We appreciate all the original code owners and we also do our best to make other values.

Adadelta¶

This optimizer follows the Adadelta in torch library.

Field	Description
`name`	(str) Name must be "adadelta" to use `Adadelta` optimizer.
`lr`	(float) Coefficient that scales delta before it is applied to the parameters.
`rho`	(float) Coefficient used for computing a running average of squared gradients
`weight_decay`	(float) weight decay (L2 penalty).

Adadelta example

training:
  optimizer:
    name: adadelta
    lr: 1.0
    rho: 0.9
    weight_decay: 0.

Adagrad¶

This optimizer follows the Adagrad in torch library.

Field	Description
`name`	(str) Name must be "adagrad" to use `Adagrad` optimizer.
`lr`	(float) Learning rate.
`lr_decay`	(float) Learning rate decay.
`weight_decay`	(float) weight decay (L2 penalty).

Adagrad example

training:
  optimizer:
    name: adagrad
    lr: 1e-2
    lr_decay: 0.
    weight_decay: 0.

Adam¶

This optimizer follows the Adam (adam) in torch library.

Field	Description
`name`	(str) Name must be "adam" to use `Adam` optimizer.
`lr`	(float) Learning rate.
`betas`	(float) Coefficients used for computing running averages of gradient and its square.
`weight_decay`	(float) weight decay (L2 penalty).

Adam example

training:
  optimizer:
    name: adam
    lr: 1e-3
    betas: [0.9, 0.999]
    weight_decay: 0.

Adamax¶

This optimizer follows the Adamax in torch library.

Field	Description
`name`	(str) Name must be "adamax" to use `Adamax` optimizer.
`lr`	(float) Learning rate.
`betas`	(float) Coefficients used for computing running averages of gradient and its square.
`weight_decay`	(float) weight decay (L2 penalty).

Adamax example

training:
  optimizer:
    name: adamax
    lr: 2e-3
    betas: [0.9, 0.999]
    weight_decay: 0.

AdamW¶

This optimizer follows the AdamW in torch library.

Field	Description
`name`	(str) Name must be "adamw" to use `AdamW` optimizer.
`lr`	(float) Learning rate.
`betas`	(list[float]) Coefficients used for computing running averages of gradient and its square.
`weight_decay`	(float) weight decay (L2 penalty).

AdamW example

training:
  optimizer:
    name: adamw
    lr: 1e-3
    betas: [0.9, 0.999]
    weight_decay: 0.

RMSprop¶

This optimizer follows the RMSprop in torch library.

Field	Description
`name`	(str) Name must be "rmsprop" to use `RMSprop` optimizer.
`lr`	(float) Learning rate.
`alpha`	(float) Smoothing constant.
`momentum`	(float) Momentum factor.
`weight_decay`	(float) weight decay (L2 penalty).
`eps`	(float) Term added to the denominator to improve numerical stability.

RMSprop example

training:
  optimizer:
    name: rmsprop
    lr: 1e-2
    alpha: 0.99
    momentum: 0.
    weight_decay: 0.
    eps: 1e-8

SGD¶

This optimizer follows the SGD in torch library.

Field	Description
`name`	(str) Name must be "sgd" to use `SGD` optimizer.
`lr`	(float) Learning rate.
`momentum`	(float) Momentum factor.
`weight_decay`	(float) weight decay (L2 penalty).
`nesterov`	(bool) Enables Nesterov momentum.

SGD example

training:
  optimizer:
    name: sgd
    lr: 1e-2
    momentum: 0.
    weight_decay: 0.
    nesterov: false