Optimizers¶
NetsPresso Trainer uses the optimizers implemented in PyTorch as is. By selecting an optimizer that suits your training recipe, you can configure the optimal training. If you are unsure which optimizer to use, we recommend reading the blog post from towardsdatascience.
Supporting optimizers¶
The currently supported methods in NetsPresso Trainer are as follows. Since techniques are adapted from pre-existing codes, most of the parameters remain unchanged. We note that most of these parameter descriptions are derived from original implementations.
We appreciate all the original code owners and we also do our best to make other values.
Adadelta¶
This optimizer follows the Adadelta in torch library.
Field | Description |
---|---|
name |
(str) Name must be "adadelta" to use Adadelta optimizer. |
lr |
(float) Coefficient that scales delta before it is applied to the parameters. |
rho |
(float) Coefficient used for computing a running average of squared gradients |
weight_decay |
(float) weight decay (L2 penalty). |
Adagrad¶
This optimizer follows the Adagrad in torch library.
Field | Description |
---|---|
name |
(str) Name must be "adagrad" to use Adagrad optimizer. |
lr |
(float) Learning rate. |
lr_decay |
(float) Learning rate decay. |
weight_decay |
(float) weight decay (L2 penalty). |
Adam¶
This optimizer follows the Adam (adam
) in torch library.
Field | Description |
---|---|
name |
(str) Name must be "adam" to use Adam optimizer. |
lr |
(float) Learning rate. |
betas |
(float) Coefficients used for computing running averages of gradient and its square. |
weight_decay |
(float) weight decay (L2 penalty). |
Adamax¶
This optimizer follows the Adamax in torch library.
Field | Description |
---|---|
name |
(str) Name must be "adamax" to use Adamax optimizer. |
lr |
(float) Learning rate. |
betas |
(float) Coefficients used for computing running averages of gradient and its square. |
weight_decay |
(float) weight decay (L2 penalty). |
AdamW¶
This optimizer follows the AdamW in torch library.
Field | Description |
---|---|
name |
(str) Name must be "adamw" to use AdamW optimizer. |
lr |
(float) Learning rate. |
betas |
(list[float]) Coefficients used for computing running averages of gradient and its square. |
weight_decay |
(float) weight decay (L2 penalty). |
RMSprop¶
This optimizer follows the RMSprop in torch library.
Field | Description |
---|---|
name |
(str) Name must be "rmsprop" to use RMSprop optimizer. |
lr |
(float) Learning rate. |
alpha |
(float) Smoothing constant. |
momentum |
(float) Momentum factor. |
weight_decay |
(float) weight decay (L2 penalty). |
eps |
(float) Term added to the denominator to improve numerical stability. |
RMSprop example
SGD¶
This optimizer follows the SGD in torch library.
Field | Description |
---|---|
name |
(str) Name must be "sgd" to use SGD optimizer. |
lr |
(float) Learning rate. |
momentum |
(float) Momentum factor. |
weight_decay |
(float) weight decay (L2 penalty). |
nesterov |
(bool) Enables Nesterov momentum. |