EMA (Exponential Moving Average)¶

In many cases, providing a model with averaged parameters brings performance benefits. The Exponential Moving Average (EMA) model is updated after each batch training step according to the following:

ema_param = decay * ema_param + (1. - decay) * training_model_param

If EMA is enabled, both validation and model saving are processed with EMA model. Note that after the validation phase, the training model parameters are reverted back to the non-averaged model.

EMA decay schedulers¶

It is often benefits to start with a smaller decay value at the beginning of training and gradually use higher values as progresses. To this, we support some decay scheduling methods.

Constant decay¶

Constant decay keeps the decay value unchanged throughout the entire training process.

Field	Description
`training.epochs`	(str) Name must be "constant_decay" to use constant decay.
`training.decay`	(float) The decay rate for EMA. Its range must be in [0, 1.0]. If `None`.

Constant decay example

training:
  ema:
    name: constant_decay
    decay: 0.9999

Exponential decay¶

Exponential decay increases the decay value exponentially with the number of updates as following:

applied_decay = decay * (1 - math.exp(-counter / beta)

decay and beta from configuration determine the maximum value of decay and the speed of convergence, respectively. The counter starts at 0 and increments by 1 with each update.

Field	Description
`training.epochs`	(str) Name must be "exp_decay" to use constant decay.
`training.decay`	(float) The decay rate for EMA. For exponential decay, this means maximum decay value. Its range must be in [0, 1.0].
`training.beta`	(float) Determines the speed of convergence of decay to maximum value.

Exponential decay example

training:
  ema:
    name: exp_decay
    decay: 0.9999
    beta: 100