Gradient Clipping¶

Gradient clipping is a technique used to mitigate the exploding gradient problem in training deep neural networks, particularly in scenarios involving recurrent neural networks (RNNs) or models with long dependency chains.

During the training of neural networks, especially those with deep architectures or recurrent structures, gradients can sometimes grow exponentially large. This phenomenon, known as exploding gradients, can lead to several issues as follows:

Numerical instability
Overshooting optimal parameter values
Divergence in the training process

Gradient clipping addresses this issue by limiting the magnitude of gradients during backpropagation. This technique ensures that gradient updates remain within a reasonable range, promoting more stable and controlled training.

Norm Clipping¶

For now, netspresso-trainer supports the norm clipping. This method scales down the gradient when its norm exceeds a threshold $v$. $$ \text{if } ||\mathbf{g}|| > v \text{ then } \mathbf{g} \leftarrow v \cdot \frac{\mathbf{g}}{||\mathbf{g}||} $$

Field	Description
`training.max_norm`	(float) The norm threshold. For gradient clipping, this means the maximum gradient value. To disable the gradient clipping, you can set this value to None (~).

Norm gradient clipping example

training:
    max_norm: 0.1