tf.compat.v1.train.cosine_decay
Stay organized with collections
Save and categorize content based on your preferences.
Applies cosine decay to the learning rate.
tf.compat.v1.train.cosine_decay(
learning_rate, global_step, decay_steps, alpha=0.0, name=None
)
When training a model, it is often recommended to lower the learning rate as
the training progresses. This function applies a cosine decay function
to a provided initial learning rate. It requires a global_step
value to
compute the decayed learning rate. You can just pass a TensorFlow variable
that you increment at each training step.
The function returns the decayed learning rate. It is computed as:
global_step = min(global_step, decay_steps)
cosine_decay = 0.5 * (1 + cos(pi * global_step / decay_steps))
decayed = (1 - alpha) * cosine_decay + alpha
decayed_learning_rate = learning_rate * decayed
Example usage:
decay_steps = 1000
lr_decayed = cosine_decay(learning_rate, global_step, decay_steps)
Args |
learning_rate
|
A scalar float32 or float64 Tensor or a Python number.
The initial learning rate.
|
global_step
|
A scalar int32 or int64 Tensor or a Python number. Global
step to use for the decay computation.
|
decay_steps
|
A scalar int32 or int64 Tensor or a Python number. Number
of steps to decay over.
|
alpha
|
A scalar float32 or float64 Tensor or a Python number. Minimum
learning rate value as a fraction of learning_rate.
|
name
|
String. Optional name of the operation. Defaults to 'CosineDecay'.
|
Returns |
A scalar Tensor of the same type as learning_rate . The decayed
learning rate.
|
Raises |
ValueError
|
if global_step is not supplied.
|
When eager execution is enabled, this function returns a function which in
turn returns the decayed learning rate Tensor. This can be useful for changing
the learning rate value across different invocations of optimizer functions.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2024-04-26 UTC.
[null,null,["Last updated 2024-04-26 UTC."],[],[],null,["# tf.compat.v1.train.cosine_decay\n\n\u003cbr /\u003e\n\n|-------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.16.1/tensorflow/python/keras/optimizer_v2/legacy_learning_rate_decay.py#L457-L517) |\n\nApplies cosine decay to the learning rate. \n\n tf.compat.v1.train.cosine_decay(\n learning_rate, global_step, decay_steps, alpha=0.0, name=None\n )\n\nWhen training a model, it is often recommended to lower the learning rate as\nthe training progresses. This function applies a cosine decay function\nto a provided initial learning rate. It requires a `global_step` value to\ncompute the decayed learning rate. You can just pass a TensorFlow variable\nthat you increment at each training step.\n\nThe function returns the decayed learning rate. It is computed as: \n\n global_step = min(global_step, decay_steps)\n cosine_decay = 0.5 * (1 + cos(pi * global_step / decay_steps))\n decayed = (1 - alpha) * cosine_decay + alpha\n decayed_learning_rate = learning_rate * decayed\n\n#### Example usage:\n\n decay_steps = 1000\n lr_decayed = cosine_decay(learning_rate, global_step, decay_steps)\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-----------------|------------------------------------------------------------------------------------------------------------------------|\n| `learning_rate` | A scalar `float32` or `float64` Tensor or a Python number. The initial learning rate. |\n| `global_step` | A scalar `int32` or `int64` `Tensor` or a Python number. Global step to use for the decay computation. |\n| `decay_steps` | A scalar `int32` or `int64` `Tensor` or a Python number. Number of steps to decay over. |\n| `alpha` | A scalar `float32` or `float64` Tensor or a Python number. Minimum learning rate value as a fraction of learning_rate. |\n| `name` | String. Optional name of the operation. Defaults to 'CosineDecay'. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A scalar `Tensor` of the same type as `learning_rate`. The decayed learning rate. ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|-----------------------------------|\n| `ValueError` | if `global_step` is not supplied. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| References ---------- ||\n|---|---|\n| Stochastic Gradient Descent with Warm Restarts: [Loshchilov et al., 2017](https://openreview.net/forum?id=Skq89Scxx¬eId=Skq89Scxx) ([pdf](https://openreview.net/pdf?id=Skq89Scxx)) ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\neager compatibility\n-------------------\n\n\u003cbr /\u003e\n\nWhen eager execution is enabled, this function returns a function which in\nturn returns the decayed learning rate Tensor. This can be useful for changing\nthe learning rate value across different invocations of optimizer functions.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e"]]