tf.compat.v1.train.linear_cosine_decay
Stay organized with collections
Save and categorize content based on your preferences.
Applies linear cosine decay to the learning rate.
tf.compat.v1.train.linear_cosine_decay(
learning_rate,
global_step,
decay_steps,
num_periods=0.5,
alpha=0.0,
beta=0.001,
name=None
)
Note that linear cosine decay is more aggressive than cosine decay and
larger initial learning rates can typically be used.
When training a model, it is often recommended to lower the learning rate as
the training progresses. This function applies a linear cosine decay function
to a provided initial learning rate. It requires a global_step
value to
compute the decayed learning rate. You can just pass a TensorFlow variable
that you increment at each training step.
The function returns the decayed learning rate. It is computed as:
global_step = min(global_step, decay_steps)
linear_decay = (decay_steps - global_step) / decay_steps)
cosine_decay = 0.5 * (
1 + cos(pi * 2 * num_periods * global_step / decay_steps))
decayed = (alpha + linear_decay) * cosine_decay + beta
decayed_learning_rate = learning_rate * decayed
Example usage:
decay_steps = 1000
lr_decayed = linear_cosine_decay(learning_rate, global_step, decay_steps)
Args |
learning_rate
|
A scalar float32 or float64 Tensor or a Python number.
The initial learning rate.
|
global_step
|
A scalar int32 or int64 Tensor or a Python number. Global
step to use for the decay computation.
|
decay_steps
|
A scalar int32 or int64 Tensor or a Python number. Number
of steps to decay over.
|
num_periods
|
Number of periods in the cosine part of the decay. See
computation above.
|
alpha
|
See computation above.
|
beta
|
See computation above.
|
name
|
String. Optional name of the operation. Defaults to
'LinearCosineDecay'.
|
Returns |
A scalar Tensor of the same type as learning_rate . The decayed
learning rate.
|
Raises |
ValueError
|
if global_step is not supplied.
|
When eager execution is enabled, this function returns a function which in
turn returns the decayed learning rate Tensor. This can be useful for changing
the learning rate value across different invocations of optimizer functions.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2024-04-26 UTC.
[null,null,["Last updated 2024-04-26 UTC."],[],[],null,["# tf.compat.v1.train.linear_cosine_decay\n\n\u003cbr /\u003e\n\n|-------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.16.1/tensorflow/python/keras/optimizer_v2/legacy_learning_rate_decay.py#L597-L679) |\n\nApplies linear cosine decay to the learning rate. \n\n tf.compat.v1.train.linear_cosine_decay(\n learning_rate,\n global_step,\n decay_steps,\n num_periods=0.5,\n alpha=0.0,\n beta=0.001,\n name=None\n )\n\nNote that linear cosine decay is more aggressive than cosine decay and\nlarger initial learning rates can typically be used.\n\nWhen training a model, it is often recommended to lower the learning rate as\nthe training progresses. This function applies a linear cosine decay function\nto a provided initial learning rate. It requires a `global_step` value to\ncompute the decayed learning rate. You can just pass a TensorFlow variable\nthat you increment at each training step.\n\nThe function returns the decayed learning rate. It is computed as: \n\n global_step = min(global_step, decay_steps)\n linear_decay = (decay_steps - global_step) / decay_steps)\n cosine_decay = 0.5 * (\n 1 + cos(pi * 2 * num_periods * global_step / decay_steps))\n decayed = (alpha + linear_decay) * cosine_decay + beta\n decayed_learning_rate = learning_rate * decayed\n\n#### Example usage:\n\n decay_steps = 1000\n lr_decayed = linear_cosine_decay(learning_rate, global_step, decay_steps)\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-----------------|--------------------------------------------------------------------------------------------------------|\n| `learning_rate` | A scalar `float32` or `float64` Tensor or a Python number. The initial learning rate. |\n| `global_step` | A scalar `int32` or `int64` `Tensor` or a Python number. Global step to use for the decay computation. |\n| `decay_steps` | A scalar `int32` or `int64` `Tensor` or a Python number. Number of steps to decay over. |\n| `num_periods` | Number of periods in the cosine part of the decay. See computation above. |\n| `alpha` | See computation above. |\n| `beta` | See computation above. |\n| `name` | String. Optional name of the operation. Defaults to 'LinearCosineDecay'. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A scalar `Tensor` of the same type as `learning_rate`. The decayed learning rate. ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|-----------------------------------|\n| `ValueError` | if `global_step` is not supplied. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| References ---------- ||\n|---|---|\n| Neural Optimizer Search with Reinforcement Learning: [Bello et al., 2017](http://proceedings.mlr.press/v70/bello17a.html) ([pdf](http://proceedings.mlr.press/v70/bello17a/bello17a.pdf)) Stochastic Gradient Descent with Warm Restarts: [Loshchilov et al., 2017](https://openreview.net/forum?id=Skq89Scxx¬eId=Skq89Scxx) ([pdf](https://openreview.net/pdf?id=Skq89Scxx)) ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\neager compatibility\n-------------------\n\n\u003cbr /\u003e\n\nWhen eager execution is enabled, this function returns a function which in\nturn returns the decayed learning rate Tensor. This can be useful for changing\nthe learning rate value across different invocations of optimizer functions.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e"]]