tf.nn.ctc_loss
Stay organized with collections
Save and categorize content based on your preferences.
Computes CTC (Connectionist Temporal Classification) loss.
tf.nn.ctc_loss(
labels,
logits,
label_length,
logit_length,
logits_time_major=True,
unique=None,
blank_index=None,
name=None
)
This op implements the CTC loss as presented in
Graves et al., 2006
Connectionist temporal classification (CTC) is a type of neural network output
and associated scoring function, for training recurrent neural networks (RNNs)
such as LSTM networks to tackle sequence problems where the timing is
variable. It can be used for tasks like on-line handwriting recognition or
recognizing phones in speech audio. CTC refers to the outputs and scoring, and
is independent of the underlying neural network structure.
Notes:
- This class performs the softmax operation for you, so
logits
should be
e.g. linear projections of outputs by an LSTM.
- Outputs true repeated classes with blanks in between, and can also output
repeated classes with no blanks in between that need to be collapsed by the
decoder.
labels
may be supplied as either a dense, zero-padded Tensor
with a
vector of label sequence lengths OR as a SparseTensor
.
- On TPU: Only dense padded
labels
are supported.
- On CPU and GPU: Caller may use
SparseTensor
or dense padded labels
but calling with a SparseTensor
will be significantly faster.
- Default blank label is
0
instead of num_labels - 1
(where num_labels
is the innermost dimension size of logits
), unless overridden by
blank_index
.
tf.random.set_seed(50)
batch_size = 8
num_labels = 6
max_label_length = 5
num_frames = 12
labels = tf.random.uniform([batch_size, max_label_length],
minval=1, maxval=num_labels, dtype=tf.int64)
logits = tf.random.uniform([num_frames, batch_size, num_labels])
label_length = tf.random.uniform([batch_size], minval=2,
maxval=max_label_length, dtype=tf.int64)
label_mask = tf.sequence_mask(label_length, maxlen=max_label_length,
dtype=label_length.dtype)
labels *= label_mask
logit_length = [num_frames] * batch_size
with tf.GradientTape() as t:
t.watch(logits)
ref_loss = tf.nn.ctc_loss(
labels=labels,
logits=logits,
label_length=label_length,
logit_length=logit_length,
blank_index=0)
ref_grad = t.gradient(ref_loss, logits)
Args |
labels
|
Tensor of shape [batch_size, max_label_seq_length] or
SparseTensor .
|
logits
|
Tensor of shape [frames, batch_size, num_labels] . If
logits_time_major == False , shape is [batch_size, frames, num_labels] .
|
label_length
|
Tensor of shape [batch_size] . None, if labels is a
SparseTensor . Length of reference label sequence in labels .
|
logit_length
|
Tensor of shape [batch_size] . Length of input sequence in
logits .
|
logits_time_major
|
(optional) If True (default), logits is shaped [frames,
batch_size, num_labels]. If False, shape is
[batch_size, frames, num_labels] .
|
unique
|
(optional) Unique label indices as computed by
ctc_unique_labels(labels) . If supplied, enable a faster, memory
efficient implementation on TPU.
|
blank_index
|
(optional) Set the class index to use for the blank label.
Negative values will start from num_labels , ie, -1 will reproduce the
ctc_loss behavior of using num_labels - 1 for the blank symbol. There is
some memory/performance overhead to switching from the default of 0 as an
additional shifted copy of logits may be created.
|
name
|
A name for this Op . Defaults to "ctc_loss_dense".
|
Returns |
loss
|
A 1-D float Tensor of shape [batch_size] , containing negative log
probabilities.
|
Raises |
ValueError
|
Argument blank_index must be provided when labels is a
SparseTensor .
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2024-04-26 UTC.
[null,null,["Last updated 2024-04-26 UTC."],[],[],null,["# tf.nn.ctc_loss\n\n\u003cbr /\u003e\n\n|----------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.16.1/tensorflow/python/ops/ctc_ops.py#L883-L1036) |\n\nComputes CTC (Connectionist Temporal Classification) loss. \n\n tf.nn.ctc_loss(\n labels,\n logits,\n label_length,\n logit_length,\n logits_time_major=True,\n unique=None,\n blank_index=None,\n name=None\n )\n\nThis op implements the CTC loss as presented in\n[Graves et al., 2006](https://www.cs.toronto.edu/%7Egraves/icml_2006.pdf)\n\nConnectionist temporal classification (CTC) is a type of neural network output\nand associated scoring function, for training recurrent neural networks (RNNs)\nsuch as LSTM networks to tackle sequence problems where the timing is\nvariable. It can be used for tasks like on-line handwriting recognition or\nrecognizing phones in speech audio. CTC refers to the outputs and scoring, and\nis independent of the underlying neural network structure.\n\n#### Notes:\n\n- This class performs the softmax operation for you, so `logits` should be e.g. linear projections of outputs by an LSTM.\n- Outputs true repeated classes with blanks in between, and can also output repeated classes with no blanks in between that need to be collapsed by the decoder.\n- `labels` may be supplied as either a dense, zero-padded `Tensor` with a vector of label sequence lengths OR as a `SparseTensor`.\n- On TPU: Only dense padded `labels` are supported.\n- On CPU and GPU: Caller may use `SparseTensor` or dense padded `labels` but calling with a `SparseTensor` will be significantly faster.\n- Default blank label is `0` instead of `num_labels - 1` (where `num_labels` is the innermost dimension size of `logits`), unless overridden by `blank_index`.\n\n tf.random.set_seed(50)\n batch_size = 8\n num_labels = 6\n max_label_length = 5\n num_frames = 12\n labels = tf.random.uniform([batch_size, max_label_length],\n minval=1, maxval=num_labels, dtype=tf.int64)\n logits = tf.random.uniform([num_frames, batch_size, num_labels])\n label_length = tf.random.uniform([batch_size], minval=2,\n maxval=max_label_length, dtype=tf.int64)\n label_mask = tf.sequence_mask(label_length, maxlen=max_label_length,\n dtype=label_length.dtype)\n labels *= label_mask\n logit_length = [num_frames] * batch_size\n with tf.GradientTape() as t:\n t.watch(logits)\n ref_loss = tf.nn.ctc_loss(\n labels=labels,\n logits=logits,\n label_length=label_length,\n logit_length=logit_length,\n blank_index=0)\n ref_grad = t.gradient(ref_loss, logits)\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `labels` | `Tensor` of shape `[batch_size, max_label_seq_length]` or `SparseTensor`. |\n| `logits` | `Tensor` of shape `[frames, batch_size, num_labels]`. If `logits_time_major == False`, shape is `[batch_size, frames, num_labels]`. |\n| `label_length` | `Tensor` of shape `[batch_size]`. None, if `labels` is a `SparseTensor`. Length of reference label sequence in `labels`. |\n| `logit_length` | `Tensor` of shape `[batch_size]`. Length of input sequence in `logits`. |\n| `logits_time_major` | (optional) If True (default), `logits` is shaped \\[frames, batch_size, num_labels\\]. If False, shape is `[batch_size, frames, num_labels]`. |\n| `unique` | (optional) Unique label indices as computed by `ctc_unique_labels(labels)`. If supplied, enable a faster, memory efficient implementation on TPU. |\n| `blank_index` | (optional) Set the class index to use for the blank label. Negative values will start from `num_labels`, ie, `-1` will reproduce the ctc_loss behavior of using `num_labels - 1` for the blank symbol. There is some memory/performance overhead to switching from the default of 0 as an additional shifted copy of `logits` may be created. |\n| `name` | A name for this `Op`. Defaults to \"ctc_loss_dense\". |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|--------|--------------------------------------------------------------------------------------|\n| `loss` | A 1-D `float Tensor` of shape `[batch_size]`, containing negative log probabilities. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|----------------------------------------------------------------------------|\n| `ValueError` | Argument `blank_index` must be provided when `labels` is a `SparseTensor`. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| References ---------- ||\n|---|---|\n| Connectionist Temporal Classification - Labeling Unsegmented Sequence Data with Recurrent Neural Networks: [Graves et al., 2006](https://dl.acm.org/citation.cfm?id=1143891) ([pdf](http://www.cs.toronto.edu/%7Egraves/icml_2006.pdf)) \u003cbr /\u003e \u003chttps://en.wikipedia.org/wiki/Connectionist_temporal_classification\u003e ||\n\n\u003cbr /\u003e"]]