tf.compat.v1.nn.sampled_softmax_loss
Stay organized with collections
Save and categorize content based on your preferences.
Computes and returns the sampled softmax training loss.
tf.compat.v1.nn.sampled_softmax_loss(
weights,
biases,
labels,
inputs,
num_sampled,
num_classes,
num_true=1,
sampled_values=None,
remove_accidental_hits=True,
partition_strategy='mod',
name='sampled_softmax_loss',
seed=None
)
This is a faster way to train a softmax classifier over a huge number of
classes.
This operation is for training only. It is generally an underestimate of
the full softmax loss.
A common use case is to use this method for training, and calculate the full
softmax loss for evaluation or inference. In this case, you must set
partition_strategy="div"
for the two losses to be consistent, as in the
following example:
if mode == "train":
loss = tf.nn.sampled_softmax_loss(
weights=weights,
biases=biases,
labels=labels,
inputs=inputs,
...,
partition_strategy="div")
elif mode == "eval":
logits = tf.matmul(inputs, tf.transpose(weights))
logits = tf.nn.bias_add(logits, biases)
labels_one_hot = tf.one_hot(labels, n_classes)
loss = tf.nn.softmax_cross_entropy_with_logits(
labels=labels_one_hot,
logits=logits)
See our Candidate Sampling Algorithms Reference
(pdf).
Also see Section 3 of (Jean et al., 2014) for the math.
Args |
weights
|
A Tensor of shape [num_classes, dim] , or a list of Tensor
objects whose concatenation along dimension 0 has shape
[num_classes, dim]. The (possibly-sharded) class embeddings.
|
biases
|
A Tensor of shape [num_classes] . The class biases.
|
labels
|
A Tensor of type int64 and shape [batch_size,
num_true] . The target classes. Note that this format differs from
the labels argument of nn.softmax_cross_entropy_with_logits .
|
inputs
|
A Tensor of shape [batch_size, dim] . The forward
activations of the input network.
|
num_sampled
|
An int . The number of classes to randomly sample per batch.
|
num_classes
|
An int . The number of possible classes.
|
num_true
|
An int . The number of target classes per training example.
|
sampled_values
|
a tuple of (sampled_candidates , true_expected_count ,
sampled_expected_count ) returned by a *_candidate_sampler function.
(if None, we default to log_uniform_candidate_sampler )
|
remove_accidental_hits
|
A bool . whether to remove "accidental hits"
where a sampled class equals one of the target classes. Default is
True.
|
partition_strategy
|
A string specifying the partitioning strategy, relevant
if len(weights) > 1 . Currently "div" and "mod" are supported.
Default is "mod" . See tf.nn.embedding_lookup for more details.
|
name
|
A name for the operation (optional).
|
seed
|
random seed for candidate sampling. Default to None, which doesn't set
the op-level random seed for candidate sampling.
|
Returns |
A batch_size 1-D tensor of per-example sampled softmax losses.
|
References |
On Using Very Large Target Vocabulary for Neural Machine Translation:
Jean et al., 2014
(pdf)
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2024-04-26 UTC.
[null,null,["Last updated 2024-04-26 UTC."],[],[],null,["# tf.compat.v1.nn.sampled_softmax_loss\n\n\u003cbr /\u003e\n\n|-----------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.16.1/tensorflow/python/ops/nn_impl.py#L2209-L2311) |\n\nComputes and returns the sampled softmax training loss. \n\n tf.compat.v1.nn.sampled_softmax_loss(\n weights,\n biases,\n labels,\n inputs,\n num_sampled,\n num_classes,\n num_true=1,\n sampled_values=None,\n remove_accidental_hits=True,\n partition_strategy='mod',\n name='sampled_softmax_loss',\n seed=None\n )\n\nThis is a faster way to train a softmax classifier over a huge number of\nclasses.\n\nThis operation is for training only. It is generally an underestimate of\nthe full softmax loss.\n\nA common use case is to use this method for training, and calculate the full\nsoftmax loss for evaluation or inference. In this case, you must set\n`partition_strategy=\"div\"` for the two losses to be consistent, as in the\nfollowing example: \n\n if mode == \"train\":\n loss = tf.nn.sampled_softmax_loss(\n weights=weights,\n biases=biases,\n labels=labels,\n inputs=inputs,\n ...,\n partition_strategy=\"div\")\n elif mode == \"eval\":\n logits = tf.matmul(inputs, tf.transpose(weights))\n logits = tf.nn.bias_add(logits, biases)\n labels_one_hot = tf.one_hot(labels, n_classes)\n loss = tf.nn.softmax_cross_entropy_with_logits(\n labels=labels_one_hot,\n logits=logits)\n\nSee our Candidate Sampling Algorithms Reference\n([pdf](https://www.tensorflow.org/extras/candidate_sampling.pdf)).\nAlso see Section 3 of (Jean et al., 2014) for the math.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|--------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `weights` | A `Tensor` of shape `[num_classes, dim]`, or a list of `Tensor` objects whose concatenation along dimension 0 has shape \\[num_classes, dim\\]. The (possibly-sharded) class embeddings. |\n| `biases` | A `Tensor` of shape `[num_classes]`. The class biases. |\n| `labels` | A `Tensor` of type `int64` and shape `[batch_size, num_true]`. The target classes. Note that this format differs from the `labels` argument of [`nn.softmax_cross_entropy_with_logits`](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits). |\n| `inputs` | A `Tensor` of shape `[batch_size, dim]`. The forward activations of the input network. |\n| `num_sampled` | An `int`. The number of classes to randomly sample per batch. |\n| `num_classes` | An `int`. The number of possible classes. |\n| `num_true` | An `int`. The number of target classes per training example. |\n| `sampled_values` | a tuple of (`sampled_candidates`, `true_expected_count`, `sampled_expected_count`) returned by a `*_candidate_sampler` function. (if None, we default to `log_uniform_candidate_sampler`) |\n| `remove_accidental_hits` | A `bool`. whether to remove \"accidental hits\" where a sampled class equals one of the target classes. Default is True. |\n| `partition_strategy` | A string specifying the partitioning strategy, relevant if `len(weights) \u003e 1`. Currently `\"div\"` and `\"mod\"` are supported. Default is `\"mod\"`. See `tf.nn.embedding_lookup` for more details. |\n| `name` | A name for the operation (optional). |\n| `seed` | random seed for candidate sampling. Default to None, which doesn't set the op-level random seed for candidate sampling. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A `batch_size` 1-D tensor of per-example sampled softmax losses. ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| References ---------- ||\n|---|---|\n| On Using Very Large Target Vocabulary for Neural Machine Translation: [Jean et al., 2014](https://aclanthology.coli.uni-saarland.de/papers/P15-1001/p15-1001) ([pdf](http://aclweb.org/anthology/P15-1001)) ||\n\n\u003cbr /\u003e"]]