tf.lookup.TextFileInitializer
Stay organized with collections
Save and categorize content based on your preferences.
Table initializers from a text file.
tf.lookup.TextFileInitializer(
filename,
key_dtype,
key_index,
value_dtype,
value_index,
vocab_size=None,
delimiter='\t',
name=None,
value_index_offset=0
)
Used in the notebooks
This initializer assigns one entry in the table for each line in the file.
The key and value type of the table to initialize is given by key_dtype
and
value_dtype
.
The key and value content to get from each line is specified by
the key_index
and value_index
.
TextFileIndex.LINE_NUMBER
means use the line number starting from zero,
expects data type int64.
TextFileIndex.WHOLE_LINE
means use the whole line content, expects data
type string.
- A value
>=0
means use the index (starting at zero) of the split line based
on delimiter
.
For example if we have a file with the following content:
import tempfile
f = tempfile.NamedTemporaryFile(delete=False)
content='\n'.join(["emerson 10", "lake 20", "palmer 30",])
f.file.write(content.encode('utf-8'))
f.file.close()
The following snippet initializes a table with the first column as keys and
second column as values:
emerson -> 10
lake -> 20
palmer -> 30
init= tf.lookup.TextFileInitializer(
filename=f.name,
key_dtype=tf.string, key_index=0,
value_dtype=tf.int64, value_index=1,
delimiter=" ")
table = tf.lookup.StaticHashTable(init, default_value=-1)
table.lookup(tf.constant(['palmer','lake','tarkus'])).numpy()
Similarly to initialize the whole line as keys and the line number as values.
emerson 10 -> 0
lake 20 -> 1
palmer 30 -> 2
init = tf.lookup.TextFileInitializer(
filename=f.name,
key_dtype=tf.string, key_index=tf.lookup.TextFileIndex.WHOLE_LINE,
value_dtype=tf.int64, value_index=tf.lookup.TextFileIndex.LINE_NUMBER)
table = tf.lookup.StaticHashTable(init, -1)
table.lookup(tf.constant('palmer 30')).numpy()
2
Args |
filename
|
The filename of the text file to be used for initialization. The
path must be accessible from wherever the graph is initialized (eg.
trainer or eval workers). The filename may be a scalar Tensor .
|
key_dtype
|
The key data type.
|
key_index
|
the index that represents information of a line to get the
table 'key' values from.
|
value_dtype
|
The value data type.
|
value_index
|
the index that represents information of a line to get the
table 'value' values from.'
|
vocab_size
|
The number of elements in the file, if known.
|
delimiter
|
The delimiter to separate fields in a line.
|
name
|
A name for the operation (optional).
|
value_index_offset
|
A number to add to all indices extracted from the file
This is useful for cases where a user would like to reserve one or more
low index values for control characters. For instance, if you would
like to ensure that no vocabulary item is mapped to index 0 (so you can
reserve 0 for a masking value), you can set value_index_offset to 1;
this will mean that the first vocabulary element is mapped to 1
instead of 0.
|
Raises |
ValueError
|
when the filename is empty, or when the table key and value
data types do not match the expected data types.
|
Attributes |
key_dtype
|
The expected table key dtype.
|
value_dtype
|
The expected table value dtype.
|
Methods
initialize
View source
initialize(
table
)
Initializes the table from a text file.
Args |
table
|
The table to be initialized.
|
Returns |
The operation that initializes the table.
|
Raises |
TypeError
|
when the keys and values data types do not match the table
key and value data types.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[],null,["# tf.lookup.TextFileInitializer\n\n\u003cbr /\u003e\n\n|------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.16.1/tensorflow/python/ops/lookup_ops.py#L609-L813) |\n\nTable initializers from a text file.\n\n#### View aliases\n\n\n**Compat aliases for migration**\n\nSee\n[Migration guide](https://www.tensorflow.org/guide/migrate) for\nmore details.\n\n[`tf.compat.v1.lookup.TextFileInitializer`](https://www.tensorflow.org/api_docs/python/tf/lookup/TextFileInitializer)\n\n\u003cbr /\u003e\n\n tf.lookup.TextFileInitializer(\n filename,\n key_dtype,\n key_index,\n value_dtype,\n value_index,\n vocab_size=None,\n delimiter='\\t',\n name=None,\n value_index_offset=0\n )\n\n### Used in the notebooks\n\n| Used in the guide |\n|----------------------------------------------------------------------------------|\n| - [Subword tokenizers](https://www.tensorflow.org/text/guide/subwords_tokenizer) |\n\nThis initializer assigns one entry in the table for each line in the file.\n\nThe key and value type of the table to initialize is given by `key_dtype` and\n`value_dtype`.\n\nThe key and value content to get from each line is specified by\nthe `key_index` and `value_index`.\n\n- [`TextFileIndex.LINE_NUMBER`](../../tf/lookup/TextFileIndex#LINE_NUMBER) means use the line number starting from zero, expects data type int64.\n- [`TextFileIndex.WHOLE_LINE`](../../tf/lookup/TextFileIndex#WHOLE_LINE) means use the whole line content, expects data type string.\n- A value `\u003e=0` means use the index (starting at zero) of the split line based on `delimiter`.\n\nFor example if we have a file with the following content: \n\n import tempfile\n f = tempfile.NamedTemporaryFile(delete=False)\n content='\\n'.join([\"emerson 10\", \"lake 20\", \"palmer 30\",])\n f.file.write(content.encode('utf-8'))\n f.file.close()\n\nThe following snippet initializes a table with the first column as keys and\nsecond column as values:\n\n- `emerson -\u003e 10`\n- `lake -\u003e 20`\n- `palmer -\u003e 30`\n\n init= tf.lookup.TextFileInitializer(\n filename=f.name,\n key_dtype=tf.string, key_index=0,\n value_dtype=tf.int64, value_index=1,\n delimiter=\" \")\n table = tf.lookup.StaticHashTable(init, default_value=-1)\n table.lookup(tf.constant(['palmer','lake','tarkus'])).numpy()\n\nSimilarly to initialize the whole line as keys and the line number as values.\n\n- `emerson 10 -\u003e 0`\n- `lake 20 -\u003e 1`\n- `palmer 30 -\u003e 2`\n\n init = tf.lookup.TextFileInitializer(\n filename=f.name,\n key_dtype=tf.string, key_index=tf.lookup.TextFileIndex.WHOLE_LINE,\n value_dtype=tf.int64, value_index=tf.lookup.TextFileIndex.LINE_NUMBER)\n table = tf.lookup.StaticHashTable(init, -1)\n table.lookup(tf.constant('palmer 30')).numpy()\n 2\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `filename` | The filename of the text file to be used for initialization. The path must be accessible from wherever the graph is initialized (eg. trainer or eval workers). The filename may be a scalar `Tensor`. |\n| `key_dtype` | The `key` data type. |\n| `key_index` | the index that represents information of a line to get the table 'key' values from. |\n| `value_dtype` | The `value` data type. |\n| `value_index` | the index that represents information of a line to get the table 'value' values from.' |\n| `vocab_size` | The number of elements in the file, if known. |\n| `delimiter` | The delimiter to separate fields in a line. |\n| `name` | A name for the operation (optional). |\n| `value_index_offset` | A number to add to all indices extracted from the file This is useful for cases where a user would like to reserve one or more low index values for control characters. For instance, if you would like to ensure that no vocabulary item is mapped to index 0 (so you can reserve 0 for a masking value), you can set value_index_offset to 1; this will mean that the first vocabulary element is mapped to 1 instead of 0. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|--------------------------------------------------------------------------------------------------------------|\n| `ValueError` | when the filename is empty, or when the table key and value data types do not match the expected data types. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Attributes ---------- ||\n|---------------|---------------------------------|\n| `key_dtype` | The expected table key dtype. |\n| `value_dtype` | The expected table value dtype. |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `initialize`\n\n[View source](https://github.com/tensorflow/tensorflow/blob/v2.16.1/tensorflow/python/ops/lookup_ops.py#L760-L787) \n\n initialize(\n table\n )\n\nInitializes the table from a text file.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|---------|------------------------------|\n| `table` | The table to be initialized. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| The operation that initializes the table. ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ||\n|-------------|--------------------------------------------------------------------------------------|\n| `TypeError` | when the keys and values data types do not match the table key and value data types. |\n\n\u003cbr /\u003e"]]