tf.lookup.TextFileInitializer

Table initializers from a text file.

View aliases

Compat aliases for migration

tf.compat.v1.lookup.TextFileInitializer

tf.lookup.TextFileInitializer(
    filename,
    key_dtype,
    key_index,
    value_dtype,
    value_index,
    vocab_size=None,
    delimiter='\t',
    name=None,
    value_index_offset=0
)

Used in the notebooks

Used in the guide
Subword tokenizers

This initializer assigns one entry in the table for each line in the file.

The key and value type of the table to initialize is given by key_dtype and value_dtype.

The key and value content to get from each line is specified by the key_index and value_index.

TextFileIndex.LINE_NUMBER means use the line number starting from zero, expects data type int64.
TextFileIndex.WHOLE_LINE means use the whole line content, expects data type string.
A value >=0 means use the index (starting at zero) of the split line based on delimiter.

For example if we have a file with the following content:

import tempfile
f = tempfile.NamedTemporaryFile(delete=False)
content='\n'.join(["emerson 10", "lake 20", "palmer 30",])
f.file.write(content.encode('utf-8'))
f.file.close()

The following snippet initializes a table with the first column as keys and second column as values:

emerson -> 10
lake -> 20
palmer -> 30

init= tf.lookup.TextFileInitializer(
   filename=f.name,
   key_dtype=tf.string, key_index=0,
   value_dtype=tf.int64, value_index=1,
   delimiter=" ")
table = tf.lookup.StaticHashTable(init, default_value=-1)
table.lookup(tf.constant(['palmer','lake','tarkus'])).numpy()

Similarly to initialize the whole line as keys and the line number as values.

emerson 10 -> 0
lake 20 -> 1
palmer 30 -> 2

init = tf.lookup.TextFileInitializer(
  filename=f.name,
  key_dtype=tf.string, key_index=tf.lookup.TextFileIndex.WHOLE_LINE,
  value_dtype=tf.int64, value_index=tf.lookup.TextFileIndex.LINE_NUMBER)
table = tf.lookup.StaticHashTable(init, -1)
table.lookup(tf.constant('palmer 30')).numpy()
2

Args
`filename`	The filename of the text file to be used for initialization. The path must be accessible from wherever the graph is initialized (eg. trainer or eval workers). The filename may be a scalar `Tensor`.
`key_dtype`	The `key` data type.
`key_index`	the index that represents information of a line to get the table 'key' values from.
`value_dtype`	The `value` data type.
`value_index`	the index that represents information of a line to get the table 'value' values from.'
`vocab_size`	The number of elements in the file, if known.
`delimiter`	The delimiter to separate fields in a line.
`name`	A name for the operation (optional).
`value_index_offset`	A number to add to all indices extracted from the file This is useful for cases where a user would like to reserve one or more low index values for control characters. For instance, if you would like to ensure that no vocabulary item is mapped to index 0 (so you can reserve 0 for a masking value), you can set value_index_offset to 1; this will mean that the first vocabulary element is mapped to 1 instead of 0.

Raises
`ValueError`	when the filename is empty, or when the table key and value data types do not match the expected data types.

Attributes
`key_dtype`	The expected table key dtype.
`value_dtype`	The expected table value dtype.

Methods

`initialize`

View source

initialize(
    table
)

Initializes the table from a text file.

Args
`table`	The table to be initialized.

Returns
The operation that initializes the table.

Raises
`TypeError`	when the keys and values data types do not match the table key and value data types.

tf.lookup.TextFileInitializer Stay organized with collections Save and categorize content based on your preferences.

View aliases

Used in the notebooks

Args

Raises

Attributes

Methods

initialize

tf.lookup.TextFileInitializer

`initialize`