tf.raw_ops.UnicodeDecode

Decodes each string in input into a sequence of Unicode code points.

View aliases

Compat aliases for migration

tf.compat.v1.raw_ops.UnicodeDecode

tf.raw_ops.UnicodeDecode(
    input,
    input_encoding,
    errors='replace',
    replacement_char=65533,
    replace_control_characters=False,
    Tsplits=tf.dtypes.int64,
    name=None
)

The character codepoints for all strings are returned using a single vector char_values, with strings expanded to characters in row-major order.

The row_splits tensor indicates where the codepoints for each input string begin and end within the char_values tensor. In particular, the values for the ith string (in row-major order) are stored in the slice [row_splits[i]:row_splits[i+1]]. Thus:

char_values[row_splits[i]+j] is the Unicode codepoint for the jth character in the ith string (in row-major order).
row_splits[i+1] - row_splits[i] is the number of characters in the ith string (in row-major order).

Args
`input`	A `Tensor` of type `string`. The text to be decoded. Can have any shape. Note that the output is flattened to a vector of char values.
`input_encoding`	A `string`. Text encoding of the input strings. This is any of the encodings supported by ICU ucnv algorithmic converters. Examples: `"UTF-16", "US ASCII", "UTF-8"`.
`errors`	An optional `string` from: `"strict", "replace", "ignore"`. Defaults to `"replace"`. Error handling policy when there is invalid formatting found in the input. The value of 'strict' will cause the operation to produce a InvalidArgument error on any invalid input formatting. A value of 'replace' (the default) will cause the operation to replace any invalid formatting in the input with the `replacement_char` codepoint. A value of 'ignore' will cause the operation to skip any invalid formatting in the input and produce no corresponding output character.
`replacement_char`	An optional `int`. Defaults to `65533`. The replacement character codepoint to be used in place of any invalid formatting in the input when `errors='replace'`. Any valid unicode codepoint may be used. The default value is the default unicode replacement character is 0xFFFD or U+65533.)
`replace_control_characters`	An optional `bool`. Defaults to `False`. Whether to replace the C0 control characters (00-1F) with the `replacement_char`. Default is false.
`Tsplits`	An optional `tf.DType` from: `tf.int32, tf.int64`. Defaults to `tf.int64`.
`name`	A name for the operation (optional).

Returns
A tuple of `Tensor` objects (row_splits, char_values).
`row_splits`	A `Tensor` of type `Tsplits`.
`char_values`	A `Tensor` of type `int32`.

tf.raw_ops.UnicodeDecode Stay organized with collections Save and categorize content based on your preferences.

View aliases

Args

Returns

tf.raw_ops.UnicodeDecode