TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

tfds.deprecated.text.ByteTextEncoder

Byte-encodes text.

Inherits From: TextEncoder

tfds.deprecated.text.ByteTextEncoder(
    additional_tokens=None
)

Args
`additional_tokens`	`list<str>`, list of additional tokens. These will be assigned vocab ids `[1, 1+len(additional_tokens)]`. Useful for things like "end-of-string" tokens (e.g. "").

Attributes
`additional_tokens`
`vocab_size`	Size of the vocabulary. Decode produces ints [1, vocab_size).

Attributes

additional_tokens

vocab_size Size of the vocabulary. Decode produces ints [1, vocab_size).

decode(
    ids
)

Decodes a list of integers into text.

encode(
    s
)

Encodes text into a list of integers.

@classmethod
load_from_file(
    filename_prefix
)

Load from file. Inverse of save_to_file.

save_to_file(
    filename_prefix
)

Store to file. Inverse of load_from_file.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-04-26 UTC.