How can Tensorflow and Python be used to build ragged tensor from list of words?

A RaggedTensor can be built by using the starting offsets of the words in the sentence. Firstly, the code point of every character in every word in the sentence is built. Next, they are displayed on the console. The number of words in that specific sentence is determined, and the offset is determined.

Read More: What is TensorFlow and how Keras work with TensorFlow to create Neural Networks?

Represent Unicode strings using Python, and manipulate those using Unicode equivalents. At first, we will separate the Unicode strings into tokens based on script detection with the help of the Unicode equivalents of standard string ops.

We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.

print("Get the code point of every character in every word")
word_char_codepoint = tf.RaggedTensor.from_row_starts(
   values=sentence_char_codepoint.values,
   row_starts=word_starts)
print(word_char_codepoint)
print("Get the number of words in the specific sentence")
sentence_num_words = tf.reduce_sum(tf.cast(sentence_char_starts_word, tf.int64), axis=1)

Code credit: https://www.tensorflow.org/tutorials/load_data/unicode

Output

Get the code point of every character in every word
<tf.RaggedTensor [[72, 101, 108, 108, 111], [44, 32], [116, 104, 101, 114, 101], [46], [19990, 30028], [12371, 12435, 12395, 12385, 12399]]>
Get the number of words in the specific sentence

Explanation

The code point for every character in every word is built.
These are displayed on the console.
The number of words in that specific sentence is determined.