A RaggedTensor can be built by using the starting offsets of the words in the sentence. Firstly, the code point of every character in every word in the sentence is built. Next, they are displayed on the console. The number of words in that specific sentence is determined, and the offset is determined.
Read More: What is TensorFlow and how Keras work with TensorFlow to create Neural Networks?
Represent Unicode strings using Python, and manipulate those using Unicode equivalents. At first, we will separate the Unicode strings into tokens based on script detection with the help of the Unicode equivalents of standard string ops.
We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.
print("Get the code point of every character in every word") word_char_codepoint = tf.RaggedTensor.from_row_starts( values=sentence_char_codepoint.values, row_starts=word_starts) print(word_char_codepoint) print("Get the number of words in the specific sentence") sentence_num_words = tf.reduce_sum(tf.cast(sentence_char_starts_word, tf.int64), axis=1)
Code credit: https://www.tensorflow.org/tutorials/load_data/unicode
Output
Get the code point of every character in every word <tf.RaggedTensor [[72, 101, 108, 108, 111], [44, 32], [116, 104, 101, 114, 101], [46], [19990, 30028], [12371, 12435, 12395, 12385, 12399]]> Get the number of words in the specific sentence
Explanation
- The code point for every character in every word is built.
- These are displayed on the console.
- The number of words in that specific sentence is determined.