Week 7 - Show in Class - Text Processing
Week 7 - Show in Class - Text Processing
In the field of AI and data analytics, we often encounter data in the form of
unstructured text. To effectively analyze this text using computational
methods, we need to transform it into a structured format that machines
can understand. This process is called text pre-processing.
o For example, the sentence "The quick brown fox jumps over
the lazy dog" becomes "quick brown fox jumps lazy dog" after
stop word removal.
o For example:
o Note that stemming does not always produce a valid word. For
example, both "university" and "universe" might be stemmed
to "univers".
o For example:
Once the text has been pre-processed, we can begin to analyze its
content. A common technique for this is TF-IDF, which helps us
understand the importance of words within a document relative to a
collection of documents.
What is TF-IDF?
TF-IDF = TF * IDF