Lec 5 e Text Analytics Vector Space TF IDF
Lec 5 e Text Analytics Vector Space TF IDF
Text Analytics
Sources of Text
Applications of Text Analytics
Text Analytics Concepts & Terminology
Text EDA
Vector Space Modeling
Set-of-Words: Binary word occurrences
Bag-of-Words: Word occurrences
tf-idf
Word embedding
Government
What is the response of people towards a particular policy?
Advertisers
What is trending that could be used for advertisement?
Careem used LUMSU as promo code
Movie Makers
What people disliked about a movie?
This information is used to deliver in future what people want
Brand Managers
What value added services people want in a brand?
How people respond to social responsibility campaigns of a brand?
Academia
Is this document plagiarized?
Retrieve similar documents
source: towardsdatascience.com
Figure credit: Francisco Rangel & Paolo Rosso [Universitat Politècnica de València]
source: devopedia.org
M Qasim (2018) Mining health reviews from online blogs and news
source:chart-studio.plotly.com
Rating distribution
source:chart-studio.plotly.com
source:chart-studio.plotly.com
source:kdnuggets.com
source:kdnuggets.com
source:kdnuggets.com
source:kdnuggets.com
source:kdnuggets.com
Helps to adjust for the fact that some words appear more frequently
in general (frequent words are less meaningful than the rare ones)
Involve two characteristics of words (terms: bigram, trigram)
Term frequency
Inverse document frequency
Bag of Words
tf-idf