0% found this document useful (0 votes)
23 views14 pages

Lecture 2 Guide To Text Analytics Techniques

Uploaded by

nihalali00oo1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views14 pages

Lecture 2 Guide To Text Analytics Techniques

Uploaded by

nihalali00oo1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Guide to Text Analytics

Techniques

Dr. Kavita Pabreja


Guide to Text Analytics Techniques

• Introduction to Text Analytics


• Challenges with Unstructured Data
• Dataset Overview
• Exploratory Data Analysis (EDA)
• Text Pre-processing Techniques
• Sentiment Classification using Naïve-Bayes
• Using N-Grams in Text Analytics
• Conclusion

Dr. Kavita Pabreja


Introduction to Text Analytics

• Text Analytics: Text analytics is the extraction of meaningful insights and patterns
from unstructured text data.
• Importance in Business: Analyzing customer sentiment aids businesses in
improving products based on feedback from reviews and social media.
• Real-life Applications: For instance, brands employ text analytics to assess public
opinion during marketing campaigns or product launches.

Dr. Kavita Pabreja


Challenges with Unstructured Data

• Handling Ambiguity: Unstructured text often bears ambiguous


meanings, complicating sentiment classification in movie reviews
significantly.
• Informal Language Challenges: Movie reviews frequently use slang or
informal expressions, posing identification difficulties for standard
analytics techniques.
• Lack of Structure: Unlike structured data, unstructured text lacks a
predefined format, making preprocessing labor-intensive prior to
analysis.

Dr. Kavita Pabreja


Dataset Overview

• Dataset Overview: The dataset consists of movie reviews with fields


for 'text' and labeled 'sentiment' classifications.
• Fields Description: 'Text' captures actual review comments, while
'sentiment' indicates positivity (1) or negativity (0).
• Example Structure: Sample record: {'text': 'Incredible film!',
'sentiment': 1} illustrates positive sentiment analysis.

Dr. Kavita Pabreja


Exploratory Data Analysis (EDA)

• Exploratory Data Analysis: Initial exploration reveals total records and


sentiment distributions, crucial for understanding data integrity.
• Sentiment Distribution Visualization: Count plot displays the positive and
negative sentiments, highlighting dataset representation.
• Positive vs. Negative Insights: Analysis shows prevalence of positive
sentiments over negative, indicating favorable audience reception overall.

Dr. Kavita Pabreja


Text Pre-processing Techniques

• Text Pre-processing Steps: Essential techniques include tokenization, stop


words removal, stemming, and lemmatization for effective text analysis.
• Bag-of-Words Model: This model treats documents as collections of words,
facilitating feature extraction crucial for sentiment classification.
• Count vectors, document vector, removing Low-frequency words, removing
stop-words, stemming, lemmatization
• TF-IDF Vectorization: Utilizes term frequency and inverse document frequency
to capture word importance, improving classification accuracy significantly.

Dr. Kavita Pabreja


Text Preprocessing Pipeline
Supervised vs Unsupervised Machine Learning
approaches in NLP
• Supervised Machine Learning: Models like those provided by Hugging Face
Transformers, Flair, and deep learning models (TensorFlow, PyTorch) are
built using supervised learning techniques where large labeled datasets
are required for training. Even simpler raditional Machine Learning Models
like BernoulliNB, Logistic Regression use supervised learning, as they are
trained on labeled data.
• Unsupervised / Lexicon-Based: Tools like SentiWordNet and parts of
VADER and TextBlob use pre-defined sentiment lexicons, which do not
require supervised training but are based on predefined rules or scores for
specific words.
Dr. Kavita Pabreja
Sentiment Classification using Naïve-Bayes
• Naïve-Bayes Algorithm Overview: A Naïve-Bayes
classifier applies Bayes' theorem, assuming
independent token occurrences in sentiment
classification tasks.
• Model Accuracy Assessment: Utilize metrics like
confusion matrix, precision, recall, and F1-score to
evaluate model performance effectively.
• Interpreting the Confusion Matrix: The confusion
matrix displays true vs predicted labels, revealing
insights about model classification performance.
Dr. Kavita Pabreja
Real-Life Application& Challenges :
Sentiment Analysis
• Application of Sentiment Analysis: Sentiment analysis examines
movie reviews, enabling systematic classification of user sentiments
as positive or negative.
• Challenges in Informal Language: Informal language introduces
variability; slang and abbreviations can obscure sentiment meaning,
complicating accurate detection.
• Impact of Emoticons: Emoticons express sentiments visually but
require contextual understanding, often necessitating advanced
processing techniques.

Dr. Kavita Pabreja


Conclusion

• Emerging Technologies: Future trends include employing deep learning


and advanced neural networks for superior text analysis accuracy.
• Methodological Advances: Enhanced techniques like transfer learning will
improve sentiment classification, adapting models across domains
effectively.
• AI Integration: Integration with AI enables real-time sentiment analysis,
providing businesses instant insights into consumer sentiments.

Dr. Kavita Pabreja


References
• https://www.analyticsvidhya.com/blog/2022/07/sentiment-analysis-using-python/#h-ways-
to-perform-sentiment-analysis-in-python
• https://www.datacamp.com/tutorial/text-classification-python
• https://www.toptal.com/machine-learning/nlp-tutorial-text-classification
• https://www.geeksforgeeks.org/using-countvectorizer-to-extracting-features-from-text/
• https://towardsdatascience.com/basics-of-countvectorizer-e26677900f9c
• https://scikit-
learn.org/1.5/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
• https://medium.com/@kamrahimanshu08/nlp-stop-words-and-count-vectorizer-
5bf0dff4f3a7#:~:text=This%20article%20is%20specially%20for,as%20input%20for%20differe
nt%20models.

Dr. Kavita Pabreja


Let the learning continue………

You might also like