0% found this document useful (0 votes)

12 views11 pages

Assignment 1 Groupwork C0927405 C0928791

The document outlines an assignment on Natural Language Processing and Social Media Analytics, detailing methods for text preprocessing, feature extraction, model training, and performance evaluation. It discusses challenges faced, such as class imbalance and noisy data, and recommends improvements like using oversampling techniques and advanced text cleaning methods. The best performance was achieved using an SVM model with TF-IDF for sentiment classification.

Uploaded by

jyotiprakashuprety2024

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views11 pages

Assignment 1 Groupwork C0927405 C0928791

Uploaded by

jyotiprakashuprety2024

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Natural Language Processing and Social Media

Analytics 01

ASSIGNMENT 1

Prashansa Thapa
Student ID: C0927405

Jyoti Prakash Uprety

Student ID: C0928791
Table of Contents
1. TEXT PREPROCESSING ...............................................................................................................2
1.1. LOADING DATASET ............................................................................................................................. 2
1.2. STANDARDIZATION ............................................................................................................................. 2
1.3. CLEANING........................................................................................................................................ 2
1.4. TOKENIZATION .................................................................................................................................. 2
1.5. WORD REMOVAL ............................................................................................................................... 2
1.6. LEMMATIZATION ................................................................................................................................ 2
1.7. CONTENT HANDLING .......................................................................................................................... 2
1.7.1. Abbreviation and Slangs ............................................................................................... 2
1.7.2. Emoji ......................................................................................................................... 2
1.7.3. Lower casing and extra space ....................................................................................... 2
2. FEATURE EXTRACTION ...............................................................................................................3
2.1. BAG OF WORDS (BOW) ....................................................................................................................... 3
2.2. TF-IDF ............................................................................................................................................ 3
2.3. COMPARISON ................................................................................................................................... 3
3. MODEL TRAINING AND PERFORMANCE EVALUATION ..................................................................3
3.1. DATA SPLITTING ................................................................................................................................ 3
3.2. NAÏVE BAYES MODEL.......................................................................................................................... 3
3.3. SVM MODEL .................................................................................................................................... 4
3.4. EVALUATION METRICS ........................................................................................................................ 4
3.5. CONFUSION MATRIX .......................................................................................................................... 4
3.6. OBSERVATIONS ................................................................................................................................. 4
4. COMPARISON ............................................................................................................................5
5. CHALLENGES FACED..................................................................................................................5
5.1. CLASS IMBALANCE ............................................................................................................................. 5
5.2. FEATURE LIMITATION .......................................................................................................................... 5
5.3. NOISY DATA ..................................................................................................................................... 5
6. RECOMMENDATION FOR IMPROVEMENT ....................................................................................5

1
1. Text Preprocessing
1.1. Loading Dataset
The dataset was loaded from the sentimentdataset.csv file using pandas.read.csv()
1.2. Standardization
The dataset had more than just Positive, Negative, and Neutral sentiment labels
hence, all potential labels were categorized into these three primary sentiment classes
using a mapping dictionary. This stage is done to make sure that the labeling system
was uniform.
1.3. Cleaning
Regular expressions (re.sub()) were used to eliminate excess whitespace,
punctuation, and special characters. This made it possible to guarantee that only
relevant text was left.
1.4. Tokenization
NLTK's word_tokenize() was used to tokenize each text entry, dividing the text
into distinct words. This is necessary for subsequent processing, including
lemmatization and stopword elimination.
1.5. Word Removal
The NLTK stopwords list was used to remove frequent words that don't add much
meaning, like "the," "is," and "and." By reducing noise, this stage increases the
accuracy of the model.
1.6. Lemmatization
Words were reduced to their base forms using WordNetLemmatizer(). For
example, ‘running’ became ‘run’ and ‘better’ became ‘good’. This process ensures that
variations of words are treated as the same feature.
1.7. Content handling
1.7.1. Abbreviation and Slangs
A simple change was made to common abbreviations (such as "u" to "you") because
social media posts frequently use casual language.
1.7.2. Emoji
To keep the text consistent, emojis were eliminated using regular expressions.

1.7.3. Lower casing and extra space

To keep the dataset consistent, the text was changed to lowercase and extraneous
whitespace was eliminated.

2
2. Feature Extraction
2.1. Bag of words (BoW)
The cleaned text was transformed into a matrix of token counts using
CountVectorizer(). Individual word frequencies were recorded using unigrams.
2.2. TF-IDF
Term frequency and inverse document frequency were used by TfidfVectorizer()
to convert text into numerical numbers. Default parameters were applied to convert
words into weighted numerical features.
2.3. Comparison
While TF-IDF emphasized key terms by lessening the influence of frequently used
words, BoW supplied raw frequency counts.

3. Model training and Performance Evaluation

3.1. Data Splitting
Train_test_split() was used to separate the dataset into 80% training and 20% testing
sets.
3.2. Naïve Bayes Model
We trained multinomialNB() using both TF-IDF and BoW representations.
Effective classification based on word frequency distributions was made possible by
Naive Bayes' probabilistic nature.

3
3.3. SVM Model
SVC() was trained using both features, and because it concentrated on unique
phrases that successfully distinguished sentiments, it performed exceptionally well
with TF-IDF. 'Amazing' for Positive and 'terrible' for Negative are examples of
important phrases that the SVM model used to create more distinct decision boundaries.

3.4. Evaluation Metrics

Accuracy, Precision, Recall, and F1-score were computed using classification_report()
to evaluate the models' performances.
3.5. Confusion Matrix
The categorization results were shown using confusion_matrix(), and Seaborn
heatmaps showed trends in both accurate and inaccurate predictions.

3.6. Observations
The SVM model outperformed the Naive Bayes model with TF-IDF based on our
modelling because it focused on significant terms, while the Naive Bayes model
outperformed BoW because it relied on frequency patterns.

4
4. Comparison
With BoW With TF-IDF

Naïve Bayes performed well because it Because the weighting down

relied on high-frequency terms of frequently used phrases
and was probabilistic, which reduced its probability
made it appropriate for assumptions, accuracy
simpler text patterns. declined.
SVM performed well enough but achieved the highest accuracy
was unable to successfully use and F1-score among all
word importance. models by identifying
sentiment-critical terms using
the TF-IDF significance
scores.

Overall: The best performance was achieved by SVM with TF-IDF, which used weighted term
importance to successfully distinguish between feelings.

5. Challenges Faced
5.1. Class Imbalance
Uneven sentiment distributions in the dataset affected the model's performance.

5.2. Feature Limitation

Contextual interpretation of sentiments was limited by Unigrams alone.
5.3. Noisy Data
Model accuracy was impacted by the use of slang, emojis, and informal
language in social media posts.

6. Recommendation for improvement

Several crucial aspects must be addressed in order to enhance the sentiment analysis
model. First, oversampling methods like SMOTE can be used to address class imbalance
by making sure the model has enough samples from each sentiment class, which will
improve generalization. The development of unique tokenizers and the application of
sophisticated text cleaning techniques can then improve noise handling by efficiently
processing emojis, slang, and abbreviations which are prevalent in social media data.
Incorporating N-grams to record word sequences and using POS tagging to aid the model
in comprehending grammatical relationships and context can also improve feature
engineering and, ultimately, the accuracy of sentiment categorization.

5
import pandas as pd
import numpy as np
import re
import string
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import CountVectorizer,
TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv('sentimentdataset.csv')

df.head(2)

Unnamed: 0.1 Unnamed: 0 \

0 0 0
1 1 1

Text Sentiment \
0 Enjoying a beautiful day at the park! ... Positive
1 Traffic was terrible this morning. ... Negative

Timestamp User Platform \

0 2023-01-15 12:30:00 User123 Twitter
1 2023-01-15 08:45:00 CommuterX Twitter

Hashtags Retweets Likes

Country \
0 #Nature #Park 15.0 30.0 USA

1 #Traffic #Morning 5.0 10.0

Canada

Year Month Day Hour

0 2023 1 15 12
1 2023 1 15 8

# Keep only relevant columns

df = df[['Text', 'Sentiment']]
# Define sentiment mapping
sentiment_mapping = {
'Positive': 'Positive', 'Negative': 'Negative', 'Neutral':
'Neutral',
'Happy': 'Positive', 'Happiness': 'Positive', 'Joy': 'Positive',
'Love': 'Positive',
'Excitement': 'Positive', 'Admiration': 'Positive', 'Affection':
'Positive',
'Awe': 'Positive', 'Surprise': 'Positive', 'Acceptance':
'Positive',
'Adoration': 'Positive', 'Anticipation': 'Positive', 'Calmness':
'Positive',
'Kind': 'Positive', 'Pride': 'Positive', 'Hope': 'Positive',
'Empowerment': 'Positive', 'Compassion': 'Positive', 'Tenderness':
'Positive',
'Elation': 'Positive', 'Euphoria': 'Positive', 'Contentment':
'Positive',
'Serenity': 'Positive', 'Gratitude': 'Positive', 'Fulfillment':
'Positive',
'Reverence': 'Positive', 'Enthusiasm': 'Positive', 'Satisfaction':
'Positive',
'Accomplishment': 'Positive', 'Wonder': 'Positive', 'Optimism':
'Positive',
'Friendship': 'Positive', 'Success': 'Positive', 'Adventure':
'Positive',
'Celebration': 'Positive', 'Creativity': 'Positive', 'Freedom':
'Positive',
'Hopeful': 'Positive', 'Inspired': 'Positive', 'Zest': 'Positive',
'Proud': 'Positive', 'Mindfulness': 'Positive',
'Sad': 'Negative', 'Sadness': 'Negative', 'Fear': 'Negative',
'Anger': 'Negative', 'Disgust': 'Negative', 'Disappointed':
'Negative',
'Bitter': 'Negative', 'Shame': 'Negative', 'Despair': 'Negative',
'Grief': 'Negative', 'Loneliness': 'Negative', 'Jealousy':
'Negative',
'Resentment': 'Negative', 'Frustration': 'Negative', 'Boredom':
'Negative',
'Anxiety': 'Negative', 'Helplessness': 'Negative', 'Envy':
'Negative',
'Regret': 'Negative', 'Melancholy': 'Negative', 'Bitterness':
'Negative',
'Heartbreak': 'Negative', 'Betrayal': 'Negative', 'Suffering':
'Negative',
'Isolation': 'Negative', 'Darkness': 'Negative', 'Exhaustion':
'Negative',
'Desolation': 'Negative', 'Desperation': 'Negative', 'Loss':
'Negative',
'Heartache': 'Negative', 'Hopelessness': 'Negative', 'Hate':
'Negative',
'Bad': 'Negative', 'Indifference': 'Neutral', 'Confusion':
'Neutral',
'Numbness': 'Neutral', 'Ambivalence': 'Neutral', 'Curiosity':
'Neutral',
'Reflection': 'Neutral', 'Determination': 'Neutral', 'Sympathy':
'Neutral',
'Miscalculation': 'Neutral', 'Obstacle': 'Neutral', 'Pressure':
'Neutral',
'Renewed Effort': 'Neutral', 'Acceptance': 'Neutral',
'Tranquility': 'Neutral',
'Observation': 'Neutral'
}

# Standardizing Sentiment column before mapping

df['Sentiment'] = df['Sentiment'].str.strip().str.capitalize()

# Replace Sentiment values with mapped values

df['Sentiment'] =
df['Sentiment'].map(sentiment_mapping).fillna('Neutral')

# Display updated dataframe

print(df.head())

# Check new sentiment distribution

print(df['Sentiment'].value_counts())

Text Sentiment
0 Enjoying a beautiful day at the park! ... Positive
1 Traffic was terrible this morning. ... Negative
2 Just finished an amazing workout! 💪 ... Positive
3 Excited about the upcoming weekend getaway! ... Positive
4 Trying out a new recipe for dinner tonight. ... Neutral
Sentiment
Positive 328
Neutral 271
Negative 133
Name: count, dtype: int64

# Download necessary NLTK resources

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to C:\Users\Jyoti Prakash

[nltk_data] Uprety\AppData\Roaming\nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to C:\Users\Jyoti Prakash
[nltk_data] Uprety\AppData\Roaming\nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to C:\Users\Jyoti Prakash
[nltk_data] Uprety\AppData\Roaming\nltk_data...
[nltk_data] Package wordnet is already up-to-date!

True

# Text Preprocessing
def preprocess_text(text):
text = text.lower()
text = re.sub(f"[{string.punctuation}]", "", text)
tokens = word_tokenize(text)
tokens = [word for word in tokens if word not in
stopwords.words('english')]
lemmatizer = WordNetLemmatizer()
tokens = [lemmatizer.lemmatize(word) for word in tokens]
return " ".join(tokens)

df['Processed_Text'] = df['Text'].apply(preprocess_text)

# Feature Extraction (BoW and TF-IDF)

vectorizer_bow = CountVectorizer()
vectorizer_tfidf = TfidfVectorizer()
X_bow = vectorizer_bow.fit_transform(df['Processed_Text'])
X_tfidf = vectorizer_tfidf.fit_transform(df['Processed_Text'])
y = df['Sentiment']

# Train-Test Split
X_train_bow, X_test_bow, y_train, y_test = train_test_split(X_bow, y,
test_size=0.2, random_state=42)
X_train_tfidf, X_test_tfidf, _, _ = train_test_split(X_tfidf, y,
test_size=0.2, random_state=42)

# Model Training
nb_model = MultinomialNB()
nb_model.fit(X_train_bow, y_train)
svm_model = SVC(kernel='linear')
svm_model.fit(X_train_tfidf, y_train)

# Predictions
y_pred_nb = nb_model.predict(X_test_bow)
y_pred_svm = svm_model.predict(X_test_tfidf)

# Evaluation
print("Naïve Bayes Performance:")
print(classification_report(y_test, y_pred_nb))
print("SVM Performance:")
print(classification_report(y_test, y_pred_svm))

# Confusion Matrix
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
sns.heatmap(confusion_matrix(y_test, y_pred_nb), annot=True, fmt='d',
cmap='Blues', ax=axes[0])
axes[0].set_title("Naïve Bayes Confusion Matrix")
sns.heatmap(confusion_matrix(y_test, y_pred_svm), annot=True, fmt='d',
cmap='Blues', ax=axes[1])
axes[1].set_title("SVM Confusion Matrix")
plt.show()

Naïve Bayes Performance:

precision recall f1-score support

Negative 0.65 0.73 0.69 30

Neutral 0.70 0.57 0.63 54
Positive 0.78 0.86 0.82 63

accuracy 0.73 147

macro avg 0.71 0.72 0.71 147
weighted avg 0.73 0.73 0.72 147

SVM Performance:
precision recall f1-score support

Negative 0.76 0.43 0.55 30

Neutral 0.64 0.67 0.65 54
Positive 0.72 0.84 0.77 63

accuracy 0.69 147

macro avg 0.71 0.65 0.66 147
weighted avg 0.70 0.69 0.68 147

Financial Documents For Effcient Retirval
No ratings yet
Financial Documents For Effcient Retirval
87 pages
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
From Everand
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
Robert Johnson
No ratings yet
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
44 pages
UNIT I - Introduction and Motivation
No ratings yet
UNIT I - Introduction and Motivation
57 pages
SmartBerg BERT
No ratings yet
SmartBerg BERT
14 pages
Week 7 - Show in Class - Text Processing
No ratings yet
Week 7 - Show in Class - Text Processing
4 pages
NLP Week9 Fine Tuning - and - IR
No ratings yet
NLP Week9 Fine Tuning - and - IR
64 pages
Economics Market Exchange
No ratings yet
Economics Market Exchange
14 pages
What Is Information Retrieval (IR)
No ratings yet
What Is Information Retrieval (IR)
15 pages
Laboratory Practice VI Natural Language Processing
No ratings yet
Laboratory Practice VI Natural Language Processing
8 pages
Gen Ai Lab Programs
No ratings yet
Gen Ai Lab Programs
15 pages
SYLLABUS
No ratings yet
SYLLABUS
2 pages
IDTA For NLP
No ratings yet
IDTA For NLP
16 pages
10 AI Sample Paper
No ratings yet
10 AI Sample Paper
11 pages
DeepCausality A General AI-powered Causal Inference
No ratings yet
DeepCausality A General AI-powered Causal Inference
12 pages
Final Research Paper
No ratings yet
Final Research Paper
12 pages
Sharma 2021
No ratings yet
Sharma 2021
16 pages
Deep Learning Based Model For Fake Review Detection
No ratings yet
Deep Learning Based Model For Fake Review Detection
4 pages
Yann Debray - 1714613827618
No ratings yet
Yann Debray - 1714613827618
16 pages
DS - Lab Report.
No ratings yet
DS - Lab Report.
25 pages
Sentiment Analysis Using Machine Learning Algorithms
No ratings yet
Sentiment Analysis Using Machine Learning Algorithms
23 pages
BDA3
No ratings yet
BDA3
61 pages
Picet Presentation
No ratings yet
Picet Presentation
12 pages
CSE4062S21 Group3 Project Delivery7 FinalReport
No ratings yet
CSE4062S21 Group3 Project Delivery7 FinalReport
9 pages
Class AI Class 10
No ratings yet
Class AI Class 10
15 pages
Sentiment Analysis of Social Media With Python - by Haaya Naushan - Towards Data Science
No ratings yet
Sentiment Analysis of Social Media With Python - by Haaya Naushan - Towards Data Science
9 pages
Phoenix CreaTech
No ratings yet
Phoenix CreaTech
5 pages
IC-RTETM Final Sentiment Analysis
No ratings yet
IC-RTETM Final Sentiment Analysis
13 pages
Praveen Phase 3
No ratings yet
Praveen Phase 3
6 pages
SentimentAnalysisSubtle SDMInsWksp2018
No ratings yet
SentimentAnalysisSubtle SDMInsWksp2018
5 pages
Sentiment Analysis Report
No ratings yet
Sentiment Analysis Report
2 pages
NLP Project (Documentation)
No ratings yet
NLP Project (Documentation)
8 pages
NLP - Module 5
No ratings yet
NLP - Module 5
58 pages
9) Sentiment Classification in Social Media
No ratings yet
9) Sentiment Classification in Social Media
42 pages
UCD Computer Science Final Year Project Report
100% (1)
UCD Computer Science Final Year Project Report
34 pages
Sentiment Analysis For Social Media
No ratings yet
Sentiment Analysis For Social Media
26 pages
Wa0002
No ratings yet
Wa0002
21 pages
Extraction of Microservices From Monolithic Software Architectures
No ratings yet
Extraction of Microservices From Monolithic Software Architectures
8 pages
AAIML
No ratings yet
AAIML
10 pages
NLP A01
No ratings yet
NLP A01
2 pages
Natural Language Processing Important Questions Answers
100% (1)
Natural Language Processing Important Questions Answers
31 pages
Anjali Presentation
No ratings yet
Anjali Presentation
21 pages
Sentiment Analysis of Social Media Post Using NLP
No ratings yet
Sentiment Analysis of Social Media Post Using NLP
12 pages
Module4 TextAnalytics
No ratings yet
Module4 TextAnalytics
9 pages
MOD 4 Notes
No ratings yet
MOD 4 Notes
19 pages
IR - Group1
No ratings yet
IR - Group1
27 pages
Text Classification For Social Media Posts
No ratings yet
Text Classification For Social Media Posts
19 pages
Rajpreet Finalized Dissertation
No ratings yet
Rajpreet Finalized Dissertation
110 pages
ISSS609 Project Proposal Group 7
No ratings yet
ISSS609 Project Proposal Group 7
8 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
9 pages
Natural Language Processing For Sentiment Analysis - Ankur Shukla
No ratings yet
Natural Language Processing For Sentiment Analysis - Ankur Shukla
27 pages
Sp09midterm Revised
No ratings yet
Sp09midterm Revised
6 pages
NLP Assignment2
No ratings yet
NLP Assignment2
7 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
67 pages
Sentiment Analysis Behind Text With Different Length and Formality
No ratings yet
Sentiment Analysis Behind Text With Different Length and Formality
6 pages
### Seminar Report
No ratings yet
### Seminar Report
12 pages
Research Paper Text Classification
No ratings yet
Research Paper Text Classification
17 pages
Natural Language: Anguage Odels
No ratings yet
Natural Language: Anguage Odels
28 pages
UNIT 4 Information Retrieval Using NLP
No ratings yet
UNIT 4 Information Retrieval Using NLP
13 pages
Megersa, Thesis Presentation
No ratings yet
Megersa, Thesis Presentation
40 pages
Manuscript Updated-1
No ratings yet
Manuscript Updated-1
10 pages
Analytics Concepts Social Listening
No ratings yet
Analytics Concepts Social Listening
10 pages
Orange3 Text PDF
No ratings yet
Orange3 Text PDF
53 pages
MP 1
No ratings yet
MP 1
14 pages
ML Project Report
No ratings yet
ML Project Report
26 pages
EXP5
No ratings yet
EXP5
15 pages
Dav Exp7 56
No ratings yet
Dav Exp7 56
8 pages
Sentiment Analysis of Twitter Data My
75% (4)
Sentiment Analysis of Twitter Data My
14 pages
Sentiment Analysis 1
No ratings yet
Sentiment Analysis 1
12 pages
Types of Data Represented As Strings
No ratings yet
Types of Data Represented As Strings
2 pages
A Comparative Study of Sentiment Analysis Using NLP and Different Machine Learning Techniques On US Airline Twitter Data
No ratings yet
A Comparative Study of Sentiment Analysis Using NLP and Different Machine Learning Techniques On US Airline Twitter Data
4 pages
Python Project Synopsis Sample
No ratings yet
Python Project Synopsis Sample
2 pages
Thesis - Aru Omarali
No ratings yet
Thesis - Aru Omarali
34 pages
Information Retrieval Systems
100% (1)
Information Retrieval Systems
16 pages
Sentiment Analysis Using Machine Learning Classifiers
No ratings yet
Sentiment Analysis Using Machine Learning Classifiers
41 pages
Youtube Analysis3
No ratings yet
Youtube Analysis3
58 pages
Twitter Analysis
No ratings yet
Twitter Analysis
8 pages
RGBGB
No ratings yet
RGBGB
11 pages
TF Idf
100% (3)
TF Idf
38 pages
Complete Report
No ratings yet
Complete Report
56 pages
Sentiment Analysis Using Recurrent Neural Network
No ratings yet
Sentiment Analysis Using Recurrent Neural Network
7 pages
Maneesha Nidigonda Major Project
No ratings yet
Maneesha Nidigonda Major Project
11 pages
Hate Speech Detection PPT FINAL
100% (1)
Hate Speech Detection PPT FINAL
29 pages
Natural Language Processing Assignment
No ratings yet
Natural Language Processing Assignment
3 pages
Maneesha Nidigonda Verzeo Major Project
No ratings yet
Maneesha Nidigonda Verzeo Major Project
11 pages
Entity Based Sentiment Classifier For Social Media Analysis
No ratings yet
Entity Based Sentiment Classifier For Social Media Analysis
66 pages
Sentiment Analysis: A NLP And: 2. Detailed Approach
No ratings yet
Sentiment Analysis: A NLP And: 2. Detailed Approach
6 pages
Sentiment Analysis in Python Using NLTK: December 2016
No ratings yet
Sentiment Analysis in Python Using NLTK: December 2016
3 pages
Template For The First Slide of PPT Presentation1
No ratings yet
Template For The First Slide of PPT Presentation1
18 pages
Text Analytics Methods For Sentence-Level Sentiment Analysis
No ratings yet
Text Analytics Methods For Sentence-Level Sentiment Analysis
79 pages

Assignment 1 Groupwork C0927405 C0928791

Uploaded by

Assignment 1 Groupwork C0927405 C0928791

Uploaded by

Natural Language Processing and Social Media

Jyoti Prakash Uprety

1.7.3. Lower casing and extra space

3. Model training and Performance Evaluation

3.4. Evaluation Metrics

Naïve Bayes performed well because it Because the weighting down

5.2. Feature Limitation

6. Recommendation for improvement

Unnamed: 0.1 Unnamed: 0 \

Timestamp User Platform \

Hashtags Retweets Likes

1 #Traffic #Morning 5.0 10.0

Year Month Day Hour

# Keep only relevant columns

# Standardizing Sentiment column before mapping

# Replace Sentiment values with mapped values

# Display updated dataframe

# Check new sentiment distribution

# Download necessary NLTK resources

[nltk_data] Downloading package punkt to C:\Users\Jyoti Prakash

# Feature Extraction (BoW and TF-IDF)

Naïve Bayes Performance:

Negative 0.65 0.73 0.69 30

accuracy 0.73 147

Negative 0.76 0.43 0.55 30

accuracy 0.69 147

You might also like