100% found this document useful (2 votes)
404 views10 pages

Twitter Sentiment Analysis

The document discusses a project on Twitter sentiment analysis using machine learning techniques, specifically focusing on logistic regression and Naive Bayes algorithms. It outlines the objectives, data acquisition and preprocessing methods, feature engineering, model training, and evaluation results, highlighting the accuracy of both algorithms. Future work includes exploring advanced machine learning models and addressing dataset imbalances to enhance prediction accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
404 views10 pages

Twitter Sentiment Analysis

The document discusses a project on Twitter sentiment analysis using machine learning techniques, specifically focusing on logistic regression and Naive Bayes algorithms. It outlines the objectives, data acquisition and preprocessing methods, feature engineering, model training, and evaluation results, highlighting the accuracy of both algorithms. Future work includes exploring advanced machine learning models and addressing dataset imbalances to enhance prediction accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Twitter Sentiment Analysis

using Machine Learning

Abhishek Pani- 21051192


Shirshak Pattnaik- 21052360
Abhijeet Pani- 21052552
Barenya Nayak- 21052577
Chandrakanta Meher- 21052580

Twitter Sentiment Analysis Machine Learning Algorithms


Twitter sentiment analysis is the process of determining Machine learning algorithms are used to train models that can
the sentiment or opinion expressed in a tweet. It involves automatically analyze and classify tweets based on their
using machine learning algorithms to analyze the text sentiment. These algorithms use techniques such as SVM,
and classify it as positive, negative, or neutral. naive bayes, natural language processing and text classification
to make predictions.
What is Sentiment Analysis?
Sentiment analysis, also known as opinion mining, is a technique used to determine the
sentiment or emotional tone expressed in a piece of text. It involves analyzing the text to
identify whether the sentiment expressed is positive, negative, or neutral.

Objective of this project

The objective of this project is to perform sentiment analysis on Twitter data using machine
learning techniques, specifically logistic regression and Naive Bayes classification algorithms.
The focus is on distinguishing between positive and negative sentiments expressed in tweets. To
achieve this, the project incorporates preprocessing steps including stemming for text cleaning.
By analyzing tweets, the aim is to develop models that accurately classify the sentiment of tweets,
enabling valuable insights into public opinion, customer sentiment, and trends on various topics
discussed on Twitter. Ultimately, the project seeks to contribute to the understanding of
sentiment dynamics in social media and provide a tool for businesses, researchers, and
individuals to gauge public sentiment effectively.

Extracting the dataset.... Dependencies...

Loading the dataset...


Perfoming EDA

Data Acquisition and Preprocessing


To perform sentiment analysis on Twitter data, it is important to acquire and preprocess the data effectively.
There are several methods for collecting Twitter data, including using the Twitter API, which provides access to a
wide range of tweets and related information.
Data Acquisition and Preprocessing Techniques
Technique Description

Twitter API Use the Twitter API to access and collect a large volume of tweets related to the desired topic.

Keyword Filtering Filtered the collected tweets based on specific keywords or hashtags relevant to the analysis.

Language Detection Identified the language of each tweet and removed tweets that were not in the desired language.

Text Cleaning Removed any unnecessary characters, such as punctuations and special symbols from the tweet text.

Tokenization Split the tweet text into individual words or tokens for further analysis.

Stopword Removal Removed common words, such as ‘and’, ’the’ and ‘is’, that do not carry significant meaning for sentimental analysis.

Normalisation Converted all words to lowercase and applied Stemming or Lemmatization to reduce words to their base form.

Data Sampling Selected a representative sample from the collected dataset for analysis to reduce computational requirements.
FEATURE ENGINEERING
1.Stemming: Reduces words to their base form (e.g.,
"running" -> "run"). This helps capture synonyms
and reduces features.
2.Regular Expressions: Define patterns to identify
specific elements (e.g., hashtags, emoticons).
Useful for removing irrelevant information or
creating sentiment features (e.g., positive
emoticons).
3.TF-IDF: Assigns weights to words based on their
importance in a document and rarity across the
corpus. Focuses on informative words for
sentiment analysis.

Train-Test-Split

1.Train-Test Split: Divides data into training and testing sets for
machine learning.
2.Model Training: The training set educates the model on
patterns and relationships.
3.Model Evaluation: The unseen test set assesses the model's
ability to generalize to new data.
4.Overfitting Prevention: Helps avoid models that memorize the
training data but fail on unseen data.
5.Validation Technique: Crucial step in machine learning to
ensure robust model performance.
Model Evaluation

1.Naive Bayes
2.Logistic Regression

Naive Bayes Classification


Naive Bayes is a popular machine learning
algorithm used for sentiment analysis. It is
based on the Bayes' theorem, which
calculates the probability of a certain event
occurring given the prior knowledge of other
related events. In the context of sentiment
analysis, Naive Bayes can determine the
sentiment (positive, negative, or neutral) of a
given text based on the occurrence of
specific words or features.
How it Works
1.Training Phase: The algorithm learns from a labeled dataset, calculating the probability of each feature
occurring in each class.
2.Prediction Phase: When given a new input, the algorithm calculates the probability of the input belonging to
each class and assigns it to the class with the highest probability.

Advantages
Naive Bayes is simple and efficient, making it computationally inexpensive.
It performs well with high-dimensional data and is less prone to overfitting.
It can handle both binary and multi-class classification problems.

Applying Naive Bayes for sentiment analysis


Logistic Regression Classification

Implementing Logistic Regression for sentiment analysis


Key Findings
1.Logistic Regression is a powerful machine learning algorithm for sentiment analysis.
2.It can accurately classify tweets into positive, negative, or neutral sentiments.
3.The algorithm uses a sigmoid function to map the input features to a probability between 0 and 1.
4.By setting a threshold, we can classify the tweets as positive or negative based on the predicted probabilities.
Result Analysis

Naive Bayes
The training data exhibited an accuracy of 0.81, whereas the test data
showed an accuracy score of 0.74.

Logistic Regression
The training data exhibited an accuracy of 0.83, whereas the test data
showed an accuracy score of 0.77.
Result Analysis

Naive Bayes 1

1
Logistic Regression

Future Work and Potential Areas for Improvement


Exploring Other Machine Learning Models
In future work, it would be beneficial to explore other machine learning models, such as
deep learning algorithms, to improve the performance of the sentiment analysis model.

Fine-tuning Model Parameters


Further optimization of the model's parameters, such as the learning rate or regularization
techniques, could lead to improved performance.

Handling Imbalanced Datasets


Addressing the issue of imbalanced datasets, where the number of positive and negative
sentiment tweets is significantly different, could improve the model's ability to accurately predict
sentiment.

Diving into Neural Networks


We can enhance our performance significantly by utilizing NLP models such as BERT, GPT, and
CNN, which can provide more precise results when working with the dataset.
Thank You

You might also like