0% found this document useful (0 votes)

24 views31 pages

DSDM Unit4

Dsdma notes

Uploaded by

nandhini.s.nandhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views31 pages

DSDM Unit4

Dsdma notes

Uploaded by

nandhini.s.nandhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Subject Name: DATA SCIENCE AND DIGITAL MARKETING SYSTEM SubjectCode:U20CST718

Prepared By:
Mr.S.KUMARAKRISHNAN, Asst.Prof/ CSE
Mrs.P.BHAVANI, Asst.Prof / CSE
Mrs.R.DEEPA, Asst.Prof / CSE
Verified by: Approved by:
2Marks

Q) What are the steps to process the twitter data?

The following are the steps to gather the data from the Twitter feeds:

● Getting the data

● Data pull
● Data cleaning
Q) What are the steps to getting the Twitter API?
Getting Twitter API keys Firstly, you will need to have a Twitter account and obtain
credentials (consumer key, consumer secret, access token, and access secret) on the
Twitter developer platform to access the Twitter API, following these steps .

○ Create a Twitter user account.

○ Log in with your Twitter user account at
○ Click Create New App.
○ Fill out the form, agree to the terms, and click on Create your Twitter
application.
○ Go to the next page, click on the Keys and Access Tokens tab, and copy
your API key and API secret. Scroll down and click on Create my access
token, and copy your Access token, and Access token secret.

Q) What is Data Extraction?

Having all required authorization keys, we can prepare the toolset for data retrieval. The
Twitter API gives several ways to extract the data, but we will focus on two main
methods:
->Keyword search query to obtain recent, historical tweets
->Streaming facility, to obtain tweets as they are posted

Q) What is Rate limit/paging?

After successful connection to the API, we have to prepare our scripts for data retrieval.
As the API limits data access (Rate Limits), it is necessary to build an efficient
workflow.Twitter allows you to get up to 100 tweets per one call . If we want to retrieve
more and we need to remember already downloaded tweets'IDs not to extract the same
tweets during next calls. This procedure is commonly called paging.

Q) What are Streaming API streams?

Streaming API Another method of obtaining information from Twitter is the streaming
API. It gives access to Twitter's global stream of data. There are several basic
streaming endpoints, each customized to certain use cases. Based on the Twitter
documentation: Public streams: Streams of the public data flowing through Twitter. It is
suitable for following specific users or topics, and data mining. User streams: These are
single-user streams, containing roughly all of the data corresponding with a single user's
view of Twitter.
Q)Define Sentiment Analysis
Sentiment analysis involves classifying comments or opinions in text into categories
such as "positive" or "negative" often with an implicit category of "neutral". A classic
sentiment application would be tracking what people think about different topics.
Sentiment analysis in data science and machine learning is also called "opinion mining"
or in marketing terminology "voice of the customer".

Q)Define VADER
VADER ( Valence Aware Dictionary for Sentiment Reasoning) is a model used for text
sentiment analysis that is sensitive to both polarity (positive/negative) and intensity
(strength) of emotion. It is available in the NLTK package and can be applied directly to
unlabeled text data.
Q) What does the preparation of a custom classifier require ?
The preparation of a custom classifier requires two data sets:
Training data set: The data on which the classifier algorithm learns the model
parameters
Test data set: This is used to determine the accuracy of the algorithm
Q) What is a confusion matrix?
A confusion matrix is a technique for summarizing the performance of a classification
algorithm. It provides information of what the classification model is getting right and
what types of errors it is making. Predictions of the results on a classification problem
are usually visualized by the following matrix :

Q)Define Precision and Recall

Q)Define K-fold cross validation

The input data is split into K parts where one is reserved for testing, and the other K-1
for training. This process is repeated K times and the evaluation metrics are averaged.
This helps in determining how well a model would generalize to new datasets.
Q)Define NER
Named-entity recognition is a subtask of information extraction that seeks to
locate and classify named entities mentioned in unstructured text into predefined
categories such as person names, organizations, locations, medical codes, time
expressions, quantities, monetary values, percentages, etc
Q) Differntiate between model performance evaluation and cross validation(JAN
2024)(2 MARKS)

Model performance evaluation measures how well a trained model performs on unseen
data using metrics like accuracy, precision, recall, and F1-score. Cross-validation, on
the other hand, is a technique for assessing the generalizability of a model by
partitioning data into subsets, training on some while validating on others iteratively.
Q)What is the purpose of labelling data in sentimental analysis.(JAN 2024)(2
MARKS)
Labeling data in sentiment analysis serves to assign sentiment categories (e.g., positive, negative,
neutral) to text samples. This labeled data is essential for training machine learning models to
recognize and predict sentiments in new, unseen text. Accurate labeling provides a foundation for
supervised learning, enabling models to learn patterns and improve prediction accuracy

5/10 Marks

Q)Explain about REST API Search endpoint

Q)Explain about Rate Limit paging
Q)Explain about Streaming API
Q)Explain about Data pull and Data Extraction with example
Q)Explain about sentiment analysis(JAN 2024)(10 MARKS)
Q)Explain about customized sentiment analysis
\
Q)Explain about Named Entity Recognition
Q)Explain the process of combining NER and sentiment analysis

Q)Explain the challenges and limitations of sentimental analysis in analyzing twitter

data.(JAN 2024)(5 MARKS)
Introduction
Sentiment analysis, or opinion mining, involves using natural language processing (NLP), text
analysis, and computational linguistics to identify and extract subjective information from text. On
platforms like Twitter, sentiment analysis aims to determine the sentiment behind tweets—whether
they express positive, negative, or neutral opinions. However, analyzing sentiment on Twitter
poses numerous challenges and limitations. This comprehensive exploration delves into the
various issues that complicate sentiment analysis of Twitter data.
Challenges and Limitations
1. Short and Ambiguous Text
Twitter's character limit (originally 140 characters, now 280) forces users to condense their
thoughts, often leading to ambiguous and contextually thin statements. This brevity can obscure
the true sentiment:
 Incompleteness: Short tweets may lack context, making it hard to understand the
sentiment.
 Ambiguity: Brief messages can be vague and open to multiple interpretations.
2. Use of Slang, Jargon, and Informal Language
Twitter users often employ slang, abbreviations, and internet-specific jargon. This informal
language is challenging for traditional NLP algorithms to interpret accurately:
 Evolving Language: New slang and expressions constantly emerge, requiring models to be
continuously updated.
 Non-Standard Grammar: Tweets frequently break conventional grammatical rules, further
complicating analysis.
3. Sarcasm and Irony
Sarcasm and irony are prevalent on Twitter and are particularly challenging for sentiment analysis
tools:
 Implicit Sentiment: Sarcastic statements often convey the opposite sentiment of the literal
text.
 Detection Complexity: Recognizing sarcasm requires understanding context, tone, and
sometimes even the user's historical tweets.
4. Emojis and Emoticons
Emojis and emoticons are widely used to express emotions and sentiments, but interpreting them
correctly poses several challenges:
 Varied Meanings: The same emoji can have different meanings in different contexts.
 Combination Use: Emojis are often used in combination, and their collective sentiment can
be complex to decipher.
5. Contextual Dependencies
Understanding the sentiment of a tweet often requires contextual information, which is not always
available:
 External Context: Tweets referencing external events or previous conversations require
background knowledge to interpret accurately.
 User Context: A user’s historical tweets and behavior can provide context that influences
the sentiment of their current tweet.
6. High Volume and Velocity
The immense volume and rapid generation of tweets present scalability issues:
 Real-Time Analysis: Processing and analyzing tweets in real-time requires significant
computational resources.
 Data Overload: Handling and storing the vast amounts of data generated on Twitter can be
overwhelming.
7. Spam and Noise
Twitter is rife with spam, automated bots, and irrelevant content, which can skew sentiment
analysis:
 Bot Activity: Bots can flood the platform with repetitive or biased content.
 Filtering Challenges: Distinguishing meaningful tweets from spam and noise requires
sophisticated filtering techniques.
8. Multilingual Tweets
Twitter’s global user base results in a mix of languages, posing additional difficulties:
 Language Identification: Detecting the language of a tweet is the first step, which itself can
be challenging with short texts.
 Multilingual Analysis: Each language requires its own sentiment analysis model,
complicating the overall process.
9. Mixed Sentiments
Tweets can express mixed sentiments, making it hard to classify them as purely positive, negative,
or neutral:
 Complex Sentiments: A single tweet might express different sentiments towards different
aspects of a topic.
 Granularity: Fine-grained sentiment analysis is needed to handle mixed sentiments,
increasing complexity.
10. Cultural Nuances
Cultural differences can influence the interpretation of sentiments:
 Cultural Context: Understanding how different cultures express sentiment is crucial for
accurate analysis.
 Local Slang: Regional slang and expressions add another layer of complexity.
11. Temporal Dynamics
Sentiments can change over time, influenced by ongoing events and trends:
 Event-Driven Sentiments: Significant events can cause rapid shifts in sentiment.
 Historical Context: Analyzing how sentiment evolves over time requires historical data,
adding to the data processing burden.
12. Bias in Training Data
The quality and representativeness of training data affect the accuracy of sentiment analysis
models:
 Biases in Data: Training data may contain inherent biases that can skew the analysis.
 Diverse Representation: Ensuring that the training data represents diverse opinions and
demographics is essential for unbiased results.
13. Evaluation Metrics
Measuring the performance of sentiment analysis models on Twitter data is challenging:
 Ground Truth: Establishing a reliable ground truth for sentiment labels is difficult due to
subjective interpretations.
 Evaluation Standards: Developing standard evaluation metrics that capture the nuances of
sentiment analysis is complex.

Twitter Sentiment Analysis
100% (2)
Twitter Sentiment Analysis
10 pages
ANN Final Exam
100% (1)
ANN Final Exam
13 pages
Sentiment Analysis of Twitter Data My
75% (4)
Sentiment Analysis of Twitter Data My
14 pages
Sentiment Analysis Final Documentation Report
50% (2)
Sentiment Analysis Final Documentation Report
21 pages
Case History, Assessment Process and Report
No ratings yet
Case History, Assessment Process and Report
88 pages
Twitter Sentiment Analysis (NLP) : This Photo CC By-Nc
100% (1)
Twitter Sentiment Analysis (NLP) : This Photo CC By-Nc
18 pages
Painting Rubrics
100% (1)
Painting Rubrics
2 pages
Twitter Sentiment Analysis - Final - Report Copy Sahil
No ratings yet
Twitter Sentiment Analysis - Final - Report Copy Sahil
26 pages
Restricting Unsolicited Approaches and Counterfeit Users: Batch No: 28 Guided by Done by
No ratings yet
Restricting Unsolicited Approaches and Counterfeit Users: Batch No: 28 Guided by Done by
28 pages
LT-LT-: Satellite Tracer
No ratings yet
LT-LT-: Satellite Tracer
70 pages
DSDM Unit4
No ratings yet
DSDM Unit4
30 pages
AIML8P
No ratings yet
AIML8P
23 pages
Prediction of Movie Success Using Sentiment Analysis of Tweets
No ratings yet
Prediction of Movie Success Using Sentiment Analysis of Tweets
6 pages
Minor 1
No ratings yet
Minor 1
20 pages
Sentiment Analysis On Twitter Data
No ratings yet
Sentiment Analysis On Twitter Data
23 pages
Standard Specification For Castings, Austenitic-Ferritic (Duplex) Stainless Steel, For Pressure-Containing Parts
No ratings yet
Standard Specification For Castings, Austenitic-Ferritic (Duplex) Stainless Steel, For Pressure-Containing Parts
6 pages
Sentiment Analysis On Twitter Data-Set Using Naive Bayes Algorithm
No ratings yet
Sentiment Analysis On Twitter Data-Set Using Naive Bayes Algorithm
5 pages
Design Review
No ratings yet
Design Review
16 pages
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
No ratings yet
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
9 pages
Diani The Concept of Social Movement
No ratings yet
Diani The Concept of Social Movement
26 pages
Sentiment Analysis On User-Generated Tweets
No ratings yet
Sentiment Analysis On User-Generated Tweets
15 pages
You Are Not Your Brain
0% (1)
You Are Not Your Brain
7 pages
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
No ratings yet
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
15 pages
Sentiment Analysis of Twitter Data: Radhi D. Desai
No ratings yet
Sentiment Analysis of Twitter Data: Radhi D. Desai
4 pages
0900 Karimi
No ratings yet
0900 Karimi
17 pages
Machine Learning With Advance Model
No ratings yet
Machine Learning With Advance Model
19 pages
Sentiment Analysis of Twitter
No ratings yet
Sentiment Analysis of Twitter
26 pages
571 Document Mod
No ratings yet
571 Document Mod
30 pages
Ascertaining Public Opinion Through Sentiment Analysis
No ratings yet
Ascertaining Public Opinion Through Sentiment Analysis
5 pages
Sentiment Analysis On Twitter Data-Set Using Naive Bayes Algorithm
No ratings yet
Sentiment Analysis On Twitter Data-Set Using Naive Bayes Algorithm
4 pages
Introduction
No ratings yet
Introduction
27 pages
Project Report
No ratings yet
Project Report
10 pages
10 1109@icict48043 2020 9112546
No ratings yet
10 1109@icict48043 2020 9112546
6 pages
Product Rating Through Sentiment Analysis
No ratings yet
Product Rating Through Sentiment Analysis
23 pages
Part C - Assignment No. 2 Mini-Project On Twitter
No ratings yet
Part C - Assignment No. 2 Mini-Project On Twitter
7 pages
IR Case Study Final Presentation
No ratings yet
IR Case Study Final Presentation
12 pages
Fake News Synopsis
No ratings yet
Fake News Synopsis
10 pages
Implementation of Sentiment Analysis On Twitter Data
No ratings yet
Implementation of Sentiment Analysis On Twitter Data
6 pages
A Review On Twitter Sentiment Analysis Approaches
No ratings yet
A Review On Twitter Sentiment Analysis Approaches
5 pages
Twitter Sentiment Analysis Using Python
No ratings yet
Twitter Sentiment Analysis Using Python
21 pages
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
No ratings yet
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
14 pages
Sentiment Analysis On Twitter Using Streaming Api: Abstract
No ratings yet
Sentiment Analysis On Twitter Using Streaming Api: Abstract
5 pages
Sentiment Analysis of Tweets Using Machine Learning
No ratings yet
Sentiment Analysis of Tweets Using Machine Learning
22 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
4 pages
Abstract
No ratings yet
Abstract
2 pages
The Three Lines of Defence: Audit Committee Institute
No ratings yet
The Three Lines of Defence: Audit Committee Institute
4 pages
Finalreview 1
No ratings yet
Finalreview 1
4 pages
Senti bp1
No ratings yet
Senti bp1
2 pages
FML Project Report
No ratings yet
FML Project Report
18 pages
IJCRT2207068
No ratings yet
IJCRT2207068
5 pages
Cement Mill Certificate
100% (2)
Cement Mill Certificate
1 page
Sypnosis: Twitter Sentimental Analysis
No ratings yet
Sypnosis: Twitter Sentimental Analysis
3 pages
Twitter Sentiment Analysis With Textblob
No ratings yet
Twitter Sentiment Analysis With Textblob
6 pages
Social Media Sentiment
No ratings yet
Social Media Sentiment
8 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
3 pages
Effective Sentiment Analysis of Twitter With Apache Spark
No ratings yet
Effective Sentiment Analysis of Twitter With Apache Spark
8 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
7 pages
Machine Learning Algorithm For Sentimental Analysis of Twitter Feeds
No ratings yet
Machine Learning Algorithm For Sentimental Analysis of Twitter Feeds
4 pages
DA Project Report
No ratings yet
DA Project Report
17 pages
Crowd Sourcing Platform IEEE Paper 1
No ratings yet
Crowd Sourcing Platform IEEE Paper 1
7 pages
Lesson Plan Subject/Grade Unit/Skill/Topic of Lesson Standards Addressed Va:Re9.1. 2 Va:Cr2.1.2 Vacr3.1.2
100% (1)
Lesson Plan Subject/Grade Unit/Skill/Topic of Lesson Standards Addressed Va:Re9.1. 2 Va:Cr2.1.2 Vacr3.1.2
4 pages
Twitter Sentiment Analysis Using Machine Learning Algorithms IJERTV12IS070128
No ratings yet
Twitter Sentiment Analysis Using Machine Learning Algorithms IJERTV12IS070128
3 pages
IJRPR6548
No ratings yet
IJRPR6548
5 pages
Sentiment Analysis of Online Data For Business Analytics: Synopsis
No ratings yet
Sentiment Analysis of Online Data For Business Analytics: Synopsis
6 pages
Fin Ijprems1714118825
No ratings yet
Fin Ijprems1714118825
6 pages
Sentiment Analysis of Tweets Using Python: Dr. Ritesh Srivastava, Bharat Singh, Choudhary Rishab Kumar, Prashant Raj
No ratings yet
Sentiment Analysis of Tweets Using Python: Dr. Ritesh Srivastava, Bharat Singh, Choudhary Rishab Kumar, Prashant Raj
4 pages
Date Reference Description Valuedate Deposit Withdrawal Balance
No ratings yet
Date Reference Description Valuedate Deposit Withdrawal Balance
26 pages
Proposalwriting
No ratings yet
Proposalwriting
16 pages
Sentiment Analysis For Promotional Campaigns: 1 Sameer Mulani 2 Nikhat Pathan
No ratings yet
Sentiment Analysis For Promotional Campaigns: 1 Sameer Mulani 2 Nikhat Pathan
3 pages
MCC CRC Implemenation Guide Edited
No ratings yet
MCC CRC Implemenation Guide Edited
38 pages
Dsdm-Unit1 241031 194317
No ratings yet
Dsdm-Unit1 241031 194317
38 pages
1730083731684.CB - VI - Art Integrated Project
100% (1)
1730083731684.CB - VI - Art Integrated Project
5 pages
Study Guide Chapter 8. The Teaching of Araling Panlipunan
No ratings yet
Study Guide Chapter 8. The Teaching of Araling Panlipunan
5 pages
Dsdm-Unit2 241031 194258
No ratings yet
Dsdm-Unit2 241031 194258
28 pages
Ok Java Case Study
No ratings yet
Ok Java Case Study
18 pages
Template Resource Mobilization
No ratings yet
Template Resource Mobilization
14 pages
Rewriting The Classics Argumentative Essay by Lucienne Tanios
No ratings yet
Rewriting The Classics Argumentative Essay by Lucienne Tanios
2 pages
Sap Abap On Hana-3
No ratings yet
Sap Abap On Hana-3
51 pages
MCQ Class 2 MS Word
No ratings yet
MCQ Class 2 MS Word
11 pages
T-HA Series: Panasonic Industrial Company
No ratings yet
T-HA Series: Panasonic Industrial Company
6 pages
159.52 101870341003 101870349999 Heating Climatic Unit
No ratings yet
159.52 101870341003 101870349999 Heating Climatic Unit
5 pages
X4751 enUS 4751 CementIndustryBrochure 010920
No ratings yet
X4751 enUS 4751 CementIndustryBrochure 010920
12 pages
Mosi Debat
No ratings yet
Mosi Debat
8 pages
Dsdm-Unit5 241031 194411
No ratings yet
Dsdm-Unit5 241031 194411
13 pages
28 October 2023 Current Affairs English
No ratings yet
28 October 2023 Current Affairs English
11 pages
60. Đề Thi Thử TN THPT 2021 - Môn Tiếng Anh - Sở GD & ĐT Hưng Yên - File Word Có Lời Giải
No ratings yet
60. Đề Thi Thử TN THPT 2021 - Môn Tiếng Anh - Sở GD & ĐT Hưng Yên - File Word Có Lời Giải
6 pages
Curriculum Vitae Indra Hermawan
No ratings yet
Curriculum Vitae Indra Hermawan
1 page
Fluostar 2L
No ratings yet
Fluostar 2L
1 page
EMR System UI Design
No ratings yet
EMR System UI Design
3 pages
Invoice 10
No ratings yet
Invoice 10
1 page
Age 60 Years or 30 Years Service
No ratings yet
Age 60 Years or 30 Years Service
1 page
Footscan®v9 Software Packages
No ratings yet
Footscan®v9 Software Packages
1 page
10 Pile Foundation in Road Project
No ratings yet
10 Pile Foundation in Road Project
1 page
Python for Cybersecurity: Using Python for Cyber Offense and Defense
From Everand
Python for Cybersecurity: Using Python for Cyber Offense and Defense
Howard E. Poston, III
No ratings yet

DSDM Unit4

Uploaded by

DSDM Unit4

Uploaded by

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Subject Name: DATA SCIENCE AND DIGITAL MARKETING SYSTEM SubjectCode:U20CST718

Q) What are the steps to process the twitter data?

● Getting the data

○ Create a Twitter user account.

Q) What is Data Extraction?

Q) What is Rate limit/paging?

Q) What are Streaming API streams?

Q)Define Precision and Recall

Q)Define K-fold cross validation

Q)Explain about REST API Search endpoint

Q)Explain the challenges and limitations of sentimental analysis in analyzing twitter

You might also like