0% found this document useful (0 votes)
24 views31 pages

DSDM Unit4

Dsdma notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views31 pages

DSDM Unit4

Dsdma notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Subject Name: DATA SCIENCE AND DIGITAL MARKETING SYSTEM SubjectCode:U20CST718


Prepared By:
Mr.S.KUMARAKRISHNAN, Asst.Prof/ CSE
Mrs.P.BHAVANI, Asst.Prof / CSE
Mrs.R.DEEPA, Asst.Prof / CSE
Verified by: Approved by:
2Marks

Q) What are the steps to process the twitter data?

The following are the steps to gather the data from the Twitter feeds:

● Getting the data


● Data pull
● Data cleaning
Q) What are the steps to getting the Twitter API?
Getting Twitter API keys Firstly, you will need to have a Twitter account and obtain
credentials (consumer key, consumer secret, access token, and access secret) on the
Twitter developer platform to access the Twitter API, following these steps .

○ Create a Twitter user account.


○ Log in with your Twitter user account at
○ Click Create New App.
○ Fill out the form, agree to the terms, and click on Create your Twitter
application.
○ Go to the next page, click on the Keys and Access Tokens tab, and copy
your API key and API secret. Scroll down and click on Create my access
token, and copy your Access token, and Access token secret.

Q) What is Data Extraction?


Having all required authorization keys, we can prepare the toolset for data retrieval. The
Twitter API gives several ways to extract the data, but we will focus on two main
methods:
->Keyword search query to obtain recent, historical tweets
->Streaming facility, to obtain tweets as they are posted

Q) What is Rate limit/paging?


After successful connection to the API, we have to prepare our scripts for data retrieval.
As the API limits data access (Rate Limits), it is necessary to build an efficient
workflow.Twitter allows you to get up to 100 tweets per one call . If we want to retrieve
more and we need to remember already downloaded tweets'IDs not to extract the same
tweets during next calls. This procedure is commonly called paging.

Q) What are Streaming API streams?


Streaming API Another method of obtaining information from Twitter is the streaming
API. It gives access to Twitter's global stream of data. There are several basic
streaming endpoints, each customized to certain use cases. Based on the Twitter
documentation: Public streams: Streams of the public data flowing through Twitter. It is
suitable for following specific users or topics, and data mining. User streams: These are
single-user streams, containing roughly all of the data corresponding with a single user's
view of Twitter.
Q)Define Sentiment Analysis
Sentiment analysis involves classifying comments or opinions in text into categories
such as "positive" or "negative" often with an implicit category of "neutral". A classic
sentiment application would be tracking what people think about different topics.
Sentiment analysis in data science and machine learning is also called "opinion mining"
or in marketing terminology "voice of the customer".

Q)Define VADER
VADER ( Valence Aware Dictionary for Sentiment Reasoning) is a model used for text
sentiment analysis that is sensitive to both polarity (positive/negative) and intensity
(strength) of emotion. It is available in the NLTK package and can be applied directly to
unlabeled text data.
Q) What does the preparation of a custom classifier require ?
The preparation of a custom classifier requires two data sets:
Training data set: The data on which the classifier algorithm learns the model
parameters
Test data set: This is used to determine the accuracy of the algorithm
Q) What is a confusion matrix?
A confusion matrix is a technique for summarizing the performance of a classification
algorithm. It provides information of what the classification model is getting right and
what types of errors it is making. Predictions of the results on a classification problem
are usually visualized by the following matrix :

Q)Define Precision and Recall

Q)Define K-fold cross validation


The input data is split into K parts where one is reserved for testing, and the other K-1
for training. This process is repeated K times and the evaluation metrics are averaged.
This helps in determining how well a model would generalize to new datasets.
Q)Define NER
Named-entity recognition is a subtask of information extraction that seeks to
locate and classify named entities mentioned in unstructured text into predefined
categories such as person names, organizations, locations, medical codes, time
expressions, quantities, monetary values, percentages, etc
Q) Differntiate between model performance evaluation and cross validation(JAN
2024)(2 MARKS)

Model performance evaluation measures how well a trained model performs on unseen
data using metrics like accuracy, precision, recall, and F1-score. Cross-validation, on
the other hand, is a technique for assessing the generalizability of a model by
partitioning data into subsets, training on some while validating on others iteratively.
Q)What is the purpose of labelling data in sentimental analysis.(JAN 2024)(2
MARKS)
Labeling data in sentiment analysis serves to assign sentiment categories (e.g., positive, negative,
neutral) to text samples. This labeled data is essential for training machine learning models to
recognize and predict sentiments in new, unseen text. Accurate labeling provides a foundation for
supervised learning, enabling models to learn patterns and improve prediction accuracy

5/10 Marks

Q)Explain about REST API Search endpoint


Q)Explain about Rate Limit paging
Q)Explain about Streaming API
Q)Explain about Data pull and Data Extraction with example
Q)Explain about sentiment analysis(JAN 2024)(10 MARKS)
Q)Explain about customized sentiment analysis
\
Q)Explain about Named Entity Recognition
Q)Explain the process of combining NER and sentiment analysis

Q)Explain the challenges and limitations of sentimental analysis in analyzing twitter


data.(JAN 2024)(5 MARKS)
Introduction
Sentiment analysis, or opinion mining, involves using natural language processing (NLP), text
analysis, and computational linguistics to identify and extract subjective information from text. On
platforms like Twitter, sentiment analysis aims to determine the sentiment behind tweets—whether
they express positive, negative, or neutral opinions. However, analyzing sentiment on Twitter
poses numerous challenges and limitations. This comprehensive exploration delves into the
various issues that complicate sentiment analysis of Twitter data.
Challenges and Limitations
1. Short and Ambiguous Text
Twitter's character limit (originally 140 characters, now 280) forces users to condense their
thoughts, often leading to ambiguous and contextually thin statements. This brevity can obscure
the true sentiment:
 Incompleteness: Short tweets may lack context, making it hard to understand the
sentiment.
 Ambiguity: Brief messages can be vague and open to multiple interpretations.
2. Use of Slang, Jargon, and Informal Language
Twitter users often employ slang, abbreviations, and internet-specific jargon. This informal
language is challenging for traditional NLP algorithms to interpret accurately:
 Evolving Language: New slang and expressions constantly emerge, requiring models to be
continuously updated.
 Non-Standard Grammar: Tweets frequently break conventional grammatical rules, further
complicating analysis.
3. Sarcasm and Irony
Sarcasm and irony are prevalent on Twitter and are particularly challenging for sentiment analysis
tools:
 Implicit Sentiment: Sarcastic statements often convey the opposite sentiment of the literal
text.
 Detection Complexity: Recognizing sarcasm requires understanding context, tone, and
sometimes even the user's historical tweets.
4. Emojis and Emoticons
Emojis and emoticons are widely used to express emotions and sentiments, but interpreting them
correctly poses several challenges:
 Varied Meanings: The same emoji can have different meanings in different contexts.
 Combination Use: Emojis are often used in combination, and their collective sentiment can
be complex to decipher.
5. Contextual Dependencies
Understanding the sentiment of a tweet often requires contextual information, which is not always
available:
 External Context: Tweets referencing external events or previous conversations require
background knowledge to interpret accurately.
 User Context: A user’s historical tweets and behavior can provide context that influences
the sentiment of their current tweet.
6. High Volume and Velocity
The immense volume and rapid generation of tweets present scalability issues:
 Real-Time Analysis: Processing and analyzing tweets in real-time requires significant
computational resources.
 Data Overload: Handling and storing the vast amounts of data generated on Twitter can be
overwhelming.
7. Spam and Noise
Twitter is rife with spam, automated bots, and irrelevant content, which can skew sentiment
analysis:
 Bot Activity: Bots can flood the platform with repetitive or biased content.
 Filtering Challenges: Distinguishing meaningful tweets from spam and noise requires
sophisticated filtering techniques.
8. Multilingual Tweets
Twitter’s global user base results in a mix of languages, posing additional difficulties:
 Language Identification: Detecting the language of a tweet is the first step, which itself can
be challenging with short texts.
 Multilingual Analysis: Each language requires its own sentiment analysis model,
complicating the overall process.
9. Mixed Sentiments
Tweets can express mixed sentiments, making it hard to classify them as purely positive, negative,
or neutral:
 Complex Sentiments: A single tweet might express different sentiments towards different
aspects of a topic.
 Granularity: Fine-grained sentiment analysis is needed to handle mixed sentiments,
increasing complexity.
10. Cultural Nuances
Cultural differences can influence the interpretation of sentiments:
 Cultural Context: Understanding how different cultures express sentiment is crucial for
accurate analysis.
 Local Slang: Regional slang and expressions add another layer of complexity.
11. Temporal Dynamics
Sentiments can change over time, influenced by ongoing events and trends:
 Event-Driven Sentiments: Significant events can cause rapid shifts in sentiment.
 Historical Context: Analyzing how sentiment evolves over time requires historical data,
adding to the data processing burden.
12. Bias in Training Data
The quality and representativeness of training data affect the accuracy of sentiment analysis
models:
 Biases in Data: Training data may contain inherent biases that can skew the analysis.
 Diverse Representation: Ensuring that the training data represents diverse opinions and
demographics is essential for unbiased results.
13. Evaluation Metrics
Measuring the performance of sentiment analysis models on Twitter data is challenging:
 Ground Truth: Establishing a reliable ground truth for sentiment labels is difficult due to
subjective interpretations.
 Evaluation Standards: Developing standard evaluation metrics that capture the nuances of
sentiment analysis is complex.

You might also like