0% found this document useful (0 votes)
4 views30 pages

DSDM Unit4

Ui

Uploaded by

manekandan8214
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views30 pages

DSDM Unit4

Ui

Uploaded by

manekandan8214
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

2Marks

Q)What are the steps to process the twitter data?

The following are the steps to gather the data from the Twitter feeds:

● Getting the data


● Data pull
● Data cleaning

Q) What are the steps to getting the Twitter API?


Getting Twitter API keys Firstly, you will need to have a Twitter account and obtain
credentials (consumer key, consumer secret, access token, and access secret) on the
Twitter developer platform to access the Twitter API, following these steps .
○ Create a Twitter user account.
○ Log in with your Twitter user account at
○ Click Create New App.
○ Fill out the form, agree to the terms, and click on Create your Twitter
application.
○ Go to the next page, click on the Keys and Access Tokens tab, and copy
your API key and API secret. Scroll down and click on Create my access
token, and copy your Access token, and Access token secret.

Q) What is Data Extraction?


Having all required authorization keys, we can prepare the toolset for data retrieval. The
Twitter API gives several ways to extract the data, but we will focus on two main
methods:
->Keyword search query to obtain recent, historical tweets
->Streaming facility, to obtain tweets as they are posted

Q) What is Rate limit/paging?


After successful connection to the API, we have to prepare our scripts for data retrieval.
As the API limits data access (Rate Limits), it is necessary to build an efficient
workflow.Twitter allows you to get up to 100 tweets per one call . If we want to retrieve
more and we need to remember already downloaded tweets'IDs not to extract the same
tweets during next calls. This procedure is commonly called paging.
Q) What are Streaming API streams?
Streaming API Another method of obtaining information from Twitter is the streaming
API. It gives access to Twitter's global stream of data. There are several basic
streaming endpoints, each customized to certain use cases. Based on the Twitter
documentation: Public streams: Streams of the public data flowing through Twitter. It is
suitable for following specific users or topics, and data mining. User streams: These are
single-user streams, containing roughly all of the data corresponding with a single user's
view of Twitter.

Q)Define Sentiment Analysis


Sentiment analysis involves classifying comments or opinions in text into categories
such as "positive" or "negative" often with an implicit category of "neutral". A classic
sentiment application would be tracking what people think about different topics.
Sentiment analysis in data science and machine learning is also called "opinion mining"
or in marketing terminology "voice of the customer".
Q)Define VADER
VADER ( Valence Aware Dictionary for Sentiment Reasoning) is a model used for text
sentiment analysis that is sensitive to both polarity (positive/negative) and intensity
(strength) of emotion. It is available in the NLTK package and can be applied directly to
unlabeled text data.

Q) What does the preparation of a custom classifier require ?


The preparation of a custom classifier requires two data sets:
Training data set: The data on which the classifier algorithm learns the model
parameters
Test data set: This is used to determine the accuracy of the algorithm

Q) What is a confusion matrix?


A confusion matrix is a technique for summarizing the performance of a classification
algorithm. It provides information of what the classification model is getting right and
what types of errors it is making. Predictions of the results on a classification problem
are usually visualized by the following matrix :
Q)Define Precision and Recall

Q)Define K-fold cross validation


The input data is split into K parts where one is reserved for testing, and the other K-1
for training. This process is repeated K times and the evaluation metrics are averaged.
This helps in determining how well a model would generalize to new datasets.

Q)Define NER
Named-entity recognition is a subtask of information extraction that seeks to
locate and classify named entities mentioned in unstructured text into predefined
categories such as person names, organizations, locations, medical codes, time
expressions, quantities, monetary values, percentages, etc.
5/10 Marks
Q)Explain about REST API Search endpoint
Q)Explain about Rate Limit paging
Q)Explain about Streaming API
Q)Explain about Data pull and Data Extraction with example
Q)Explain about sentiment analysis
Q)Explain about customized sentiment analysis
\
Q)Explain about Named Entity Recognition
Q)Explain the process of combining NER and sentiment analysis

You might also like