DSDM Unit2
DSDM Unit2
2 MARKS
Q) WHAT IS AN API?
An API stands for Application programming interface.It is the medium that allows the
exchange of data points between a service and the programmer or user.It is the Interface
that can be thought of as a contract of service between two applications. This contract defines
how the two communicate with each other using requests and responses.
• RESTful API
• Stream API
Social media APIs have many advantages. The main advantages are:
• Rate limits
• API changes
• Legal
Q)What are the connecting principles of API'S?
General connecting principles of API’S
• APP registration: Almost every social media platform needs you to register your application
on their website. It involves entering personal information and the objectives in using their
API services. This step results in the generation of certain keys, which are called
authentication and consumer keys.
• Authentication: Use the consumer keys (also called authentication keys) generated from the
previous step to authenticate your application.
• API endpoint hunting: The API endpoints will be different for each provider, so it is
necessary to read the provided documentation to identify which end points best correspond to
your needs.
Q)What is OAuth?
OAuth is simply an authorization protocol that allows users to share data with an application
without sharing the password. It is a way to obtain a secure authorization scheme based on a
token-based authorization mechanism. There are two API authentication models using
OAuth: OAuth1 and OAuth2
2. Creating an application:
3. Obtaining access tokens:
4. Authorizing HTTP requests (optional):
5. Setting up permission scopes (optional):
6. Connecting to the API using obtained access tokens:
Q)What is GitHub?
GitHub is one of the most important platforms for computer programmers and hobbyists. Its
main goal is to host source code repositories and empower open source communities to work
together on new technologies. The platform contains lots of valuable information about what is
happening in the community of technology enthusiasts, what the trends are, what programming
languages have started to emerge, and much more. We will use the data from GitHub to predict
the trending technologies of the future.
Q)What is YouTube?
YouTube is certainly the most popular video sharing social network and helps users to share and
monetize their media content. It has a very rich content ranging from amateur users to
professionals recording quality videos. On top of the media content it contains different kinds of
data such as comments, statistics, or captions automatically extracted from video sound. The
main advantage of YouTube is the number of users and the volume of new videos uploaded
every day. These numbers are huge and increase every day, making a data goldmine of this
social media platform.
Q)What is Pinterest?
Pinterest has become one of the most important photo sharing platforms over the last few years.
It allows users to share photos found on the internet with other users by creating pins. In our
further analysis we will analyze the content and relationships between users. In order to gather
content, we have to establish a connection to the Pinterest API.
Q)What is Encoding?
Data type and encoding Comments and conversation are textual data that we retrieve as strings.
In brief, a string is a sequence of characters represented by code points. Every string in Python
is seen as a Unicode covering the numbers from 0 through 0x10FFFF (1,114,111 decimal).
Then, the sequence has to be represented as a set of bytes (values from 0 to 255) in memory.
The rules for translating a Unicode string into a sequence of bytes are called encoding.
Q)What is Preprocessing?
Preprocessing is one of the most important parts of the analysis process. It reformats the
unstructured data into uniform, standardized form. The characters, words, and sentences
identified at this stage are the fundamental units passed to all further processing stages.
Q)What is MongoDB?
Q) API’s in nutshell:
An API is the medium that allows the exchange of data points between a service and the
programmer or user. API concepts have been widely used in the software industry when we
needed different software to exchange data with with another. Mobile and internet
applications have been using web services and APIs to enrich information from external
sources. Social media also started creating APIs to share their data with third-party
application developers. The popularity of data science has made APIs emerge also as a
source for mining and knowledge creation. The nature of all social media is different, so are
their APIs. The steps involved in making a connection may not differ greatly, but the data
points we capture do
RESTful API:
REST stands for Representational State Transfer and it relies on the HTTP protocol for data
transfer between machines. It has been created to simplify the transfer of data between
machines unlike previous web services such as CORBA, RPC, and SOAP. Since the
architecture of REST uses the HTTP protocol, it would be fair to assume that the WWW itself
is based on RESTful design. Two of the most important uses of RESTful services are:
• GET: Procedure to receive data from a distant machine
• PUT: Procedure to write data to a distant machine
Almost all the functionalities of a REST API can be used through the preceding two
methods.
Stream API:
You need a Stream API when the requirement is to collect data in real time, instead of
backdated from the platform. The Stream API of Twitter is widely used to collect real-time
data from Twitter. The output is quite similar to that of a REST API apart from the real-time
aspect. We'll see examples of the Twitter Stream API and its outputs.
Advantages of social media APIs are:
• Social data: APIs allow you to extract valuable data around Social Media users and
content that is used for behavioral analysis and user insights.
• App development: Thousands of software and applications have been built using
Social Media APIs that provide additional services on top of Social Media platforms.
• Marketing: Social media APIs are useful in automating marketing activities such as
social media marketing by posting on platforms. It also helps in enriching marketing
data through Social Data acquired about customers.
Limitations of social media APIs:
• Rate limits: Social media companies need to take into account the amount of data
that enters or leaves their systems. These are rules based on their infrastructural
limitations and business objectives. We must not think of acquiring unlimited amounts
of data at our own speeds. The amount of data and the speed of receiving are clearly
stated by most social media platforms. We have to read them carefully and include
them in our extraction strategy.
• API changes: This is one of the biggest challenges to deal with when developing
applications or analysis using social data. Social media platforms are free to change
or stop their API services own will. Such kinds of change or stoppage could severely
impact development or analytics strategies. The only advice in such situations is to
be prepared for it and have flexible systems to be able to adapt to the changes.
• Legal: This challenge is mainly in the use cases around social media APIs. The rules
and regulations for social media platforms are strict about the type of usage of its
data and services. We have to be conscious of the legal framework before thinking of
our usage and applications. Any use of data from APIs that doesn't conform to the
stipulated regulations risks legal implications.
What is OAuth?
OAuth is simply an authorization protocol that allows users to share data with an application
without sharing the password. It is a way to obtain a secure authorization scheme based on
a token-based authorization mechanism.
There are two API authentication models using OAuth:
• User authentication
• Application authentication.
User authentication: This is the most common form of resource authentication
implementation. The signed request both identifies an application's identity in addition to the
identity accompanying granted permissions of the end user making API calls on behalf of,
represented by the user's access token.
Application authentication: Application authentication is a form of authentication where the
application makes API requests on its own behalf, without a user context. API calls are often
rate limited per API method, but the pool each method draws from belongs to your entire
application at large, rather than from a per-user limit.
For the purposes of social media analysis, we will use in most cases application
authentication by creating an application on each social media platform that will query the
related API.
There are several steps that are required to put in place a client with OAuth
authorization:
1. Creating a user/developer account: First of all, you have to register a user/developer
account and provide personal information such as a valid email address, name, surname,
country, and in many cases a valid telephone number (the verification process is done by
sending you a text message with a code).
2. Creating an application: Once you create your account, you will have access to a
dashboard, which is very often called a developer console. It provides all the functionalities
to manage your developer account, create and delete applications, or monitor your quota. In
order to obtain access credentials you will have to create your first application via this
interface.
3. Obtaining access tokens: Then, you generate access tokens for your application and
save them in a safe place. They will be used in your code to create an OAuth connection to
the API
. 4. Authorizing HTTP requests (optional): Some APIs require HTTP request
authorization, which means that a request has to contain an additional authorization header
that provides the server with information about the identity of the application and permission
scope.
5. Setting up permission scopes (optional): Some APIs have the notion of multilevel
permissions. In that case when you generate your API key you need to specify the scope for
the key. Scope here refers to a set of allowed actions. Therefore, in cases where an
application attempts an action that is out of its scope, it will be refused. This is designed as
an additional security layer. Ideally one should use multiple API keys, each with restricted
scopes, so that in the scenario where your API key is hijacked, due to the restrictions in its
scope the level of potential harm is restricted.
6. Connecting to the API using obtained access tokens: When all the preceding steps
are configured, you can make requests using your access tokens. Now, the only limitation is
the request quota, which depends on each platform.
If you are using the OAuth protocol, you import the related library:
Then, you have to create your authenticated connection using access tokens and application
keys that you will find in the developer console:
POST requests:
Also, a whole range of additional requests:
In order to parse the outputs, you can use different methods such as:
• r.text(): This gets a string with request outputs
• r.json(): This gets JSON with request outputs
• r.encoding(): This checks the encoding of the output
Similarly, we will use an endpoint URL for the Streaming API that returns a random sample
stream of statuses:
Firstly, we encode our query. We have chosen to search for three car brands: BMW,
Mercedes, and Audi:
Then we execute a search request using our query and OAuth client:
The request returned a list of tweets with all the meta information. We will convert it to JSON
and print the content of each tweet we find under the text field.
Similarly, we make a request to the Streaming API to get all recent tweets:
We keep iterating through all the lines that are being returned.
If the line exists we decode it to UTF-8 to make sure we manage the encoding issues and
then we print a field text from JSON.
There are small differences between versions, mostly in available endpoints, resources,
and parameters. We use the basic functionalities of this API so switching between
versions should not cause any problems in terms of endpoints and resources, but we
have to check the documentation to pass the right arguments.
Q)Explain the API processes of GITHUB
GitHub is one of the most important platforms for computer programmers and hobbyists. Its main
goal is to host source code repositories and empower open source communities to work together
on new technologies. The platform contains lots of valuable information about what is happening
in the community of technology enthusiasts, what the trends are, what programming languages
have started to emerge, and much more. We will use the data from GitHub to predict the trending
technologies of the future. .
Selecting the endpoint
The queries in our further project will be mostly based on searches within different repositories.
In order to obtain results based on our criteria we will use the following endpoint:
There are multiples endpoints that we will be useful for network analysis. There are three main
objects that we can get with the Pinterest API:
• User
• Board
• Pins
Q)What are the basic cleaning processes?
Social media contains different types of data: information about user profiles, statistics
(number of likes or number of followers), verbatims, and media. Quantitative data is very
convenient for an analysis using statistical and numerical methods, but unstructured data
such as user comments is much more challenging. To get meaningful information, one
has to perform the whole process of information retrieval. It starts with the definition of
the data type and data structure. On social media, unstructured data is related to text,
images, videos, and sound and we will mostly deal with textual data. Then, the data has
to be cleaned and normalized. Only after all these steps can we delve into the analysis.
Data type and encoding
Comments and conversation are textual data that we retrieve as strings. In brief, a string
is a sequence of characters represented by code points. Every string in Python is seen
as a Unicode covering the numbers from 0 through 0x10FFFF (1,114,111 decimal).
Then, the sequence has to be represented as a set of bytes (values from 0 to 255) in
memory. The rules for translating a Unicode string into a sequence of bytes are called
encoding.
Encoding plays a very important role in natural language processing, because people
use more and more characters such as emojis or emoticons, which replace whole words
and express emotions . Moreover, in many languages there are accents that go beyond
the regular English alphabet. In order to deal with all the processing problems that might
be caused by these we have to use the right encoding, because comparing two strings
with different encodings is actually like comparing apples and oranges. The most
common one is UTF-8, used by default in Python 3, which can handle any type of
character. As a rule of thumb always normalize your data to Unicode UTF-8.
Structure of data
Better solution is to store the data in a tabular format in pandas dataframe, which has
multiple advantages for further processing. First of all, rows are indexed, so search
operations become much faster. There are also many optimized methods for different
kinds of processing and above all it allows you to optimize your own processing by using
functional programming. Moreover, a row can contain multiple fields with metadata about
verbatims, which are very often used in our analysis. It is worth remembering that the
dataset in pandas must fit into RAM memory. For bigger datasets we suggest the use of
SFrames.