0% found this document useful (0 votes)

26 views18 pages

DS Retest

Uploaded by

manekandan8214

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views18 pages

DS Retest

Uploaded by

manekandan8214

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

2 MARK

What is data extraction?

Data extraction is the process of collecting or retrieving disparate types of data from a variety of
sources, many of which may be poorly organized or completely unstructured.

What is Mongo DB?

MongoDB is a source-available cross-platform document-oriented database program. Classified as a
NoSQL database program, MongoDB uses JSON-like documents with optional schemas

What is the Facebook API?

The Facebook Graph API is an HTTP-based API that allows developers to extract data and functionality
from the Facebook platform. Applications can use this API to programmatically query data, post in
pages and groups, and manage ads, among other tasks.

What Is Keyword Analysis?

Keyword analysis is the process of analyzing the keywords or search phrases that bring visitors to your
website through organic and paid search. As such, keyword analysis is the starting point and cornerstone
of search marketing campaigns.

What is Feature Extraction?

Feature extraction refers to the process of transforming raw data into numerical features that can be
processed while preserving the information in the original data set. It yields better results than applying
machine learning directly to the raw data.

jv
5 MARK

Process of data cleaning:

Step 1: Remove duplicate or irrelevant observations
Remove unwanted observations from your dataset, including duplicate observations or
irrelevant observations. Duplicate observations will happen most often during data collection.
When you combine data sets from multiple places, scrape data, or receive data from clients or
multiple departments, there are opportunities to create duplicate data. De-duplication is one of
the largest areas to be considered in this process. Irrelevant observations are when you notice
observations that do not fit into the specific problem you are trying to analyze. For example, if
you want to analyze data regarding millennial customers, but your dataset includes older
generations, you might remove those irrelevant observations. This can make analysis more
efficient and minimize distraction from your primary target—as well as creating a more
manageable and more performant dataset.

Step 2: Fix structural errors

Structural errors are when you measure or transfer data and notice strange naming
conventions, typos, or incorrect capitalization. These inconsistencies can cause mislabeled
categories or classes. For example, you may find “N/A” and “Not Applicable” both appear, but
they should be analyzed as the same category.

Step 3: Filter unwanted outliers

Often, there will be one-off observations where, at a glance, they do not appear to fit within the
data you are analyzing. If you have a legitimate reason to remove an outlier, like improper data-
entry, doing so will help the performance of the data you are working with. However,
sometimes it is the appearance of an outlier that will prove a theory you are working on.
Remember: just because an outlier exists, doesn’t mean it is incorrect. This step is needed to
determine the validity of that number. If an outlier proves to be irrelevant for analysis or is a
mistake, consider removing it.

Step 4: Handle missing data

You can’t ignore missing data because many algorithms will not accept missing values. There
are a couple of ways to deal with missing data. Neither is optimal, but both can be considered.

1. As a first option, you can drop observations that have missing values, but doing this will drop or
lose information, so be mindful of this before you remove it.

2. As a second option, you can input missing values based on other observations; again, there is an
opportunity to lose integrity of the data because you may be operating from assumptions and
not actual observations.

3. As a third option, you might alter the way the data is used to effectively navigate null values.
jv
Step 5: Validate and QA
At the end of the data cleaning process, you should be able to answer these questions as a part
of basic validation:

• Does the data make sense?

• Does the data follow the appropriate rules for its field?

• Does it prove or disprove your working theory, or bring any insight to light?

• Can you find trends in the data to help you form your next theory?

• If not, is that because of a data quality issue?

False conclusions because of incorrect or “dirty” data can inform poor business strategy and
decision-making. False conclusions can lead to an embarrassing moment in a reporting meeting
when you realize your data doesn’t stand up to scrutiny. Before you get there, it is important to
create a culture of quality data in your organization. To do this, you should document the tools
you might use to create this culture and what data quality means to you.

2)Store and access social data

Setting up the environment

MongoDB requires a data directory to store all the data. The directory can be created in your working
directory:

md datadb

Starting MongoDB
We need to go to the folder where mongod.exe is stored and and run the following command:

cmd binmongod.exe
Once the MongoDB server is running in the background, we can switch to our Python environment to
connect and start working.

MongoDB using Python

MongoDB can be used directly from the shell command or through programming languages. For the
sake of our book we’ll explain how it works using Python. MongoDB is accessed using Python through a
driver module named PyMongo.

jv
We will not go into the detailed usage of MongoDB, which is beyond the scope of this book. We will see
the most common functionalities required for analysis projects. We highly recommend reading the official
MongoDB documentation.

PyMongo can be installed using the following command:

pip install pymongo

Then the following command imports it in the Python script

from pymongo import MongoClient

client = MongoClient('localhost:27017')
The database structure of MongoDB is similar to SQL languages, where you have databases, and inside
databases you have tables. In MongoDB you have databases, and inside them you have collections.
Collections are where you store the data, and databases store multiple collections. As MongoDB is a
NoSQL database, your tables do not need to have a predefined structure, you can add documents of any
composition as long as they are a JSON object. But by convention is it best practice to have a common
general structure for documents in the same collections.

To access a database named scrapper we simply have to do the following:

db_scrapper = db.scrapper
To access a collection named articles in the database scrapper we do this:

db_scrapper = db.scrapper

collection_articles = db_scrapper.articles
Once you have the client object initiated you can access all the databases and the collections very easily.

Now, we will see how to perform different operations:

• Insert: To insert a document into a collection we build a list of new documents to insert into
the database:
docs = []

for _ in range(0, 10):

# each document must be of the python type dict

docs.append({

"author": "...",

"content": "...",

"comment": ["...", ... ]

})
Inserting all the docs at once:

jv
db.collection.insert_many(docs)
Or you can insert them one by one:

for doc in docs:

db.collection.insert_one(doc)

• Find: To fetch all documents within a collection:

# as the find function returns a cursor we will iterate over the cursor to actually fetch

# the data from the database

docs = [d for d in db.collection.find()]

To fetch all documents in batches of 100 documents:

batch_size = 100

Iteration = 0

count = db.collection.count() # getting the total number

of documents in the collection

while iteration * batch_size < count:

docs = [d for d in db.collection.find().skip(batch_size *

iteration).limit(batch_size)]

Iteration += 1
To fetch documents using search queries, where the author is Jean Francois:

query = {'author': 'Jean Francois'}

docs = [d for d in db.collection.find(query)

Where the author field exists and is not null:

query = {'author': {'$exists': True, '$ne': None}}

docs = [d for d in db.collection.find(query)]

There are many other different filtering methods that provide a wide variety of flexibility and precision;
we highly recommend taking your time going through the different search operators.

• Update: To update a document where the author is Jean Francois and set the attribute
published as True:
query_search = {'author': 'Jean Francois'}

query_update = {'$set': {'published': True}}

db.collection.update_many(query_search, query_update)
jv
Or you can update just the first matching document:

db.collection.update_one(query_search, query_update)

• Remove: Remove all documents where the author is Jean Francois:

query_search = {'author': 'Jean Francois'}

db.collection.delete_many(query_search, query_update)

Or remove the first matching document:

db.collection.delete_one(query_search, query_update)

• Drop: You can drop collections by the following:

db.collection.drop()

Or you can drop the whole database:

db.dropDatabase()
We saw how to store and access data from MongoDB. MongoDB has gained a lot of popularity and is
the preferred database choice for many, especially when it comes to working with social media data.

Detail about Mongo db using python:

Create a database in MongoDB using Python
MongoDB is a general-purpose, document-based, distributed database built for modern
application developers and the cloud. It is a document database, which means it stores
data in JSON-like documents. This is an efficient way to think about data and is more
expressive and powerful than the traditional table model. MongoDB has no separate
command to create a database. Instead, it uses the use command to create a database.
The use command is used to switch to the specific database. If the database name
specified after the use keyword does not exist, then a new database is created with the
specified name.
Creating a database using Python in MongoDB
To use Python in MongoDB, we are going to import PyMongo. From that, MongoClient
can be imported which is used to create a client to the database. Using the client, a new
database can be created. Example: List of databases using MongoDB shell

(before):

• Python3

jv
# import MongoClient

from pymongo import MongoClient

# Creating a client

client = MongoClient('localhost', 27017)

# Creating a database name GFG

db = client['GFG']

print("Database is created !!")

Output:
Database is created!!

In the above example, it is clearly shown how a database is created. When creating a
client, the local host along with its port number, which is 27017 here, is passed to the
MongoClient. Then, by using the client, a new database named ‘GFG’ is created. We can
check if the database is present in the list of databases using the following code:

• Python3

list_of_db = client.list_database_names()

if "mydbase" in list_of_db:

print("Exists !!")

Output:
Exists!!

List of Databases in MongoDB shell (after) :

4)Elaborate about Maximum shares and Maximum Likes in Fb

jv
If you have a Facebook Share or a Like button on your site, as we do on this page, you will see a
number alongside it which is “the sum of:

• The number of likes of your URL

• The number of shares of your URL (this includes copy/pasting a link back to Facebook)
• The number of likes and comments on stories on Facebook about your URL”

Thus the Facebook number you will most commonly see is the total for all interactions including
likes, shares and comments. However, are total Facebook interactions a good indication of
content resonance?

Many people have argued that Facebook shares are more powerful than ‘likes’ and you should
really just look at Facebook shares to see what content is resonating with people. From our
research things are not quite so simple and there are benefits in looking at total Facebook
interactions.

Here’s what we will cover as we explore these ideas:

• ‘Likes’ are the quick and easy way to share

• Do ‘likes’ tell us anything about users and content resonance?
• The argument for valuing Facebook Shares over ‘Likes’
• Do shares correlate higher with Google rankings?
• Summary

‘Likes’ are the quick and easy way to share

According to Facebook “the Like button is the quickest way for people to share content.”

The FB developer pages go on to say “A single click on the Like button will ‘like’ pieces of
content on the web and share them on Facebook. You can also display a Share button next to
the Like button to let people add a personal message and customize who they share with.”
See https://developers.facebook.com/docs/plugins/like-button

Facebook also says to users that “a ‘Like’ is a way to give positive feedback or to connect with
things you care about on Facebook. You can like content that your friends post to give them
feedback or like a Page that you want to connect with on Facebook.”

As an example of a ‘like’ being a quick way to share content, my partner liked a post by Andy
Murray tonight and it immediately appeared at the top of my News Feed along with the image,
as you can see below.

jv
Do ‘likes’ tell us anything about users and content resonance?
Some people argue that the like’ is the “lazy” option and means very little. It is quick and easy
to ‘like’ something, unlike commenting or sharing where users need to spend some time writing
about the post they are sharing. This is definitely true, shares are harder to earn. However, for
many busy users this is the advantage of the ‘like’ button and it means they can share more
content than they would otherwise.

A Marketo and Brian Carter study found that people were eight times as likely to ‘like’ as share
or comment.

The degree to which people ‘like’ content was revealed in a Pew Research Center survey. The
survey asked thousands of Americans about their social media sharing and found that 44% of

jv
Facebook users “liked” content posted by their friends at least once a day, with 29% did so
several times per day.

A further separate study published by the National Academy of Sciences found that by
analyzing the “likes” of 86,000 volunteers they could predict the characteristics of the person
with incredible accuracy. They could predict whether someone was:

• white or African American with 95% accuracy

• male or female with 93% accuracy
• a gay male with 88% accuracy

They found they could also predict a person’s political leaning, Democrat or Republican, with
85% accuracy. This would appear to indicate quite strongly that people ‘like’ content that
resonates with them.

Thus maybe, and I accept this goes against much conventional wisdom, the frequency and
nature of ‘likes’ is a more important indicator of content resonance than is often thought.
However, let’s look at the argument for shares and why these may be a better indicator.
The argument for valuing Facebook Shares over 'Likes'
Marketers generally place a higher value on Facebook sharing than on ‘likes’ according to
this AdWeek article. The argument for the greater value of shares runs along the lines that
sharing involves greater commitment and is more likely to mean content is shown in News
Feeds.

The first argument about commitment is certainly true, as whilst a ‘like’ is frictionless, to share
can require more effort, although there is now a ‘one click’ option in Facebook to simply share
with Friends without commenting, see image below.

However, to share wider you get a dialogue box which allows you to comment along with your
share and to control the image that gets displayed with the share. You can also control where
you share and who sees your share.

jv
The second argument for shares is the potential for shared content to be seen by far more
people. A share will for example show more clearly on your own profile page than a ‘like’ which
comes under recent activity. However, it is unlikely many people will go to your personal profile
page. What most people will see is their News Feed. It is claimed that the Edgerank algorithm,
which determines what shows up in the News Feed, gives far more weight to ‘shares’ than ‘likes’
leading to more visibility for content shared than ‘liked’. However, whilst this may be the case,
the appearance of articles in News Feed is a complex matter.

Facebook say:

“The goal of News Feed is to show you the stories that matter most to you. To do this, we use
ranking to order stories based on how interesting we believe they are to you: specifically, whom
you tend to interact with, and what kinds of content you tend to like and comment on.”
Thus the Edgerank algorithm looks at many factors. Facebook is also constantly updating the
algorithm and the control a user can exercise over what appears in a News Feed. For example,
the following are two updates:

June 2015 – Time spent viewing stories is added as a factor

Facebook commented:

“We’ve discovered that if people spend significantly more time on a particular story in News
Feed than the majority of other stories they look at, this is a good sign that content was relevant
to them.”

July 2015 – Introduction of more user controls over news feeds

Facebook commented:

“To help prioritize stories, and make sure you don’t miss posts from particular friends and Pages,
you can now select which friends and Pages you would like to see at the top of your News Feed.”
Facebook’s aim is to ensure what appears in a News Feed, other than sponsored paid content, is
relevant to the user. This clearly takes into account many factors as we can see from above,
including an individual’s privacy settings. Shares and likes are just one part of a more complex
picture about what appears and where in News Feeds. As organic reach declines there is a view
that to make sure your content is seen you really have to pay to play.
jv
Shares may also be more valuable as they can be driven by different reasons to ‘likes’.
The Marketo and Brian Carter study found that people share content to share tips and advice,
to warn people, to pass on deals, to show they are part of a community and to entertain their
friends with amusing or inspirational posts. ‘Likes’ may be used more, as Facebook indicate, to
give positive feedback to people and things you care about.

I think there is another potential argument in support of shares over ‘likes’ namely that shares
are a little less easy to automate than ‘likes’. Whilst Facebook is constantly rooting out false
accounts, ‘likes’ may be slightly more at risk to exaggeration than shares through automation
and non-human activity.
Do shares correlate higher with Google rankings?
Searchmetrics produce a respected report each year looking at the factors that distinguish well-
placed sites from those with lower positions in the Google search results. We took a look at
the 2014 report and its correlations for Facebook activities.

The report found, not surprisingly, that relevant, quality content ranks better on average, and is
“identifiable by properties such as a higher word-count and semantically comprehensive
wording.” However, the report also looked in detail at the correlations of various factors with
rankings.

The top ranking correlations from their latest survey are shown below. Please remember, as the
report’s authors are at pains to point out, that correlation does not mean causation.

I have highlighted the Facebook correlations which are of interest to us here. We can see that
Facebook shares have a slightly higher correlation than comments or ‘likes’ to Google
rankings. However, we can also see that the correlation of total Facebook interactions; namely
shares, likes and comments; is the same as that for Facebook shares. This does not mean that
higher Facebook activity causes higher Google rankings but it does show that total Facebook
interaction activity has the same correlation as Facebook shares.
jv
10 MARK
Different Tools of Google Analytics

Below is the list of Google analytics tool which helps you to improve the quality of data and
increase the traffic of the website:

1.Google Tag Assistant: Google Tag Assistant is a debugging tool. It is a chrome extension
that is used to check whether google analytics, award conversion tracking, google tag
manager, and others are workings properly or not. With the Google Tag Assistant tool, you
can troubleshoot any google analytics issues quickly and fix them immediately. The tag
assistant is easy to use. It gives all the tags present on the web page which you are visiting.
This tool gives automatic suggestions to solve problems. Google Tag Assistant helps you to
find invalid events, missing tags, filters, etc.

2.RegExr: Google analytics supports 13 regular expressions. Regular expressions can be

used to apply report filters, set segments and admin filters, and define ng funnel steps.
13 google expression supported by google analytics are pipe (|), dot(.), an asterisk (*), dot
asterisk(.*), backslash(\) , caret(^), dollar sign ($) , question mark (?), parenthesis (),square
brackets ([ ]) dashes(-), plus sign (+), and curly brackets ({}).

Google Analytics URL Builder: Google Analytics URL Builder lets you create a campaign URL
based on the current URL and then use automatic reporting tools that automatically track the
URL’s progress. Campaign URL can be used to track which promotions get traffic to the site.

4.Real-Time Reporting: The real-time reporting tool allows an organization to monitor

website activity in real-time, provides information about how users reacting to your
webpage, how much time they spend on each page, how many web pages they visit through
a link, the physical location of the user, etc. With the help of this tool organization also
analyze the users’ reaction to an email campaign.

5.Google Analytics Table Booster: Google Analytics Table Booster is a chrome extension used
to enhance google analytics’s data grid. To enhance the data grid, it provides three types of
visualization. Every row can use different types of visualization, or they can combine thre e
types. It is the best way to evaluate the performance of the data.

jv
6.GA Debugger: GA debugger is a google analytics debugger. It is a chrome extension that is
used for debugging the google analytics tracking code. It allows users to debug their website
and allows them to see how other websites implemented google analytics tracking. It is easy
to use. After adding GA debugger on chrome, to have to turn it on; after that, for an opening
console, you have to press ctrl+ Shift + i. It automatically starts debugging.

7.Google Tag Manager injector: Google Tag manager injector is an open-source Chrome
extension that is used to inject google tag manager container tags into web pages. The
advantage of using this tool is it doesn’t require any JavaScript code to be added to preview
the GTM, i.e. Google Tag Manager containers.

8.WASP2: WASP stands for Web Analytics Solution Profiler. It is used for debugging the
google analytics tracking issue. It also allows users to debug other tracking issues. With the
help of WASP.crawler, you can get detailed information about tags present on the website.
WASP can audit any content or tag on the website.

9.Mobile traffic behaviour: Mobile Internet search nearly doubled between 2012 and 2013 to
maximize your possibilities by developing a mobile website and measuring its traffic conduct.
This metric shows how the mobile market affects your association’s website traffic and will be
an indicator of your site’s customer experience.

10.Audience Location: Audience location allows you to know the audience’s physical location,
improving marketing and the business. It helps you to determine the User’s area of interest
and make high revenue. Audience location helps you analyze whether you are reaching the
right audience or not, the website’s traffic, set the best strategy for marketing, etc.

11.Events: This is the most important tool which records user activity on the webpage in real-
time. It records the user’s information, such as how the user scrolls the web page, navigates
other web pages of the site, how much time he spent, and many more. These activities help
you analyze the website speed, behaviour, and response time and improve the website by
solving problems.

12.Supermetrics Add-on: Google Analytics tools widely use Supermetrics. It provides different
products. However, Supermetrics is mostly used for Google Sheets. It allows gathering all
data from different tools into Google Sheet. After that, the user can create his own
dashboard or connect the google sheet with other applications. It has a Free and paid
version.

jv
2)Analyse about Facebook API interaction on DS
Facebook Graph API?
Facebook Graph API is the primary way to interact with Facebook programmatically, as already
stated earlier. With the Graph API, apps can read & write data to Facebook. All the Facebook
SDKs use the Graph API too. I will talk about the SDKs further down the article & how it is
different from the Graph API. For now, let’s focus on the API.

You must have come across Login Via Facebook feature on third party websites or apps. That is
made possible by the Graph API. The API uses OAuth token to let users’ login in into apps via
their Facebook credentials. Though it doesn’t share the credentials with the third-party portals.
The Facebook API verifies your credentials & generates a unique authentication token & passes
it to the third-party app, verifying the login process.

Graph API is an HTTP based API via which apps can post on users walls, upload photos, share
events & stuff.

Basics of Facebook Graph API

As we can figure out from the name. The API is a social graph. Just like any other graph, it has
nodes & edges. There is a third element known as the field. What are they? I’ll explain.

Nodes – Nodes represent individual users. Technically they are individual objects with a unique
id which are linked with each other via edges. Other nodes in the graph are our friend’s nodes.

Edges – An edge represents a link between one node & the another.

Field – Field represents the data about an object like his name or the pages he likes & stuff.

How Do I Access the Data from Facebook Via Graph API?

We will have a look at how we can mine information

like Username, Email, Location, Website etc. from a user account
automatically using a python script.

First let's have a look at the code:

#!usr/bin/env python

# Program to mine data from your own facebook account

import json

import facebook

jv
def main():

token = "{Your Token}"

graph = facebook.GraphAPI(token)

#fields = ['first_name', 'location{location}','email','link']

profile = graph.get_object('me',fields='first_name,location,link,email')

#return desired fields

print(json.dumps(profile, indent=4))

if __name__ == '__main__':

main()

Here inside the main() method, we are trying to get the information about our own Facebook account. Let's try to understand
the code line-by-line:

• token: It is the access token required to access the page.

• In the next line we created a GraphAPI object to access API methods.

• Now, we have extracted the desired fields in a variable profile. Here, notice that 'me' in get_object() method indicates that
we are doing it for our own account.

• first_name: returns the first name of user.

• location: The person's current location as entered by them on their profile. This field is not related to check-ins.

• email: The person's primary email address listed on their profile. This field will not be returned if no valid email address
is available.

• link: A link to the person's timeline.

Besides these fields, there are numerous fields. For full list and description of fields refer the Facebook Graph API official
documentation, here.

Output:

jv
What is the Life Cycle of an Analytics Project?(develop with stories 😂)

To complete a data science/analytics project, you may have to go through

five major phases starting from understanding the problem and designing
the project, to collecting data, running analysis, presenting the results and
doing documentations and self reflection.

I have a strong feeling that running an analytics project is pretty similar to building a house. First, the

architect meets his/her client, understands their needs and comes up with an actionable blueprint

(Understand and Plan).

Then it requires collecting building blocks such as cements, steels, bricks… etc. You have to learn the

features of your building materials and choose the right materials for construction. Otherwise, you may

end up having a house that can be collapsed easily. This is like a data collection process where you have

to do some EDA or feature engineering to understand data and find the right data to solve your analytics

problems, or else you may not manage to get solid or concrete results from your analysis!

jv
With the building materials and blueprint handy, you can start building a house (Run Analysis). After

construction is finished, home Inspection and quality checks are required to ensure safety. Similarly, we

need to document our analytics project regarding the methodologies, conclusions and limitations.

Understand & Plan

If I’m asked the most critical phase of the whole cycle, I would say Understanding and Planning without

any hesitation because the main purpose of data science and analysis is not to create a project with fancy

technology, but to solve real problems. Therefore, the success of an analytics project is highly dependent

on how well you understand the situation, define the problem and translate the business questions to an

analytics question. From that standpoint, it’s always worth spending time thinking about the broader

context of your analytics project.

Analytics Plans

Before diving into the analysis, let’s come up with an analytics plan and set up another follow up meeting

to recap questions and reinforce expectations.

It will provide a high-level overview of the plan, giving a clear picture of the next steps and draw the link

between technical actions and the bigger picture from the business side. Here are some key elements in

my Analytics Plans:

Tailwind Css Starter Kit
100% (1)
Tailwind Css Starter Kit
47 pages
Data Science Practical Explanation
No ratings yet
Data Science Practical Explanation
26 pages
Data Science My Notes
No ratings yet
Data Science My Notes
61 pages
01 - 16 1 Geocoding - en
No ratings yet
01 - 16 1 Geocoding - en
3 pages
DSA Practical Workbook - LAb Manuals 18cs
No ratings yet
DSA Practical Workbook - LAb Manuals 18cs
141 pages
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
No ratings yet
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
35 pages
Unit-2 DS
No ratings yet
Unit-2 DS
10 pages
Module 2 - Final
No ratings yet
Module 2 - Final
58 pages
The Data Science Process
No ratings yet
The Data Science Process
33 pages
Python
100% (1)
Python
635 pages
UNIT 1 MongoDB Fully Complete
100% (1)
UNIT 1 MongoDB Fully Complete
60 pages
Updated Mongodb Lab Manual IV Sem
No ratings yet
Updated Mongodb Lab Manual IV Sem
48 pages
Sma Exp 3
No ratings yet
Sma Exp 3
7 pages
BDTT Lab 2023 24 Week9
No ratings yet
BDTT Lab 2023 24 Week9
26 pages
Datascience
No ratings yet
Datascience
26 pages
Lecture 3 (DS) - Steps in Data Science Process
No ratings yet
Lecture 3 (DS) - Steps in Data Science Process
57 pages
Python Record Manual
No ratings yet
Python Record Manual
18 pages
Data Mining Basics
No ratings yet
Data Mining Basics
52 pages
Big Data Visualizer Course Notes
No ratings yet
Big Data Visualizer Course Notes
20 pages
What Is Big Data Analytics
No ratings yet
What Is Big Data Analytics
3 pages
Data Mining Basics
No ratings yet
Data Mining Basics
38 pages
DS Unit 2
No ratings yet
DS Unit 2
23 pages
Machine Learning Presentation Bushra Kambo Roll No 6
No ratings yet
Machine Learning Presentation Bushra Kambo Roll No 6
18 pages
Data Analytics For IOT
No ratings yet
Data Analytics For IOT
57 pages
Dsmlusingpython
No ratings yet
Dsmlusingpython
10 pages
Data Wrangling
No ratings yet
Data Wrangling
4 pages
Guided By: Prof-P.K. Deshpande Submitted By: Kiran Zawar USN:2BA12IS020
No ratings yet
Guided By: Prof-P.K. Deshpande Submitted By: Kiran Zawar USN:2BA12IS020
27 pages
Data-Engineering Compressed
No ratings yet
Data-Engineering Compressed
20 pages
Lecture 2
No ratings yet
Lecture 2
14 pages
UNIT I - Introduction - DataScience - New
No ratings yet
UNIT I - Introduction - DataScience - New
34 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
DCPP Notes
No ratings yet
DCPP Notes
6 pages
Unit-2 Bda
No ratings yet
Unit-2 Bda
11 pages
Data Analtycs Professional-1
No ratings yet
Data Analtycs Professional-1
15 pages
Big Data
No ratings yet
Big Data
11 pages
Data Scince Report
No ratings yet
Data Scince Report
11 pages
Big Data Practical 3
No ratings yet
Big Data Practical 3
4 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Data Analytics Curriculum
No ratings yet
Data Analytics Curriculum
8 pages
Unit 2 Data Gathering
No ratings yet
Unit 2 Data Gathering
14 pages
Lec 1
No ratings yet
Lec 1
9 pages
MongoDB Data Modeling - Sample Chapter
No ratings yet
MongoDB Data Modeling - Sample Chapter
40 pages
Pcap 31 03
No ratings yet
Pcap 31 03
6 pages
Resume Building Tips by Prafful
No ratings yet
Resume Building Tips by Prafful
7 pages
TCS NQT Coding Questions, Videos & Solutions Total 134 Sessions
No ratings yet
TCS NQT Coding Questions, Videos & Solutions Total 134 Sessions
65 pages
Data Analytics Using Python
No ratings yet
Data Analytics Using Python
10 pages
Unit 4
No ratings yet
Unit 4
60 pages
Digital Anatomy Applications of Virtual Mixed and Augmented Reality - Jean François Uhl Joaquim Jorge Daniel Simões Lopes Pedro F. Campos
No ratings yet
Digital Anatomy Applications of Virtual Mixed and Augmented Reality - Jean François Uhl Joaquim Jorge Daniel Simões Lopes Pedro F. Campos
388 pages
Udacity Dandsyllabus
No ratings yet
Udacity Dandsyllabus
7 pages
PPPoE JUNOS
No ratings yet
PPPoE JUNOS
14 pages
Book
No ratings yet
Book
475 pages
PLCopen - Creating PLCopen Compliant Function Block Libraries
No ratings yet
PLCopen - Creating PLCopen Compliant Function Block Libraries
4 pages
Spotify - Face The Music (Update 2016)
No ratings yet
Spotify - Face The Music (Update 2016)
23 pages
Project Report - 32527
No ratings yet
Project Report - 32527
51 pages
XCXVXCVCX
No ratings yet
XCXVXCVCX
12 pages
S - Slave: High Performance Interchangeable Slave Interfaces Supporting All Major Industrial Networks
No ratings yet
S - Slave: High Performance Interchangeable Slave Interfaces Supporting All Major Industrial Networks
3 pages
Relational Database Management System
No ratings yet
Relational Database Management System
6 pages
QP - Digital Documentation Worksheet Grade 10
No ratings yet
QP - Digital Documentation Worksheet Grade 10
5 pages
Export Data Using R Studio
No ratings yet
Export Data Using R Studio
9 pages
c23 Embedded
No ratings yet
c23 Embedded
18 pages
AtellicaSolution Sample Handler SpecSheet FINAL 1800000004691539
No ratings yet
AtellicaSolution Sample Handler SpecSheet FINAL 1800000004691539
3 pages
Lecture 19-RDD in Spark
No ratings yet
Lecture 19-RDD in Spark
12 pages
Ceel 311 - Prelim
No ratings yet
Ceel 311 - Prelim
57 pages
DeepL Translate - The World's Most Accurate Translator
No ratings yet
DeepL Translate - The World's Most Accurate Translator
6 pages
AMP Manual Aditya
No ratings yet
AMP Manual Aditya
61 pages
OS Functions Definition of Systems and Its Information
No ratings yet
OS Functions Definition of Systems and Its Information
14 pages
LG 15
No ratings yet
LG 15
20 pages
How To Download Free VidMate
No ratings yet
How To Download Free VidMate
8 pages
Dokumen - Tips Building Blocks For Iot Devices
No ratings yet
Dokumen - Tips Building Blocks For Iot Devices
18 pages
Sayan Majumder Bcac501
No ratings yet
Sayan Majumder Bcac501
9 pages
Croon B2B Case Study - Group 1
No ratings yet
Croon B2B Case Study - Group 1
9 pages
Folha de Trabalho Padronizado
No ratings yet
Folha de Trabalho Padronizado
3 pages
Syllabus of Python Course in Bangalore: Module 1: An Introduction To Python
No ratings yet
Syllabus of Python Course in Bangalore: Module 1: An Introduction To Python
7 pages
Technical Project
No ratings yet
Technical Project
1 page
Study Guide MO-500 Certification Exam Microsoft Access Expert ( Office 2019)
From Everand
Study Guide MO-500 Certification Exam Microsoft Access Expert ( Office 2019)
Anand Vemula
No ratings yet
Access 2016: Up To Speed
From Everand
Access 2016: Up To Speed
R.M. Hyttinen
5/5 (2)
Oracle Warehouse Builder 11g: Getting Started
From Everand
Oracle Warehouse Builder 11g: Getting Started
Bob Griesemer
No ratings yet
Getting Started with SQL Server 2012 Cube Development
From Everand
Getting Started with SQL Server 2012 Cube Development
Simon Lidberg
No ratings yet
IBM Cognos Business Intelligence
From Everand
IBM Cognos Business Intelligence
Dustin Adkison
No ratings yet
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet
Oracle Database Administration Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
From Everand
Oracle Database Administration Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
Vibrant Publishers
5/5 (1)
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
Be Data Curious!: Be Data Curious!, #1
From Everand
Be Data Curious!: Be Data Curious!, #1
Nick Jewell
No ratings yet
Oracle Quick Guides: Part 2 - Oracle Database Design
From Everand
Oracle Quick Guides: Part 2 - Oracle Database Design
Malcolm Coxall
No ratings yet
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet

DS Retest

Uploaded by

DS Retest

Uploaded by

2 MARK

What is data extraction?

What is Mongo DB?

What is the Facebook API?

What Is Keyword Analysis?

What is Feature Extraction?

Process of data cleaning:

Step 2: Fix structural errors

Step 3: Filter unwanted outliers

Step 4: Handle missing data

• Does the data make sense?

• If not, is that because of a data quality issue?

2)Store and access social data

Setting up the environment

MongoDB using Python

PyMongo can be installed using the following command:

pip install pymongo

from pymongo import MongoClient

To access a database named scrapper we simply have to do the following:

Now, we will see how to perform different operations:

for _ in range(0, 10):

# each document must be of the python type dict

"comment": ["...", ... ]

for doc in docs:

• Find: To fetch all documents within a collection:

# the data from the database

docs = [d for d in db.collection.find()]

count = db.collection.count() # getting the total number

of documents in the collection

while iteration * batch_size < count:

docs = [d for d in db.collection.find().skip(batch_size *

query = {'author': 'Jean Francois'}

docs = [d for d in db.collection.find(query)

query = {'author': {'$exists': True, '$ne': None}}

docs = [d for d in db.collection.find(query)]

query_update = {'$set': {'published': True}}

• Remove: Remove all documents where the author is Jean Francois:

Or remove the first matching document:

• Drop: You can drop collections by the following:

Or you can drop the whole database:

Detail about Mongo db using python:

from pymongo import MongoClient

client = MongoClient('localhost', 27017)

# Creating a database name GFG

print("Database is created !!")

List of Databases in MongoDB shell (after) :

4)Elaborate about Maximum shares and Maximum Likes in Fb

• The number of likes of your URL

Here’s what we will cover as we explore these ideas:

• ‘Likes’ are the quick and easy way to share

‘Likes’ are the quick and easy way to share

• white or African American with 95% accuracy

June 2015 – Time spent viewing stories is added as a factor

July 2015 – Introduction of more user controls over news feeds

2.RegExr: Google analytics supports 13 regular expressions. Regular expressions can be

4.Real-Time Reporting: The real-time reporting tool allows an organization to monitor

Basics of Facebook Graph API

How Do I Access the Data from Facebook Via Graph API?

We will have a look at how we can mine information

First let's have a look at the code:

# Program to mine data from your own facebook account

token = "{Your Token}"

#fields = ['first_name', 'location{location}','email','link']

#return desired fields

• token: It is the access token required to access the page.

• In the next line we created a GraphAPI object to access API methods.

• first_name: returns the first name of user.

• link: A link to the person's timeline.

To complete a data science/analytics project, you may have to go through

(Understand and Plan).

Understand & Plan

context of your analytics project.

to recap questions and reinforce expectations.

You might also like