100% found this document useful (5 votes)
473 views

Data Science With Python - Lesson 01 - Data Science Overview

Data science involves analyzing large amounts of data from various sources to discover patterns and extract meaningful information. It combines aspects of statistics, mathematics, programming, and data visualization. A data scientist's responsibilities include collecting and exploring data, applying statistical and mathematical models, visualizing results, and communicating findings to stakeholders. Data science is used across many fields like social media, search engines, healthcare, finance, and government to provide personalized services and products, make recommendations, enable predictive analytics, and help decision making. Python is a commonly used programming language in data science.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (5 votes)
473 views

Data Science With Python - Lesson 01 - Data Science Overview

Data science involves analyzing large amounts of data from various sources to discover patterns and extract meaningful information. It combines aspects of statistics, mathematics, programming, and data visualization. A data scientist's responsibilities include collecting and exploring data, applying statistical and mathematical models, visualizing results, and communicating findings to stakeholders. Data science is used across many fields like social media, search engines, healthcare, finance, and government to provide personalized services and products, make recommendations, enable predictive analytics, and help decision making. Python is a commonly used programming language in data science.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Data Science with Python

Data Science Overview


Learning Objectives

By the end of this lesson, you will be able to:

Define Data Science

Discuss the roles and responsibilities of a Data Scientist

List various applications of Data Science

Explain Data Science importance

Describe Python and its importance


Data Science
What Is Data Science?

Some common definitions of Data Science are:

A powerful new approach to An automated way to analyze


make discoveries from data enormous amounts of data and
extract information

Data Science

A new discipline that combines the aspects of statistics, mathematics,


programming, and visualization to turn data into information
Components of Data Science

When you combine domain expertise and scientific methods with technology, you get Data Science.

Domain Expertise and Scientific Technology


Methods Operating
Systems
Analysis Python Application
Mathematical Scientific Language Design
and Statistical Tools and
Models Methods

Data
Science
Data Processing Library
Tools
Domain Expertise and Scientific Methods

Data Scientists collect, explore, analyze, and visualize data. They apply mathematical and statistical models
to find patterns and solutions in the data.

Data analysis can be:

Analysis
• Descriptive: Study a dataset to decipher the
details
• Predictive: Create a model based on existing
Mathematical Scientific
and Statistical Tools and information to predict outcome and behavior
Models Methods
• Prescriptive: Suggest actions for a given
situation using the collected information
Data Processing and Analytics

Modern tools and technologies have made data processing and analytics faster and efficient.

These technologies help Data Scientists to: Technology


• Build and train machine learning models
Operating System
• Manipulate data with technology
Python
• Build data tools, applications, and services Language Application
Design
• Extract information from data

Library
Data Processing
Tools

Data analysis that uses only technology and domain knowledge without mathematical and
!
statistical knowledge often leads to incorrect patterns and wrong interpretations.
This can cause serious damage to businesses.
Roles and Responsibilities of a Data Scientist
Role of a Data Scientist
Basic Skills of a Data Scientist

A Data Scientist should be able to:


• Ask the right questions
• Understand data structure
• Interpret and wrangle data
• Apply statistical and mathematical methods
• Visualize data and communicate with stakeholders
• Work as a team player
Sources of Big Data

Data Scientists work with different types of datasets for various purposes. Now that Big Data is generated every
second through different media, the role of Data Science has become more important.
3 Vs of Big Data

Volume Enormous amount of data generated from various sources

Large amount of data streaming in at great speeds, which requires quick


Velocity
data processing

Variety Different formats of data: Structured, Semi-structured, and Unstructured

Big Data is a huge collection of data stored on distributed systems/machines popularly referred
to as Hadoop clusters.
Data Science helps extract information from the data and build information-driven enterprises.
Applications of Data Science
Different Sectors Using Data Science

Various sectors use Data Science to extract the information they need to create different services and products.
Using Data Science: Social Network Platforms

LinkedIn uses data points from its users to provide them with relevant digital services and data products.

Profil
e

Groups

Locatio
n
Digital

Information
Data Points
Services
Connections

Data
Products

Post

Likes
Using Data Science: Search Engines

Google uses Data Science to provide relevant search recommendations as the user types a query.

Search keyword

Autocomplete feature is fed by data models (Machine Learning)

Fast and real-time analytics is made Influencing Factors


possible by modern and advanced 1. Query Volume – Unique and verifiable users
infrastructure, tools, and technologies 2. Geographical locations
3. Keyword/phrase matches on the web
4. Some scrubbing for inappropriate content
Using Data Science: Healthcare

Wearable devices use Data Science to analyze data gathered by their biometric sensors.

Biometric data IoT Gateway Data transfer to Enterprise


transfer Servers Infrastructure
Wearable device

Make informed Data Analytics


decisions Engagement
Dashboard
Using Data Science: Finance

A loan manager can easily access and sift through a loan applicant’s financial details using Data Science.

Data transfer to Enterprise


Loan Application Servers Infrastructure
Portal
Loan Applicant

Credit report, Credit history,


Approved amount, Risk etc.
Make informed
decisions Data Analytics
Engagement
Dashboard
Using Data Science: Public Sector

The governments in different countries share large datasets from various domains with the public.
Data.gov is a website hosted and maintained by the U.S. government.

Large collection of datasets

Sectors/Domains
The Real Challenge

Some of the challenges Data Scientists face in the real


world are:
• Data quality doesn’t conform to the set standards
• Data integration is a complex task
• Data is distributed into large clusters in HDFS, which is
difficult to integrate and analyze
• Unstructured and semi-structured data are harder to
analyze
Python
Data Analytics and Python

Python deals with each stage of data analytics efficiently by applying different libraries and packages.

Acquire

Wrangle

Explore

Model

Data
Analytics Visualize
Bokeh
Python Tools and Technologies

Python is a general purpose, open source programming language that lets you work quickly and integrate
systems more effectively.
Benefits of Python

Easy to learn

Open source

Efficient and multi-platform


multi platform
support
Huge collection of libraries,
functions,
functions and
and modules
modules

Big open source community

Integrates well with enterprise


apps and systems

Great vendor and product support


Big Data Platforms and Processing Frameworks for Python

Python is supported by well-established data platforms and processing frameworks that help analyze data in a
simple and efficient way.

Data Scientist

Python Tool for data analysis and processing

Big Data Processing Framework

Enterprise Big Data Platform

Big Data
Knowledge Check
Knowledge
Check
A Data Scientist _____.
1

a. Asks the right questions

b. Acquires data

c. Performs data wrangling and data visualization

d. All of the above


Knowledge
Check
A Data Scientist _____.
1

a. Asks the right questions

b. Acquires data

c. Performs data wrangling and data visualization

d. All of the above

The correct answer is d

A Data Scientist asks the right questions to the stakeholders, acquires data from various sources and data points,
performs data wrangling that makes the data available for analysis, and creates reports and plots for data
visualization.
Knowledge
Check
The search engine’s autocomplete feature identifies unique and verifiable users who
search for a particular keyword or phrase to_____. Select all that apply.
2

a. Scrub inappropriate content

b. Build a query volume

c. Tag the location to a query

d. Find similar instances on the web


Knowledge
Check
The search engine’s autocomplete feature identifies unique and verifiable users who
search for a particular keyword or phrase to_____. Select all that apply.
2

a. Scrub inappropriate content

b. Build a query volume

c. Tag the location to a query

d. Find similar instances on the web

The correct answer is b, c

The search engine’s autocomplete feature identifies unique and verifiable users who search for a particular
keyword or phrase to build a query volume. It also helps identify the users’ locations and tag them to the query,
enabling it to be location-specific.
Knowledge
Check What is the sequential flow of Data Analytics?
3

a. Data wrangling, exploration, modeling, acquisition, and visualization

b. Data exploration, acquisition, modeling, wrangling, and visualization

c. Data acquisition, wrangling, exploration, modeling, and visualization

d. Data modeling, acquisition, exploration, wrangling, and visualization


Knowledge
Check What is the sequential flow of Data Analytics?
3

a. Data wrangling, exploration, modeling, acquisition, and visualization

b. Data exploration, acquisition, modeling, wrangling, and visualization

c. Data acquisition, wrangling, exploration, modeling, and visualization

d. Data modeling, acquisition, exploration, wrangling, and visualization

The correct answer is c

In Data Analytics, the data is acquired from various sources and is then wrangled to ease its analysis. This is
followed by data exploration and data modeling. The final stage is data visualization, where the data is presented
and the patterns are identified.
Key Takeaways

You are now able to:

Define Data Science

Discuss the roles and responsibilities of a Data Scientist

List various applications of Data Science

Explain Data Science importance

Describe Python and its importance


Thank You

You might also like