Data Science With Python - Lesson 01 - Data Science Overview
Data Science With Python - Lesson 01 - Data Science Overview
Data Science
When you combine domain expertise and scientific methods with technology, you get Data Science.
Data
Science
Data Processing Library
Tools
Domain Expertise and Scientific Methods
Data Scientists collect, explore, analyze, and visualize data. They apply mathematical and statistical models
to find patterns and solutions in the data.
Analysis
• Descriptive: Study a dataset to decipher the
details
• Predictive: Create a model based on existing
Mathematical Scientific
and Statistical Tools and information to predict outcome and behavior
Models Methods
• Prescriptive: Suggest actions for a given
situation using the collected information
Data Processing and Analytics
Modern tools and technologies have made data processing and analytics faster and efficient.
Library
Data Processing
Tools
Data analysis that uses only technology and domain knowledge without mathematical and
!
statistical knowledge often leads to incorrect patterns and wrong interpretations.
This can cause serious damage to businesses.
Roles and Responsibilities of a Data Scientist
Role of a Data Scientist
Basic Skills of a Data Scientist
Data Scientists work with different types of datasets for various purposes. Now that Big Data is generated every
second through different media, the role of Data Science has become more important.
3 Vs of Big Data
Big Data is a huge collection of data stored on distributed systems/machines popularly referred
to as Hadoop clusters.
Data Science helps extract information from the data and build information-driven enterprises.
Applications of Data Science
Different Sectors Using Data Science
Various sectors use Data Science to extract the information they need to create different services and products.
Using Data Science: Social Network Platforms
LinkedIn uses data points from its users to provide them with relevant digital services and data products.
Profil
e
Groups
Locatio
n
Digital
Information
Data Points
Services
Connections
Data
Products
Post
Likes
Using Data Science: Search Engines
Google uses Data Science to provide relevant search recommendations as the user types a query.
Search keyword
Wearable devices use Data Science to analyze data gathered by their biometric sensors.
A loan manager can easily access and sift through a loan applicant’s financial details using Data Science.
The governments in different countries share large datasets from various domains with the public.
Data.gov is a website hosted and maintained by the U.S. government.
Sectors/Domains
The Real Challenge
Python deals with each stage of data analytics efficiently by applying different libraries and packages.
Acquire
Wrangle
Explore
Model
Data
Analytics Visualize
Bokeh
Python Tools and Technologies
Python is a general purpose, open source programming language that lets you work quickly and integrate
systems more effectively.
Benefits of Python
Easy to learn
Open source
Python is supported by well-established data platforms and processing frameworks that help analyze data in a
simple and efficient way.
Data Scientist
Big Data
Knowledge Check
Knowledge
Check
A Data Scientist _____.
1
b. Acquires data
b. Acquires data
A Data Scientist asks the right questions to the stakeholders, acquires data from various sources and data points,
performs data wrangling that makes the data available for analysis, and creates reports and plots for data
visualization.
Knowledge
Check
The search engine’s autocomplete feature identifies unique and verifiable users who
search for a particular keyword or phrase to_____. Select all that apply.
2
The search engine’s autocomplete feature identifies unique and verifiable users who search for a particular
keyword or phrase to build a query volume. It also helps identify the users’ locations and tag them to the query,
enabling it to be location-specific.
Knowledge
Check What is the sequential flow of Data Analytics?
3
In Data Analytics, the data is acquired from various sources and is then wrangled to ease its analysis. This is
followed by data exploration and data modeling. The final stage is data visualization, where the data is presented
and the patterns are identified.
Key Takeaways