Week 1 Explore The Use Case and Analyze The Dataset
Week 1 Explore The Use Case and Analyze The Dataset
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.
Artificial
Intelligen
ce
Machin
e
Learnin
g
Deep
Learnin
g
AI, ML, DL, data science…?
Artificial
Intelligen
ce
D
Machin o Mathemati
e m cs
Learnin Data a
i Statistic
g Deep Scienc n s
Learnin e Visualizatio
g k
n n
o Programmi
w ng
l
e
d
g
e
Practical Data
Science?
Practical data science
Massive data
sets
Extrac Knowledge +
t Insight
… in the
Cloud?
Practical data science in the cloud
⠇
“It's ok.”
⠇
“It arrived
damaged.
Going to return.”
Working with product reviews data
It arrived -1 (negative)
damaged, going
to return
Data
Ingestion &
Exploration
Ingest data into data lakes
# Create a database in
AWS Glue the # AWS Glue Data
Data Catalog
Catalog wr.catalog.create_databas
Name reviews e(
name=...)
Database dsoaws_deep_learning
● Based on Presto
● No infrastructure
setup / no data
movement required
Data
Visualizati
on
Popular Python data analysis &
visualization tools
Python visualization
summary = df["num_words"].describe( code
percentiles=[0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90,
1.00])
df["num_words"].plot.hist(
xticks=[0, 16, 32, 64, 128, 256], bins=100,
range=[0, 256]).axvline(x=summary["100%"],
c="red")
What is the distribution of review lengths?
(number of words)
mean 52.51
std 31.38
min 1.00
10% 10.00
20% 22.00
30% 32.00
40% 41.00
50% 51.00
60% 61.00
70% 73.00
80% 88.00
90% 97.00
100% 115.00