0% found this document useful (0 votes)

3 views30 pages

Week 1 Explore The Use Case and Analyze The Dataset

The document outlines the copyright notice for slides distributed under a Creative Commons License by DeepLearning.AI for educational purposes. It discusses practical data science in the cloud, including data ingestion, exploration, and machine learning workflows using AWS tools. Additionally, it covers popular machine learning tasks, sentiment analysis of product reviews, and data visualization techniques.

Uploaded by

Raish S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views30 pages

Week 1 Explore The Use Case and Analyze The Dataset

Uploaded by

Raish S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Copyright Notice

These slides are distributed under the Creative Commons License.

DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.

For the rest of the details of the license, see https://creativecommons.org/licenses/by-sa/2.0/legalcode

Practical
Data
Science
Explore the Use
Case and Analyze
the Dataset
Practical
Data
Science in
the Cloud
Introductio
n
AI, ML, DL, data science…?

Artificial
Intelligen
ce

Machin
e
Learnin
g

Deep
Learnin
g
AI, ML, DL, data science…?

Artificial
Intelligen
ce
D
Machin o Mathemati
e m cs
Learnin Data a
i Statistic
g Deep Scienc n s
Learnin e Visualizatio
g k
n n
o Programmi
w ng
l
e
d
g
e
Practical Data
Science?
Practical data science

Massive data
sets
Extrac Knowledge +
t Insight
… in the
Cloud?
Practical data science in the cloud

Store & Large data

process any science and
amount of data ML toolbox

Scale up Scale Elastic

out infrastructure

Limited Local Notebook / Prototype

by existing
hardware
Data science and ML
toolbox
Machine Learning Workflow
Ingest Prepare Train Deploy
& & & &
Analyz Transfor Tune Manag
e
Data m
Feature Automated e
Model
exploration engineering ML deployment
Bias Feature Model train and Automated
detection store tune pipelines

Amazon S3 & Amazon Amazon Amazon

Amazon SageMaker Data SageMaker SageMaker
Athena Wrangler Autopilot Endpoints
AWS Amazon Amazon Amazon
Glue SageMaker SageMaker SageMaker Batch
Amazon Processing Jobs Training & Transform
SageMaker Data Amazon Debugger
Amazon SageMaker Amazon
Wrangler SageMaker Hyperparameter SageMaker
& Clarify Feature Store Tuning Pipelines
Machine Learning Workflow
Ingest Prepare Train Deploy
& & & &
Analyz Transfor Tune Manag
e
Data m
Feature Automated e
Model
exploration engineering ML deployment
Bias Feature Model train and Automated
detection store tune pipelines

Amazon S3 & Amazon Amazon Amazon

Amazon SageMaker Data SageMaker SageMaker
Athena Wrangler Autopilot Endpoints
AWS Amazon Amazon Amazon
Glue SageMaker SageMaker SageMaker Batch
Amazon Processing Jobs Training & Transform
SageMaker Data Amazon Debugger
Amazon SageMaker Amazon
Wrangler SageMaker Hyperparameter SageMaker
& Clarify Feature Store Tuning Pipelines
Use Case
and
Dataset
Introductio
n
Popular ML tasks and learning
paradigms

Classificati Clusterin Image Text

on & g Processing Analysis
Regression
Supervis Unsupervis Computer NLP /
ed ed Vision NLU
Multi-class classification for sentiment
analysis of product reviews

“I simply love it!”

⠇
“It's ok.”

⠇
“It arrived
damaged.
Going to return.”
Working with product reviews data

Input feature Label for

for model model
training training
Review Text Sentiment

I simply love it! 1 (positive)

It's ok. 0 (neutral)

It arrived -1 (negative)
damaged, going
to return
Data
Ingestion &
Exploration
Ingest data into data lakes

● Centralized and secure

repository
● Store, discover and share data
at any scale
○ structured relational data
○ semi-structured data
○ unstructured data
○ streaming data
● Governance
Data lakes on Amazon S3

● Amazon Simple Storage

Data Analytic Machin Service (Amazon S3)
Warehousin s e
g Learnin ● Object storage
g
● Durable, available, exabyte
scale
● Secure, compliant,
Amazon S3 auditable
AWS Data Wrangler

● Open source Python !pip install

library awswrangler
● Connects pandas import awswrangler as
wr
DataFrames and AWS import pandas as pd
data services
○ data lakes # Retrieving the data directly from
● Load/unload data from Amazon S3
○ data
df = wr.s3.read_csv(
warehouses
path='s3://bucket/prefix/')
○ databases
Register data with AWS Glue Data
Catalog
● Creates reference to
data ("S3-to-table"
AWS Glue mapping)
Data
Catalog ● Just metadata / schema
Name reviews stored in tables
● No data is moved
Database dsoaws_deep_learning
● AWS Glue Crawlers can
Classification csv
be
○ set up to
infer data
Location s3://<bucket>/ automatically
<prefix>
schema
○ update data
catalog
Register data with AWS Glue Data
Catalog import awswrangler as wr

# Create a database in
AWS Glue the # AWS Glue Data
Data Catalog
Catalog wr.catalog.create_databas
Name reviews e(
name=...)
Database dsoaws_deep_learning

Classification csv # Create CSV table (metadata only) in

the # AWS Glue Data Catalog
Location s3://<bucket>/ wr.catalog.create_csv_table(
<prefix>
table=...,
column_types=..
.,
...)
Query data with Amazon Athena
● Query data in S3 import awswrangler as Python
wr
● Using SQL # Create Amazon Athena S3
Amazo
bucket
n ● No infrastructure to set
Athen wr.athena.create_athena_bucket
up ()
a # Execute SQL query on Amazon
● Schema lookup in Athena
AWS Glue Data df =
Catalog wr.athena.read_sql_query
( sql=...,
● No data to load database=...)
'SELECT product_category FROM SQL
reviews'
Query data with Amazon Athena

● Complex analytical queries

● Gigabytes > Terabytes >

Petabytes
● Scales automatically

● Runs queries in parallel

● Based on Presto
● No infrastructure
setup / no data
movement required
Data
Visualizati
on
Popular Python data analysis &
visualization tools

pip install pandas pip install numpy

pip install pip install seaborn

matplotlib
How many reviews are in each sentiment
class?
SELECT sentiment, COUNT(*) AS SQL Query
count_sentiment
FROM dsoaws_deep_learning.reviews
GROUP BY sentiment
ORDER BY sentiment DESC, count_sentiment

import matplotlib.pyplot as Python visualization

plt
chart = df.plot.bar( code
x="sentiment",
y="count_sentiment
")
plt.xlabel("sentiment
") plt.show(chart)
How many reviews are in each sentiment
class?
What is the distribution of review lengths?
(number of words)

SELECT CARDINALITY(SPLIT(review_body, ' ')) as SQL Query

num_words
FROM dsoaws_deep_learning.reviews

Python visualization
summary = df["num_words"].describe( code
percentiles=[0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90,
1.00])
df["num_words"].plot.hist(
xticks=[0, 16, 32, 64, 128, 256], bins=100,
range=[0, 256]).axvline(x=summary["100%"],
c="red")
What is the distribution of review lengths?
(number of words)
mean 52.51
std 31.38
min 1.00
10% 10.00
20% 22.00
30% 32.00
40% 41.00
50% 51.00
60% 61.00
70% 73.00
80% 88.00
90% 97.00
100% 115.00

AWS Certified ML Engineer Associate Slides
No ratings yet
AWS Certified ML Engineer Associate Slides
861 pages
AWS Summit Mumbai Keynote
No ratings yet
AWS Summit Mumbai Keynote
101 pages
Building A Smarter and More Effective Business Using Aiml On Aws
No ratings yet
Building A Smarter and More Effective Business Using Aiml On Aws
36 pages
Aws Sagemaker
No ratings yet
Aws Sagemaker
18 pages
Lab1-01 - Amazon Sagemaker Data Wrangling and Features Storel
No ratings yet
Lab1-01 - Amazon Sagemaker Data Wrangling and Features Storel
47 pages
AWS Data Lake
No ratings yet
AWS Data Lake
118 pages
Module 3 Aws
No ratings yet
Module 3 Aws
132 pages
AWS ML Cheat Sheet Nov 2024
No ratings yet
AWS ML Cheat Sheet Nov 2024
100 pages
AWSCertified MLSlides
No ratings yet
AWSCertified MLSlides
450 pages
ML
No ratings yet
ML
38 pages
Mining Public Datasets
100% (1)
Mining Public Datasets
45 pages
ML Certificate Preparation (Last Version)
No ratings yet
ML Certificate Preparation (Last Version)
288 pages
Building Serverless Analytics Pipelines With AWS Glue - Tom McMeekin-1
No ratings yet
Building Serverless Analytics Pipelines With AWS Glue - Tom McMeekin-1
39 pages
AIF-C01 (87 Questions)
No ratings yet
AIF-C01 (87 Questions)
79 pages
Machine Learning in Practice 1111656172 180813160029
No ratings yet
Machine Learning in Practice 1111656172 180813160029
50 pages
06 Cloud Computing AWSAcademy Lab
No ratings yet
06 Cloud Computing AWSAcademy Lab
19 pages
BDA BigDataArchitecturesAndModelManagement
No ratings yet
BDA BigDataArchitecturesAndModelManagement
48 pages
Data Engineering by AWS
100% (1)
Data Engineering by AWS
11 pages
M6 - Custom Model Building With Cloud AutoML Slides
No ratings yet
M6 - Custom Model Building With Cloud AutoML Slides
31 pages
PSO Data Analytics Day 1
100% (1)
PSO Data Analytics Day 1
106 pages
(25D1S01) - Keynote - AWS AI의 핵심 트렌드와 비즈니스 혁신
No ratings yet
(25D1S01) - Keynote - AWS AI의 핵심 트렌드와 비즈니스 혁신
40 pages
AWS Machine Learning Specialty
100% (1)
AWS Machine Learning Specialty
67 pages
AWS ML Exam Notes - Important
No ratings yet
AWS ML Exam Notes - Important
20 pages
CCD Unit 3
No ratings yet
CCD Unit 3
8 pages
Modernserverlessdatalak
No ratings yet
Modernserverlessdatalak
45 pages
Gravimetic Feeders
100% (1)
Gravimetic Feeders
26 pages
File 11
No ratings yet
File 11
6 pages
Amazon SageMaker DataWrangler Deep Dive Deck
No ratings yet
Amazon SageMaker DataWrangler Deep Dive Deck
30 pages
ADM202 Use AWS Generative AI and Machine Learning With Salesforce Data Cloud
No ratings yet
ADM202 Use AWS Generative AI and Machine Learning With Salesforce Data Cloud
23 pages
Deepdive On Amazon Sagemaker and Aws Reinvent New Features
No ratings yet
Deepdive On Amazon Sagemaker and Aws Reinvent New Features
31 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
30 pages
Reinvent Online Recap 2018 v5 425675166 190103174030 PDF
No ratings yet
Reinvent Online Recap 2018 v5 425675166 190103174030 PDF
50 pages
Deloitte Take Home Challenge - V2
No ratings yet
Deloitte Take Home Challenge - V2
83 pages
NorthBays CRISP Artificial Data Lakes
No ratings yet
NorthBays CRISP Artificial Data Lakes
149 pages
AWS ML Notes - Domain Misc
No ratings yet
AWS ML Notes - Domain Misc
15 pages
Top AWS Services For ML
No ratings yet
Top AWS Services For ML
8 pages
Solutions Training For Partners: Machine Learning (ML) On AWS For ML Practitioners (Technical) Resource Guide
No ratings yet
Solutions Training For Partners: Machine Learning (ML) On AWS For ML Practitioners (Technical) Resource Guide
6 pages
3AWS - Cloud Services Demo and Use Case
No ratings yet
3AWS - Cloud Services Demo and Use Case
43 pages
Basic Terms of DATA ENGINEERING
No ratings yet
Basic Terms of DATA ENGINEERING
9 pages
HW14
No ratings yet
HW14
5 pages
Lab Aws 14-10
100% (1)
Lab Aws 14-10
25 pages
Archived: Deep Learning On AWS
No ratings yet
Archived: Deep Learning On AWS
51 pages
Introduction To AWS SageMaker
100% (1)
Introduction To AWS SageMaker
52 pages
CCD Chapter 3 Notes
No ratings yet
CCD Chapter 3 Notes
11 pages
Data Engineering Notes
No ratings yet
Data Engineering Notes
11 pages
Slides Rethink Machine Learning For Regulated Industries
No ratings yet
Slides Rethink Machine Learning For Regulated Industries
30 pages
DocScanner 20 Oct 2024 2-19 PM
No ratings yet
DocScanner 20 Oct 2024 2-19 PM
16 pages
AWS Innovate 2023 - AIML Edition - Closing Keynote
No ratings yet
AWS Innovate 2023 - AIML Edition - Closing Keynote
15 pages
AWS Summary
No ratings yet
AWS Summary
4 pages
Power Machine Learning at Scale: Mapping Parallelized Modeling-to-HPC Infrastructure On AWS
No ratings yet
Power Machine Learning at Scale: Mapping Parallelized Modeling-to-HPC Infrastructure On AWS
20 pages
Accelerate Machine Learning Innovation With The Right Cloud Services and Infrastructure
No ratings yet
Accelerate Machine Learning Innovation With The Right Cloud Services and Infrastructure
17 pages
ECS Concepts and Features-Participant Guide
No ratings yet
ECS Concepts and Features-Participant Guide
132 pages
OUDREY THOMAS ASSIGNMENT AWS - Oudrey
No ratings yet
OUDREY THOMAS ASSIGNMENT AWS - Oudrey
2 pages
APC Building Data Lakes On AWS SG
No ratings yet
APC Building Data Lakes On AWS SG
187 pages
Aif c01 Demo
No ratings yet
Aif c01 Demo
7 pages
Dinellie D - Assignment
No ratings yet
Dinellie D - Assignment
1 page
Be Form 2 School Work Plan
100% (1)
Be Form 2 School Work Plan
3 pages
Topic Cheatsheet For GCP's Professional Machine Learning Engineer Beta Exam
No ratings yet
Topic Cheatsheet For GCP's Professional Machine Learning Engineer Beta Exam
2 pages
SPE-199091-MS, Electric Submersible Pump Troubleshooting Guide, An Effective Way To Improve System Performance and Reduce Avoidable System Failues
100% (1)
SPE-199091-MS, Electric Submersible Pump Troubleshooting Guide, An Effective Way To Improve System Performance and Reduce Avoidable System Failues
18 pages
AWS Innovate AIML Edition 2022
No ratings yet
AWS Innovate AIML Edition 2022
1 page
MT Standard Safety Sign Checklist For 132kV SS
100% (1)
MT Standard Safety Sign Checklist For 132kV SS
18 pages
Mono Pump 80 - Manual
No ratings yet
Mono Pump 80 - Manual
162 pages
CSI 4500 Datasheet PDF
No ratings yet
CSI 4500 Datasheet PDF
16 pages
Quran Fonts
0% (1)
Quran Fonts
8 pages
1-Introduction To Algorithms and C Programming
No ratings yet
1-Introduction To Algorithms and C Programming
50 pages
IPCC Inventory Software Manual
No ratings yet
IPCC Inventory Software Manual
66 pages
Sample MA Due Diligence Issues Report
No ratings yet
Sample MA Due Diligence Issues Report
10 pages
A List of All My Torrents
No ratings yet
A List of All My Torrents
3 pages
Espan140 Solution 54860159 8697
No ratings yet
Espan140 Solution 54860159 8697
39 pages
Dca
No ratings yet
Dca
8 pages
8051 UNIT 1-Material
No ratings yet
8051 UNIT 1-Material
38 pages
Mbeya University of Science and Technology: Admission Requirements
No ratings yet
Mbeya University of Science and Technology: Admission Requirements
15 pages
Karl George EMG
No ratings yet
Karl George EMG
2 pages
Integrating PCA With Deep Learning Models For Stock Market Forecasting
No ratings yet
Integrating PCA With Deep Learning Models For Stock Market Forecasting
13 pages
JS7 ClassNotes
No ratings yet
JS7 ClassNotes
5 pages
Lecture 1: Cryptography: 1.2.1 Symmetric Case
No ratings yet
Lecture 1: Cryptography: 1.2.1 Symmetric Case
3 pages
Argus 40 Optical Swing Lane Data Sheet
No ratings yet
Argus 40 Optical Swing Lane Data Sheet
4 pages
Resume: Lokam Srikanth Contact No: +91 8463931010
No ratings yet
Resume: Lokam Srikanth Contact No: +91 8463931010
2 pages
Itri 613 Database Systems Assignment 1 29435927
No ratings yet
Itri 613 Database Systems Assignment 1 29435927
9 pages
13-13, Connection Box EJB 5380
No ratings yet
13-13, Connection Box EJB 5380
1 page
Computational Fluid Dynamic Analysis of Innovative Design of Solar-Biomass Hybrid Dryer
No ratings yet
Computational Fluid Dynamic Analysis of Innovative Design of Solar-Biomass Hybrid Dryer
12 pages
PKG List (Submit To Mr. Jeong)
No ratings yet
PKG List (Submit To Mr. Jeong)
6 pages
Brakes Volvo Trucks
No ratings yet
Brakes Volvo Trucks
2 pages
Nighthawk Ac1900 Wifi Usb Adapter-Usb 3.0, Dual Band: Performance & Use
No ratings yet
Nighthawk Ac1900 Wifi Usb Adapter-Usb 3.0, Dual Band: Performance & Use
4 pages
ADR Sabre
No ratings yet
ADR Sabre
2 pages
BMC Bos
No ratings yet
BMC Bos
1 page
Python AI Programming: Navigating fundamentals of ML, deep learning, NLP, and reinforcement learning in practice
From Everand
Python AI Programming: Navigating fundamentals of ML, deep learning, NLP, and reinforcement learning in practice
Patrick J
No ratings yet
Python AI Programming
From Everand
Python AI Programming
Patrick J
No ratings yet
AWS Certified Machine Learning - Specialty (MLS-C01) Certification Guide: The ultimate guide to passing the MLS-C01 exam on your first attempt
From Everand
AWS Certified Machine Learning - Specialty (MLS-C01) Certification Guide: The ultimate guide to passing the MLS-C01 exam on your first attempt
Somanath Nanda
No ratings yet
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
From Everand
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
Matthew Rosch
No ratings yet