0% found this document useful (0 votes)
3 views

1 DataScienceOverview

Uploaded by

6633171921
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

1 DataScienceOverview

Uploaded by

6633171921
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

+

Introduction to Data Science

Assoc. Prof. Peerapon Vateekul, Ph.D.


* Part of this slide is modified from a slide of Prof.Natawut Department of Computer Engineering,
Faculty of Engineering, Chulalongkorn University
[email protected]
www.cp.eng.chula.ac.th/~peerapon/
+ 2

Outline

n Introduction
n Data is important
n Data Science Definition by Dr.Virote
n Data Science Definition by Aj.Natawut

n Big Data

n Data Science Process & Data Science Trend


+
Introduction

3
+ 4

Data is important (in 2017)

n Alphabet (Google’s parent


company), Amazon, Apple,
Facebook and Microsoft
n $25bn in net profit in the first
quarter of 2017
n Amazon captures half of all
dollars spent online in America.

n Google and Facebook


accounted for almost all the
revenue growth in digital
advertising in America last year

https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data
+ 5

Data is important (in 2018)! (cont.)

Data Science
(AI,ML,DM)

https://www.epmag.com/new-oil-1720651
+ 6

Who analyzes these data!


+ 7

What is Data Science?

n Data
n Facts and statistics collected for reference or analysis

n Science
n A systematic study through observation and experiment

n Data Science
n The scientific exploration of data to extract meaning or insight,
n and the construction of software to utilize such insight in a business context.

Data Data Data


Data Product
Preparation Analysis Visualization
+ 8

What is Data Science? (cont.)

1. Transform data into valuable insights

2. Transform data into data products

3. Transform data into interesting stories

Code Mania 2 (01), Jan-2015


+ 9

1) Transform data into valuable insights


+ 10

1) Transform data into valuable insights (cont.)

Code Mania 2, Jan-2015

http://nypost.com/2016/12/05/amazon-introduces-next-major-job-killer-to-face-americans/
+ 11

2) Transform data into data products


+ 12

3) Transform data into interesting stories


Consumer Price Index (CPI) - Inflation

http://www.thebillionpricesproject.com/
+ 13
+ 14

https://www.hbs.edu/faculty/Publication%20Files/BPP_JEP_m_13b5e009-4162-4f2c-b507-593a9a98c082.pdf
+ 15

Google Flu Trend

Ginsberg, Jeremy; Mohebbi, Matthew H.; Patel, Rajan S.; Brammer, Lynnette;
Smolinski, Mark S.; Brilliant, Larry (19 February 2009). "Detecting influenza
epidemics using search engine query data". Nature. 457 (7232): 1012–1014.
+ 16

What are they using data science for?

1. Measurement

2. Insights

3. Data Products
+ 17

1) Measurement

n To make a decision based on data

n Aka. benchmarking

n Turning qualitative information into quantitative values


n Usually called metrics or indicators

n Direct and indirect measurement


+ 18

Why do we need to measure?

n Comparison between alternatives (make a selection)


n Choosing which notebook to buy

n Comparison after improvement or tuning


n Should I add memory to my notebook?

n A/B Testing (split testing)


n Let the actual users decide their preferences
n Very popular for UI design
+ 19

A/B Testing

Source: https://vwo.com/ab-testing/
+ 20

Example: SimCity

Source: https://blog.optimizely.com/2015/06/04/ecommerce-conversion-optimization-case-studies/
+ 21

Example: SmartWool

Source: https://blog.optimizely.com/2015/06/04/ecommerce-conversion-optimization-case-studies/
+ 22

2) Insights
https://blogs.scientificamerican.com/guest-blog/9-bizarre-and-surprising-insights-from-data-science/

n Good understanding of user behavior can lead to new product


development or improvements of the existing products

n Walmart -- Pop-Tarts before a hurricane


n Prehurricane, Strawberry Pop- Tart sales increased about sevenfold

n Financial startup -- Typing with proper capitalization indicates


creditworthiness
n Online loan applicants who complete the application form with the correct case are
more dependable debtors

n Starbucks use customer purchase information from My Starbucks Mobile


Apps to figure out new products
+ 23

Example: Tracing Traffic


+ 24

Example: Tracing Traffic


GPS Average Speed 25

6:00-10:00 10:00-15:00 15:00-18:00


Bus Drivers’ Behaviors 26

Bus A 07/03/2016 ~17:00 Bus B 07/03/2016 ~17:00

Bus A 10/03/2016 ~09:00 Bus B 10/03/2016 ~09:00

Bus A 10/03/2016 ~17:00 Bus B 10/03/2016 ~17:00


+ 27

3) Data Products

n Anapplication or system that uses data to provide “intelligent”


products or services, which create more data that can be further
used

n Machine learning plays an important role in building great data


products
+ 28

Machine Learning Classification

n Identify to which set of categories a new observation belong

n Example: spam filtering, customer churn prediction, complaint classification


+ 29

Example: Students Grade Prediction


30
Historical Data

Training

Model

Predicting

Current Students Predicted Outcomes

𝑂𝑆×𝐷𝑎𝑡𝑎 𝑆𝑡𝑟𝑢𝑐𝑡×𝑃𝑟𝑜𝑔
>7
9
Example: Amazon
Recommendation
n Amazonsells 480M products (485k
new products per day)

n Userecommendation systems to
bring products to customers

n Analyze data from 300M customers


n Purchase history
n Reviews / Ratings
n Search history
n Views
+ 32

Case study: Alibaba Fraud Detection

Source: http://www.sciencedirect.com/science/article/pii/S2405918815000021
+ 33

Case study: Predictive Policing

Being used by 60 cities in the US e.g. Atlanta, LA, etc.

Source: http://www.forbes.com/sites/ellenhuet/2015/02/11/predpol-predictive-policing
+ 34

Drew Conway’s Data Science Venn diagram (Skills)

Drew Conway’s Venn


diagram of data
science, 2010

Data Data
Data Data
Preparat Visualiz
ion Analysis ation Product

Chula Data Science


35

https://odsc.medium.com/40-must-know-data-science-skills-and-frameworks-for-2023-582fef0bc3fa
+
Big Data

36
37

Big Data Explosion


47,000 20 million 3,000
$83,000
204 million App downloads
In sales
Photo views Photo uploads

Emails sent
61,141
Hours of music 320 100,000
New twitter accounts New tweets

1,300 100+
New mobile New Linkedin
users accounts

277,000 6 million
Logins Facebook views
What Happens in 2+ million
an Internet Minute? Search queries

30 1.3 million
SOURCE: INTEL Hours of videos Video views
uploaded
+ 38

https://www.ibmbigdatahub.com/infographic/four-vs-big-data
Now 42 V of Big Data
39

42 V’s?!?
Big Data Driver: Internal + External Data
40
https://owletcare.com/ 41
42

https://findair.eu/#Produkt
Big Data Analytics
43

• It is a process of examining Big Data to uncover useful information and knowledge.

• More data means better decision!

Big Challenges

External Data

Unstructured
Data
Big Data Challenges
44

Same tasks, but much more difficult!


Big Data Solution
45

INFRASTRUCTURE ALGORITHM
46
Big Data Solution (cont.)
Scale-out Infrastructure

Vertical Scaling Horizontal Scaling


(Scale-up) (Scale-out)
Big Data Solution (cont.)
47

In-memory & Distributed Computing

Resilient Distributed Datasets (RDD)


RAM RAM RAM RAM RAM

COM 1 COM 2 COM 3 COM 4 COM …


+ 48

SQL
NoSQL
Python

Hadoop
Spark

LINK
+ 49

https://blog.datath.com/data-engineer-guide/
Data Scientist + ML Engineer
+ 50

https://vocal.media/education/data-scientist-vs-data-
engineer-vs-ml-engineer-vs-ml-ops-engineer
+ MLOps = ML + DEV + OPS
+
Data Science Process

52
+ 53

Data Science Process

Dr.Virote

1. Transform data into valuable insights

2. Transform data into data products

3. Transform data into interesting stories

Aj.Natawut

1. Measurement (decision)

2. Insights (knowledge)

3. Data Products (Innovation, Intelligent)


Data Analytics (Data Science)
54
+ 55

Types of Data Science Projects

Valuable insights Advanced analytics

n Data visualization n AI/Machine Learning/Deep Learning


n Prediction, Forecasting, Clustering, etc.
n Analytical skills & storytelling
n Infographic
+ 56
+ 57
+ 58

https://dataforest.ai/blog/best-business-intelligence-
tool-of-2023-top-16-bi-tools-by-dataforest
+ 1
59

=
2

n 1) Rule-based AI

n 2) Machine Learning (ML)

https://mc.ai/machine-learning-basics-artificial-
intelligence-machine-learning-and-deep-learning/
+ Machine Learning (ML) 60
61

https://www.gartner.com/en/articles/gartner-top-10-
strategic-technology-tre nds-for-2024
Data Trend in 2024 (cont.)
62

• AI (AI everywhere & Gen AI) is the key


component.

• Knowledge without action (Platform


Engineering) is meaningless.

• Cloud technology is a modern infrastructure.


63

Vit Niennattrakul, Ph.D.


64

Vit Niennattrakul, Ph.D.


65

Vit Niennattrakul, Ph.D.


AWS Academy Service
AWS Academy Learner Lab
• Amazon API Gateway • AWS Cost and Usage Report • Amazon Forecast • AWS OpsWorks
• AWS App Mesh • AWS Cost Explorer • AWS Glue • Amazon Personalize
• Application Auto Scaling • AWS Data Pipeline • AWS Glue DataBrew • Amazon QuickSight
• AWS AppSync • AWS DeepComposer • Amazon GuardDuty • Amazon Redshift
• Amazon Athena • AWS DeepLens • AWS Health • Amazon Relational Database Service (RDS)
• Amazon Aurora • AWS DeepRacer • AWS Identity and Access Management (IAM) • AWS Resource Groups & Tag Editor
• AWS Backup • AWS Directory Service • AWS IAM Access Analyzer • AWS RoboMaker
• AWS Certificate Manager (ACM) • Amazon EC2 Auto Scaling • Amazon Inspector • Amazon Route 53
• AWS Batch • AWS Elastic Beanstalk • AWS IoT 1-Click • AWS Secrets Manager
• AWS Cloud9 • Amazon Elastic Block Store (EBS) • AWS IoT Analytics • AWS Security Hub
• AWS CloudFormation • Amazon Elastic Container Registry (ECR) • AWS IoT Core • AWS Security Token Service (STS)
• Amazon CloudFront • Amazon Elastic Container Service (ECS) • AWS IoT Greengrass • AWS Serverless Application Repository (SAR)
• Amazon CloudSearch • Amazon Elastic File System (EFS) • Amazon Kendra • AWS Service Catalog
• AWS CloudShell • Amazon Elastic Inference • AWS Key Management Service (KMS) • Amazon Simple Notification Service (SNS)
• AWS CloudTrail • Amazon Elastic Kubernetes Service (EKS) • Amazon Kinesis • Amazon Simple Queue Service (SQS)
• Amazon CloudWatch • Elastic Load Balancing (ELB) • Amazon Lex • Amazon Simple Storage Service (S3)
• AWS CodeCommit • Amazon Elastic MapReduce (EMR) • Amazon Machine Learning (Amazon ML) • Amazon Simple Storage Service Glacier (S3 Glacier)
• AWS CodeDeploy • Amazon ElastiCache • AWS Marketplace Subscriptions • Amazon Simple Workflow Service (SWF)
• Amazon CodeWhisperer • Amazon EventBridge • AWS Mobile Hub • AWS Step Functions
• AWS Config • AWS Fargate • Amazon Neptune • AWS Storage Gateway
• AWS Systems Manager (SSM) • Amazon Timestream • Amazon Virtual Private Cloud (Amazon VPC) • AWS Well-Architected Tool
• Amazon Textract • AWS Trusted Advisor • AWS WAF - Web Application Firewall • AWS X-Ray

AWS Academy Lab Project - Cloud Data Pipeline Builder Both Learner Lab & Lab Project - Cloud Data Pipeline Builder
• Amazon Managed Streaming for Apache Kafka (Amazon MSK) • Amazon SageMaker
• Amazon Elastic Compute Cloud (EC2)
• Amazon DynamoDB
• AWS Lambda
• Amazon Kinesis Video Streams
• Amazon Rekognition

https://awsacademy.instructure.com/login/canvas
Conclusion
67

4) Cloud technology

1) Data Analytics
Module (AI/ML)

3) Data Vizualiation
Module

2) Data Engineering Module


(Data Pipeline)
+
Any questions? J

68

You might also like