1 DataScienceOverview
1 DataScienceOverview
Outline
n Introduction
n Data is important
n Data Science Definition by Dr.Virote
n Data Science Definition by Aj.Natawut
n Big Data
3
+ 4
https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data
+ 5
Data Science
(AI,ML,DM)
https://www.epmag.com/new-oil-1720651
+ 6
n Data
n Facts and statistics collected for reference or analysis
n Science
n A systematic study through observation and experiment
n Data Science
n The scientific exploration of data to extract meaning or insight,
n and the construction of software to utilize such insight in a business context.
http://nypost.com/2016/12/05/amazon-introduces-next-major-job-killer-to-face-americans/
+ 11
http://www.thebillionpricesproject.com/
+ 13
+ 14
https://www.hbs.edu/faculty/Publication%20Files/BPP_JEP_m_13b5e009-4162-4f2c-b507-593a9a98c082.pdf
+ 15
Ginsberg, Jeremy; Mohebbi, Matthew H.; Patel, Rajan S.; Brammer, Lynnette;
Smolinski, Mark S.; Brilliant, Larry (19 February 2009). "Detecting influenza
epidemics using search engine query data". Nature. 457 (7232): 1012–1014.
+ 16
1. Measurement
2. Insights
3. Data Products
+ 17
1) Measurement
n Aka. benchmarking
A/B Testing
Source: https://vwo.com/ab-testing/
+ 20
Example: SimCity
Source: https://blog.optimizely.com/2015/06/04/ecommerce-conversion-optimization-case-studies/
+ 21
Example: SmartWool
Source: https://blog.optimizely.com/2015/06/04/ecommerce-conversion-optimization-case-studies/
+ 22
2) Insights
https://blogs.scientificamerican.com/guest-blog/9-bizarre-and-surprising-insights-from-data-science/
3) Data Products
Training
Model
Predicting
𝑂𝑆×𝐷𝑎𝑡𝑎 𝑆𝑡𝑟𝑢𝑐𝑡×𝑃𝑟𝑜𝑔
>7
9
Example: Amazon
Recommendation
n Amazonsells 480M products (485k
new products per day)
n Userecommendation systems to
bring products to customers
Source: http://www.sciencedirect.com/science/article/pii/S2405918815000021
+ 33
Source: http://www.forbes.com/sites/ellenhuet/2015/02/11/predpol-predictive-policing
+ 34
Data Data
Data Data
Preparat Visualiz
ion Analysis ation Product
https://odsc.medium.com/40-must-know-data-science-skills-and-frameworks-for-2023-582fef0bc3fa
+
Big Data
36
37
Emails sent
61,141
Hours of music 320 100,000
New twitter accounts New tweets
1,300 100+
New mobile New Linkedin
users accounts
277,000 6 million
Logins Facebook views
What Happens in 2+ million
an Internet Minute? Search queries
30 1.3 million
SOURCE: INTEL Hours of videos Video views
uploaded
+ 38
https://www.ibmbigdatahub.com/infographic/four-vs-big-data
Now 42 V of Big Data
39
42 V’s?!?
Big Data Driver: Internal + External Data
40
https://owletcare.com/ 41
42
https://findair.eu/#Produkt
Big Data Analytics
43
Big Challenges
External Data
Unstructured
Data
Big Data Challenges
44
INFRASTRUCTURE ALGORITHM
46
Big Data Solution (cont.)
Scale-out Infrastructure
SQL
NoSQL
Python
Hadoop
Spark
LINK
+ 49
https://blog.datath.com/data-engineer-guide/
Data Scientist + ML Engineer
+ 50
https://vocal.media/education/data-scientist-vs-data-
engineer-vs-ml-engineer-vs-ml-ops-engineer
+ MLOps = ML + DEV + OPS
+
Data Science Process
52
+ 53
Dr.Virote
Aj.Natawut
1. Measurement (decision)
2. Insights (knowledge)
https://dataforest.ai/blog/best-business-intelligence-
tool-of-2023-top-16-bi-tools-by-dataforest
+ 1
59
=
2
n 1) Rule-based AI
https://mc.ai/machine-learning-basics-artificial-
intelligence-machine-learning-and-deep-learning/
+ Machine Learning (ML) 60
61
https://www.gartner.com/en/articles/gartner-top-10-
strategic-technology-tre nds-for-2024
Data Trend in 2024 (cont.)
62
AWS Academy Lab Project - Cloud Data Pipeline Builder Both Learner Lab & Lab Project - Cloud Data Pipeline Builder
• Amazon Managed Streaming for Apache Kafka (Amazon MSK) • Amazon SageMaker
• Amazon Elastic Compute Cloud (EC2)
• Amazon DynamoDB
• AWS Lambda
• Amazon Kinesis Video Streams
• Amazon Rekognition
https://awsacademy.instructure.com/login/canvas
Conclusion
67
4) Cloud technology
1) Data Analytics
Module (AI/ML)
3) Data Vizualiation
Module
68