Week 2 Lecture 3
Week 2 Lecture 3
Landscape
6/25/22 11:59 AM 1
Oil and Gas
05/18/2025 07:38 AM 2
.
05/18/2025 07:38 AM 3
Production Forecasting
05/18/2025 07:38 AM 4
Production Forecasting and Well shut-ins
05/18/2025 07:38 AM 5
.
05/18/2025 07:38 AM 6
Predict/Detect Screen-Outs
05/18/2025 07:38 AM 7
Predict Hydraulic Pump Failure
05/18/2025 07:38 AM 8
.
05/18/2025 07:38 AM 9
Offline Maintenance Scope Setting
05/18/2025 07:38 AM 10
Discussion
6/25/22 11:59 AM 11
Best Practices To Execute Data Science Projects?
05/18/2025 07:38 AM 12
Best Practices To Execute Data Science Projects?
05/18/2025 07:38 AM 13
People
People Science
Data
Data Analytics
14
BUILDING RIGHT DATA SCIENCE CAPABILITY
15
DATA SCIENCE TEAM?
16
Data Science Team Key Roles
Manager/Lead
Data Scientist
Data
Science
Team
Data Data
Engineer Architect
Data
Analyst
17
Team within Organizations
Who Is Data Scientist?
Math,
Statistics, AI,
Competenties:
Machine • Team Player
learning • Ready to face failure
• Communicative
• Ready to get out of
comfort zone
Data • Analytical mindset
Scientist • Curiosity
Domain Computer
Knowledge, BI Science
• Empathy
tools (Programming, • Proactive
(Visualization) Databases)
19
Major Roles & Skills
Data Data Data
Engineer Analyst Scientist
Operation Visualization/
Developing
Interpretation
Developing Machine
Learning Machine
Learning
Infrastructure Math &
Design
Statistics Math &
Statistics
• Infrastructure
Data Analyst • Statistics,
• Database (SQL and • Machine learning,
• Statistics
NoSQL) • Programming
• Business Intelligence
• Feature engineering languages
• Reporting
21
Skill Set Data Scientist
• Deep learning
• Basic statistics
Average • Unstructured data analysis
(text analysis. Video
• Basic Machine learning analytics)
(Regression and Decision • Machine learning (ANN, SVM, • Understanding of Hadoop
Trees) PCA, Naïve bayes, k-means, based Ecosystems
• Feature engineering KNN, etc.) (Hortonworks and Cloudera)
• Excel, Basic SQL • Programming (R, Python) • NoSQL
• Basic R and Python • SQL • Hive, Pig, SparkSQL
• BI tools - Visualization • Domain Knowledge and
• Domain Knowledge and communication skills
communication skills
Satisfactory
Advanced
22
23
What are the right skills for Data Scientist & Data Engineer?
Visualization System
Implementation
Programming
Story Telling
DB
Math Administration
Data Scientist Data Engineer
Statistics
25
Processes
People Science
Data
Data Analytics
An interdisciplinary field that employs sophisticated tools and techniques to extract knowledge
and actionable insights from structured or unstructured data in order to optimize business
objectives.
26
Design Thinking Process
• Share your prototyped • Create a point of view that
idea with your original Testing Problem is based on user needs and
• What does the Definition
user for feedback. insights.
user think about 1 • What do you
What words? your solution? want to solve? • What are their needs?
• What didn't?
5
Expose Discover
Collect
Analyze
Process
Integrate
Store
28
Data Science
Internal data Combine
and Enrich
Correct data
faulty and features
missing
data
React
Dynamic quickly to
alerts events
External data
29
CRISP-DM
Business Data
Understanding Understanding
Data Strategy
CRISP-DM (CROSS INDUSTRY Deployment
Data
30
DATA SCIENCE & DESIGN THINKING LINKAGE
Step 1: Define hypothesis to test or prediction to be Empathize
made • Individual & small group interviews
• Seek to understand; Non-
judgmental
Step 2: Gather data.. And more data (Data Lake: SQL Define
+ Hadoop) • Personas (Objectives, decisions,
challenges)
• Envisioning variables that might be
Step 3: Prepare data: Build scheme (schema-on-
better predictors of performance
query)
Ideate
• Data visualization
Step 4: Visualize the data (Tableau, Micro strategy, • Descriptive analytics
Spotfire, ggplot2,..) • Illustrative analytics
Prototype
• Predictive Analytics
Step 5: Build analytic models (Python, R, Mahout) • Prescriptive analytics
Test
Step 6: Evaluate model “Goodness of fit” • Goodness of fit
• Codify impediments
(coefficients, confidence level) • Fail fast / learn faster / iterate
31
Data Science Process
32
Developing Data Science With Strategic Capability Guidelines
Analytics Capability Adoption Curve
Degree of Support
OWNERSHIP
ADOPTION
ACCEPTANCE
UNDERSTANDING
AWARENESS
33
ENABLE ANALYTICAL STRATEGY
Insights Foresights Optimize
Optimization
What best we •Prescription of
can do? best choice
Predictive amongst a
Modeling complex web
•Modeling of options
targeted to
Descriptive enable
Modeling decisions
•Describe
Insights/ historical What will
event
Limited What-if
•Insights in
happen?
•Multi
inference &
OLAP Reporting dimensional
causality
querying
•Drill-thru •Basic scenario
•Drill-Across analysis
Standard
Reporting
•Comp Sales What happened?
•Sell-thru
Raw Data
•Product,
Sales,
Inventory,
Customer
35
What is Agile Methodology?
These determines KPIs, reports and other Mapping processes to
metrics, per user, in order to monitor that determine the starting
the process is working as agree point and the “Actual
State”
37
Agile Methodology
Data Understanding Data Preparation Modeling Evaluation Deployment
• Collect Data • Select Data • Select Modeling • Evaluate Results • Plan Deployment
• Describe Data • Clean Data Technique • Review Process • Plan Monitoring Next
• Explore Data • Integrate Data • Generate Test • Determine Next Steps • Review Project iteration
• Verify Data Quality • Format Data Design • Evaluate results with • Results/ Come Outs
• Build Model domain expert
• Asses Model
• Collect Data • Select Data • Select Modeling • Evaluate Results • Plan Deployment
• Describe Data • Clean Data Technique • Review Process • Plan Monitoring
• • • • •
Next
Explore Data Integrate Data Generate Test Determine Next Steps Review Project
• Verify Data Quality • Format Data Design • Evaluate results with • Results / Come Outs
iteration
• Build Model domain experts
• Asses Model
• Collect Data • Select Data • Select Modeling • Evaluate Results • Plan Deployment
• Describe Data • Clean Data Technique • Review Process • Plan Monitoring
• Explore Data • Integrate Data • Generate Test • Determine Next Steps • Review Project
• Verify Data Quality • Format Data Design • Evaluate results with • Results Come outs
• Build Model domain experts
• Asses Model
38
39
Emerging Technologies
Top Emerging Technologies
41
In this Class
Machine
1 Artificial
intelligence
2 Learning and
Deep Learning
Natural
3 Language 4 Computer Vision
Processing
9 Smart Cities
42
Statistics About Emerging Technologies
The Internet of
Artificial Quantum
Things & Smart Cybersecurity
Intelligence Computing
Cities
• $1.5 Trillion • 80% to 90% • $39.2 million • 4.5 bilion
potential market worlds data is potential market records
• 50 billion unstructured in 2017 breaches in first
devices • $2.2 billion in half of 2018
• $20 billion by 2025 • Hackers attacks
2050 on sensors computer in
alone every 39
seconds.
43
Artificial
Intelligence
05/18/2025 44
Artificial Intelligence
The word ‘Artificial ‘Shakey’ was the first Supercomputer ‘deep First commercially Speech recognition,
Intelligence’ coined by general purpose blue’ was designed which successful robotic Video analytics,
John Mccarthy mobile robot built defeated the world chess vacuum cleaner Industry robots, smart
champion in a game created homes and many
more….
05/18/2025 46
Artificial Intelligence Future
05/18/2025 47
Current Status of Artificial Intelligence
Naïve Bayes
classifiers
Decision Tree
Robotics
Supervised
learners Predictors Regression Trees
Associative Learners
Expert Systems Clustering
K Means
Deterministic rules
& Processes &
Decisions
An Expert System
• A huge organized set of knowledge about a particular
Knowledg subject. It contains facts and judgmental knowledge
e Base which gives it the ability guess like human.
Knowledge from
an expert
Non-Expert
user Expert
System
Query
User Interface Inference Knowledge
Engine Base
Advic
e
05/18/2025 51
Tools For Artificial Intelligence
05/18/2025 52
Advantages & Disadvantages of A.I
Advantages Disadvantages
The chances of error are almost nil High Cost
It can be used to explore space, depths Decrease in demand for human labor
of ocean
Smartphones are greatest example of AI AI may be programmed to do something
devasting
It can be used in time consuming tasks Machine Ethics
efficiently
Algorithms can help the doctors asses The storages and success are not as
patients and their health risks. effective as human brains
Machines do not require sleep or break No improvements with experience
and are able to function without
stopping.
05/18/2025 53
How to choose technology to implement data science in any organization?
05/18/2025 07:38 AM 54
How to choose technology to implement data science in any organization?
Examine their domain Read reviews on Look through their Learn their workflow and
Study their experiences
expertise specialized sites development activities best methodologies
05/18/2025 07:38 AM 55
Machine
Learning &
Deep
Learning
05/18/2025 56
Types of Learning
Supervised: Learning with a labeled training set
Example: email classification with already labeled emails
05/18/2025 07:38 AM 57
Machine Learning
Machine Learning is a field of computer science that gives computers the
ability to learn without being explicitly programmed
Machine Learning vs Traditional Programming
Data
Computer Output
Program
Traditional Programming
Data
Computer Program
Output
Machine Learning Approach
Machine Learning - Workflow
• Pandas
• Emoji translator • EDA
• Correlation
• Stop words • Deriving new
• Features
Removal features from
Selections
• Lemmatization available
• Sklearn
• Steaming attributes
• etc
Data Feature
Feature Selection
Preprocessing Engineering