Lecture 5 - Lifecycle of A Data Science Project
Lecture 5 - Lifecycle of A Data Science Project
But first,
• Check your groups
• Contact your team members
• Decide datasets
• Discuss contributions
• Email from TAs
• Language preferences
• Schedule group meeting with TAs (preferably after)
• Some changes in the scoring scheme for group assignment
• Questions on Slack
3
We are looking for 5-8 students to comprise the reference group. The purpose of the reference group is to provide
constructive feedback about the course through an ongoing open dialogue with other students throughout the semester.
You can read more about task of the reference group in this link.
If you want to sign up to be a member of the reference group, use this link.
A survey will be sent out to all to evaluate the course during the last week.
CRISP-DM: with a use case
5
What is CRISP-DM
Cross-industry standard process for data
mining - CRISP-DM
Source
7
What is CRISP-DM
Cross-industry standard process for data
mining - CRISP-DM
Maintenace and
An open standard developed in 1996 by leading monitoring
companies in data analysis
What is CRISP-DM
Maintenace and
monitoring
9
Business Understanding
• Initially, it is vital to understand the problem to be solved
Documentation
• One Pager
• Design document
Data analytics
Examining data to answer questions, identify trends, and extract insights.
12
Descriptive analysis
• Pull trends from raw data and succinctly describe it.
Descriptive analysis
15
Descriptive analysis
16
Descriptive analysis
17
Diagnostic analysis
• Comparing coexisting trends or movement, uncovering correlations between
variables, and determining causal relationships where possible.
Descriptive analysis
19
Descriptive analysis
• Comparing coexisting trends or movement, uncovering correlations between
variables, and determining causal relationships where possible.
Spurious Correlations
22
Predictive analysis
• Predict the future trends and events, using the data at hand.
Predictive analysis
25
Predictive analysis
Source
26
Prescriptive analysis
• Suggests actionable takeaways considering all possible factors in a scenario
Prescriptive analysis
28
Prescriptive analysis
29
Business Understanding
• Initially, it is vital to understand the problem to be solved
Data Understanding
• If solving the business problem is the goal, the data
comprise the available raw material from which the
solution will be built
Data Understanding
Maintenace and
monitoring
35
Data Understanding
Maintenace and
monitoring
36
•https://www.kaggle.com/trnderenergikraft/grid-loss-time-series-dataset
37
2. Calendar features
3. Weather forecasts
Data Understanding
Dos and DON’T’S
Data Preparation
1. Select data
• Select features
2. Clean data
• Correct data errors
Maintenace and
• Make coding consistent monitoring
• Fill in or infer missing data
3. Construct data
• Generate derived attributes
4. Integrate data
• Merge information from different sources
5. Format data
• Convert to format convenient for modelling
41
Data Preparation
Maintenace and
monitoring
42
2. Calendar features
• Categorical features
• Encoding?
Modelling
1. Select modelling techniques
• Select an algorithm or a model
Maintenace and
2. Build the model monitoring
• Feature selection
• Hyperparameter optimization
• Training and validation
3. Assess model
• Model performance on test dataset
• Time
• Other Key Performance Indicators (KPIs)
46
3. Assess model
• Model performance on test dataset
• Time
• Other KPIs
47
3. Assess model
• Model performance on test dataset
• Time
• Other KPIs
48
Breck, Eric, et al. "The ML test score: A rubric for ML production readiness and technical debt
reduction." 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017.
51
Breck, Eric, et al. "The ML test score: A rubric for ML production readiness and technical debt
reduction." 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017.
52
Important Deadlines
When you will need to deliver or complete a task
Lecture Plan
Unpacking the course syllabus
1 23/8 Lecture 1: Introduction [Nisha Dalal] 8 11/10 Lecture 7: Data Visualization & Storytelling
[Manos Papagiannidis]
5 20/9 No lecture
8/11 Lecture 10: Decision making with data science
12
[Nisha Dalal]
6 27/9 Lecture 5: Lifecycle of a Data Science project I
[Nisha Dalal]
https ://tinyurl.com/2xxh5uhx
55
Nisha Dalal
Questions & Discussion [email protected]