Week 3 - LAQ

The document discusses the data science life cycle which includes problem identification, data collection, data preprocessing, data analysis, data modeling, model evaluation, model training, and model deployment. It describes each step and who is typically involved at each stage such as domain experts, data scientists, machine learning engineers, and data engineers.

Uploaded by

G Kishore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views5 pages

Week 3 - LAQ

Uploaded by

G Kishore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Explain Data Science Life Cycle in detail?

A data science lifecycle indicates the iterative steps taken to build, deliver and maintain
any data science product. All data science projects are not built the same, so their life
cycle varies as well. Still, we can picture a general lifecycle that includes some of the most
common data science steps. A general data science lifecycle process includes the use of
machine learning algorithms and statistical practices that result in better prediction
models. Some of the most common data science steps involved in the entire process are
data extraction, preparation, cleansing, modeling, evaluation etc. The world of data
science refers to this general process as the “Cross Industry Standard Process for Data
Mining”.
We will go through these steps individually in the subsequent sections and understand
how businesses execute these steps throughout data science projects. But before that,
let us take a look at the data science professionals involved in any data science project.
Get to know more about measures of dispersion .

Who Are Involved in The Projects?

 Domain Expert: The data science projects are applied in different domains or
industries of real life like Banking, Healthcare, Petroleum industry etc. A domain
expert is a person who has experience working in a particular domain and knows
in and out about the domain.
 Business analyst: A business analyst is required to understand the business needs
in the domain identified. The person can guide in devising the right solution and
timeline for the same.
 Data Scientist: A data scientist is an expert in data science projects and has
experience working with data and can work out the solution as to what data is
needed to produce the required solution.
 Machine Learning Engineer: A machine learning engineer can advise on which
model to be applied to get the desired output and devise a solution to produce
the correct and required output.
 Data Engineer and Architect: Data architects and Data engineers are the experts in
the modeling of data. Visualization of data for better understanding, as well as
storage and efficient retrieval of data, are looked after by them.

The Lifecycle of Data Science

The major steps in the life cycle of a Data Science project are as follows:

1. Problem identification
This is the crucial step in any Data Science project . The first thing is understanding in
what way Data Science is useful in the domain under consideration and identification
of appropriate tasks which are useful for the same. Domain experts and Data
Scientists are the key persons in the problem identification of problem. Domain expert
has in depth knowledge of the application domain and exactly what the problem is to be
solved. Data Scientist understands the domain and help in identification of problem
and possible solutions to the problems.

2. Business Understanding
Understanding what customer exactly wants from the business perspective is nothing but
Business Understanding. Whether customer wish to do predictions or want to improve
sales or minimize the loss or optimize any particular process etc forms the business
goals. During business understanding, two important steps are followed:
 KPI (Key Performance Indicator)
For any data science project, key performance indicators define the performance or
success of the project. There is a need to be an agreement between the customer and
data science project team on Business related indicators and related data science project
goals. Depending on the business need the business indicators are devised and then
accordingly the data science project team decides the goals and indicators. To better
understand this let us see an example. Suppose the business need is to optimise the
overall spendings of the company, then the data science goal will be to use the existing
resources to manage double the clients. Defining the Key performance Indicators is very
crucial for any data science projects as the cost of the solutions will be different for
different goals.
 SLA (Service Level Agreement)
Once the performance indicators are set then finalizing the service level agreement is
important. As per the business goals the service level agreement terms are decided.
For example, for any airline reservation system simultaneous processing of say 1000
users is required. Then the product must satisfy this service requirement is the part of
service level agreement.
Once the performance indicators are agreed and service level agreement is completed
then the project proceeds to the next important step.
3. Collecting Data
Data Collection is the important step as it forms the important base to achieve targeted
business goals. There are various ways the data will flow into the system as shown in
figure 2.

The basic data collection can be done using the surveys. Generally, the data collected
through surveys provide important insights. Much of the data is collected from the
various processes followed in the enterprise. At various steps the data is recorded in
various software systems used in the enterprise which is important to understand the
process followed from the product development to deployment and delivery. The
historical data available through archives is also important to better understand the
business. Transactional data also plays a vital role as it is collected on a daily basis.
Many atistical methods are applied to the data to extract the important information
related to business. In data science project the major role is played by data and so proper
data collection methods are important.

4. Pre-processing data
Large data is collected from archives, daily transactions and intermediate records. The
data is available in various formats and in various forms. Some data may be available in
hard copy formats also. The data is scattered at various places on various servers. All
these data are extracted and converted into single format and then processed. Typically,
as data warehouse is constructed where the Extract, Transform and Loading (ETL) process
or operations are carried out. In the data science project this ETL operation is vital and
important. A data architect role is important in this stage who decides the structure of
data warehouse and perform the steps of ETL operations.

5. Analyzing data
Now that the data is available and ready in the format required then next important step
is to understand the data in depth. This understanding comes from analysis of data using
various statistical tools available. A data engineer plays a vital role in analysis of data. This
step is also called as Exploratory Data Analysis (EDA). Here the data is examined by
formulating the various statistical functions and dependent and independent variables or
features are identified. Careful analysis of data revels which data or features are
important and what is the spread of data. Various plots are utilized to visualize the data
for better understanding. The tools like Tableau, PowerBI etc are famous for performing
Exploratory Data Analysis and Visualization. Knowledge of Data Science with Python and
R is important for performing EDA on any type of data.
6. Data Modelling
Data modelling is the important next step once the data is analysed and visualized. The
important components are retained in the dataset and thus data is further refined. Now
the important is to decide how to model the data? What tasks are suitable for modelling?
The tasks, like classification or regression, which is suitable is dependent upon what
business value is required. In these tasks also many ways of modelling are available. The
Machine Learning engineer applies various algorithms to the data and generates the
output. While modelling the data many a times the models are first tested on dummy
data similar to actual data.

7. Model Evaluation/ Monitoring

As there are various ways to model the data so it is important to decide which one is
effective. For that model evaluation and monitoring phase is very crucial and
important. The model is now tested with actual data. The data may be very few and in
that case the output is monitored for improvement. There may be changes in data while
model is being evaluated or tested and the output will drastically change depending on
changes in data. So, while evaluating the model following two phases are important:
 Data Drift Analysis: Changes in input data is called as data drift. Data drift is
common phenomenon in data science as depending on the situation there will be
changes in data. Analysis of this change is called Data Drift Analysis. The accuracy
of the model depends on how well it handles this data drift. The changes in
data are majorly because of change in statistical properties of data.
 Model Drift Analysis: To discover the data drift machine learning techniques can
be used. Also, more sophisticated methods like Adaptive Windowing, Page Hinkley
etc. are available for use. Modelling Drift Analysis is important as we all know
change is constant. Incremental learning also can be used effectively where the
model is exposed to new data incrementally.

8. Model Training
Once the task and the model are finalised and data drift analysis modelling is finalized
then the important step is to train the model. The training can be done is phases where
the important parameters can be further fine tuned to get the required accurate
output. The model is exposed to the actual data in production phase and output
is monitored.

9. Model Deployment
Once the model is trained with the actual data and parameters are fine tuned then model
is deployed. Now the model is exposed to real time data flowing into the system and
output is generated. The model can be deployed as web service or as an embedded
application in edge or mobile application. This is very important step as now model is
exposed to real world.
10. Driving insights and generating BI reports
After model deployment in real world, the next step is to find out how the model is
behaving in real-world scenario. The model is used to get insights that aid in strategic
decisions related to business. The business goals are bound to these insights. Various
reports are generated to see how business is driving. These reports help in finding out if
key process indicators are achieved or not.

11. Taking a decision based on insight

For data science to do wonders, every step indicated above has to be done very carefully
and accurately. When the steps are followed properly, then the reports generated in the
above step help in making key decisions for the organization. The insights generated help
in taking strategic decisions for example, the organization can predict that there will be a
need for raw materials in advance. Data science can be of great help in making many
important decisions related to business growth and better revenue generation.

Mechanical & Electrical Building Systems
100% (9)
Mechanical & Electrical Building Systems
142 pages
Huawei ISO9001 ISO14001
No ratings yet
Huawei ISO9001 ISO14001
30 pages
Data Science
100% (2)
Data Science
33 pages
BAGUS GE - NGT & Generator Protection Calculation
100% (2)
BAGUS GE - NGT & Generator Protection Calculation
140 pages
Unit-I Introduction To Data Science
No ratings yet
Unit-I Introduction To Data Science
40 pages
Scenarıos To Tercıos Volume I Travlos DRAFT
100% (1)
Scenarıos To Tercıos Volume I Travlos DRAFT
36 pages
John Deere Fuel Eng
100% (4)
John Deere Fuel Eng
760 pages
Plastic Furniture
No ratings yet
Plastic Furniture
20 pages
Data Science CLASS 12 INVESTIGATORY PROJECT
No ratings yet
Data Science CLASS 12 INVESTIGATORY PROJECT
9 pages
Transfer of Ownership Package
100% (10)
Transfer of Ownership Package
12 pages
DSE 3 Unit 1
100% (1)
DSE 3 Unit 1
10 pages
Data Science Process
No ratings yet
Data Science Process
101 pages
INSURANCE Argente Vs West Coast
No ratings yet
INSURANCE Argente Vs West Coast
3 pages
Introduction Data Science Edited
No ratings yet
Introduction Data Science Edited
33 pages
M1 - FDS
No ratings yet
M1 - FDS
19 pages
Topper World Data-Science-Lifecycle-Fnl
No ratings yet
Topper World Data-Science-Lifecycle-Fnl
6 pages
Data Science PDF
No ratings yet
Data Science PDF
11 pages
Shipper Load Confirmation
No ratings yet
Shipper Load Confirmation
2 pages
Handbook Introduction of Data Science AY 23-24
No ratings yet
Handbook Introduction of Data Science AY 23-24
171 pages
Arts Humanities2323980
No ratings yet
Arts Humanities2323980
2 pages
EBook - Data Science 4
No ratings yet
EBook - Data Science 4
14 pages
SQL Queries Exercise
100% (3)
SQL Queries Exercise
53 pages
Sales Force Motivation Thesis PDF
100% (4)
Sales Force Motivation Thesis PDF
6 pages
Notes
No ratings yet
Notes
132 pages
HW1-Solutions - Final 2016
No ratings yet
HW1-Solutions - Final 2016
6 pages
IDS Unit 1
No ratings yet
IDS Unit 1
67 pages
Data Science Notes
No ratings yet
Data Science Notes
105 pages
Final Industrial Report
No ratings yet
Final Industrial Report
34 pages
What Is Data Science?
No ratings yet
What Is Data Science?
94 pages
CLILpaper Tommy TCope
No ratings yet
CLILpaper Tommy TCope
74 pages
Factory Mutual LPDS 1-49
100% (3)
Factory Mutual LPDS 1-49
25 pages
IDS - UNIT-2 - Notes Part1 - Introduction To Data Science and Prob Concept
No ratings yet
IDS - UNIT-2 - Notes Part1 - Introduction To Data Science and Prob Concept
66 pages
EDA Unit1
No ratings yet
EDA Unit1
53 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
25 pages
Internship Report: T.J.Instituteoftechnology
No ratings yet
Internship Report: T.J.Instituteoftechnology
29 pages
View Bill History
No ratings yet
View Bill History
3 pages
Ids Unit I
No ratings yet
Ids Unit I
46 pages
Data Science Introduction
No ratings yet
Data Science Introduction
24 pages
Exploratory Data Analysis With Python
No ratings yet
Exploratory Data Analysis With Python
24 pages
Unit - 1
No ratings yet
Unit - 1
25 pages
M55 LE1 Review Handout
No ratings yet
M55 LE1 Review Handout
33 pages
Architecture of Data Science Projects: Components
No ratings yet
Architecture of Data Science Projects: Components
4 pages
OASIS Native QuickHelp
No ratings yet
OASIS Native QuickHelp
21 pages
Challenges and Scope of Data Science Project
No ratings yet
Challenges and Scope of Data Science Project
21 pages
Introduction of Data Science
No ratings yet
Introduction of Data Science
28 pages
Unit-2 - DS Notes
No ratings yet
Unit-2 - DS Notes
22 pages
Module1 Data Science
No ratings yet
Module1 Data Science
15 pages
Data Science
No ratings yet
Data Science
14 pages
MLM FDS
No ratings yet
MLM FDS
19 pages
Liceria Tech
No ratings yet
Liceria Tech
12 pages
Data Science-Lec 1
No ratings yet
Data Science-Lec 1
17 pages
DSA Lecture1
No ratings yet
DSA Lecture1
15 pages
Commission Agent Scenario PDF
No ratings yet
Commission Agent Scenario PDF
17 pages
Data Science
No ratings yet
Data Science
11 pages
Unit - I
No ratings yet
Unit - I
17 pages
Arduino Mario Bros Tunes
No ratings yet
Arduino Mario Bros Tunes
11 pages
Data Science
No ratings yet
Data Science
11 pages
IDS Unit - 5
No ratings yet
IDS Unit - 5
6 pages
Unit 2 - DS - 1st Year
No ratings yet
Unit 2 - DS - 1st Year
7 pages
Data Science Life Cycle
No ratings yet
Data Science Life Cycle
7 pages
GOP Moves To Extend Ballot Verification
No ratings yet
GOP Moves To Extend Ballot Verification
6 pages
Chemistry.: Serway, R. A., & Jewett, J. W. (2018) - Physics For Scientists and Engineers With
No ratings yet
Chemistry.: Serway, R. A., & Jewett, J. W. (2018) - Physics For Scientists and Engineers With
6 pages
Life Cycle of DS Project
No ratings yet
Life Cycle of DS Project
9 pages
Questions
No ratings yet
Questions
4 pages
Lecture 3 Immunology of Cancer
No ratings yet
Lecture 3 Immunology of Cancer
4 pages
Data Science
No ratings yet
Data Science
5 pages
DS Unit 2
No ratings yet
DS Unit 2
7 pages
Drager Fabius Gs Draeger Medical
No ratings yet
Drager Fabius Gs Draeger Medical
8 pages
Republic of The Philippines Province of Isabela Municipality of Gamu BARANGAY - Office of The Punong Barangay
No ratings yet
Republic of The Philippines Province of Isabela Municipality of Gamu BARANGAY - Office of The Punong Barangay
2 pages
Data Science Process
No ratings yet
Data Science Process
7 pages
Life Cycle of Data Science - Complete Step-By-step Guide
No ratings yet
Life Cycle of Data Science - Complete Step-By-step Guide
3 pages
Handout 1
No ratings yet
Handout 1
5 pages
Data Science Lifecycle
No ratings yet
Data Science Lifecycle
3 pages
Learning Journal For Unit 2 HS 2711
No ratings yet
Learning Journal For Unit 2 HS 2711
2 pages
Aluminium-Air Battery
No ratings yet
Aluminium-Air Battery
6 pages
KB2960) Exclude A Safe Website From Being Blocked by Web Access Protection in ESET
No ratings yet
KB2960) Exclude A Safe Website From Being Blocked by Web Access Protection in ESET
5 pages
Data Science Process Stages Lecture 2
No ratings yet
Data Science Process Stages Lecture 2
4 pages
Dear Ms. Lovely Khatun, Urban Essentials India PVT LTD.: Date 03/01/2024
No ratings yet
Dear Ms. Lovely Khatun, Urban Essentials India PVT LTD.: Date 03/01/2024
2 pages
Dsur Ea2352001010391 W3
No ratings yet
Dsur Ea2352001010391 W3
3 pages
Implementing Data Science Projects PDF
No ratings yet
Implementing Data Science Projects PDF
2 pages
First Hand
No ratings yet
First Hand
4 pages
Week 3
No ratings yet
Week 3
3 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Business Intelligence and Data Mining Techniques
From Everand
Business Intelligence and Data Mining Techniques
Dwaipayan Sethi
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
Analytics and Big Data for Accountants
From Everand
Analytics and Big Data for Accountants
Jim Lindell
No ratings yet
Data Entry Operator: Skills, Software, Career Tips, and Interview Q&A
From Everand
Data Entry Operator: Skills, Software, Career Tips, and Interview Q&A
Sumitra Kumari
No ratings yet

Week 3 - LAQ

Uploaded by

Week 3 - LAQ

Uploaded by

Explain Data Science Life Cycle in detail?

Who Are Involved in The Projects?

The Lifecycle of Data Science

7. Model Evaluation/ Monitoring

11. Taking a decision based on insight

You might also like