Introduction To CRISP DM Framework For Data Science and Machine Learning
Introduction To CRISP DM Framework For Data Science and Machine Learning
Reactivate
Search
Premium
Chapter 1 - Introduction to
CRISP DM Framework for Data
Science and Machine Learning
Published on June 21, 2018
CRISP DM Framework
In my first post, I would like to discuss about the basic framework which
is normally used and implemented in any Data Science/ML Project. It is
very important for any one working on to follow a streamlined approach
of creating a Machine Learning Model. This is also done to ensure, we
follow and do not miss any of the required steps for creating our Machine
Learning Model.
Out of many such methodologies available, the one that is widely used is
CRISP DM Framework.
https://www.linkedin.com/pulse/chapter-1-introduction-crisp-dm-framework-data-science-anshul-roy/ 1/7
10/12/2018 Chapter 1 - Introduction to CRISP DM Framework for Data Science and Machine Learning | LinkedIn
1. Business Understanding
2. Data Understanding
3. Data Preparation
4. Modeling
5. Evaluation
6. Deployment
I will try to bring in the key steps and significance of all the above
mentioned steps.
Messaging
https://www.linkedin.com/pulse/chapter-1-introduction-crisp-dm-framework-data-science-anshul-roy/ 2/7
10/12/2018 Chapter 1 - Introduction to CRISP DM Framework for Data Science and Machine Learning | LinkedIn
This step mostly focuses on understanding the Business in all the different
aspects. It follows the below different steps.
d. Flow Chart
2. Data Understanding –
Messaging
https://www.linkedin.com/pulse/chapter-1-introduction-crisp-dm-framework-data-science-anshul-roy/ 3/7
10/12/2018 Chapter 1 - Introduction to CRISP DM Framework for Data Science and Machine Learning | LinkedIn
Note: Independent variables in a data are the variables which are Reactivate
Premium
used to perform Machine Learning Predictions. Dependent
Variables are the variables that we are required to predict.
3. Data preparation:
In this step, we prepare and clean the provided data. There are many steps
that one should follow to complete the data preparation phase.
a. The first and foremost step being the NA treatment. Normally the
data at hand is not clean and always have NA. We must identify such
values and appropriately fill or replace them. There are many different
techniques of NA treatment and there are packages in R and Python
which automatically treat such variables based on some default logic.
However, it is always good to do it manually, as this way we get to
understand the data even further and can replace these NA’s with our
understanding of Business Requirement. The below article from Analytics
Vidhya has explanation of R Packages which can do the NA treatment on
its own.
Messaging
https://www.linkedin.com/pulse/chapter-1-introduction-crisp-dm-framework-data-science-anshul-roy/ 4/7
10/12/2018 Chapter 1 - Introduction to CRISP DM Framework for Data Science and Machine Learning | LinkedIn
b. The next step would be to treat Null’s. This step is equally Reactivate
Premium
important as NA treatment and as per my experience, I have below steps
for the Null treatment. Again they may change as per the data in hand but
will definitely help to some extent.
https://www.linkedin.com/pulse/chapter-1-introduction-crisp-dm-framework-data-science-anshul-roy/ 5/7
10/12/2018 Chapter 1 - Introduction to CRISP DM Framework for Data Science and Machine Learning | LinkedIn
Feature Engineering step is one such step which can be explored more
and can contribute significantly to the outcome.
4. Modeling :
Once the above steps are done, we have implemented the basic necessity
of ML and now we can proceed with the implementation of different ML
algorithm. The algorithm to be selected depends completely on the
business requirement, available data and the desired outcome. In an ideal
situation, we should try different algorithm or combination of algorithm
(Ensembles) to actually arrive at our final best algorithm. We will discuss
in detail on the different ML algorithms.
6. Deployment:
Finally, once the model is created and tested and evaluated on the Test
and Validation data, this is presented to the business (with PPT). The
model the undergoes different real time evaluation and testing like A/B
Testing and after all the approval process, the code is pushed to the
PROD/Live data.
Report this
Messaging
https://www.linkedin.com/pulse/chapter-1-introduction-crisp-dm-framework-data-science-anshul-roy/ 6/7
10/12/2018 Chapter 1 - Introduction to CRISP DM Framework for Data Science and Machine Learning | LinkedIn
Reactivate
28 Likes Premium
+18
0 Comments
Add a comment…
Anshul Roy
Sr. Data Engineer/Data Scientist on R/Python/Spark/Scala/Big Data
Follow
Messaging
https://www.linkedin.com/pulse/chapter-1-introduction-crisp-dm-framework-data-science-anshul-roy/ 7/7