Notes On Data Science Methodologies
Notes On Data Science Methodologies
Deriving answers
1. In what way can data be visualized in order to get to the answer that is required?
2. Does the model really answer the the initial question or does it need to be adjusted?
3. Can you put the model into practice?
4. Can you get constructive feedback into answering the questions?
3. Data Preparation Once the data has been collected, it must be transformed into
a useable subset unless it is determined that more data is needed. Once a dataset
is chosen, it must then be checked for questionable, missing, or ambiguous cases.
Data Preparation is common to CRISP-DM and Foundational Methodology.
4. Modeling Once prepared for use, the data must be expressed through whatever
appropriate models, give meaningful insights, and hopefully new knowledge. This
is the purpose of data mining: to create knowledge information that has meaning
and utility. The use of models reveals patterns and structures within the data that
provide insight into the features of interest. Models are selected on a portion of
the data and adjustments are made if necessary. Model selection is an art and
science. Both Foundational Methodology and CRISP-DM are required for the
subsequent stage.
6. Deployment In the deployment step, the model is used on new data outside of
the scope of the dataset and by new stakeholders. The new interactions at this
phase might reveal the new variables and needs for the dataset and model. These
new challenges could initiate revision of either business needs and actions, or the
model and data, or both.
CRISP-DM is a highly flexible and cyclical model. Flexibility is required at each step along
with communication to keep the project on track. At any of the six stages, it may be
necessary to revisit an earlier stage and make changes. The key point of this process is
that it’s cyclical; therefore, even at the finish you are having another business
understanding encounter to discuss the viability after deployment. The journey
continues.
The CRISP-DM model is flexible and can be customized easily. For example, if your
organization aims to detect money laundering, it is likely that you will sift through large amounts
of data without a specific modeling goal. Instead of modeling, your work will focus on data
exploration and visualization to uncover suspicious patterns in financial data. CRISP-DM allows
you to create a data mining model that fits your particular needs.
In such a situation, the modeling, evaluation, and deployment phases might be less relevant than
the data understanding and preparation phases. However, it is still important to consider some of
the questions raised during these later phases for long-term planning and future data mining
goals.