Data Science Methodologies
Data Science Methodologies
Different Methodologies –
KDD,SEMMA, CRISP-DM,TDSP
• Managing DS projects involves
• navigating complexity through tailored project management
strategies.
• This includes orchestrating diverse teams,
• handling extensive data sets,
• ensuring alignment with both business objectives and rigorous
scientific standards.
• Over the few years, there have been a lot of effort in
terms of standardizing the methodologies and defining the
best practices which are followed in building your data
science solutions and data science projects.
Different Methodologies for DS
• So we will understand about the various project management
methodologies frameworks which are used for building data
mining solutions.
• We will look into
• Knowledge Discovery In Databases (KDD),
• Cross Industry Standard Processes for Data Mining (CRISP-DM)
• SEMMA Stands for Sample, Explore, Modify, Model, Assess, And
Refers to the process Of Conducting A DM Project.
• Team Data Science Process (TDSP).
DIKW pyramid
• This is called as DIKW
pyramid, also known variously
as the DIKW hierarchy,
wisdom hierarchy, knowledge
hierarchy, information
hierarchy, information
pyramid, and the data pyramid.
• refers loosely to a class of
models for representing
purported structural and/or
functional relationships
between data, information,
knowledge, and wisdom.
1. Knowledge Discovery in Databases
(KDD)
• A comprehensive process used in data mining and machine
learning to extract useful knowledge from large datasets.
• The KDD process typically consists of several stages or
steps, which are often represented as a sequence:
1.Selection:
Objective: In the selection stage, the focus is on identifying
and retrieving data from various sources that are relevant to
the analysis and decision-making process.
Activities:
Define the criteria for selecting data based on the problem
domain and objectives.
Gather data from databases, data warehouses, or other sources
that meet the defined criteria.
Ensure the data collected is comprehensive and representative of
the problem at hand.
2 . Pre-processing:
Activities:
• Clean the data by handling missing values, outliers, and noise.
• Normalize or standardize data to ensure consistency and
comparability across different variables.
• Feature selection or extraction to identify relevant attributes
that contribute most to the analysis.
3. Transformation: