0% found this document useful (0 votes)
28 views

Data Science Process

Uploaded by

krishnan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Data Science Process

Uploaded by

krishnan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Data Science Process

Data science process consists of six stages :


1. Discovery or Setting the research goal
2. Retrieving data
3. Data preparation
4. Data exploration
5. Data modeling
6. Presentation and automation
• Step 1: Discovery or Defining research goal

• This step involves acquiring data from all the identified internal and
external sources, which helps to answer the business question.
• Step 2: Retrieving data
• It collection of data which required for project. This is the process of
gaining a business understanding of the data user have and
deciphering what each piece of data means.
• This could entail determining exactly what data is required and the
best methods for obtaining it.
• If we have given a data set from a client, for example, we shall need to
know what each column and row represents.
• Step 3: Data preparation
• Data can have many inconsistencies like missing values, blank
columns, an incorrect data format, which needs to be cleaned.
• We need to process, explore and condition data before modeling. The
cleandata, gives the better predictions.
• Step 4: Data exploration
• Data exploration is related to deeper understanding of data.
• Try to understand how variables interact with each other, the
distribution of the data and whether there are outliers.
• To achieve this use descriptive statistics, visual techniques and simple
modeling.
• This steps is also called as Exploratory Data Analysis.
• Step 5: Data modelling
• In this step, the actual model building process starts. Here, Data
scientist distributes datasets for training and testing.
• Techniques like association, classification and clustering are applied to
the training data set. The model, once prepared, is tested against the
"testing" dataset.
• Step 6: Presentation and automation
• Deliver the final baselined model with reports, code and technical
documents in this stage.
• Model is deployed into a real-time production environment after
thorough testing.
• In this stage, the key findings are communicated to all stakeholders
• This helps to decide if the project results are a success or a failure
based on the inputs from the model.

You might also like