0% found this document useful (0 votes)
22 views3 pages

Data Science

The document outlines the five key stages of a Data Science project: defining a problem, data processing, modeling, evaluation, and deployment. Each stage is crucial, starting with clearly identifying the problem and measure of success, followed by data collection and preparation, model creation, evaluation of model performance, and finally deploying the model into production. The process emphasizes the importance of each stage in ensuring a successful Data Science project across various domains.

Uploaded by

janshijha893
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views3 pages

Data Science

The document outlines the five key stages of a Data Science project: defining a problem, data processing, modeling, evaluation, and deployment. Each stage is crucial, starting with clearly identifying the problem and measure of success, followed by data collection and preparation, model creation, evaluation of model performance, and finally deploying the model into production. The process emphasizes the importance of each stage in ensuring a successful Data Science project across various domains.

Uploaded by

janshijha893
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

INTRODUCTION TO DATA SCIENCE

Stages in a data science project

Data Science workflows tend to happen in a wide range of domains and areas of expertise such as
biology.geography. finance or business, among others. This mcans that Data Science projects can
take on very different challenges and focuses resulting in very differcnt methods and data sets
being uscd. A Data Science project willhave to go through five key stages: defining a problem,
data processing, modelling, evaluation and deployment.

Defining a problem

The first stage of any Data Science project is toidentify and define a problem to be solved.
Without aclearly defincd problem to solve, i can be difficult to know how to tackle to the
problem.
For a Data Science project this can include what method to use, such as is classification,
regression or clustering. Also, without aclearly defined problem, it can be hard to
determine what your measure of success would be.

Without adefined measure of success, youcan never know when your project is complete
or is good enough to be used in production.
Achallenge with this is being able to define a problem small enough that it can be
solved/tackled individually.
Data Processing

Cnce youhave your problem, how you are going to measure success, and an idea of he
mehods you willbe using. you can then go about performing the all important task of datu
processing, This is often the stage that will take the longest in any Data Science project
and can regularly be the most important stage.
There are a variety of tasks that need to occur at this stage depending on what problem
you are going to tackle. The first is often finding ways to create or capture data that
doesn't exist yet.
INTRODUCTION TO DATA SCIENCE

Once you have created this data, you then need to collect it somewhere and in a format
hat is useful for your model. This will depend on what method you will be using in the
modelling phase but it will involve figuring out how you will feed the data into your
model.

The final part of this is to then perform any pre-processing steps to cnsure that the data is
clean enough for the modelling method to work This may involve removing outliers, or
choosing to keep them, manipulating null values, whether a null value is a measure or
whether it should be imputed to the average, or standardising the measures.

Modelling

The next part, and often the most fun and exciting part, is the modelling phase of the Data
Science project,)The format this will take will depend primarily on what the problem is
and how you defined success in the first step, and secondarily on howyou processed the
data.

Unfortunately, this is often the part that will take the least amount of' time of any Data
Science project, especially when there are many frameworks or libraries that exist, such as
sklearn, statsmodels,tensorflow and that can be readily utilised.
You should have selected the method that you will be using to model your data in the
defining aproblem stage, and this may include simple graphical exploration, regression,
classification or clustering.

Evaluation

Once you have then created and implemented your models, you then need to know how to
evaluate it. Again, this goes back to the problem formulation stage where you will have
defined your measureof success, but this is often one of the most important stages.
Depending on how you processed your data and set-up your model, you may have a
holdout datasetor testing data set that can be used to evaluate your model) On this dataset.
INTRODUCTION TO DATA SCIENCE

you are aiming to see how well your model performs in terms of both accuracy and
reliability.

Deployment

Finally, once you have robustly evaluated your model and are satisficd with the results, then you
can deploy it into production. This can mean a varicty of things such as whether you use the
insights from the model to make changes in your business, whether you use your model to check
whether changes that have becn made were successful, or whether the model is deployed
somewhere to continually receive and evaluate live data.

Defininga
problem

Data
processing

Modelling

Evaluation

Deploying to
production

You might also like