Data Science
Data Science
Data Science workflows tend to happen in a wide range of domains and areas of expertise such as
biology.geography. finance or business, among others. This mcans that Data Science projects can
take on very different challenges and focuses resulting in very differcnt methods and data sets
being uscd. A Data Science project willhave to go through five key stages: defining a problem,
data processing, modelling, evaluation and deployment.
Defining a problem
The first stage of any Data Science project is toidentify and define a problem to be solved.
Without aclearly defincd problem to solve, i can be difficult to know how to tackle to the
problem.
For a Data Science project this can include what method to use, such as is classification,
regression or clustering. Also, without aclearly defined problem, it can be hard to
determine what your measure of success would be.
Without adefined measure of success, youcan never know when your project is complete
or is good enough to be used in production.
Achallenge with this is being able to define a problem small enough that it can be
solved/tackled individually.
Data Processing
Cnce youhave your problem, how you are going to measure success, and an idea of he
mehods you willbe using. you can then go about performing the all important task of datu
processing, This is often the stage that will take the longest in any Data Science project
and can regularly be the most important stage.
There are a variety of tasks that need to occur at this stage depending on what problem
you are going to tackle. The first is often finding ways to create or capture data that
doesn't exist yet.
INTRODUCTION TO DATA SCIENCE
Once you have created this data, you then need to collect it somewhere and in a format
hat is useful for your model. This will depend on what method you will be using in the
modelling phase but it will involve figuring out how you will feed the data into your
model.
The final part of this is to then perform any pre-processing steps to cnsure that the data is
clean enough for the modelling method to work This may involve removing outliers, or
choosing to keep them, manipulating null values, whether a null value is a measure or
whether it should be imputed to the average, or standardising the measures.
Modelling
The next part, and often the most fun and exciting part, is the modelling phase of the Data
Science project,)The format this will take will depend primarily on what the problem is
and how you defined success in the first step, and secondarily on howyou processed the
data.
Unfortunately, this is often the part that will take the least amount of' time of any Data
Science project, especially when there are many frameworks or libraries that exist, such as
sklearn, statsmodels,tensorflow and that can be readily utilised.
You should have selected the method that you will be using to model your data in the
defining aproblem stage, and this may include simple graphical exploration, regression,
classification or clustering.
Evaluation
Once you have then created and implemented your models, you then need to know how to
evaluate it. Again, this goes back to the problem formulation stage where you will have
defined your measureof success, but this is often one of the most important stages.
Depending on how you processed your data and set-up your model, you may have a
holdout datasetor testing data set that can be used to evaluate your model) On this dataset.
INTRODUCTION TO DATA SCIENCE
you are aiming to see how well your model performs in terms of both accuracy and
reliability.
Deployment
Finally, once you have robustly evaluated your model and are satisficd with the results, then you
can deploy it into production. This can mean a varicty of things such as whether you use the
insights from the model to make changes in your business, whether you use your model to check
whether changes that have becn made were successful, or whether the model is deployed
somewhere to continually receive and evaluate live data.
Defininga
problem
Data
processing
Modelling
Evaluation
Deploying to
production