Unit 3
Unit 3
If you are in a technical domain or a student with a technical background then you must have heard
about Data Science from some source certainly. This is one of the booming fields in today’s tech
market. And this will keep going on as the upcoming world is becoming more and more digital day by
day. And the data certainly hold the capacity to create a new future. In this article, we will learn about
Data Science and the process which is included in this.
What is Data Science?
Data can be proved to be very fruitful if we know how to manipulate it to get hidden patterns from
them. This logic behind the data or the process behind the manipulation is what is known as Data
Science. From formulating the problem statement and collection of data to extracting the required
results from them the Data Science process and the professional who ensures that the whole process is
going smoothly or not is known as the Data Scientist. But there are other job roles as well in this
domain as well like:
1. Data Engineers
2. Data Analysts
3. Data Architect
4. Machine Learning Engineer
5. Deep Learning Engineer
Data Science Process Life Cycle
There are some steps that are necessary for any of the tasks that are being done in the field of data
science to derive any fruitful results from the data at hand.
Data Collection – After formulating any problem statement the main task is to calculate data that
can help us in our analysis and manipulation. Sometimes data is collected by performing some kind
of survey and there are times when it is done by performing scrapping.
Data Cleaning – Most of the real-world data is not structured and requires cleaning and conversion
into structured data before it can be used for any analysis or modeling.
Exploratory Data Analysis – This is the step in which we try to find the hidden patterns in the
data at hand. Also, we try to analyze different factors which affect the target variable and the extent
to which it does so. How the independent features are related to each other and what can be done to
achieve the desired results all these answers can be extracted from this process as well. This also
gives us a direction in which we should work to get started with the modeling process.
Model Building – Different types of machine learning algorithms as well as techniques have been
developed which can easily identify complex patterns in the data which will be a very tedious task
to be done by a human.
Model Deployment – After a model is developed and gives better results on the holdout or the
real-world dataset then we deploy it and monitor its performance. This is the main part where we
use our learning from the data to be applied in real-world applications and use cases.
Data Science Process Life Cycle