Data Science Process
Data Science Process
• This step involves acquiring data from all the identified internal and
external sources, which helps to answer the business question.
• Step 2: Retrieving data
• It collection of data which required for project. This is the process of
gaining a business understanding of the data user have and
deciphering what each piece of data means.
• This could entail determining exactly what data is required and the
best methods for obtaining it.
• If we have given a data set from a client, for example, we shall need to
know what each column and row represents.
• Step 3: Data preparation
• Data can have many inconsistencies like missing values, blank
columns, an incorrect data format, which needs to be cleaned.
• We need to process, explore and condition data before modeling. The
cleandata, gives the better predictions.
• Step 4: Data exploration
• Data exploration is related to deeper understanding of data.
• Try to understand how variables interact with each other, the
distribution of the data and whether there are outliers.
• To achieve this use descriptive statistics, visual techniques and simple
modeling.
• This steps is also called as Exploratory Data Analysis.
• Step 5: Data modelling
• In this step, the actual model building process starts. Here, Data
scientist distributes datasets for training and testing.
• Techniques like association, classification and clustering are applied to
the training data set. The model, once prepared, is tested against the
"testing" dataset.
• Step 6: Presentation and automation
• Deliver the final baselined model with reports, code and technical
documents in this stage.
• Model is deployed into a real-time production environment after
thorough testing.
• In this stage, the key findings are communicated to all stakeholders
• This helps to decide if the project results are a success or a failure
based on the inputs from the model.