0% found this document useful (0 votes)
132 views

Data Analytics Process

The document discusses the data analysis process, including defining the business problem, data sourcing and collection, and data cleaning. It provides an example of an insurance company analyzing customer churn. Key steps are understanding the root cause through discussions with business teams, gathering relevant customer, agent, and insurance data from various sources, and cleaning the data by addressing missing values, normalization, and creating new indicators. The goal is to transform raw data into useful information for making better business decisions.

Uploaded by

Yashi Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
132 views

Data Analytics Process

The document discusses the data analysis process, including defining the business problem, data sourcing and collection, and data cleaning. It provides an example of an insurance company analyzing customer churn. Key steps are understanding the root cause through discussions with business teams, gathering relevant customer, agent, and insurance data from various sources, and cleaning the data by addressing missing values, normalization, and creating new indicators. The goal is to transform raw data into useful information for making better business decisions.

Uploaded by

Yashi Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Data Analytics Process

The Motivation Behind Data Analysis Process

Given the considerable amount of data collected by industries nowadays, they need to adopt
the right analytics strategies for better decision-making. In this conceptual blog, we will start
by building your understanding of the data analysis process before providing an in-depth
explanation of all the steps involved.

What is Data Analysis?

Data analysis is analysing data to provide organizations with meaningful insights for better
decision-making from historical data using different data analysis techniques such as
performing statistical analysis and creating data visualizations for storytelling. Let's apply the
complete data analysis process to the following real-time data analytic project for better
understanding.

Data Analysis Process Example with a Data Analytic Project in Insurance

Imagine an insurance company whose business model is to compensate or not its clients
based on the type of insurance they have subscribed (auto and home) and the detailed brief
submitted to support their claims.

The company noticed a 30% customer churn for the past few months. Realizing this issue, it
seeks data analyst expertise to help them properly identify the root cause of the problem so
that it does not keep losing customers. To help in the process, the manager thinks that this is
due to the delay taken by agents to process clients' requests.

Understanding the Role of a Data Analyst in the Data Analysis Process

The job of a Data Analyst is to understand the business problem better, collect appropriate
data, and process and explore them to extract useful information to help the insurance
company make smart business decisions.

Data Analysis Process - Fundamental Steps of a Data Analytics Project

As a data analyst, you might find it challenging to make the best use of your data. Following
the data analysis process and best practices for each new or existing data analysis project will
help you make the most out of the data for the business.
Data Analysis Process Step 1 - Define and Understand the Business Problem

In the use case, the company stated that the delay in request processing might cause customer
churn. This is not the exact problem but a statement. The goal of a data analyst in the first
step of the data analysis process is to get a clarification on the problem from the business. To
do so, data analysts schedule a meeting with the following people from the Business and the
Data Consulting team.

Business Team
 Head of the Insurance Company, who is responsible for the coordination of both auto
and home insurance departments.

 Managers of Auto and home insurance departments because they better understand


their respective departments.

Data Consulting team


 Data Manager

 Data Analyst/Data Scientist/Data Engineer

Here's how the discussion between the Business and Data Consulting teams could
proceed through the analysis process -

Business team: we want to know why we are currently facing this level of customer churn.

Data team: currently facing, meaning you did not have that in the past?

Business team:  No, because we only had the auto insurance department in the past.

Data team:  could you please describe the request processing process?
Business team:  the customers send their request, we check the completion of the required
documents, and only then do we proceed forward when all the documents are completed.

Data team: what is the proportion of employees before adding home insurance department?

Business team: we just trained some people from auto insurance to join the new department.

etc...

At the end of such a discussion, the Data team could develop a better understanding of the
Business problem and then adopt analytic strategies to facilitate the process.

Avoiding as much technical jargon as possible during this phase is also important. Your goal
is to harness your soft skills and domain knowledge as much as you can for a smooth
discussion with the business.

Commonly Used KPI Monitoring Tools in the Data Analysis Process

Every business problem understanding includes defining Key Performance Indicators (KPIs)
to keep track of the deliverables performances. Different licensed and open-source tools exist,
as shown below:

Tools Description

This is a licensed tool used by businesses to visualize, request, and


understand their metrics from multiple data sources simultaneously.

A free visualization tool that can store real-time metrics is handy when
dealing with Time Series use cases.

Licensed monitoring and analytics tool. Datadog is used to determine both


performance metrics and event monitoring for infrastructure and cloud-
based services.
 

Data Analysis Process Step 2 - Data Sourcing and Data Collection

Once the data analyst understands the Business problem, the next step is to perform the
inventory of existing information and collect a data set that better fits the analytics use case.

This can be either first-party data, third-party data to the company, or open data repositories.
First-party data corresponds to the data accessible within the company, and third-party ones
are those the company buys from external sources.

These collected data must be legally and technically exploitable, reliable, and sufficiently up-
to-date on the stated problem.
We can imagine that we have the following four sources of data available for our use case

 Requests' Statistics

o Request conversion rate: number of clients' requests that made it to the next
step after the first submission.

o Time spent by an agent on examining the completion of a given client's


request.

 Client's Attributes

o The age and address of each client

o Date of subscription to the insurance company's service.

o The feedback of each client on the analysis process of their previous request.

 Insurance Data

o List of documents required for processing auto insurance requests.

o List of documents required for processing home insurance requests.

o Agents' arrival date in each department.

 Client's Raw data

o A document explaining the reason for the customer's request.

This data gathered by the Data Engineer is then used further in the data analysis process by
Data Analysts and Data Scientists.

Commonly Used Data Collection and Storage Tools in the Data Analysis Process
The Data Engineer is responsible for creating the right data pipelines to gather and store these
data in a data warehouse or a data lake using different big data technologies such as Scala,
PostgreSQL, Python, etc.

Tools Description
One of the main reasons for using Scala is its ability to provide
parallelization features for processing large data sets, which can be
very useful when collecting data from multiple sources.

Open-source relational database for storing and querying data. It


provides many features to protect data integrity and also to help
manage data no matter the size.
The simplicity and readability of Python make it one of the most used
tools by Data partitioners. It offers multiple libraries to collect data
from any website.
 

Data Analysis Process Step 3 - Data Cleaning

Data Cleaning - An Integral Part of the Analytics Process

Data cleaning is one of the major steps in the data analysis process, and a good Data Analyst
spends around 70 to 90% of their time on data cleaning. This step takes that much time
because having high-quality data can have global benefits across the organization, such as:

 Detecting and correcting errors to avoid costly errors.

 Make decision-making easier by creating the correct key performance indicators from
the raw data.

 Working with quality data can improve team productivity because they will not need
to allocate time to deal with incorrect data.

Below are the key tasks in the data cleaning process:

 Deal with missing values,

o In our data analytics process example, we can replace the missing request
conversion rate with the median value specific to each department.

 Normalizing variables

o the time spent on requests examination can be measured in days by home


insurance agents and hours by auto insurance agents. The normalization will
consist of using the same user measure for both departments, let's say in hours.

o each  address can be represented by its postal code instead of the complete


address.

o from insurance data, auto and home departments can require the same ID
document, ID_home, and ID_auto, which can be normalized to ID.

 Replacing dates by duration to know how long each client has been using the
company's service and how long each agent has been in a specific department.

 Creating key indicators based on business knowledge.

o the age of the customer when subscribing for the first time to the company's
insurance service.
o the total number of requests made by each client.

o the period with the highest number of requests.

 Encoding certain variables

o , the agents' arrival date can be replaced by their seniority. For instance, the
longer the period, the more senior he/she is.

 Correct errors in the data

o the clients' raw textual data might contain some grammatical errors, so
running them through the

Data cleaning can be done using programming languages such as Python, R, etc. The
previous list of processes is not exhaustive but specific to our case for a better understanding
of the process.

Commonly Used Data Cleaning Tools in the Data Analytics Process

There are many tools for data cleaning, but the focus here is being made on the open source
ones, as shown below.

Tools Description
Distributed processing system used by data scientists to reduce the
cost and time required for the Extract, Transform and Load process
due to its ability to deal with several petabytes of data at the time.

When it comes to data processing, Python can be the tool to go for,


because it has a lot of built in analytics libraries for processing
complex data structures.
 

Data Analysis Process Step 4 - Analysing the Data for Interpretations and Insights

A data scientist is likely to feel relieved once done with cleaning data. Now comes the time to
express curiosity and analytical and data storytelling skills by using different data
visualization tools and techniques and statistical analysis approaches to answer the business
problem appropriately.

The data analysis process you will go through depends on the business problem you are
trying to solve. Most business problems fall into the following five data analysis categories:
What happened? --> Descriptive Analysis
That is, most of the time, the first question the business team might want to find an answer to
before diving into any other exploration.

Referring to our use case, the insurance company can use descriptive analytics to understand
what has happened in the past few months by running different hypotheses to accept or reject
the null hypothesis, which corresponds to the claim of the insurance manager.

Why did it happen? --> Diagnostic Analysis


Now that we know what happened, the next logical step could be to know why it happened.
Here is where the diagnostic analysis process comes in handy, and combining it with the
descriptive analysis process can help the business take actionable decisions to mitigate
customer churn.

What relationship exists in my data? --> Exploratory Data Analysis/EDA


This process is about analyzing the raw data to know what to learn and understand from it. It
involves the use of different data visualizations techniques so that you can understand:

 the distribution of the variables in your data by examining their shape, whether they
are right, left-skewed, or normally distributed, etc.

 detect eventual outliers that might exist in the data set and the relationship between all
the data types.

 if there is a notion of temporality in your data set.

What Will happen? --> Predictive Analysis


As the name suggests, predictive analytics is all about trying to predict future trends based on
diagnostic and exploratory analysis.

An efficient understanding of those trends and relationships in the data can guide the tasks
that need to be performed, whether it is clustering, classification, regression analysis, etc.
Once the data analyst has an idea of the task, we can proceed with the scientific literature
review phase, which aims to benchmark state-of-the-art Machine learning, Artificial
Intelligence, or even statistical solutions for the use case.

How will it happen? --> Prescriptive Analysis


Answering such a question for any business is a highly coveted skill, making it one of the
most effective data analysis types in research. Now that you know what happened, how, and
why it happened, a data analyst can use prescriptive analytics to make recommendations for
the future, which will allow the business to take the appropriate actions for a better return on
investment in the short, medium, and long term while adapting their data collection strategy
and ultimately realigning their performance indicator.
Data Analysis Process Step 5- Communicate Results and Eventually Readjust the

Problem

Data Analysts can communicate their findings to the Business using different business
analytics solutions and open source tools.

In our use case, the Data Analyst might conclude that the customer churn is due to the delay
created during pre-processing.

However, the analysis might show two additional facts in addition to the direct observation of
the insurance manager:

(1) agents spend more time checking the request's documents completion instead of
focusing on analysing whether a given request is worth the compensation.

(2) Once the document is completed, agents need additional time to identify which


department the request is intended for.

Now a new question arises.

How to improve the clients' documents processing?

This question means that the Data team needs to provide the business team with the right
recommendations to mitigate customer churn.

Commonly Used Data Visualization Tools in the Data Analysis Process


Different factors can lead a company to use one tool over another for data visualization. The
most important skill is to communicate your result properly, regardless of the data
visualization tool. Below are some of the most commonly used data visualization tools:

Tools Description

Business Intelligence software that provides an intuitive drag-and-


drop interface for analytics and visualization. The non-technical
aspect makes it stand out in the industry.

Similar to Tableau, PowerBI is also a Business Intelligence and


Data Visualization tool, allowing the conversion of data from
multiple sources into interactive business intelligence reports and
also supports both Python and R.

A python framework for creating from simple to more advanced


visualizations.
When it comes to presenting results, PowerPoint is one of the top
tools to adopt because it allows the users to translate complex
information into easily digestible visualizations.
 

Data Analysis Process Step 6- Choose the Right Models

Choosing the right model depends on the data analysis result. Failing to do so will ultimately
lead to choosing the wrong modelling data models.

As a data analyst, you can make the following recommendations to mitigate the previously
identified two facts. In addition, a new discussion will be required to set the key success and
performance indicators for the data analysis project.

(1) Conversational agent for document completion

The document completion issue might be solved by creating a conversational chatbot agent
that focuses on the following actions:

 check clients' document completion

 and instantly notify the clients whether the list of requested documents is completed
or not.

(2) document submission to the right department

Once the document is completed, a second machine learning model is responsible for
submitting it to the right department when the confidence score satisfies a given threshold
defined by the business team.

Implement and validate the models


Once the model is implemented by the Data Science team, a validation phase is required with
the business to ensure that the result is aligned with the business metrics.

deploy the models


Before the model deployment, different aspects of the target environment need to be taken
into consideration such as:

 the infrastructure that will host the model and also its dependence with existing
applications.

 change management to identify how the current team will efficiently and comfortably
interact with the model.
Data Analysis Process Step 7- Monitor the Model Performance

Machine learning models are not traditional applications, so monitoring their performance
over time is crucial. You can get users' and business feedback to improve them.

We hope this article has given you a complete overview of the data analysis lifecycle. There
might be more or fewer steps in the analysis process from one data analysis project to
another. Still, a data analyst will likely come across at least the first five steps when solving a
real-world business problem. You have the complete data analytics project plan template to
help you efficiently plan your next data analysis project.

You might also like