Unit 4
Unit 4
Analytics
Contents
• 4.1 Steps in Data Analysis
• 4.2 Working with methods for analyzing variety of data
• 4.3 Working with large data
• 4.4 Data Visualization using advanced graphs
Steps in Data Analysis
1. Define the Problem or Research Question
In the first step of process, the data analyst is given a problem/business
task.
• The analyst has to understand the task and the stakeholder’s expectations
for the solution.
• A stakeholder is a person that has invested their money and resources to a
project.
• The analyst must be able to ask different questions in order to find the
right solution to their problem.
• The analyst has to find the root cause of the problem in order to fully
understand the problem.
• Communicate effectively with the stakeholders and other colleagues to
completely understand what the underlying problem is.
• Questions to ask yourself for the Ask phase are:
• What are the problems that are being mentioned by my stakeholders?
• What are their expectations for the solutions?
2. Collect Data
• The second step is to Prepare or Collect the Data.
• This step includes collecting data and storing it for further analysis.
• The analyst has to collect the data based on the task given from multiple
sources.
• All data fit into one of three categories: first-party, second-party, and third-
party data.
• First-party data are data that you, or your company, have directly collected from
customers.
• Second-party data is the first-party data of other organizations i.e The data that is
collected and sold is called second-party data.
• This might be available directly from the company or through a private marketplace.
• Third-party data is data that has been collected and aggregated from numerous
sources by a third-party organization. Often (though not always) third-party data
contains a vast amount of unstructured data points (big data).
3. Cleaning the data
• Removing major errors, duplicates, and outliers—all of which are inevitable problems when
aggregating data from numerous sources.
• Removing unwanted data points—extracting irrelevant observations that have no bearing on
your intended analysis.
• Bringing structure to your data—general ‘housekeeping’, i.e. fixing typos or layout issues, which
will help you map and manipulate your data more easily.
• Filling in major gaps—as you’re tidying up, you might notice that important data are missing.
Once you’ve identified gaps, you can go about filling them.
4. Analyzing the data