0% found this document useful (0 votes)
30 views

Module Data Analysis

This document provides an overview of data analysis techniques. It discusses exploring data to understand its structure and find stories within. Techniques include asking questions about the data, visualizing it, and identifying outliers. The goals of exploratory analysis are to suggest hypotheses, assess assumptions, support tool selection, and provide a basis for further data collection or experiments. Key analytical tools covered are averages, linear regression, and applying statistical methods while balancing rigor with practical business needs.

Uploaded by

Murad Alhakem
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Module Data Analysis

This document provides an overview of data analysis techniques. It discusses exploring data to understand its structure and find stories within. Techniques include asking questions about the data, visualizing it, and identifying outliers. The goals of exploratory analysis are to suggest hypotheses, assess assumptions, support tool selection, and provide a basis for further data collection or experiments. Key analytical tools covered are averages, linear regression, and applying statistical methods while balancing rigor with practical business needs.

Uploaded by

Murad Alhakem
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Module: Data analysis

Learning Objectives
Understand how your data is structured
Assess and validate the Quality of data: is it already in a standardized format? What types of
data do we have? Is it well documented? What restrictions apply?
Understand how to create data about your data

Data Can we use Finding a Telling Trying it


Asking Questions
collection this data? Story your story out

Topics covered in this module:


1. What is data analysis or how do you find a story in your data?
2. Analytical tools
3. Applied analytics

Who is it for:
Data analysts & Data Owners

How do you find your story?


You’ve collected and processed your data—now the fun starts. Data analysis is the practice of
applying statistical and analytical tools to a dataset to serve actionable business insights. But, staring
down a huge dataset can be intimidating. Here are a few techniques to get you underway.

Data can be intimidating, especially when it comes to analyzing it. But


remember, most of statistics is just special ways of counting. We’re all pretty
good at counting! Establishing a data culture doesn't mean that we will all be
statistical wizards but it does mean learning enough about analysis to
understand its creative potential. The data analysis process is an opportunity
to bring people together. There are many ways to do this to find a story.
https://databasic.io/en/culture/#activities

Exploratory data analysis

If a friend asked you to go hiking with them, what questions might you ask to be better prepared?
You might want to know how long the trail is, and how well-kept. You might ask if there was
elevation gain and how evenly distributed. Or if there are outliers like water crossings or boulder
scrambles. These questions will help you understand the general structure of what you’re engaging
with and aid your preparation.

Similarly, exploratory data analysis is a set of techniques designed to get you familiar with a dataset,
fast. Usually utilizing visualizations, you can quickly understand some of the key features. It’s an
iterative process where answers to your questions lead to new questions and eventually new
answers.
Some questions you might ask during exploratory data analysis include:

1. How many rows and columns are there?


2. Are there any null values? A lot or just a few or none?
3. How are the different measures distributed?
4. Are there any outliers?
5. What’s the biggest value? The smallest? The mean and median averages?

Hands-on group activity: “Finding a Story” https://datatherapy.org/activities/activity-finding-a-story-


in-data/

Exploratory data analysis has the following stated goals:

1. Suggest hypotheses about the causes of observed phenomena


2. Assess assumptions on which statistical inference will be based
3. Support the selection of appropriate statistical tools and techniques
4. Provide a basis for further data collection through surveys or experiments

Essentially, exploratory data analysis should occur before you have enough information to begin
hypothesis testing, helping you to arrive there faster. It also allows the data to suggest a model for
future testing. The stated goals may sound intimidating, but it can be as simple as dropping the data
into Tableau and slicing & dicing around looking for something interesting.

Joins vs Unions

If you are working with multiple sources of data (it might be as simple as a couple different Excel tables or
as complicated as different databases, you will have to identify how these sources of data should interact
with each other. In just a few words: joins work with columns (horizontally) and unions work with rows
(vertically). Here is the explanation in more detail:
A great additional resource on joins and unions in Tableau: https://www.thedataschool.co.uk/diego-
parkertheinformationlab-co-uk/joins-and-unions/

Data dredging

Data dredging or data fishing is the process of (usually programmatically) identifying relationships
between variables. This can be an incredibly powerful technique to uncover elements that interact in
unexpected ways.

Correlation – Causation – Coincidence

However, in the world of big data, often so many variables are available to pair together that
eventually some will return a false positive. Evaluating two variables over the same period of time
allows application of a statistical technique to determine a correlation coefficient, a value between
negative 1 and 1 which expresses whether their two curves correlate (and how strongly, and
positively or negatively). To oversimplify, when there is a change in one line, does it correlate with a
similar change in the other line?

Correlation is the existence of a relationship between two or more variables. Correlation states that
as one variable will go up or down another variable will go up or down.
Correlations does not mean that one variable is causing another one to change.

To learn more about correlations: https://qcc.qlik.com/course/view.php?id=401

Handy print-out: Download a reference document File

Interesting fact, often with large datasets with near-infinite different lines to choose from,
eventually random chance will produce two lines which appear to correlate but in reality are
unrelated. These are known as spurious correlations and pulling from publicly available datasets can
produce humorous (and instructive!) examples:

http://tylervigen.com/spurious-correlations

Data dredging is a useful tool but use it with an understanding of its pitfalls. Metrics that seem to
correlate should be investigated within the context of the business rather than taken at face value.

Analytical tools
Average types

We looked at this during the exploratory phase, but now we understand the data better. Outliers
can have a large impact on the mean in particular—you might consider excluding them. Does the
mean change dramatically if you slice by any particular dimension? Are the mean and median far
apart, indicating a skewed distribution? Start slicing the data in a few ways and see what you can
discover.
Further reading: https://betterexplained.com/articles/how-to-analyze-data-using-the-average/

Linear regression

Linear regression is a common and powerful statistical tool that demonstrates the relationship
between a dependent and an explanatory variable. Most relevant to business scenarios is that it
allows you to forecast the dependent variable based on a future state of the explanatory variable.
Despite being frequently utilized it is more complex than can be captured here, but is a technique
worth investing in learning.

Linear regression in Tableau: https://www.thedataschool.co.uk/emily-dowling/calculate-linear-


regression-line-tableau/

Tutorial using R: https://www.machinelearningplus.com/machine-learning/complete-introduction-


linear-regression-r/

Applied analytics
Applying statistical rigor is highly effective when it can be used, but the ambiguity inherent to daily
work frequently causes conditions to fall short of ideal. Analysts are often forced to find a point
where their work is not perfect, just “good enough.”

Bring it all together

The business is your partner in any analysis. It’s often better to engage them early and often in your
work; they may provide context that will help you figure out what to test next, or prevent you from
puzzling over a strange trend or outlier for days.

Despite how it may appear data is a creative pursuit. Getting a diverse group of stakeholders in the
same room with juicy data to discuss may inspire insights any individual would be incapable of
gleaning alone. Analytical rigor is important, but business is messy and sometimes needs a messy
solution.

One-way and two-way doors

Business decisions can be thought of as one-day or two-way doors. If making the decision is
irreversible, or so taxing that it would be realistically unfeasible, it’s a one-way door. If it’s a decision
you can try but at any point quickly abandon, it’s a two-way door (a two-way door might still be
heavy to open).

Speed matters, and sometimes recommending a strategy you feel reasonably good about is faster
than investing more time into the upfront analysis. Knowing when you hit that point of diminishing
returns as well as the context within the business will give you a gauge of when to keep analysing
versus bringing your findings to the decision-maker.

You might also like