Fundamentals of Data Visualization
Fundamentals of Data Visualization
Visualization
Data Visualization through Microsoft Power BI
Marcelo Guerra Hahn
MARCELO HAHN
FUNDAMENTALS OF
DATA VISUALIZATION
DATA VISUALIZATION
THROUGH MICROSOFT
POWER BI
2
Fundamentals of Data Visualization: Data Visualization through Microsoft Power BI
© Marcelo Hahn 2024
ISBN 978-87-403-4909-2
3
FUNDAMENTALS OF DATA VISUALIZATION Contents
CONTENTS
About the Author 6
Preface 7
1 Visualization 8
1.1 Introduction 8
1.2 Why Do Data Visualizations Work? 9
1.3 What Are Data Visualizations Used For? 11
1.4 When not to Use Visualizations 11
2 Data Fields 12
2.1 Introduction 12
2.2 Basic Field Types 12
3 Field Transformations 15
3.1 Introduction 15
3.2 Splitting 15
3.3 Concatenation 15
3.4 Simple Math 16
4 Bar Charts 17
4.1 Introduction 17
4.2 Line Charts 23
4.3 Maps 28
6 Correlations 45
6.1 Introduction 45
6.2 Scatterplots 46
6.3 Trend Lines 48
6.4 Measures of Fit 51
4
FUNDAMENTALS OF DATA VISUALIZATION Contents
7 Time Series 54
7.1 Introduction 54
7.2 Formatting Dates 56
7.3 Using Cycle Plots to Depict Seasonality 57
7.4 Forecasting 60
8 Storytelling 62
8.1 Introduction 62
8.2 Aspects of A Good Story 63
8.3 Advantages of Storytelling 64
8.4 Creating A Story 64
Conclusion 68
Table of Figures 69
References 71
5
FUNDAMENTALS OF DATA VISUALIZATION About the Author
6
FUNDAMENTALS OF DATA VISUALIZATION Preface
PREFACE
Data Visualization simplifies analyzing data by presenting it in a format more accessible for
human brains to understand. Since our eyes are more drawn to colors and patterns than
words and numbers, this tool helps communicate information faster and more efficiently
using graphical representations such as charts, tables, and maps.
For example, an organization can choose to represent its sales data with a map on top
of using words and numbers. This visual map, color-coded based on the sales numbers,
would help them understand how these sales increase or decrease or how the trend
moves over a certain period.
Data visualization is a helpful tool for every field in which data plays an important role:
industries ranging from marketing to finance to tech. As a result, the ability to produce
visualizations has become an important, increasingly sought-after skill for coding in today’s
data analysis. Also, this skill has led to the rise of visualization tools, such as PowerBI,
Tableau, and Looker.
This book introduces the basic concepts behind visualizations and provides examples in
PowerBI of each. PowerBI offers step-by-step tutorials on how to create each visualization.
7
FUNDAMENTALS OF DATA VISUALIZATION Visualization
1 VISUALIZATION
1.1 INTRODUCTION
PowerBI is a Microsoft-owned Data Visualization Tool, Microsoft Official website says,
“PowerBI is a business analytics solution that allows you to visualize your data and share
insights across your organization, or embed them in your app or website.” Coming from a
different technology area. I was not aware of the full power of visualizations or how they
even worked, for that matter. As I started researching the subject, I came across Hans
Rosling’s TED Talk: The best stats you’ve ever seen | Hans Rosling
If you have 20 minutes to spare, I recommend watching it. What Doctor Rollins did in this
presentation showed me the real power of visualizations. In short, he had come up with an
insight that, even though factually correct, goes against people’s perceptions. He then used
a very well-crafted visualization to explain said insight to the public. This showed me the
real power of visualizations and the key reason they are becoming increasingly prevalent. By
the end of this book, you should have acquired the tools needed to build the visualization
Hans showed in his speech.
Data visualization uses graphic elements to represent data in a more understandable and
applicable manner. The effective use of visualizations has become an essential tool in this era
of massive amounts of complex data, mainly when it provides a practical way of representing
patterns and trends.
In the past, visualization tools were not as widespread as today; only in the last decades
has it become a common tool that anyone can use. Thus, 20 years ago, Data Analytics was
mainly reserved for statisticians with the specialized skills to collect, analyze, and interpret
data and tap into the creation of visualizations and support companies with massive
supercomputers. Conversely, the works of important figures such as Edward Tufte and John
Tukey paved the way for visualization techniques for more than just these statisticians.
Therefore, along with the advancement of technology came data visualization, facilitating
data exploration and analysis.
8
FUNDAMENTALS OF DATA VISUALIZATION Visualization
For example, look at the following sequence of numbers and identify all instances of the
number 5.
16385024654420
39321856807581
93071210762917
71908091154386
36048914702605
Now look at this new sequence and try to identify all instances of the number 5.
16385024654420
39321856807581
93071210762917
71908091154386
36048914702605
Notice how, as an instinct, the brain groups visual characteristics together much more rapidly
than the shape of a number. We can observe how using color makes interpreting sequences
and identifying patterns easier with less effort.
As a second example, look at the following table containing the World Population and
Annual Growth Rate between 2010 and 2020, and identify if these numbers are increasing
or decreasing.
9
FUNDAMENTALS OF DATA VISUALIZATION Visualization
Table 1.1 World Population and Annual Growth Rate in The Last Decade
Then look at the following graph and identify if the population and rate are increasing or
decreasing. Notice how that data is more easily processed and the rate of increase more
quickly interpreted when the numbers are represented in a graph.
Figure 1.1 World Population and Annual Growth Rate in The Last Decade
10
FUNDAMENTALS OF DATA VISUALIZATION Visualization
Finance: In this field, data visualizations help reveal patterns, anomalies, fluctuations in the
price of assets, etc. Financial visualizations usually emphasize redirecting the insight from
retrospective to forecasting to help make a data-driven decision on, for example, entering
new markets, financing and investing, analyzing assets, capital, and more.
Healthcare: Some public health variables, such as mortality rates for a particular disease
across a specific area, could be represented through a colored map. The use of infographics, in
addition, is a pivotal device for educating the public about public health matters. Moreover,
visualization techniques can help health personnel provide insights into a patient’s care
coordination, tracking and monitoring a patient’s health status, records, etc.
Marketing: Visualizations can help marketing teams and stakeholders tap into sales figures,
market research analytics, campaign statistics, marketing strategies, etc., aiding in their
decision-making process. Marketing teams could, for example, create visualizations as a tool
when crafting content for campaigns and sharing data with consumers.
11
FUNDAMENTALS OF DATA VISUALIZATION Data Fields
2 DATA FIELDS
2.1 INTRODUCTION
Among the critical aspects of data visualization, choosing the best possible visualization for
the data we observe is essential. Naturally, multiple factors play a role in this determination.
The first factor we must tap into is determining the type of data field.
2.2.1 NUMBERS
Mathematically, numbers can be classified in multiple ways. However, we can find two
separations from creating visualizations: whole vs. decimal and discrete vs. continuous.
12
FUNDAMENTALS OF DATA VISUALIZATION Data Fields
the scale. For example, for the number of customers in a restaurant, the value can be any
number from 0 to, in theory, an infinite value. Furthermore, the price of a product can
take any value between 0 and, in theory, an endless number.
2.2.2 STRINGS
Strings are a sequence of characters that can include letters, numbers, and separators, which
can be either visible or invisible and may be repeated. Strings are categorical, meaning they
represent a concept (such as a name, a location, a product, and such). For example, “Apple”
is a string, and “221B Baker St.” is a string. In addition, a string can also be a constant or
a variable. We will look into two specific strings: dates and locations.
2.2.3 DATES
Dates are a particular number and characters (i.e., the separators) representing a specific
moment. They usually include three components: Year, Month, and day, and can be grouped
by year, quarter, month, and day of the week. Additionally, dates have mathematical and
logical operations, for example:
13
FUNDAMENTALS OF DATA VISUALIZATION Data Fields
2.2.4 LOCATIONS
Locations are strings that contain semantics: they represent places in the world. Locations
can be countries, cities, zip codes, etc. Moreover, locations can be global coordinates. Figure
2/1 shows locations in PowerBI.
14
FUNDAMENTALS OF DATA VISUALIZATION Field Transformations
3 FIELD TRANSFORMATIONS
3.1 INTRODUCTION
Now that we have discussed the purpose and basic types of data fields, it’s time to shed
light on the next step – field transformation.
Once the data is stored in appropriate fields, we may need to process it to ensure it is ready
for further data analysis. For example, an organization records daily sales in its system. Those
records may have dates as separate columns representing Day, Month, and Year, while for
analysis purposes, we need this information as only one field. Similarly, the system could
have customer addresses as one field when we need separate fields that include state and
zip code information.
When we process data from the data source, we need not replace the existing data with the
output. We can instead create new fields that are referred to as Calculated Fields. Therefore,
we can transform fields in the following ways: Splitting string data, Concatenating string
fields, performing simple math calculations, Transforming data from one data type to another
3.2 SPLITTING
When we extract data from a source, specific fields may contain string data. For instance,
it can be payroll information about employees working at a retail store. Data fields may
include a single field for employees’ first and last names, but we can separate the first and
last names from this information. For this purpose, we must specify a common separator.
This separator guides the software about dividing or splitting data into multiple fields.
3.3 CONCATENATION
Along with the Split function, PowerBI has another process called Concatenation. The
concatenation function helps us combine two strings with the help of the ‘&’ operator.
When it comes to strings, this operator attaches the second string to the first string.
The PowerBI CONCATENATE function is a DAX function that combines two text strings
into one. Text, integers, Boolean values displayed as text, or a mix of those elements can
be connected. If the column has appropriate values, you may utilize the following formula
to concatenate strings in PowerBI:
CONCATENATE(“Hello”, “World”)
15
FUNDAMENTALS OF DATA VISUALIZATION Field Transformations
Once we specify this formula, the tool combines the mentioned strings and shows the result
in a new field. However, In the above example, a space is deliberately left after “Hello” to
ensure a space between the two arguments while concatenating them. Otherwise, the output
would have been “HelloWorld” if no space had been left.
This formula can be applied to the existing data by creating a new data field and entering
this formula in the Calculation Editor after naming the field as ‘Profit Margin.’ The software
marks this new data field as a Measures data type since it includes numerical data.
Be the One
who Makes the
Breakthrough.
Discovery means many different things at SLB,
but it’s the spirit that unites every single one of us.
It doesn’t matter whether you join our business,
engineering, or technology teams, you’ll push
boundaries and deliver the exceptional. If that
excites you, we want to hear from you.
careers.slb.com/job-listing
© 2023 SLB. All rights reserved.
16
FUNDAMENTALS OF DATA VISUALIZATION Bar Charts
4 BAR CHARTS
4.1 INTRODUCTION
Bar charts are one of the most widely used methods of data visualization. They allow us to
scan information presented as vertical or horizontal bars. We can use bar charts to manage
large data sets by categorizing them based on their numerical values.
Imagine an ice cream parlor that conducts a user survey of 1000 people regarding the likability
of various experimental flavors it offers –nutty vanilla, chocolate smoothie, caramel brownie,
hazelnut cookie fudge, and strawberry swirl fudge. The survey results revealed the following:
17
FUNDAMENTALS OF DATA VISUALIZATION Bar Charts
As seen above, the data in the bar graph is plotted according to the frequency of each
category of ice cream. The length or height of each bar is equivalent to the data it represents.
Thus, the X-axis shows various flavors, and the corresponding number of people who like
a particular flavor is depicted on the Y-axis.
Bar Charts or bar graphs represent grouped data presented in vertical or horizontal bars. The
length of the bars equals the measure of data. A Bar Chart possesses the following characteristics:
We use the following pointers to determine when to use a Bar Chart for data visualization.
A Bar Chart allows you to compare different categories and changes in these categories. So,
to plot profits and losses from other departments in a store, you will have a bar for each
department. This bar will extend to the positive vertical axis to depict a profit or down to
the negative axis to describe a loss.
We may use a Bar Chart to depict changes in data over some time. For example, changes
from year to year or quarter to quarter. Continuing the above example, you can show a
trend over time with the help of bars representing each quarter for the whole store.
However, you should avoid using a Bar Chart when the number of groups you wish to compare
is more than 10. Also, avoid using the Bar Chart if the data you want to visualize is continuous.
18
FUNDAMENTALS OF DATA VISUALIZATION Bar Charts
In the above diagram, the X-axis represents the range of income. The corresponding number
of people falling in each field is depicted on the Y-axis.
19
FUNDAMENTALS OF DATA VISUALIZATION Bar Charts
20
FUNDAMENTALS OF DATA VISUALIZATION Bar Charts
As evident from the Grouped Bar Chart, we can decipher the number of people with
cardiac and orthopedic issues for two age groups: 70-79 years and 80-100 years. The X-axis
represents the age groups of the ailments, and the Y-axis represents the percentage of people
having these ailments.
Let us assume that a departmental store tabulates the data showing sales made by various
salespeople.
21
FUNDAMENTALS OF DATA VISUALIZATION Bar Charts
The X-axis represents salespeople’s names, while the Y-axis represents sales in dollars (in
thousands). You can see the sales value for clothing, kitchens, Shoes, Bedding, and Toys.
22
FUNDAMENTALS OF DATA VISUALIZATION Bar Charts
4.1.4.1 Advantages
Using Bar Charts as a tool of data visualization offers the following benefits. Bar Charts:
4.1.4.2 Disadvantages
The disadvantages of using bar charts are listed below. Bar Charts:
Line graphs typically aid in analyzing the trend, allowing you to gauge whether a quantity
on the Y-axis increases or decreases over time.
The table below shows Norah’s height (in feet) over a time interval of 2 years.
Age 2 4 6 8 10 12
23
FUNDAMENTALS OF DATA VISUALIZATION Bar Charts
This data can be shown with the help of a line graph in the following manner.
Note that the X-axis represents age, and the Y-axis shows a change in height over the defined
time intervals. In a Line Chart, an upward slope indicates that the values have increased
(like in the above example), and a downward slope indicates that the values have decreased.
Let us understand the various components of a Line Chart with the help of the above
example.
24
FUNDAMENTALS OF DATA VISUALIZATION Bar Charts
Title: The title reveals what the graph is about. So, this lets you know what information
is depicted in the Line Chart.
Labels: Both axes have labels, which help you gauge the data shown in the graph. Here,
the X-axis is labeled as age, and the Y-axis is labeled as height.
Scale: The Scale of the Line Chart tells you the number of units used to define each point
on the graph.
Points: The points represent the x and y coordinates. As seen above, the data on the X-axis
shows an independent variable, and the information on the Y-axis is the dependent variable.
Lines: The lines connecting the points estimate the value between each point. We can
conclude that the line is the actual graph, while other parts of the chart are guides that
help you to understand the sequence.
Line Charts can be in the form of a Simple Line Chart or Multiple Line Chart.
25
FUNDAMENTALS OF DATA VISUALIZATION Bar Charts
Let us assume that Robin bought a car in 2015. The table below shows its depreciated
value in the subsequent years.
2015 54,795
2016 49,316
2017 44,384
2018 39,946
2019 35,951
2020 32,356
2021 29,120
Observe that the line graph has a downward slope, which tells you that the car’s value has
decreased.
26
FUNDAMENTALS OF DATA VISUALIZATION Bar Charts
The table below shows the number of students who enrolled in college and opted for Economics
as their major from 2014 to 2021. The states being represented are Ohio and Illinois.
In the Multiple Line Chart above, the students of Ohio are represented by the yellow Line
Chart, and the blue Line Chart represents that of Illinois.
27
FUNDAMENTALS OF DATA VISUALIZATION Bar Charts
4.3 MAPS
Maps are used to analyze and display geographically related data. Maps provide a visual
representation of each region’s distribution or proportion of data. This allows for deciphering
deeper information to make better decisions.
Using Maps for data visualization may offer you the following benefits:
Consider the map below. It demonstrates region-wise profit for a superstore. Notice that the
regions are divided into- Central, East, South, and West. Using maps for data visualization
provides a clear-cut picture of the company’s performance in terms of geography.
28
FUNDAMENTALS OF DATA VISUALIZATION Bar Charts
The map below shows the state-wise sales, which are represented by dots. This map has
been created using PowerBI, and when the cursor is placed over a particular state, the sales
value is displayed, as shown in the figure below.
29
FUNDAMENTALS OF DATA VISUALIZATION Single Variable Statistics
5.1 INTRODUCTION
Single variable or univariate statistics refers to data given as a list of numbers. You have often
made lists when you go grocery shopping. That is an example of single-variable statistics.
However, you may have to use techniques to summarize and display data effectively when
analyzing the data. The following methods available for the single variables can be of great help:
Surpass your
Expectations.
By joining SLB, you’ll be part of the most multicultural
and diverse team of experts in any industry. Working
collaboratively, with agility, and alongside talented
colleagues across the company, you’ll realize
your full potential. The scope of what you’ll learn
is limitless. Apply now and broaden your horizons.
careers.slb.com/job-listing
© 2023 SLB. All rights reserved.
30
FUNDAMENTALS OF DATA VISUALIZATION Single Variable Statistics
In the same sample, the distribution of the variable ‘sex’ indicates that 45 people have a
‘male’ and ‘55’ score of ‘female.’
Frequency tables offer an efficient way to show the distribution of a variable. The following
table shows the test score of 20 students.
Number of Students
Marks Obtained
(Frequency)
8 2
12 1
15 2
19 2
20 3
22 2
23 1
25 3
27 1
29 2
30 1
31
FUNDAMENTALS OF DATA VISUALIZATION Single Variable Statistics
The frequency distribution table above is an ungrouped frequency distribution table. Such
tables are apt for representing smaller data sets. However, greater data clarity is achieved by
grouping it into class intervals for larger data sets. Thus, the above table can be grouped
into class intervals as follows.
Number of Students
Marks Obtained
(Frequency)
5-10 2
10-15 1
15-20 4
20-25 6
25-30 6
30-35 1
Here, it is worth noting that the upper-class interval repeats itself as the lower limit of the
next class interval. So, the values corresponding to the upper-class interval are included in
the next class interval. As we can observe, two students obtained 15 marks. This has been
included in the class interval of 15-20, not 10-15.
5.3 HISTOGRAMS
A histogram represents the same information as a frequency table graphically. A histogram
always groups numbers into ranges, and the height of each bar shows the numbers that
fall into each range.
The X-axis represents the variable, and the Y-axis represents the frequency. For example,
Jonathan has pear trees with varying heights in his orchard. The heights and their corresponding
frequencies are listed in the table below.
32
FUNDAMENTALS OF DATA VISUALIZATION Single Variable Statistics
12-18 4
18-24 6
24-30 7
30-36 8
36-42 10
42-48 13
48-54 5
The X-axis represents the variable, and Y-axis represents the corresponding frequency. Each
vertical bar represents the number of trees within a particular range.
33
FUNDAMENTALS OF DATA VISUALIZATION Single Variable Statistics
A histogram can have six main shapes, depicting different distribution types. These shapes are:
34
FUNDAMENTALS OF DATA VISUALIZATION Single Variable Statistics
35
FUNDAMENTALS OF DATA VISUALIZATION Single Variable Statistics
36
FUNDAMENTALS OF DATA VISUALIZATION Single Variable Statistics
The central tendency of distribution represents its middle. This is a point around which the
scores of distributions tend to cluster.
Here, we would talk of the mean as a measure of central tendency and Standard deviation
as a measure of variability.
5.4.1 MEAN
Data’s Mean (also known as average) can be determined by adding all the numbers in a
dataset and then dividing this sum by the number of values in that set.
The distribution in the two charts below shows that the lower one has lower variability than
the upper one. The scores of the upper distribution are spread across a much greater range,
while those of the lower distribution are relatively closer to the center.
37
FUNDAMENTALS OF DATA VISUALIZATION Single Variable Statistics
Numerically we can also consider the following example where the mean of two sets of
data is the same:
The most widely used measure of variability is the standard deviation. Standard deviation
is a measure that indicates the dispersion of a set of data from its mean. The higher the
variability, the greater the standard deviation.
38
FUNDAMENTALS OF DATA VISUALIZATION Single Variable Statistics
Percentile ranks are typically used to report standardized tests like GRE based on ability or
achievement. So, a student’s total GRE score of 1380 marks will not impart any meaning.
However, if the total score of 1380 is approximately the 90th percentile, this student performed
better than the other 90 % of students who took the GRE.
Suppose the sales goal of your team is 90,000 $. This can be shown as a reference line along
with the sales data of your team.
The reference line allows you to decipher which team members have achieved their sales
targets and those who have not.
39
FUNDAMENTALS OF DATA VISUALIZATION Single Variable Statistics
40
FUNDAMENTALS OF DATA VISUALIZATION Single Variable Statistics
The above graphic shows the usage of the SUM function in Excel. You may use different
types of aggregate functions to make specific calculations.
41
FUNDAMENTALS OF DATA VISUALIZATION Single Variable Statistics
The box plot draws a box from the first to the third quartile. A vertical line passes through
the box at the median (the mid-value). The whiskers extend from each quartile to the
maximum or minimum.
42
FUNDAMENTALS OF DATA VISUALIZATION Single Variable Statistics
Take two car manufacturers, Honda and Toyota. Now we use Box Plot Graph to measure
each car’s mileage per gallon. The data that will be used is given below:
Honda Accord 27
Honda Civic 34
Honda CRV 27
Honda CR-Z 33
Honda Fit 36
Honda Odyssey 22
Honda Pilot 23
Toyota 4Runner 18
Toyota Avalon 24
Toyota Camry 28
Toyota Corolla 31
Toyota Highlander 21
Toyota RAV4 25
Toyota Sienna 21
43
FUNDAMENTALS OF DATA VISUALIZATION Single Variable Statistics
44
FUNDAMENTALS OF DATA VISUALIZATION Correlations
6 CORRELATIONS
6.1 INTRODUCTION
Quite often, in statistics, you might be required to study the relationship between two or
more quantitative variables. You can do so with the help of correlation.
Correlation implies a relationship pattern between the values of two or more variables. For
example, there is a correlation between the sales of hot beverages and coats. As the sales of
hot drinks increase, the sales of coats also increase.
6.1.1 CAUSATION
While talking about correlation, we need to remember another vital concept: causation.
Causation takes a step further from correlation. It states that any change in the value of
one variable will cause a difference in the value of another variable. This implies that one
event is a result of the occurrence of another event. So, you may refer to it as cause and
effect. For example, smoking leads to an increase in the risk of developing lung cancer.
45
FUNDAMENTALS OF DATA VISUALIZATION Correlations
We can identify the distinction between correlation and causation because correlation does
not automatically imply that the change in one variable will lead to a change in the value of
another variable. So, smoking is correlated to alcoholism, but it does not cause alcoholism.
1. Positive correlation – When one variable increases, so does the other. For
example, the number of calories burnt increases with the hours put into
exercise.
2. Negative correlation – When one variable increases, the other decreases. For
example, when the price of commodities increases, their demand decreases.
3. No correlation – The two variables show no statistical relationship. For
example, there exists no correlation between height and GPA.
6.2 SCATTERPLOTS
Using scatterplots, scatter graphs, charts, or diagrams, you can visually express a correlation.
These scatter plots provide a graphical view depicting relationships between two numerical
variables. The correlation is shown by marking a dot for each value.
Observe the scatter diagram shown in section 6.1.1. The closeness of dots towards each
other in a particular direction shows a higher degree of correlation. When the dots are
scattered and show neither similarity nor direction, it indicates a low degree of correlation.
However, it would help if you remembered that scatter diagrams show an approximation of the
relationship or closeness of data. It does not offer a precise measurement of the relationship.
46
FUNDAMENTALS OF DATA VISUALIZATION Correlations
Linda wanted to ascertain if there exists a correlation between the diameter of a tree and
its height. She has the following data:
1. Draw an axis. She represents the diameter on the X-axis and the height on the
Y-axis.
2. Make a dot corresponding to the value of the variable y w.r.t that of the
variable x. Thus, she uses the coordinates 4.1 and 3.1 to draw the first dot.
3. Repeat this process for all the variables; now, she has a scatter plot.
47
FUNDAMENTALS OF DATA VISUALIZATION Correlations
As evident from the above diagram, a positive correlation exists between the diameter and
height.
Consider another example where Steven wants to determine the correlation between age
and the number of pets people own. His scatter plot looks as follows.
It can be seen that there is no correlation between age and the number of pets a person owns.
48
FUNDAMENTALS OF DATA VISUALIZATION Correlations
Below is a scatter plot that depicts a particular store’s sales and profit data.
49
FUNDAMENTALS OF DATA VISUALIZATION Correlations
From this data, you can decipher that the general pattern of the graph is sloping upwards.
Consecutively, a trend line can be drawn to depict this trend, as shown in the diagram below.
Figure 6.6 Scatter Plot in PowerBI Profit vs. Sales with Trendline
A trend line may go through some points but need not go through all. Looking at this trend
line, you can see a positive trend in the data. Furthermore, you can predict that when the
sales go up to $20,000, a profit of $9,000 can be estimated.
Consider another example where a teacher surveyed her students to determine the correlation
between the number of hours they watched television and their test scores. She got the
following scatter plot and the corresponding trend line, which shows a negative trend.
50
FUNDAMENTALS OF DATA VISUALIZATION Correlations
A trend line can only be drawn for positive or negative correlations. It cannot be drawn
for data where no correlation exists.
Let us talk about the R-squared value, denoted by R2, and thus the square of correlation.
It allows you to measure the proportion of variation in the dependent variable (usually
represented on the Y-axis) attributed to the independent variable (traditionally represented
on the X-axis).
Thus, the R-squared value, also known as the regression value, tells you how correlated the
independent and dependent variables are. Suppose the R-squared value is closer to 1. In
that case, it suggests that the independent and dependent variables are closely correlated. If
the R-squared value is more relative to 0, it indicates that the independent and dependent
variables are uncorrelated.
51
FUNDAMENTALS OF DATA VISUALIZATION Correlations
In the above correlations, the R-squared value will allow you to ascertain how correlated
your independent and dependent variables are.
Look at the following table, which gives the values of r and R2 and the associations displayed
by the variables. It also indicates how the variables appear on the trend line of a scatter plot.
Location of variables
Value of r Value of R2 Type of association
on the trend line
52
FUNDAMENTALS OF DATA VISUALIZATION Correlations
Notice that in the scatter plot shown below, the value of R2 is displayed. This value is 0.82, which
indicates a significant positive linear association. So, the points lie close to the linear trend line.
53
FUNDAMENTALS OF DATA VISUALIZATION Time Series
7 TIME SERIES
7.1 INTRODUCTION
Time series means a presentation of data in chronological order. For this purpose, the
statistical data is collected over time, usually at equal intervals (hourly, daily, weekly, monthly,
quarterly, annually, etc.). The examples listed below represent data that is chronological:
Using a time-series graph, you can plot repeated measurements over regular intervals. The
time is displayed on the X-axis, and the dependent variable is on the Y-axis. The data points
are joined, usually with straight lines. Suppose you want to analyze the number of views
on a particular YouTube channel over six months.
Month Views
January 1,80,000
February 90,000
March 1,40,000
April 2,20,000
May 2,52,000
June 2,80,000
54
FUNDAMENTALS OF DATA VISUALIZATION Time Series
This data can be depicted with the help of a time series graph shown below.
Changes in time-series graphs result from various factors: natural, economic, social, natural,
industrial, or political. These factors are known as components of time series. The features
of a time series graph are listed below.
1. Secular trend or long-term trend: This can be seen with the help of peaks and
troughs in the time-series graphs. This depicts the general tendency of data to
increase, decrease or stagnate over some time. For example, time series relating
to the business may show an upward tendency, whereas time series about death
rates may show a downward trend.
2. Seasonal variations: Such variations include changes that take place due to the
rhythmic forces which occur in a regular periodic manner. Seasonal variations
are calculated when you record data in weeks, months, years, etc. For example,
the sale of ice cream increases in the summer season. Also, sales in department
stores are more during the festive seasons than on regular days.
55
FUNDAMENTALS OF DATA VISUALIZATION Time Series
3. Identify the cyclical variations: These refer to the ups and downs recurring over
some time. Cyclical variations are of a longer duration and may not follow
precisely similar patterns after equal intervals of time. For example, cyclical
variations can be seen in a business cycle. These cycles entail intervals of
prosperity, recession, depression, and recovery. The usual period of a business
cycle may range between 5-11 years.
4. Identify random or irregular variations due to unforeseen and unpredictable
circumstances. For example, variations caused due to famines, floods, strikes,
landslides, wars, etc.
56
FUNDAMENTALS OF DATA VISUALIZATION Time Series
You can represent line charts in multiple line charts or single line charts. Consider the
following sales data:
Value
Year Quarter
(in 1000$)
2017 Q1 78
2017 Q2 56
2017 Q3 62
2017 Q4 66
2018 Q1 85
2018 Q2 36
2018 Q3 46
2018 Q4 96
2019 Q1 96
2019 Q2 37
2019 Q3 76
2019 Q4 65
2020 Q1 56
2020 Q2 83
2020 Q3 28
2020 Q4 20
57
FUNDAMENTALS OF DATA VISUALIZATION Time Series
As is evident from the multiple-line chart, the sales peak was observed for quarter 1 in
2019. The value of quarter-one sales dipped in 2020.
58
FUNDAMENTALS OF DATA VISUALIZATION Time Series
However, the limitation here is that you cannot see the general trend for the data. Now,
let us try to plot this data using a single-line chart.
The data plotted on a single line chart shows the presence or absence of trends/cycles. We can
see the trend that the sale value increased gradually between 2017 – 2019 and declined in 2020.
However, in the above single-line chart, it isn’t easy to see the effect of each quarter on
sales. This is where cycle plots can prove to be helpful.
Cycle plots allow you to incorporate both types of data – the quarter-of-the-year effect and
the trend/cycle data.
59
FUNDAMENTALS OF DATA VISUALIZATION Time Series
This cycle plot allows you to view yearly sales and the quarter-of-the-year effect. Here, you
can see the annual sales value for individual quarters. Observe the peak of sales achieved
for quarter 1 in 2019 and quarter 4 in 2018.
7.4 FORECASTING
The time series graphs can be used for making predictions since these graphs depict time-
based data. The process of making predictions is also called forecasting or extrapolation.
Forecasting involves considering historical data to predict future observations. Below are a
few examples that provide a better picture of this concept.
1. We are forecasting the rice yield by the state for each year.
2. We are forecasting the birth rate in each city for each year.
3. Forecasting weather conditions in each city for each month.
4. We are forecasting electricity consumption by each household for each month.
5. Forecasting sales of each product for each day.
6. We are forecasting the number of passengers traveling through a train each day.
While forecasting time series data, the primary aim is to estimate how the sequence of
observations will move into the future.
60
FUNDAMENTALS OF DATA VISUALIZATION Time Series
Automated software like PowerBI allows you to forecast data. Observe that the graph below
predicts the profits for 2021 and 2022 based on the profits earned up to 2020.
61
FUNDAMENTALS OF DATA VISUALIZATION Storytelling
8 STORYTELLING
8.1 INTRODUCTION
Data plays a significant role in business operations and their related decisions. Data
visualization allows you to generate various types of charts and tables. It, thus, will enable
you to present the substance of your matrices visually.
Storytelling with data links data with human communication to create an exciting narrative
supported by facts. Storytelling uses data visualization techniques like charts, tables, and
graphs. Data-driven stories are tailored to the specific audience and the context to which
they cater. This renders cognitive clarity to data, and the audience can better absorb the
message in the data.
Observe the narrative along with the data. It allows better data comprehension by providing
a visual aid to the audience.
62
FUNDAMENTALS OF DATA VISUALIZATION Storytelling
Two common approaches can be used for storytelling: explorative and narrative.
63
FUNDAMENTALS OF DATA VISUALIZATION Storytelling
In the example below, we have created a story using three sheets. Each sheet forms a story
point. The graphic below demonstrates this.
64
FUNDAMENTALS OF DATA VISUALIZATION Storytelling
The first story shows the quarterly sales trend for 2014, 2015, 2016, and 2017. The
accompanying narrative supports the chart in the story.
The following story point demonstrates sales made in consumer, Corporate, and home office
segments. The highest percentage of sales is contributed to the Consumer sector, which is
supported by the narrative at the bottom.
65
FUNDAMENTALS OF DATA VISUALIZATION Storytelling
The third and last story point explains the profits earned in different states.
To summarize, supporting your data visualizations with appropriate stories allows viewers
to understand and engage with data better. This will enable businesses to communicate
effectively with their audience and deliver the desired message.
PowerBI provides a dynamic narrative capability that allows changing the narrative according
to the selected date label without changing the sheet.
66
FUNDAMENTALS OF DATA VISUALIZATION Storytelling
67
FUNDAMENTALS OF DATA VISUALIZATION Conclusion
CONCLUSION
Power BI is a powerful data visualization tool that finds extensive usage across various industries
and sectors. One key application is in business intelligence and analytics. Organizations
use Power BI to transform raw data into visually appealing and interactive reports and
dashboards. This enables decision-makers to gain insights quickly and make data-driven
choices. Whether it’s tracking sales performance, monitoring key performance indicators, or
analyzing market trends, Power BI provides a versatile platform to create custom visualizations
tailored to specific business needs. This not only enhances data comprehension but also
fosters collaboration among teams by sharing real-time, interactive dashboards.
68
FUNDAMENTALS OF DATA VISUALIZATION Table of Figures
TABLE OF FIGURES
Figure 1.1 World Population and Annual Growth Rate in The Last Decade 10
Figure 2.1 Locations 14
Figure 4.1 Ice Cream preferences 17
Figure 4.2 Vertical Bar Chart 19
Figure 4.3 Horizontal Bar Chart 20
Figure 4.4 Grouped Bar Chart 21
Figure 4.5 Stacked Bar Chart 22
Figure 4.6 Line Chart 24
Figure 4.7 Line Chart Components 25
Figure 4.8 Simple Line Chart 26
Figure 4.9 Multiple Line Chart 27
Figure 4.10 Profit Map 28
Figure 4.11 Sales Map 29
Figure 5.1 Range Histogram 33
Figure 5.2 Bell Shaped Histogram 34
Figure 5.3 Bi-Modal Histogram 34
Figure 5.4 Right Skewed Histogram 35
Figure 5.5 – Left Skewed Histogram 35
Figure 5.6 Uniform Histogram 36
Figure 5.7 Random Histogram 36
Figure 5.8 Higher Variability Distribution 37
Figure 5.9 Lower Variability Distribution 38
Figure 5.10 Reference Line in PowerBI 39
Figure 5.11 Aggregate Functions in Excel 40
Figure 5.12 Aggregate Functions in Excel (Result of aggregate formula) 40
Figure 5.13 Aggregate Functions in PowerBI 41
Figure 5.14 Box Plot 42
Figure 5.15 Box Plot in PowerBI 44
Figure 6.1 Looking at correlations 45
Figure 6.2 Potential Correlations 46
Figure 6.3 Scatter Plot in PowerBI with Correlation 47
Figure 6.4 Scatter Plot in PowerBI without Correlation 48
Figure 6.5 Scatter Plot in PowerBI Profit vs. Sales 49
Figure 6.6 Scatter Plot in PowerBI Profit vs. Sales with Trendline 50
Figure 6.7 Scatter Plot in PowerBI with Negative Trend 51
Figure 6.8 R^2 in PowerBI 53
69
FUNDAMENTALS OF DATA VISUALIZATION Table of Figures
70
FUNDAMENTALS OF DATA VISUALIZATION References
REFERENCES
Chapter 1
1. https://learn.microsoft.com/en-us/power-bi/fundamentals/power-bi-overview
2. https://www.ted.com/talks/hans_rosling_the_best_stats_you_ve_ever_
seen?language=en
3. https://www.techtarget.com/searchbusinessanalytics/definition/data-visualization
4. https://www.globaldata.com/data-insights/macroeconomic/world-population-from/
5. https://www.nobledesktop.com/learn/data-visualization/industries-and-professions
Chapter 2
6. https://www.geeksforgeeks.org/difference-between-syntax-and-semantics/
7. https://amplitude.com/blog/data-types
Chapter 3
1. https://www.tibco.com/reference-center/what-is-data-transformation
2. https://learn.microsoft.com/en-us/dax/concatenate-function-dax
3. https://www.techrepublic.com/article/calculate-profit-margin-power-bi-calculated-
column/
Chapter 4
1. https://www.fe.training/free-resources/power-bi-data-visualization/bar-and-
column-charts-in-power-bi/
2. https://www.indeed.com/career-advice/career-development/types-of-bar-graphs
3. https://www.techquintal.com/advantages-and-disadvantages-of-bar-diagram/
4. https://www.investopedia.com/terms/l/linechart.asp
5. https://www.geeksforgeeks.org/power-bi-format-line-chart/
6. https://powerbidocs.com/2020/10/26/small-multiple-line-chart-visual-in-power-bi/
7. https://spreadsheeto.com/power-bi-map/
Chapter 5
1. https://www.hellovaia.com/explanations/math/statistics/single-variable-data/
2. https://statisticsbyjim.com/basics/frequency-table/
3. https://www.statology.org/describe-shape-of-histogram/
71
FUNDAMENTALS OF DATA VISUALIZATION References
4. https://soc.utah.edu/sociology3112/central-tendency-variability.php
5. https://web.mnstate.edu/malonech/Psy%20230/Notes/Percentiles%20GW2.htm
6. https://www.tableau.com/drive/reference-lines-as-visual-statistics
7. https://www.wallstreetmojo.com/power-bi-aggregate/
8. https://www.khanacademy.org/math/statistics-probability/summarizing-
quantitative-data/box-whisker-plots/a/box-plot-review
Chapter 6
1. https://amplitude.com/blog/causation-correlation
2. https://www.simplypsychology.org/correlation.html
3. https://visme.co/blog/scatter-plot/
4. https://zebrabi.com/guide/how-to-add-trendline-in-power-bi-2/
5. https://corporatefinanceinstitute.com/resources/data-science/r-squared/
Chapter 7
1. https://study.com/academy/lesson/time-series-plots-definition-features.html
2. https://www.toppr.com/guides/business-mathematics-and-statistics/time-series-
analysis/components-of-time-series/
3. https://simplexct.com/to-find-seasonality-use-cycle-plots
4. https://thirdspacelearning.com/gcse-maths/statistics/time-series-graph/
Chapter 8
1. https://www.techtarget.com/searchcio/definition/data-storytelling
2. https://www.analyticsvidhya.com/blog/2020/05/art-storytelling-analytics-data-
science/
72