R2. Data Visualisation
R2. Data Visualisation
Relative Percent
Frequency
frequency Frequency However, we are often interested in the
Blue 24 0.40 40.00 proportion, or percentage, of items in each
Black 9 0.15 15.00 class. The relative frequency of each class
Red 12 0.20 20.00 can be determined by dividing the
Green 6 0.10 10.00 category frequency by n. Relative
Orange 4 0.07 6.67 frequency multiplied by 100 gives Percent
others 5 0.08 8.33 frequency as shown in the table.
TOTAL 60 1 100
The sum of the frequencies equals to the number of observations, sum of relative
frequencies equals 1, and the sum of percent frequency distribution is always 100.
The two most common plots of a categorical variable are Bar and Pie charts:
Pie chart
Bar Chart
30 24 4
5
20 12 24
9 6
10 6 4 5
12
0 9
Blue Black Red Green Orange others
The bars in the bar chart can be rearranged in horizontal, or vertical and in any order of
frequency A Pie chart shows the distribution of categorical variables as proportions or
percentages Use Bar chart to show frequencies and Pie-chart to show proportions.
54 49 65 73 57 62 62 60 64 69 52 60 60 68 64 65 54 66 65 68 58 58 68
67 63 58 73 64 66 58 73 68 54 56 52 65 59 64 67 57 63 66 72 75 72 63
60 66 63 59 63 64 60 60 64 57 61 58 72 66 58 61 61 65 62 62 65 56 72
59 62 60 63 66 63 59 55 68 61 63 65 57 60 63 56 59 63 74 62 67 61 74
71 65 64 64 70 62 61 52
For Quantitative data frequency distribution table is created by the values being grouped
into class intervals or ranges of numbers in a 3-step method:
Determining: 1. number of classes. 2. width of each class and 3. class limits.
Number of classes: Classes are formed by specifying ranges that will used to group the data.
The goal is to use enough classes to show the variation in the data, but not so many classes
that some contain only a few data items.
Class Width: Approximate Class Width can be determined by subtracting the smallest vales
by largest and dividing it by number of classes. Usually, the class width is equal
Class limits: Class limits must be chosen such that the data item belongs to one and only
one class. The lower limit identifies the smallest possible value assigned to the class; The
upper limit identifies the largest possible value assigned to the class.
Class midpoint: is the value half way between the upper and lower class limits.
Table 2 below displays the number of classes = 7, class width or interval =4, and each
interval has the unique class limits (non-overlapping).
Table 2: Frequency distribution of grouped data
Class Frequency
49 - 52 4
53 - 56 7
57 - 60 23
61 - 64 31
65 - 68 22
69 - 72 7
73 - 76 6
Remember, no single frequency distribution is best for a data set, goal is to reveal the
natural grouping.
Like in categorical data, here too one can plot bar/ pie charts, but the Histogram is better
visualiser as shown here, where bars represent frequency of values within each interval.
Histogram
40
Histograms help us understand the patterns,
31 trends, and central tendencies of a dataset. By
30 23 22 using grouped data, we can manage large
20
datasets more efficiently and create informative
10 7 7 6
4 visualizations.
0
49-52 53-56 57-60 61-64 65-68 69-72 73-76
All these Bar, Pie and Histogram charts follow Area of Principle, that is the area of the plot
that shows data are proportional to the amount of data.
Line charts are plotted to display data points connected by lines. They are excellent for
showing trends over time, like the fluctuation of stock prices throughout the year.
Line chart
800 670 675
612
530 560
600 680
Sales in '00
450
540 545
500
400
430
200
0
2010 2012 2014 2016 2018 2020 2022 2024
Year
Charts Vs Distribution:
Charts and Distribution are the two visual representations commonly used in data analysis. Yes,
Charts and distribution are different.
Charts are great for visualizing trends, comparisons, and relationships, while distribution graphs
provide insights into the distribution and characteristics of data values. Each chart type has its own
strengths and is chosen based on the nature of the data and the intended message.
Bar charts are ideal for comparing categorical data, while line charts are useful for
illustrating trends over time. Pie charts help visualize proportions or percentages, and there
are scatter plots which show the relationship between two variables.
A distribution graph, also known as a histogram or frequency plot, focuses on displaying the
distribution or spread of data - the shape of the graph, whether it's symmetrical, skewed, or
bimodal, they provide insights into the distribution and characteristics of data values.
The x-axis in a chart is the Entity and the y-axis is the Behaviour of the Entity, for a
distribution the x-axis is the Behaviour of the Entity in the y-axis. The chart changes as the
data is sorted, but if the data is arranged in frequency table, there is only one way to plot.
This difference can be seen in the example below for data collected for 30 days
53 53 49
46 53 42
Chart
50 48 56 80
57 43 51
46 55 48 60
44 49 50
41 57 45 40
44 51 48
20
56 53 41
52 43 58 0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
lower upper
- Frequency
limit limit
40 - 44 7
45 - 49 8
50 - 54 9
55 - 59 6
Charts are your go-to for visualizing trends, making comparisons, and highlighting
relationships within your data.
Distribution Graphs, on the other hand, are perfect for uncovering the distribution and
characteristics of data values.
Outliers
56
Outliers are extreme data points in a dataset,
60 53 52
50 they can affect data summary.
46 46 44 44
50 41 They can be most informative part of your data
40 or can be an error. Such extreme data, like the
30
missing values, need to be validated and
premeditated for how they should be reported
20
7 in the analysis. Finally, the researcher must
10 decide on the retention or exclusion of each
0 outlier.
0 5 10 15
___________________________________________________________________________
References:
Statistics for Business and Economics –by David R. Anderson, Dennis J. Sweeney, Thomas A. Williams.
Statistics for Business Decision Making and Analysis – by Robert A. Stine and Dean Foster
Statistics made simple - by H. T. Hayslett.
Statistics for people who hate Statistics - by Neil J. Salkind