0% found this document useful (0 votes)
2 views

R2. Data Visualisation

The document discusses data visualization and summarization techniques for both categorical and quantitative data, emphasizing the importance of organizing and visually representing data for effective analysis. It explains methods such as frequency distribution, bar and pie charts for categorical data, and histograms for quantitative data, along with the significance of outliers in data interpretation. The document also highlights the differences between charts and distribution graphs in conveying data insights.

Uploaded by

ayushsingh01525
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

R2. Data Visualisation

The document discusses data visualization and summarization techniques for both categorical and quantitative data, emphasizing the importance of organizing and visually representing data for effective analysis. It explains methods such as frequency distribution, bar and pie charts for categorical data, and histograms for quantitative data, along with the significance of outliers in data interpretation. The document also highlights the differences between charts and distribution graphs in conveying data insights.

Uploaded by

ayushsingh01525
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Reading 2: Data Visualisation and Summarisation

Data Visualisation and Summarisation


In a previous article (R1), we covered data classification into two main types: categorical and
quantitative. Categorical data use labels, while quantitative data use numerical values for
quantity. This article explores common techniques for summarizing both types of data in
tables and graphs, frequently seen in annual reports, news articles, and research.
Understanding how to create and interpret these summaries is crucial for effective data
analysis and present it in a simple communicable way.
To gain meaningful insights, the raw data must be organized, categorized, visually
represented, and analysed. This process is known as frequency distribution for categorizing
data, and data visualization for uncovering patterns, trends, and correlations that may not
be obvious in raw data. Here we provide an example to illustrate this process.
Summarise Categorical data:
A data for shirt colour of recent purchase was collected for 60 persons; colour of each shirt
was counted and converted in a tabular format creating a frequency table.

Relative Percent
Frequency
frequency Frequency However, we are often interested in the
Blue 24 0.40 40.00 proportion, or percentage, of items in each
Black 9 0.15 15.00 class. The relative frequency of each class
Red 12 0.20 20.00 can be determined by dividing the
Green 6 0.10 10.00 category frequency by n. Relative
Orange 4 0.07 6.67 frequency multiplied by 100 gives Percent
others 5 0.08 8.33 frequency as shown in the table.
TOTAL 60 1 100

The sum of the frequencies equals to the number of observations, sum of relative
frequencies equals 1, and the sum of percent frequency distribution is always 100.
The two most common plots of a categorical variable are Bar and Pie charts:

Pie chart
Bar Chart
30 24 4
5
20 12 24
9 6
10 6 4 5
12
0 9
Blue Black Red Green Orange others

The bars in the bar chart can be rearranged in horizontal, or vertical and in any order of
frequency A Pie chart shows the distribution of categorical variables as proportions or
percentages Use Bar chart to show frequencies and Pie-chart to show proportions.

Notes prepared by Dr. Debmallya Chatterjee and Ms. Binita Salian.


Reading 2: Data Visualisation and Summarisation

Summarise Quantitative data


Take the example of data collected for number of cars serviced in a week in a particular
garage.
Table 1: Number of cars serviced/week data for 100 weeks

54 49 65 73 57 62 62 60 64 69 52 60 60 68 64 65 54 66 65 68 58 58 68
67 63 58 73 64 66 58 73 68 54 56 52 65 59 64 67 57 63 66 72 75 72 63
60 66 63 59 63 64 60 60 64 57 61 58 72 66 58 61 61 65 62 62 65 56 72
59 62 60 63 66 63 59 55 68 61 63 65 57 60 63 56 59 63 74 62 67 61 74
71 65 64 64 70 62 61 52

For Quantitative data frequency distribution table is created by the values being grouped
into class intervals or ranges of numbers in a 3-step method:
Determining: 1. number of classes. 2. width of each class and 3. class limits.
Number of classes: Classes are formed by specifying ranges that will used to group the data.
The goal is to use enough classes to show the variation in the data, but not so many classes
that some contain only a few data items.
Class Width: Approximate Class Width can be determined by subtracting the smallest vales
by largest and dividing it by number of classes. Usually, the class width is equal
Class limits: Class limits must be chosen such that the data item belongs to one and only
one class. The lower limit identifies the smallest possible value assigned to the class; The
upper limit identifies the largest possible value assigned to the class.
Class midpoint: is the value half way between the upper and lower class limits.
Table 2 below displays the number of classes = 7, class width or interval =4, and each
interval has the unique class limits (non-overlapping).
Table 2: Frequency distribution of grouped data

Class Frequency
49 - 52 4
53 - 56 7
57 - 60 23
61 - 64 31
65 - 68 22
69 - 72 7
73 - 76 6

Remember, no single frequency distribution is best for a data set, goal is to reveal the
natural grouping.

Notes prepared by Dr. Debmallya Chatterjee and Ms. Binita Salian.


Reading 2: Data Visualisation and Summarisation

Like in categorical data, here too one can plot bar/ pie charts, but the Histogram is better
visualiser as shown here, where bars represent frequency of values within each interval.

Histogram
40
Histograms help us understand the patterns,
31 trends, and central tendencies of a dataset. By
30 23 22 using grouped data, we can manage large
20
datasets more efficiently and create informative
10 7 7 6
4 visualizations.
0
49-52 53-56 57-60 61-64 65-68 69-72 73-76

All these Bar, Pie and Histogram charts follow Area of Principle, that is the area of the plot
that shows data are proportional to the amount of data.

Line charts are plotted to display data points connected by lines. They are excellent for
showing trends over time, like the fluctuation of stock prices throughout the year.

Line chart
800 670 675
612
530 560
600 680
Sales in '00

450
540 545
500
400
430

200

0
2010 2012 2014 2016 2018 2020 2022 2024
Year

Charts Vs Distribution:
Charts and Distribution are the two visual representations commonly used in data analysis. Yes,
Charts and distribution are different.

Charts are great for visualizing trends, comparisons, and relationships, while distribution graphs
provide insights into the distribution and characteristics of data values. Each chart type has its own
strengths and is chosen based on the nature of the data and the intended message.

Bar charts are ideal for comparing categorical data, while line charts are useful for
illustrating trends over time. Pie charts help visualize proportions or percentages, and there
are scatter plots which show the relationship between two variables.

Notes prepared by Dr. Debmallya Chatterjee and Ms. Binita Salian.


Reading 2: Data Visualisation and Summarisation

A distribution graph, also known as a histogram or frequency plot, focuses on displaying the
distribution or spread of data - the shape of the graph, whether it's symmetrical, skewed, or
bimodal, they provide insights into the distribution and characteristics of data values.
The x-axis in a chart is the Entity and the y-axis is the Behaviour of the Entity, for a
distribution the x-axis is the Behaviour of the Entity in the y-axis. The chart changes as the
data is sorted, but if the data is arranged in frequency table, there is only one way to plot.
This difference can be seen in the example below for data collected for 30 days

53 53 49
46 53 42
Chart
50 48 56 80
57 43 51
46 55 48 60
44 49 50
41 57 45 40
44 51 48
20
56 53 41
52 43 58 0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

We create a frequency distribution table and plot a histogram

lower upper
- Frequency
limit limit
40 - 44 7
45 - 49 8
50 - 54 9
55 - 59 6

Charts are your go-to for visualizing trends, making comparisons, and highlighting
relationships within your data.
Distribution Graphs, on the other hand, are perfect for uncovering the distribution and
characteristics of data values.

Notes prepared by Dr. Debmallya Chatterjee and Ms. Binita Salian.


Reading 2: Data Visualisation and Summarisation

Outliers
56
Outliers are extreme data points in a dataset,
60 53 52
50 they can affect data summary.
46 46 44 44
50 41 They can be most informative part of your data
40 or can be an error. Such extreme data, like the
30
missing values, need to be validated and
premeditated for how they should be reported
20
7 in the analysis. Finally, the researcher must
10 decide on the retention or exclusion of each
0 outlier.
0 5 10 15

___________________________________________________________________________
References:
Statistics for Business and Economics –by David R. Anderson, Dennis J. Sweeney, Thomas A. Williams.
Statistics for Business Decision Making and Analysis – by Robert A. Stine and Dean Foster
Statistics made simple - by H. T. Hayslett.
Statistics for people who hate Statistics - by Neil J. Salkind

Notes prepared by Dr. Debmallya Chatterjee and Ms. Binita Salian.

You might also like