R2. Data Visualisation

The document discusses data visualization and summarization techniques for both categorical and quantitative data, emphasizing the importance of organizing and visually representing data for effective analysis. It explains methods such as frequency distribution, bar and pie charts for categorical data, and histograms for quantitative data, along with the significance of outliers in data interpretation. The document also highlights the differences between charts and distribution graphs in conveying data insights.

Uploaded by

ayushsingh01525

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

R2. Data Visualisation

Uploaded by

ayushsingh01525

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Reading 2: Data Visualisation and Summarisation

Data Visualisation and Summarisation

In a previous article (R1), we covered data classification into two main types: categorical and
quantitative. Categorical data use labels, while quantitative data use numerical values for
quantity. This article explores common techniques for summarizing both types of data in
tables and graphs, frequently seen in annual reports, news articles, and research.
Understanding how to create and interpret these summaries is crucial for effective data
analysis and present it in a simple communicable way.
To gain meaningful insights, the raw data must be organized, categorized, visually
represented, and analysed. This process is known as frequency distribution for categorizing
data, and data visualization for uncovering patterns, trends, and correlations that may not
be obvious in raw data. Here we provide an example to illustrate this process.
Summarise Categorical data:
A data for shirt colour of recent purchase was collected for 60 persons; colour of each shirt
was counted and converted in a tabular format creating a frequency table.

Relative Percent
Frequency
frequency Frequency However, we are often interested in the
Blue 24 0.40 40.00 proportion, or percentage, of items in each
Black 9 0.15 15.00 class. The relative frequency of each class
Red 12 0.20 20.00 can be determined by dividing the
Green 6 0.10 10.00 category frequency by n. Relative
Orange 4 0.07 6.67 frequency multiplied by 100 gives Percent
others 5 0.08 8.33 frequency as shown in the table.
TOTAL 60 1 100

The sum of the frequencies equals to the number of observations, sum of relative
frequencies equals 1, and the sum of percent frequency distribution is always 100.
The two most common plots of a categorical variable are Bar and Pie charts:

Pie chart
Bar Chart
30 24 4
5
20 12 24
9 6
10 6 4 5
12
0 9
Blue Black Red Green Orange others

The bars in the bar chart can be rearranged in horizontal, or vertical and in any order of
frequency A Pie chart shows the distribution of categorical variables as proportions or
percentages Use Bar chart to show frequencies and Pie-chart to show proportions.

Notes prepared by Dr. Debmallya Chatterjee and Ms. Binita Salian.

Reading 2: Data Visualisation and Summarisation

Summarise Quantitative data

Take the example of data collected for number of cars serviced in a week in a particular
garage.
Table 1: Number of cars serviced/week data for 100 weeks

54 49 65 73 57 62 62 60 64 69 52 60 60 68 64 65 54 66 65 68 58 58 68
67 63 58 73 64 66 58 73 68 54 56 52 65 59 64 67 57 63 66 72 75 72 63
60 66 63 59 63 64 60 60 64 57 61 58 72 66 58 61 61 65 62 62 65 56 72
59 62 60 63 66 63 59 55 68 61 63 65 57 60 63 56 59 63 74 62 67 61 74
71 65 64 64 70 62 61 52

For Quantitative data frequency distribution table is created by the values being grouped
into class intervals or ranges of numbers in a 3-step method:
Determining: 1. number of classes. 2. width of each class and 3. class limits.
Number of classes: Classes are formed by specifying ranges that will used to group the data.
The goal is to use enough classes to show the variation in the data, but not so many classes
that some contain only a few data items.
Class Width: Approximate Class Width can be determined by subtracting the smallest vales
by largest and dividing it by number of classes. Usually, the class width is equal
Class limits: Class limits must be chosen such that the data item belongs to one and only
one class. The lower limit identifies the smallest possible value assigned to the class; The
upper limit identifies the largest possible value assigned to the class.
Class midpoint: is the value half way between the upper and lower class limits.
Table 2 below displays the number of classes = 7, class width or interval =4, and each
interval has the unique class limits (non-overlapping).
Table 2: Frequency distribution of grouped data

Class Frequency
49 - 52 4
53 - 56 7
57 - 60 23
61 - 64 31
65 - 68 22
69 - 72 7
73 - 76 6

Remember, no single frequency distribution is best for a data set, goal is to reveal the
natural grouping.

Notes prepared by Dr. Debmallya Chatterjee and Ms. Binita Salian.

Reading 2: Data Visualisation and Summarisation

Like in categorical data, here too one can plot bar/ pie charts, but the Histogram is better
visualiser as shown here, where bars represent frequency of values within each interval.

Histogram
40
Histograms help us understand the patterns,
31 trends, and central tendencies of a dataset. By
30 23 22 using grouped data, we can manage large
20
datasets more efficiently and create informative
10 7 7 6
4 visualizations.
0
49-52 53-56 57-60 61-64 65-68 69-72 73-76

All these Bar, Pie and Histogram charts follow Area of Principle, that is the area of the plot
that shows data are proportional to the amount of data.

Line charts are plotted to display data points connected by lines. They are excellent for
showing trends over time, like the fluctuation of stock prices throughout the year.

Line chart
800 670 675
612
530 560
600 680
Sales in '00

450
540 545
500
400
430

200

0
2010 2012 2014 2016 2018 2020 2022 2024
Year

Charts Vs Distribution:
Charts and Distribution are the two visual representations commonly used in data analysis. Yes,
Charts and distribution are different.

Charts are great for visualizing trends, comparisons, and relationships, while distribution graphs
provide insights into the distribution and characteristics of data values. Each chart type has its own
strengths and is chosen based on the nature of the data and the intended message.

Bar charts are ideal for comparing categorical data, while line charts are useful for
illustrating trends over time. Pie charts help visualize proportions or percentages, and there
are scatter plots which show the relationship between two variables.

Notes prepared by Dr. Debmallya Chatterjee and Ms. Binita Salian.

Reading 2: Data Visualisation and Summarisation

A distribution graph, also known as a histogram or frequency plot, focuses on displaying the
distribution or spread of data - the shape of the graph, whether it's symmetrical, skewed, or
bimodal, they provide insights into the distribution and characteristics of data values.
The x-axis in a chart is the Entity and the y-axis is the Behaviour of the Entity, for a
distribution the x-axis is the Behaviour of the Entity in the y-axis. The chart changes as the
data is sorted, but if the data is arranged in frequency table, there is only one way to plot.
This difference can be seen in the example below for data collected for 30 days

53 53 49
46 53 42
Chart
50 48 56 80
57 43 51
46 55 48 60
44 49 50
41 57 45 40
44 51 48
20
56 53 41
52 43 58 0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

We create a frequency distribution table and plot a histogram

lower upper
- Frequency
limit limit
40 - 44 7
45 - 49 8
50 - 54 9
55 - 59 6

Charts are your go-to for visualizing trends, making comparisons, and highlighting
relationships within your data.
Distribution Graphs, on the other hand, are perfect for uncovering the distribution and
characteristics of data values.

Notes prepared by Dr. Debmallya Chatterjee and Ms. Binita Salian.

Reading 2: Data Visualisation and Summarisation

Outliers
56
Outliers are extreme data points in a dataset,
60 53 52
50 they can affect data summary.
46 46 44 44
50 41 They can be most informative part of your data
40 or can be an error. Such extreme data, like the
30
missing values, need to be validated and
premeditated for how they should be reported
20
7 in the analysis. Finally, the researcher must
10 decide on the retention or exclusion of each
0 outlier.
0 5 10 15

___________________________________________________________________________
References:
Statistics for Business and Economics –by David R. Anderson, Dennis J. Sweeney, Thomas A. Williams.
Statistics for Business Decision Making and Analysis – by Robert A. Stine and Dean Foster
Statistics made simple - by H. T. Hayslett.
Statistics for people who hate Statistics - by Neil J. Salkind

Notes prepared by Dr. Debmallya Chatterjee and Ms. Binita Salian.

The Group Assessment
No ratings yet
The Group Assessment
2 pages
Texas City Refinery Accident Case Study in Breakdown of de 2014 Engineering
No ratings yet
Texas City Refinery Accident Case Study in Breakdown of de 2014 Engineering
13 pages
DSA-Midterm
No ratings yet
DSA-Midterm
29 pages
Week 02 Data Organizatiion and Presentaion
No ratings yet
Week 02 Data Organizatiion and Presentaion
51 pages
Chapter
No ratings yet
Chapter
33 pages
1. Descriptive Statistics (1)
No ratings yet
1. Descriptive Statistics (1)
65 pages
Topic 3
No ratings yet
Topic 3
22 pages
2. presenting of data_١١١٠٥٩
No ratings yet
2. presenting of data_١١١٠٥٩
39 pages
Statistics Report New 1
No ratings yet
Statistics Report New 1
11 pages
Data visualization (3)
No ratings yet
Data visualization (3)
5 pages
Describing Data:: Frequency Tables, Frequency Distributions, and Graphic Presentation
No ratings yet
Describing Data:: Frequency Tables, Frequency Distributions, and Graphic Presentation
35 pages
Lecture 2 Statistics
No ratings yet
Lecture 2 Statistics
38 pages
Unit 4
No ratings yet
Unit 4
41 pages
Basic Statistics
No ratings yet
Basic Statistics
23 pages
Frequency Distrobution & Graphs
No ratings yet
Frequency Distrobution & Graphs
18 pages
Probability Statistics Lecture 2
No ratings yet
Probability Statistics Lecture 2
38 pages
Session 2
No ratings yet
Session 2
38 pages
_ Unit 2 _ Descriptive Analytics
No ratings yet
_ Unit 2 _ Descriptive Analytics
85 pages
Lecture-02 Data Organization and Presentation
No ratings yet
Lecture-02 Data Organization and Presentation
36 pages
BADB1014 Quantitative Methods - Lesson 3
No ratings yet
BADB1014 Quantitative Methods - Lesson 3
23 pages
CH 9 - Part 2
No ratings yet
CH 9 - Part 2
20 pages
Session-4-5-6-Statistics For Data Analytics-Dr - Girish - Bagale - IsAGx5vCqq
No ratings yet
Session-4-5-6-Statistics For Data Analytics-Dr - Girish - Bagale - IsAGx5vCqq
21 pages
Chapter 2
No ratings yet
Chapter 2
52 pages
MATH 101 - Data Management
No ratings yet
MATH 101 - Data Management
44 pages
frequency distribution & Graphs
No ratings yet
frequency distribution & Graphs
39 pages
2- Presenting Data Part
No ratings yet
2- Presenting Data Part
42 pages
Charts and Graphs
No ratings yet
Charts and Graphs
24 pages
Displaying & Organizing Data Statistics
No ratings yet
Displaying & Organizing Data Statistics
22 pages
1 Stats Intro 14022024 105127am
No ratings yet
1 Stats Intro 14022024 105127am
26 pages
Lecture-2 & 3
No ratings yet
Lecture-2 & 3
94 pages
Chapter 2, Part A Descriptive Statistics
No ratings yet
Chapter 2, Part A Descriptive Statistics
5 pages
EMBA Day3
No ratings yet
EMBA Day3
29 pages
Presentation of Data 1
No ratings yet
Presentation of Data 1
33 pages
Lecture 2 Part A Descriptive Statistics Tabular and Graphical Displays
No ratings yet
Lecture 2 Part A Descriptive Statistics Tabular and Graphical Displays
77 pages
Chapter 2 VISUAL PRESENTATION OF DATA
No ratings yet
Chapter 2 VISUAL PRESENTATION OF DATA
14 pages
Frequency Distributio2
No ratings yet
Frequency Distributio2
12 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Module Part 2 Frequency Distribution and Graphs
No ratings yet
Module Part 2 Frequency Distribution and Graphs
38 pages
Describing Data New
No ratings yet
Describing Data New
13 pages
Foundation Notes 2013
No ratings yet
Foundation Notes 2013
25 pages
Chapter 2
No ratings yet
Chapter 2
74 pages
Data Explorations-Frequency Distributions
No ratings yet
Data Explorations-Frequency Distributions
21 pages
Business Statistics: Graphs, Charts, and Tables - Describing Your Data Graphs, Charts, and Tables - Describing Your Data
100% (1)
Business Statistics: Graphs, Charts, and Tables - Describing Your Data Graphs, Charts, and Tables - Describing Your Data
74 pages
Unit 4 Quantitative Analysis and Interpretation
No ratings yet
Unit 4 Quantitative Analysis and Interpretation
10 pages
Chapter 2 Describing Data Using Tables and Graphs
No ratings yet
Chapter 2 Describing Data Using Tables and Graphs
16 pages
2 Organizing and Visualizing Variables
No ratings yet
2 Organizing and Visualizing Variables
36 pages
Data 1
No ratings yet
Data 1
62 pages
"Probability and Statistics (For Engineering) 235 M: Summer Session 2019/2020
No ratings yet
"Probability and Statistics (For Engineering) 235 M: Summer Session 2019/2020
45 pages
CH 2 - Luc
No ratings yet
CH 2 - Luc
45 pages
Graphical Representation of Frequency Distributions
No ratings yet
Graphical Representation of Frequency Distributions
15 pages
Overview: Describing and Interpreting Data: Variable
No ratings yet
Overview: Describing and Interpreting Data: Variable
5 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
14 pages
1st Mid
No ratings yet
1st Mid
19 pages
STA 111 Note
No ratings yet
STA 111 Note
12 pages
Business Statistics: Saroj Kanta Jena
No ratings yet
Business Statistics: Saroj Kanta Jena
41 pages
Statistics- slide 2
No ratings yet
Statistics- slide 2
15 pages
Module 2 - Descriptive Statistics - PPT-3
No ratings yet
Module 2 - Descriptive Statistics - PPT-3
31 pages
Picturing Distributions With Graphs
No ratings yet
Picturing Distributions With Graphs
21 pages
Organizing-Data_250120_180858
No ratings yet
Organizing-Data_250120_180858
32 pages
PB2MAT - 02Bahan-Presenting Data in Tables and Charts For Categorical and Numerical Data Pert 2
No ratings yet
PB2MAT - 02Bahan-Presenting Data in Tables and Charts For Categorical and Numerical Data Pert 2
23 pages
Math 10
No ratings yet
Math 10
7 pages
High-Dimensional Covariance Estimation: With High-Dimensional Data
From Everand
High-Dimensional Covariance Estimation: With High-Dimensional Data
Mohsen Pourahmadi
No ratings yet
Running Miscue Record Paper
No ratings yet
Running Miscue Record Paper
6 pages
SOP For BIM CP
No ratings yet
SOP For BIM CP
5 pages
ASTM E3263-20
No ratings yet
ASTM E3263-20
9 pages
Q4W4
No ratings yet
Q4W4
30 pages
OFR 03 451 Map
No ratings yet
OFR 03 451 Map
1 page
Reading Skills 4
No ratings yet
Reading Skills 4
6 pages
Alexandria Archive Institute, Data Literacy and Community Building in Digital Heritage
No ratings yet
Alexandria Archive Institute, Data Literacy and Community Building in Digital Heritage
11 pages
MBA Syllabus PDF - 2018
No ratings yet
MBA Syllabus PDF - 2018
78 pages
Alonso
No ratings yet
Alonso
7 pages
2024b-TG4-Dale Isaacs CV
No ratings yet
2024b-TG4-Dale Isaacs CV
1 page
IP Modeling Practice
No ratings yet
IP Modeling Practice
6 pages
Ordering Information
No ratings yet
Ordering Information
4 pages
Cilt Desertation For Njeku Simbarashe 2019
100% (1)
Cilt Desertation For Njeku Simbarashe 2019
87 pages
QM Overview
No ratings yet
QM Overview
79 pages
Statistics and Data Science 188 Y1 s1
No ratings yet
Statistics and Data Science 188 Y1 s1
38 pages
ED607625
No ratings yet
ED607625
15 pages
The Co-Occurrence of Neurodevelopmental Problems in Dyslexia
No ratings yet
The Co-Occurrence of Neurodevelopmental Problems in Dyslexia
17 pages
Elastic-Plastic Bending Analysis of Slab
No ratings yet
Elastic-Plastic Bending Analysis of Slab
278 pages
Students Perceptions About The Use of So
No ratings yet
Students Perceptions About The Use of So
13 pages
Speciality Astrology Books
No ratings yet
Speciality Astrology Books
5 pages
Written Report 02
No ratings yet
Written Report 02
16 pages
13-2 Collated 3 (3) - 149-155
No ratings yet
13-2 Collated 3 (3) - 149-155
7 pages
Sustainable Packaging
75% (4)
Sustainable Packaging
56 pages
Econometrics by Example 2nd Edition Damodar Gujarati All Chapters Instant Download
100% (1)
Econometrics by Example 2nd Edition Damodar Gujarati All Chapters Instant Download
51 pages
Manufacturing engineering me
No ratings yet
Manufacturing engineering me
93 pages
Research Methods PPT Maseeh
No ratings yet
Research Methods PPT Maseeh
9 pages
4-Experimental Research Design
No ratings yet
4-Experimental Research Design
20 pages
Google Advanced Data Analytics
No ratings yet
Google Advanced Data Analytics
1 page