8614, Unit 3 5
8614, Unit 3 5
Muhammad Riaz
Assistant Professor
UNIT-3
• Statistical Graphics / Exploratory Data
Analysis
Introduction
• Graphical representation of data is for the purpose of
easier interpretation. Facts and figures as such do not
catch our attention unless they are presented in an
interesting way.
• Graphical representation of data is the most commonly
used interesting modes of presentation. The purpose of
this unit is to make you familiar with this interesting
mode of presentation.
Bar Chart
• Bar charts are one of the most commonly used
graphical representations of data used to visually
display compare values to each other.
• They are easy to create and interpret.
• They are also flexible and have several variations
of standard bar charts including vertical or
horizontal bar charts, component or grouped
charts, and stacker bar charts.
• Data for a bar chart are entered in columns.
• Each numeric data value becomes a bar.
Bar Chart
• The chart is constructed such that lengths of the
different bars are proportional to the size of the
category they represent. X-axis represents the
different categories and has no scale; the y-axis
does have a scale and indicates the units of
measurement, in case of vertical bar charts, and
vice versa in case of horizontal bar charts.
• In the following figure result of first, second and
third term of a student in the subjects of English,
Urdu, Mathematics and Pak-Studies.
Bar Charts
example
7/17/2 6
020
Advantages of bar charts
i) They show data category in a frequency distribution.
ii) They display relative numbers / proportions of
multiple categories.
iii) They summarize a large amount of data in an easily
interpretable manner.
iv) They make trends easier to highlight than tables
do.
v) By bar charts estimates can be made quickly and
accurately.
vi) They are easily accessible to everyone.
Disadvantages of bar charts
i) They often require additional explanation.
ii) They fail to expose key assumptions, causes,
impacts and patterns.
Pictograms
A pictogram is a graphical symbol that conveys its
meaning through its pictorial resemblance to a
physical object. A pictogram may include a symbol
plus graphic elements such as border, back
pattern, or color that is intended to covey specific
information. we can also say that a pictogram is a
kind of graph that uses pictures instead of bars to
represent data under analysis. A pictogram is also
called “pictograph”, or simply “picto”.
A pictogram or pictograph represents the
frequency of data as pictures of symbols. Each
picture or symbols may represent one or more
units of data.
To successfully convey the meaning, a pictogram:
i) Should be self-explanatory.
ii) Should be recognizable by all people.
iii) Must represent a general concept.
iv) Should be clear concise and interesting.
v) Should be identifiable as a set, through uniform
treatment of scale, style and subject.
vi) Should be highly visible.
vii) Should not be dependent upon a border and
should work equally well in positive or negative
form.
viii) Should be attractive when used with their
design, elements and typestyles.
Advantages of pictograms
i) Pictograms can make warnings more eye-
catching.
ii) They can serve as an “instant reminder” of a
hazard or an established message.
iii) They may improve warning comprehension
for those with visual or literacy difficulties.
iv) They have the potential to be interpreted
more accurately and more quickly than words.
v) They can be recognized and recalled far
better than words.
vi) They may be better when undertaking
familiar routine tasks.
Disadvantages of pictograms
There are a number of disadvantages of
relying on pictograms.
i) Very few pictograms are universally
understood.
ii) Even well understood pictograms will not
be interpreted equally by all groups of peoples
and across all cultures, and it takes years for
any pictogram to reach maximum
effectiveness.
iii) They have the potential for interpreting
the opposite or often undesired meaning which
can create additional confusion.
Example
The following table shows the number of laptops sold by a
company for the months January to March. Construct a
pictograph for the table.
January = 25 Laptops
February= 15 Laptops
March= 20 Laptops
Solution: represents 5 laptops
January
February
March
Histogram
A histogram is a type of graph that provides a
visual interpretation of numerical data by
indicating the number of data points.
1-10 1 1 29 + 1 = 30
11-20 4 1+4=5 25 + 4= 29
21-30 3 5+3=8 22 + 3 = 25
31-40 7 8 + 7 = 15 15 + 7 = 22
41-50 7 15 + 7 = 22 8 + 7 = 15
51-60 7 22 + 7 = 29 1+7=8
Example
Marks of 33 students of a class, obtained
in a test out of 100, are given below: 78,
64, 55, 65, 52, 67, 69, 77, 79,36, 57, 47,
55, 57, 39, 45, 54, 69, 75, 25, 74, 38, 29,
33, 79, 37, 42, 49, 48, 58, 66, 61, 85.
Solution:
Classes Frequenc Cumulative Frequency
y Less Than Greater Than
21-30 2 2 31 + 2 = 33
31-40 5 2+5=7 26 + 5 = 31
41-50 5 7 + 5 = 12 21+ 5 = 26
61-70 7 19 + 7 = 26 7 + 7 = 14
71-80 6 26 + 6 = 32 1+6=7
Scatter Plot
A scatter plot is used to plot data in XY- plane
to show how much one variable or data set is
affected by another. It has points that show
the relationship between two variables or two
sets of data.
These points are sometimes called markers
and position of these points depends on the
values in the columns sets on the XY axis.
Scatter plot gives good visual picture of the
relationship or association between two
variables or data sets, and aids to
interpretation of the correlation coefficient or
regression model.
The relationship between two data sets or
variables is called correlation.
If the markers are close together and make a
straight line in the scatter plot, the two
variables of data sets have high correlation.
If the markers are equally distributed in the
scatter plot, the correlation is low, or zero.
Correlation is positive when the values
increase together, i.e. if one value increases
the other will also increase or if one value
decreases the other will also decrease.
On the other hand, correlation is negative
when one value increases the other decreases,
and vice versa.
Scatter plot provides answers of the
following questions.
i) Are variables X and Y or two data sets
related?
ii) Are variables X and Y or two data sets
linearly related?
iii) Are variables X and Y or two data sets
non-linearly related?
iv) Does the variation Y or one data set
change depending on X or other data set?
v) Are there outliers?
When to Use Scatter Plot?
Following situations provide a rationale to use a scatter
plot.
i) When there is paired numerical data.
ii) When the dependent variable have multiple
values for each value of independent variable.
iii) When the researcher tries to determine
whether the two variables are related, such as:
a) When trying to identify potential root causes of
the problems.
b) To determine objectively whether a particular
cause and effect are related.
c) When determining whether two effects those
appear to be related both occur with the same
cause.
Example of Scatter Plots
7/17/2
020
Example of Scatter Plots
7/17/2 2
020 0
Box Plot
The box plot is an exploratory graph. It is a
standardized way of displaying the distribution of
data based on the five summary statistics:
minimum, first quartile, median, third quartile,
and maximum.
First and third quartile is called two hinges, first
quartile is the lower hinge and the third quartile
is the upper hinge. Minimum and the maximum
are two whiskers. Minimum is the lower whisker
and the maximum is the upper whisker.
Box plot gives us information about the location
and variation in the data set. Particularly it helps
us in detecting and illustrating location and
variation changes between different groups of
data.
Types of Box Plot
Commonly used types of box plot are single
box plot and multiple box plot.
Single box plot
A single box plot can be drawn for one set of
data with no distinct groups. In such a plot the
width of the box is arbitrary.
Multiple box lot
Multiple box plots can be drawn together to
compare multiple data sets or to compare
groups in a single data set. In such a plot the
width of the box plot can be set proportional to
the number of points in the given group or
The box plot provides answers to the following
questions.
i) Is a factor significant?
ii) Does the location differ between subgroups
or between different data sets?
iii) Does the variation differ between
subgroups or between different data sets?
iv) Are there any outliers?