0% found this document useful (0 votes)
33 views48 pages

8614, Unit 3 5

Educational statistics

Uploaded by

maahiahmed1648
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views48 pages

8614, Unit 3 5

Educational statistics

Uploaded by

maahiahmed1648
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 48

By

Muhammad Riaz
Assistant Professor
UNIT-3
• Statistical Graphics / Exploratory Data
Analysis
Introduction
• Graphical representation of data is for the purpose of
easier interpretation. Facts and figures as such do not
catch our attention unless they are presented in an
interesting way.
• Graphical representation of data is the most commonly
used interesting modes of presentation. The purpose of
this unit is to make you familiar with this interesting
mode of presentation.
Bar Chart
• Bar charts are one of the most commonly used
graphical representations of data used to visually
display compare values to each other.
• They are easy to create and interpret.
• They are also flexible and have several variations
of standard bar charts including vertical or
horizontal bar charts, component or grouped
charts, and stacker bar charts.
• Data for a bar chart are entered in columns.
• Each numeric data value becomes a bar.
Bar Chart
• The chart is constructed such that lengths of the
different bars are proportional to the size of the
category they represent. X-axis represents the
different categories and has no scale; the y-axis
does have a scale and indicates the units of
measurement, in case of vertical bar charts, and
vice versa in case of horizontal bar charts.
• In the following figure result of first, second and
third term of a student in the subjects of English,
Urdu, Mathematics and Pak-Studies.
Bar Charts

example

7/17/2 6
020
Advantages of bar charts
i) They show data category in a frequency distribution.
ii) They display relative numbers / proportions of
multiple categories.
iii) They summarize a large amount of data in an easily
interpretable manner.
iv) They make trends easier to highlight than tables
do.
v) By bar charts estimates can be made quickly and
accurately.
vi) They are easily accessible to everyone.
Disadvantages of bar charts
i) They often require additional explanation.
ii) They fail to expose key assumptions, causes,
impacts and patterns.
Pictograms
A pictogram is a graphical symbol that conveys its
meaning through its pictorial resemblance to a
physical object. A pictogram may include a symbol
plus graphic elements such as border, back
pattern, or color that is intended to covey specific
information. we can also say that a pictogram is a
kind of graph that uses pictures instead of bars to
represent data under analysis. A pictogram is also
called “pictograph”, or simply “picto”.
A pictogram or pictograph represents the
frequency of data as pictures of symbols. Each
picture or symbols may represent one or more
units of data.
To successfully convey the meaning, a pictogram:
i) Should be self-explanatory.
ii) Should be recognizable by all people.
iii) Must represent a general concept.
iv) Should be clear concise and interesting.
v) Should be identifiable as a set, through uniform
treatment of scale, style and subject.
vi) Should be highly visible.
vii) Should not be dependent upon a border and
should work equally well in positive or negative
form.
viii) Should be attractive when used with their
design, elements and typestyles.
Advantages of pictograms
i) Pictograms can make warnings more eye-
catching.
ii) They can serve as an “instant reminder” of a
hazard or an established message.
iii) They may improve warning comprehension
for those with visual or literacy difficulties.
iv) They have the potential to be interpreted
more accurately and more quickly than words.
v) They can be recognized and recalled far
better than words.
vi) They may be better when undertaking
familiar routine tasks.
Disadvantages of pictograms
There are a number of disadvantages of
relying on pictograms.
i) Very few pictograms are universally
understood.
ii) Even well understood pictograms will not
be interpreted equally by all groups of peoples
and across all cultures, and it takes years for
any pictogram to reach maximum
effectiveness.
iii) They have the potential for interpreting
the opposite or often undesired meaning which
can create additional confusion.
Example
The following table shows the number of laptops sold by a
company for the months January to March. Construct a
pictograph for the table.
January = 25 Laptops
February= 15 Laptops
March= 20 Laptops
Solution: represents 5 laptops
January

February

March
Histogram
A histogram is a type of graph that provides a
visual interpretation of numerical data by
indicating the number of data points.

A histogram looks similar to bar charts. Both are


ways to display data set. The height of the bar
corresponds to the relative frequency of the
amount of data in the class. The higher the bar
is, the greater the frequency of the data will bean
vice versa. The main difference between these
graphs is the level of measurement of the data.
Bar graphs are used for data at nominal level of
measurement. It measures the frequency of
categorical data.
On the other hand histograms are used for data
that is at least ordinal level of measurement.
A bar graph presents actual counts against
categories. The height of the bar indicates the
number of items in that category. A histogram
displays the same categorical variables in bins.
While creating a histogram, you are actually
creating a bar graph that shows how many data
points are there within the range (an interval),
called a bin.
There are no hard and fast rules about how many
bins there should be. But usually there is 5-20
bins. Less than 5 bins will have little meaning and
more than 20 bins, will make data hard to read
and interpret. Ideally 5-7 bins are enough.
Shapes of Histogram
i) Bell-shaped
ii) Bimodal
A bimodal shape, shown below, has two
peaks. This shape may show that the data has
come from two different systems. Often in a
single system, there may be two modes in the
data set.
iv) Skewed left
Some histograms will show a skewed distribution
to the left, as shown below. A distribution skewed
to the left is said to be negatively skewed.
This kind of distribution has a large number of
occurrences in the upper value cells (right side)
and few in the lower value cells (left side).
A skewed distribution can result when data is
gathered from a system with a boundary such as
100.
In other words, all the collected data has values
less than 100.
Skewed left
v) Uniform
A uniform distribution provides little information
about the system. It may describe a distribution
which has several modes (peaks).
If your histogram has this shape, check to see if
several sources of variation have been combined.
If so, analyze them separately.
If multiple sources of variation do not seem to be
the cause of this pattern, different groupings can
be tried to see if a more useful pattern results.
This could be as simple as changing the starting
and ending points of the cells, or changing the
number of cells.
A uniform distribution often means that the
number of classes is too small.
v) Uniform
vi) Random
A random distribution, as shown below, has
no apparent pattern. A random distribution
often means there are too many classes.
Frequency Polygon
The frequency polygon is as graph that
displays data by using lines that connect
points plotted for the frequencies at the
midpoint of the classes.
This graph is useful for understanding the
shape of distribution. They are good
choice for displaying cumulative
frequency distribution.
A frequency polygon is similar to
histogram. The difference is that
histogram tends to be rectangles while a
frequency polygon resembles a line graph.
Cumulative Frequency Polygon or
Ogive
The cumulative frequency is the sum of the
frequencies accumulated up to the upper
boundary of a class in the distribution.
An ogive is drawn on the basis of cumulative
frequency. To construct cumulative
frequency, first we have to form cumulative
frequency table.
The upper limits of the classes are taken on
the x-axis and the cumulative frequencies on
the y-axis and the points are plotted.
There are two methods for of drawing a
cumulative frequency curve or ogive.

i) The less than method


In this method a frequency distribution is
prepared which gives the number of items
that are less than a certain size. It gives a
series which is cumulatively upward.
ii) The greater than method
In this method a frequency distribution is
prepared that gives the number of items that
exceed a certain size and gives a series which
is cumulatively downward.
Example
Marks of 30 students of a class, obtained in a
test out of 70, are given below: 42, 21, 50, 37,
38, 42, 49, 52, 38, 53, 57, 47, 29, 59, 61, 33,
17, 17, 39, 44, 42, 39, 14, 7, 27, 19, 54, 51, 35,
55.
Solution

Classes Frequenc Cumulative Frequency


y Less Than Greater Than

1-10 1 1 29 + 1 = 30

11-20 4 1+4=5 25 + 4= 29

21-30 3 5+3=8 22 + 3 = 25

31-40 7 8 + 7 = 15 15 + 7 = 22

41-50 7 15 + 7 = 22 8 + 7 = 15

51-60 7 22 + 7 = 29 1+7=8
Example
Marks of 33 students of a class, obtained
in a test out of 100, are given below: 78,
64, 55, 65, 52, 67, 69, 77, 79,36, 57, 47,
55, 57, 39, 45, 54, 69, 75, 25, 74, 38, 29,
33, 79, 37, 42, 49, 48, 58, 66, 61, 85.
Solution:
Classes Frequenc Cumulative Frequency
y Less Than Greater Than

21-30 2 2 31 + 2 = 33

31-40 5 2+5=7 26 + 5 = 31

41-50 5 7 + 5 = 12 21+ 5 = 26

51-60 7 12+ 7 = 19 14+ 7 = 21

61-70 7 19 + 7 = 26 7 + 7 = 14

71-80 6 26 + 6 = 32 1+6=7
Scatter Plot
A scatter plot is used to plot data in XY- plane
to show how much one variable or data set is
affected by another. It has points that show
the relationship between two variables or two
sets of data.
These points are sometimes called markers
and position of these points depends on the
values in the columns sets on the XY axis.
Scatter plot gives good visual picture of the
relationship or association between two
variables or data sets, and aids to
interpretation of the correlation coefficient or
regression model.
The relationship between two data sets or
variables is called correlation.
If the markers are close together and make a
straight line in the scatter plot, the two
variables of data sets have high correlation.
If the markers are equally distributed in the
scatter plot, the correlation is low, or zero.
Correlation is positive when the values
increase together, i.e. if one value increases
the other will also increase or if one value
decreases the other will also decrease.
On the other hand, correlation is negative
when one value increases the other decreases,
and vice versa.
Scatter plot provides answers of the
following questions.
i) Are variables X and Y or two data sets
related?
ii) Are variables X and Y or two data sets
linearly related?
iii) Are variables X and Y or two data sets
non-linearly related?
iv) Does the variation Y or one data set
change depending on X or other data set?
v) Are there outliers?
When to Use Scatter Plot?
Following situations provide a rationale to use a scatter
plot.
i) When there is paired numerical data.
ii) When the dependent variable have multiple
values for each value of independent variable.
iii) When the researcher tries to determine
whether the two variables are related, such as:
a) When trying to identify potential root causes of
the problems.
b) To determine objectively whether a particular
cause and effect are related.
c) When determining whether two effects those
appear to be related both occur with the same
cause.
Example of Scatter Plots

 7/17/2
020
Example of Scatter Plots

7/17/2 2
020 0
Box Plot
The box plot is an exploratory graph. It is a
standardized way of displaying the distribution of
data based on the five summary statistics:
minimum, first quartile, median, third quartile,
and maximum.
First and third quartile is called two hinges, first
quartile is the lower hinge and the third quartile
is the upper hinge. Minimum and the maximum
are two whiskers. Minimum is the lower whisker
and the maximum is the upper whisker.
Box plot gives us information about the location
and variation in the data set. Particularly it helps
us in detecting and illustrating location and
variation changes between different groups of
data.
Types of Box Plot
Commonly used types of box plot are single
box plot and multiple box plot.
Single box plot
A single box plot can be drawn for one set of
data with no distinct groups. In such a plot the
width of the box is arbitrary.
Multiple box lot
Multiple box plots can be drawn together to
compare multiple data sets or to compare
groups in a single data set. In such a plot the
width of the box plot can be set proportional to
the number of points in the given group or
The box plot provides answers to the following
questions.
i) Is a factor significant?
ii) Does the location differ between subgroups
or between different data sets?
iii) Does the variation differ between
subgroups or between different data sets?
iv) Are there any outliers?

A box-plot can tell whether a data set is


symmetric (when the median is in the center of
the box), but it can’t tell the shape of the
symmetry the way a histogram can.
Pie Chart
A pie chart displays data in an easy pie-slice
format with varying sizes. The size of a slice
tells how much data exists in one element.
The bigger the slice, the more of that
particular data was gathered and vice versa.
Pie charts are mainly used to show
comparison among various segments of data.
The main purpose of using a pie chart is to
show part-whole relationship. These charts
are used for displaying data that are
classified into nominal or ordinal categories.
Pie Chart
Pie Chart
How to Read a Pie Chart?
It is easy to read and interpret a pie-chart.
Usually, a pie-chart has several bits of data,
and each is pictured on a pie-chart as a pie
slice.
Some data have larger slices than others. So it
is easy to decide which data have maximum
frequency and which have minimum.
When to Use the Pie Chart?
There are some simple criteria that can be
used to determine whether a pie chart is right
choice or not for a given data.
i) Do the parts make up a meaningful whole?
Pie charts should be used only if parts or slices
can define the entire set of data in a way that
makes a meaningful sense to the viewer.
ii) Are the parts mutually exclusive?
If there is overlap between the parts, it is better
to use any other chart.
iii) Do you want to compare the parts to each
other or the parts to the whole?
If the main purpose is to show part-whole
relationship then pie chart is useful but if the
main purpose is to show part-part relationship
then pie chart is useless and wise to use another
chart.
iv) How many parts do you have?
If there are more than five to seven parts it
advisable to use a different chart. Pie charts
with lots of slices of varying size are hard to
read.
Draw Backs of Pie-Charts
There are two features that help us read the
values on a pie chart: the angle a slice covers
(compared to the entire circle) and the area of
slice (compared to the entire circle).
Generally, we are not very good at measuring
angles. We only recognize angles of 90o and
180o with high degree of precision. Other
angles are rather impossible to perceive with a

You might also like