0% found this document useful (0 votes)
36 views16 pages

CH 2

Uploaded by

moneymakeline9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views16 pages

CH 2

Uploaded by

moneymakeline9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Arba Minch University Basic Statistics

2. VISUAL DESCRIPTION OF DATA


❖ Objectives of this chapter
At the end of this chapter, students should be able to:
• Construct a frequency distribution and a histogram.
• Construct a stem-and-leaf plot, dot plot, and scatter diagram to represent data.
• Visually represent data by using graphs and charts.

2.1.Methods of Data Presentation


After collecting and organize data, what do we do with the organized data next? Now you
have to present the data you have collected so that they can be of use. Thus, the collected data
also known as ‘raw data’ are always in an unorganized form and need to be organized and
presented in a meaningful and readily comprehensible form in order to facilitate further
statistical analysis. Raw data: recorded information in its original collected form, whether it
is counts or measurements, is referred to as raw data. Classification is a preliminary and it
prepares the ground for proper presentation of data.

After collecting and organize data the next important task is effective presentation of bulk
volume data. The major objectives of data presentation are:-
➢ To presenting data in visual display and more understandable
➢ To have great attraction about the data
➢ To facilitate quick comparisons using measures of location and dispersion.
➢ To enable the reader to determine the shape and nature of distribution to make
statistical inference.
➢ To facilitate further statistical analysis.

There are three methods of data presentation, namely: Tables (e.g., frequency distribution),
Graphs (e.g., histogram), and Diagrams (e.g., bar chart) are commonly used to summarize
both qualitative and quantitative data.

2.2.Tabular Presentation of Data


Tabulation is the process of summarizing classified or grouped data in the form of a table so
that it is easily understood and an investigator is quickly able to locate the desired
information.
Tables are important to summarize large volume of data in more understandable way. Based
on the characteristics they present tables are:
i. Simple (one way table): table which present one characteristics for example age
distribution.
ii. Two way table: it presents two characteristics in columns and rows for example age
versus sex.
iii. A higher order table: table which presents two or more characteristics in one table.
In statistics usually we use frequency distribution table for different type of data.

1
Arba Minch University Basic Statistics

Frequency Distribution: is the organization of raw data in table form, using classes and
frequencies. Where, Frequency (f) is the number of values in a specific class of the
distribution.

There are three basic types of frequency distributions, and there are specific procedures for
constructing each type. The three types are categorical, ungrouped and grouped frequency
distributions.

2.2.1. Categorical Frequency Distribution (CaFD)


The CaFD is used for data which can be placed in specific categories such as nominal or
ordinal level data. For example, for data such as political affiliation, religious affiliation,
blood type, or major field of study categorical frequency distribution is appropriate.

❖ Steps of constructing CaFD


1. You have to identify that the data is in nominal or ordinal scale of measurement
2. Make a table as show below
A B C D
Class Tally Frequency Percent

3. Put distinct values of a data set in column A


4. Tally the data and place the result in column B
5. Count the tallies and place the results in column C
f
6. Find the percentage of values in each class by using the formula x100% , and place
n
the results in column D. Where, f is frequency, and n is total number of values.

Example 2.1: Twenty-five army inductees were given a blood test to determine their blood
type. The data set is given as follows:
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A

Construct a frequency distribution for the above data.

❖ Solution:
Our data type is nominal so we use categorical frequency distribution type to present data.
After we have followed the above six steps the following frequency distribution was obtained.
A B C D
Class Tally Frequency Percent
A ///// 5 20
B ///// // 7 28

2
Arba Minch University Basic Statistics

O ///// //// 9 36
AB //// 4 16

2.2.2. Ungrouped Frequency Distribution (UFD)


It is a table of all the potential raw values that could possible occurs in the data along with the
number of times each actually occurred. In other words UFD is the distribution that use
individual data values along with their frequencies. It is often constructed for small set of data
on discrete variable (when data are numerical), and when the range of the data is small.
The major components of this type of frequency distributions are class, tally, frequency, and
cumulative frequency.
Cumulative frequencies (CF):- are used to show how many values are accumulated up
to and including a specific class. We have less than and more than cumulative
frequencies.
Less than Cumulative Frequency (LCF):-is the total sum of observations below
specified class including that class
More than Cumulative frequency (MCF):- is the total sum of observations above
specified class including that class.

❖ Steps of Constructing UFD:


1. First find the smallest and largest raw score in the collected data.
2. Arrange the data in order of magnitude and count the frequency.
3. To facilitate counting one may include a column of tallies.
4. Put respective frequency and cumulative frequency (LCF and MCF) along each
ordered data.

Example 2.2: A demographer is interested in the number of children a family may have,
he/she took sample of 30 families and obtained the following observations.
Number of children in a sample of 30 families
4 2 4 3 2 8
3 4 4 2 2 8
5 3 4 5 4 5
4 3 5 2 7 3
3 6 7 3 8 4
Construct a frequency distribution for this data.
❖ Solution:
• Find the range, Range=Max-Min= 8-2=6
• These individual observations can be arranged in ascending or descending order of
magnitude in which case the series is called array. Array of the number of children in 30
families is:
• 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 7, 7, 8, 8, 8
• Frequency distribution of children in a 30 families is as follow:

No of children No of family

3
Arba Minch University Basic Statistics

(Class) (Frequency) LCF MCF


2 5 5 30 = 5+(7+8+4+1+2+3)
3 7 12 = 7+(5) 25 = 7+(8+4+1+2+3)
4 8 20 = 8+(5+7) 18 = 8+(4+1+2+3)
5 4 24 = 4+(5+7+8) 10 = 4+(1+2+3)
6 1 25 = 1+(5+7+8+4) 6 = 1+(2+3)
7 2 27 = 2+(5+7+8+4+1) 5 = 2+(3)
8 3 30 = 3+(5+7+8+4+1+2) 3
Each individual value is presented separately, that is why it is named ungrouped frequency
distribution

2.2.3. Grouped Frequency Distribution (GFD)


It is a frequency distribution when several numbers are grouped in one class; the data must be
grouped in which each class has more than one unit in width. We use this type of frequency
distribution when the range of the data is large, and for data from continuous variable.
Some of basic terms that are most frequently used while we deal with grouped frequency
distribution are:
Class limits (CL): Separates one class in a GFD from another. The limits could actually
appear in the data and have gaps between the upper limits of one class and lower limit of
the next.
Lower Class Limits (LCL): are the smallest numbers that can belong to the different
class.
Upper Class Limits (UCL): are the largest numbers that can belong to the different
classes.
Units of measurement (U): the distance between two possible consecutive measures. It
is usually taken as 1, 0.1, 0.01, 0.001, -----.
Class Boundaries (CB) (true class limits): are the number used to separate classes, but
without the gaps created by class limits.
Lower Class Boundary (LCB): is found by subtracting U/2 from the corresponding LCL
Upper Class Boundary (UCB): is found by adding U/2 to the corresponding UCL
Class Mark/Mid Points (CM): are the midpoints of the classes. Each class midpoint can
be found by adding the LCL/B to the UCL/B and dividing the sum by 2.
Class Width (W) is the difference between two consecutive LCL or two consecutive
LCB.
Cumulative Frequency Distribution (CFD): it is the tabular arrangement of class
interval together with their corresponding cumulative frequencies (CF). It can be more
than (MCF) or less than type (LCF), depending on the type of CF used.
Relative frequency (rf): it is the frequency divided by the total frequency.
Relative cumulative frequency (rcf): it is the CF divided by the total frequency.

❖ Steps in constructing GFD


1. Find the Highest (H) and the Lowest (L) values
2. Find the Range; Range = Maximum − Minimum or R = H − L

4
Arba Minch University Basic Statistics

3. Select the number of classes desired. Here, we have two choices to get the desired
number of classes:
i. Use Struge’s rule. That is, K = 1+ 3.32 log n where K is the number of class and n
is the number of observations. Round the decimal to the upper nearest integer.
ii. Select the number of classes arbitrarily between 5 and 20 conventionally. If you fail
to calculate K by Struge’s rule, this method is more appropriate.
When we choose the number of classes, we have to think about the following criteria:
i. There should be between 5 and 20 classes.
ii. The classes must be mutually exclusive. This means that no data value can fall into
two different classes
iii. The classes must be all inclusive or exhaustive. This means that all data values
must be included.
iv. The classes must be continuous. There are no gaps in a frequency distribution. The
only exception occurs when the class with a zero frequency is the first or last. A
class width with a zero frequency at either end can be omitted without affecting the
distribution.
v. The classes must be equal in width. The exception here is the first or last class. It is
possible to have a "below ..." or "... and above" class. This is often used with Ages.

4. Find the Class Width (W) by dividing the range by the number of classes
R Range
W = or W =
K Number of Classes
Note that: Round the value of W up to the nearest whole number if there is a reminder.
For instance, 4.7 ≈ 5 and 4.12 ≈ 5.
5. Select the Starting Point as the LCL. This is usually the lowest score (observation). Add
the width to that score to get the LCL of the next class. Keep adding until you achieve the
number of desired classes (K) calculated in step 3.
6. Find the UCL; subtract unit of measurement (U) from the LCL of the second class in
order to get the UCL of the first class. Then add the W to each UCL to get all UCL.
Unit of measurement (U): Is the smallest value of difference between consecutive
observations or sometimes it is next value. For instance, 28, 23, 52, and then the unit
of measurement of this data set is one. Because take one datum arbitrarily, say 23,
then the next value will be 24. Therefore, U = 24 − 23 = 1. If the data set is 24.12,
30, 21.2, then give priority to the datum with more decimal place. Take 24.12 and
guess the next possible value. It is 24.13. Therefore, U = 24.12 - 24.13 = 0.01
Note that: U=1 is the maximum value of unit of measurement and is the value when
we don’t have a clue about the data.
U
7. Find the Class Boundaries (CB). Lower Class Boundary = Lower Class Limit − And
2
U
Upper Class Boundary = Upper Class Limit + .
2
8. Find the Class Mark/Mid Points (CM)
9. Tally the data and write the numerical values for tallies in the frequency column.

5
Arba Minch University Basic Statistics

10. Find Cumulative Frequency (LCF and MCF)


11. If necessary, find Relative frequency (rf): it is the frequency (f) divided by the total
frequency (Tf) and find Relative cumulative frequency (rcf): it is the CF divided by the
total frequency (Tf).

Example 2.3: Consider the following set of data and construct the frequency distribution.
11 29 6 33 14 21 18 17 22 38
31 22 27 19 22 23 26 39 34 27
❖ Solution:
Using steps to construct grouped frequency distribution
1. Highest value=39, Lowest value=6
2. R = 39 - 6 = 33
3. K = 1 + 3.32 log20 = 5.32  6 which is a classes desired
R 33
4. W = = = 5.5  6
K 6
5. Select starting point. Take the minimum which is 6 then add width 6 on it to get the
next class LCL.
➢ Lower class limit of the first class: LCL1=6
➢ Lower class limit of the second class: LCl2= LCL1+W= 6+6 =12
➢ LCL3= LCL2+W= 12+6=18
➢ LCL4= LCL3+W= 18+6=24......... continue
6 12 18 24 30 36

6. Upper Class Limit (UCL). Since unit of measurement (U) is one.


➢ Upper class limit of the first class: UCL1= LCL2-U= 12-1=11
➢ UCL2= LCL3-U= 18-1=17 or UCL1+W= 11+6=17
➢ UCL3= LCL4-U= 24-1=23 or UCL2+W= 17+6=23.......... continue
11 17 23 29 35 41

Therefore, 6 − 11 is the first class limit; 12 – 17 is the second class


limit......continue
Classes desired( K) Class limit (CL)
1 6 - 11
2 12 - 17
3 18 - 23
4 24 - 29
5 30 - 35
6 36 - 41
7. Find the Class Boundaries (CB). Take the formula in step 7. LCBi = LCLi - 0.5 and
UCBi = UCLi + 0.5
➢ LCB1=LCL1-U/2= 6-0.5 =5.5
➢ LCB2=LCL2-U/2= 12-0.5 =11.5 or LCB1 + W= 5.5+6= 11.5… continue
➢ UCB1=UCL1+U/2= 11+0.5 =11.5
➢ UCB2=UCL2+U/2= 17+0.5 =17.5 or UCB1+W= 11.5+6= 17.5.… continue

6
Arba Minch University Basic Statistics

8. With respective the CL and CB, the class marks (CM) each class limit are:-
➢ CM1= (LCL1+UCL1)/2 = (6+11)/2 = 8.5 or (LCB1+UCB1)/2 = (5.5+11.5)/2 =
8.5……continue
9. Tally the data and write the numerical values for tallies in the frequency column.
Classes desired( K) Class limit (CL) Frequency (f)
1 6 - 11 2
2 12 - 17 2
3 18 - 23 7
4 24 - 29 4
5 30 - 35 3
6 36 - 41 2
Total frequency 20

10. Find Cumulative Frequency (LCF and MCF)


K Frequency (f) LCF MCF
1 2 2 20= 2+(2+7+4+3+2)
2 2 2+2= 4 18= 2+(7+4+3+2)
3 7 2+2+7= 11 16= 7+(4+3+20)
4 4 2+2+7+4= 15 9= 4+(3+2)
5 3 2+2+7+4+3= 18 5= 3+(2)
6 2 2+2+7+4+3+2= 20 2

11. Find Relative frequency (rf): it is the frequency (f) divided by the total frequency (Tf)
and find Relative cumulative frequency (rcf): it is the CF divided by the total
frequency (Tf).
➢ rf1= f1/Tf= 2/20= 0.1, rf3= f3/Tf= 7/20= 0.35......continue
➢ less than type rcf1= LCF1/Tf= 2/20= 0.1,
➢ less than type rcf2= LCF2/Tf= 4/20= 0.2........continue
➢ more than type rcf1= MCF1/Tf= 20/20= 1,
➢ more than type rcf2= MCF2/Tf= 18/20= 0.9.....continue
K f LCF MCF rf rcf(less than rcf(more than type)
type)
1 2 2 20 0.10 0.10 1.00
2 2 4 18 0.10 0.20 0.90
3 7 11 16 0.35 0.55 0.80
4 4 15 9 0.20 0.75 0.45
5 3 18 5 0.15 0.90 0.25
6 2 20 2 0.10 1.00 0.10

 Therefore: the overall grouped frequency distribution of the given data set is show below
K CL CB CM f LCF MCF rf rcf(less rcf(more
than type) than type)
1 6 - 11 5.5– 11.5 8.5 2 2 20 0.10 0.10 1.00
2 12 - 17 11.5–17.5 14.5 2 4 18 0.10 0.20 0.90
3 18 - 23 17.5– 23.5 20.5 7 11 16 0.35 0.55 0.80
4 24 - 29 23.5– 29.5 26.5 4 15 9 0.20 0.75 0.45

7
Arba Minch University Basic Statistics

5 30 - 35 29.5– 35.5 32.5 3 18 5 0.15 0.90 0.25


6 36 - 41 35.5– 41.5 38.5 2 20 2 0.10 1.00 0.10

Example 2.4: The following data are percentage coverage of forest in countries in Africa.
Construct frequency distribution of CL, CB, CM, and F by using sturge’s rule.
30, 25, 23, 41, 39, 27, 41, 24, 32, 29, 35, 31, 36, 33, 36, 42, 35, 37, 41, and 29
❖ Solution
1. Given no. of observation( n ) = 20,then no. of classes
K = 1 + 3.32 log 1020
 5 , where k is number of classes.
highestvalue − lowestvalue 42 − 23
2. Class width(W) = =  4
k 5

Classes Class boundary Class mark Frequency


23 - 26 22.5 –26.5 24.5 3
27 - 30 26.5 – 30.5 28.5 4
31 - 34 30.5 – 34.5 32.5 3
35 - 38 34.5 – 38.5 36.5 5
39 - 42 38.5 – 42.5 40.5 5
Total 20

2.3.Graphical Presentation of Data


Often we use graphical presentation form for continuous data type; results from the grouped
frequency distribution and continuous variables distributed over time.

2.3.1. Histogram
✓ Histogram is a special type of bar graph in which the horizontal scale represents class
intervals of data values and the vertical scale represents frequencies (f).
✓ The height of the bars correspond to the frequency values, and the drawn adjacent to
each other (without gaps).
✓ We can construct a histogram after we have first completed a frequency distribution
table for a data set.
Example 2.4: The histogram for the data in example 2.3 is (See Figure 2.1)

2.3.2. Frequency Polygon


✓ A frequency polygon uses line segment connected to points located directly above
class midpoint (class mark) values.
✓ A histogram can be easily transformed into a frequency polygon by joining the mid-
points of the rectangles by straight lines.
✓ The heights of the points correspond to the class frequencies, and the line segments
are extended to the left and right so that the graph begins and ends on the horizontal
axis with the same distance that the previous and next midpoint would be located.

Example 2.5: The frequency polygon for the data in example 2.3 is (See Figure 2.2)

8
Arba Minch University Basic Statistics

Frequency

Figure 2.1: Histogram

7.0

Frequency
6.0
polygon

5.0

4.0

3.0

2.0

2.5 8.5 14.5 20.5 26.5 32.5 38.5 44.5


Midpoints

Figure 2.2: Frequency polygon

2.3.3. O- give Graph (Cumulative Frequency Curve)


✓ An o-give (pronounced as “oh-give”) is a line that depicts cumulative frequencies, just
as the cumulative frequency distribution lists cumulative frequencies.
✓ A cumulative frequency distribution enables us to know how many observations are
above or below a certain value.

9
Arba Minch University Basic Statistics

✓ Note that the O-give uses class boundaries along the horizontal scale, and graph begins
with the lower boundary of the first class and ends with the upper boundary of the last
class.
✓ There are two type of O-give namely less than O-give and more than O-give.
✓ If we plot a ‘less than’ Ogive is moving up and to the right while if we plot a ‘more than’
curve then it would show a declining slope and to the right.

2.4.Diagrammatic Presentation of the Data

In last lesson we observed that the technique of tabulation helps us to put unorganized
collected data in an orderly form so that it is easily understood and the needed information is,
quickly located. However, the grouping of data or too many figures in a table do not always
appeal to a common man as too many figures are generally confusing and fail to convey the
definite pattern or trend of the figures. A picture is said to be worth 10,000 words, i.e.,
through pictorial presentation data can be presented in an interesting form. Importance:

➢ They have greater attraction.


➢ They facilitate comparison.
➢ They are easily understandable.
➢ Diagrams are appropriate for presenting discrete data.

The most commonly used diagrammatic presentation for discrete as well as qualitative data
are: Line diagram, Pie charts, Bar charts, and Pictograms

2.4.1. Line diagram


This is the simplest form of diagram. The height of each line indicates the value of an item
that is being measured. The line diagram is drawn taking a suitable scale.
Example 2.5: The following data represent sale by product, 1957- 1959 of a given company
for three products A, B, C.
Production Sales($) in 1957 Sales($) in 1958 Sales($) in 1959
A 12 14 18
B 24 21 18
C 24 35 54
 Draw a line diagram to represent the sales by product from 1957 to 1959
Solution:

60
50
40
Sales($)

30 Sales($) in 1957
20 Sales($) in 1958
10 Sales($) in 1959
0
A B C
Production

10
Arba Minch University Basic Statistics

Figure 2.3: Line diagram of the three products

2.4.2. Pie Chart


Pie chart can used to compare the relation between the whole values and its components. Pie
chart is a circular diagram and the area of the sector of a circle is used in pie chart. To
construct a pie chart (sector diagram), we draw a circle with radius (square root of the total).
The total angle of the circle is 3600 .
The angles and percentages of each component are calculated by the formula:

Component Part Component Part


Angle of Sector = x360 0 ; Percentageof Sector = x100
Total Total
These angles are made in the circle by mean of a protractor to show different components.
Example 2.6: The following table gives the details of monthly budget of a family. Represent
these figures by a suitable diagram.
Item of Expenditure Family Budget
Food $600
Clothing $100
House Rent $400
Fule and Lighting $100
Miscellaneouse $300
Total $1500

Solution: The necessary computations are given below:

600 600
➢ Angle of SectorFood = x 3600 =1440, PercentageofFood = x100 = 40%
1500 1500
Item of Expenditure Family Budget Angle of Sectors %
Food $600 1440 40
Clothing $100 240 6.67
House Rent $400 960 26.67
Fule and Lighting $100 240 6.67
Miscellaneouse $300 720 20
Total $1500 3600 100

Percent

20% Food
40% Clothing
7%
House Rent
Fule and Lighting
26%
Miscellaneouse
7%

11
Arba Minch University Basic Statistics

Figure 2.4: pie chart of monthly budget of a family

2.4.3. Bar Charts


A set of bars (thick lines or narrow rectangles) representing some magnitude over
time space.
They are useful for comparing aggregate over time space.
Bars can be drawn either vertically or horizontally.
There are different types of bar charts. The most common being :
➢ Simple bar chart
➢ Component or sub divided bar chart.
➢ Multiple bar charts.
➢ Deviation or two way bar chart

a. Simple Bar Chart:


✓ It used to represents data involving only one variable classified on spatial, quantitative
or temporal basis.
✓ In simple bar chart, we make bars of equal width but variable length, i.e. the
magnitude of a quantity is represented by the height or length of the bars.
Example 2.7: Draw a simple bar chart to represent the sales by product 1957 and 1958 using
the data in example 2.5.

Sales($) in 1957 Sales($) in 1958


30 40 35
24 24
30
20
21
12 20
Sales($) in 1957 14 Sales($) in 1958
10
10

0 0
A B C A B C

Figure 2.5: Simple bar charts of the three products in 1957 and 1958

b. Subdivided or Component Bar chart:


✓ When there is a desire to show how a total (or aggregate) is divided in to its
component parts, we use component bar chart.
✓ The bars represent total value of a variable with each total broken in to its component
parts and different colours or designs are used for identifications
Example 2.7: Draw a component bar chart to represent the sales by product from 1957 to
1959 using the data in example 2.5
Solution:

12
Arba Minch University Basic Statistics

100
80
60 54 C
24 35
40 B
24 21 18 A
20
12 14 18
0
Sales($) in 1957 Sales($) in 1958 Sales($) in 1959

Figure 2.6: Component bar charts of the three products in 1957, 1958 and 1959

c. Multiple Bars chart:


✓ When two or more interrelated series of data are depicted by a bar diagram, then such
a diagram is known as a multiple-bar diagram.
✓ Suppose we have export and import figures for a few years. We can display by two
bars close to each other, one representing exports while the other representing imports
figure shows such a diagram based on hypothetical data.
✓ Multiple bar chart should be noted that multiple bar diagrams are particularly suitable
where some comparison is involved.
Example 2.8: Draw a component bar chart to represent the sales by product from 1957 to
1959 using the data in example 2.5
Solution:
Sales by product 1957-1959

54
60
50
35
Sales in $

40
24 24 21 18 18
30 A
12 14
20 B
10
0 C
Sales($) in Sales($) in Sales($) in
1957 1958 1959
Year of production

Figure 2.7: Multiple bar charts of the three products in 195, 1958 and 1959

d. Deviation Bar Diagram:


✓ When the data contains both positive and negative values such as data on net profit, net
expense, percent change etc.
✓ It is possible that in one or two years, instead of earning net profit the company might
have sustained net loss. In such a case, the data on net profit will be displayed above the
base line while the data on net loss below it.
Example 2.9: Suppose we have the following data relating to net profit (percent) of
commodity.

13
Arba Minch University Basic Statistics

Commodity Net profit


Soap 80
Sugar -90
Coffee 125
Solution:
Net profit
200
125
100 80

0
Soap Sugar Coffee Net profit
-100
-90
-200
Figure 2.8: Deviation bar diagram of the three commodities

2.4.4. Pictogram
In this diagram, we represent data by means of some picture symbols. We decide about a
suitable picture to represent a definite number of units in which the variable is measured.

Example 2.10: draw a pictogram to represent the following population of a town.


Year 1989 1990 1991 1992
Population 2000 3000 5000 7000

2.5.The Stem-and-Leaf Display and the Dotplot

Stem and Leaf Plots

The stem and leaf plot is a method of organizing data and is a combination of sorting and
graphing. It has the advantage over a grouped frequency distribution of retaining the actual
data while showing them in graphical form. A stem and leaf plot is a data plot that uses part
of the data value as the stem and part of the data value as the leaf to form groups or classes.

Example:
At internet business center, the number of customers served each day for 20 days is shown.
Construct a stem and leaf plot for the data.
25 31 20 32 13
14 43 02 57 23
36 32 33 32 44
32 52 44 51 45
Solution:

Step 1: Arrange the data in order:

02, 13, 14, 20, 23, 25, 31, 32, 32, 32,

14
Arba Minch University Basic Statistics

32, 33, 36, 43, 44, 44, 45, 51, 52, 57

Step 2: Separate the data according to the first digit, as shown.

02 13, 14 20, 23, 25 31, 32, 32, 32, 32, 33, 36 43, 44, 44, 45 51, 52, 57

Step 3: A display can be made by using the leading digit as the stem and the trailing digit as
the leaf. For example, for the value 32, the leading digit, 3, is the stem and the trailing digit, 2,
is the leaf. For the value 14, the 1 is the stem and the 4 is the leaf. Now a plot can be
constructed as shown below

Figure shows that the distribution peaks in the center and that there are no gaps in the data.
For 7 of the 20 days, the number of customer visiting internet center between 31 and 36. The
plot also shows that the center served from a minimum of 2 customers to a maximum of 57
customers in any one day

The Dotplot

A dotplot uses points or dots to represent the data values. If the data values occur more than
once, the corresponding points are plotted above one another. A dotplot is a statistical graph
in which each data value is plotted as a point (dot) above the horizontal axis. Dotplots are
used to show how the data values are distributed and to see if there are any extremely high or
low data values.
Example
The data show the number of named storms each year for the last 40 years. Construct and
analyse a dotplot for the data.
19 15 14 7 6 11 11
9 16 8 8 11 9 8
16 12 13 14 13 12 7
15 15 19 11 4 6 13
10 15 7 12 6 10
28 12 8 7 12 9

Step 1: Find the lowest and highest data values, and decide what scale to use on the
horizontal axis. The lowest data value is 4 and the highest data value is 28, so a scale from 4
to 28 is needed.
Step 2: Draw a horizontal line, and draw the scale on the line.
Step 3: Plot each data value above the line. If the value occurs more than once, plot the other
point above the first point.

15
Arba Minch University Basic Statistics

The graph shows that the majority of the named storms occur with frequency between 6
and 16 per year. There are only 3 years when there were 19 or more named storms per year.

Exercise 2.1:- draw o-give curves (less than o-give and more than o-give curves), suppose we
are given the following data set:

Weekly Earnings (Br) Number of Employees


Below 550 5
550-600 10
600-650 22
650-700 30
700-750 16
750-800 12
800-850 15

Exercise 2.1: Following is the distribution of the size of certain farms selected at random
from a district. Draw Histogram, Frequency polygon and o-give curves (less than o-give and
more than o-give curves).
Size of farms No. of farms
5-15 8
15-25 12
25-35 17
35-45 29
45-55 31
55-65 5
65-75 3

16

You might also like