0% found this document useful (0 votes)
312 views57 pages

Graphical Representation of Statistical Data

1) This document discusses different types of graphical representations that can be used to visually display statistical data, including bar diagrams, multiple bar diagrams, and component bar charts. 2) Bar diagrams can be used to represent both categorical and numerical data by using either horizontal or vertical bars of varying lengths. Multiple bar diagrams allow comparison of two or more data points by grouping the corresponding bars together. 3) Component bar charts divide each bar into sections proportional to the component parts, allowing presentation of data cumulation and percentages. Graphical representations make statistical data easier for non-numerical people to understand compared to tabulated figures.

Uploaded by

mishti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
312 views57 pages

Graphical Representation of Statistical Data

1) This document discusses different types of graphical representations that can be used to visually display statistical data, including bar diagrams, multiple bar diagrams, and component bar charts. 2) Bar diagrams can be used to represent both categorical and numerical data by using either horizontal or vertical bars of varying lengths. Multiple bar diagrams allow comparison of two or more data points by grouping the corresponding bars together. 3) Component bar charts divide each bar into sections proportional to the component parts, allowing presentation of data cumulation and percentages. Graphical representations make statistical data easier for non-numerical people to understand compared to tabulated figures.

Uploaded by

mishti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 57

1

Graphical Representation of
Statistical Data
GRAPHICAL REPRESENTATION:
Tabulation is a good method of condensing and representing data in a readily
understandable form, but many people have no taste for figures. They would
prefer a way of representation where figures could be avoided. This purpose is
achieved by the presentation of statistical data in a visual form. The visual display
of statistical data in the form of points, lines, areas and other geometrical forms
and symbols, is the most general terms known as Graphical Representation.
Statistical data can be studied with this method without going through figures,
presented in the form of tables.
Such visual representation can be described in the sections that follow. The basic
difference between a graph and a diagram is that a graph is a representation of
data by a continuous curve, usually shown on a graph paper while a diagram is
any other one, two or three-dimensional form of visual representation

1) Simple Bar Diagrams-


Simple Bar Chart: Simple bar diagrams are made to represent geographical,
historical, numerical and the qualitative data. The vertical or horizontal bars are made
to represent the data when the difference between different quantities is not very
large. The different quantities may be arranged in ascending or descending order but
the time series data (A time series consists of numerical data collected, observed or
recorded at more or less regular intervals of time each hour, day, month, quarter or
year.) are not arranged.

Suggestions for Constructing Bar Chart:


1. The bars should be constructed horizontally when the categorized
observations are the outcomes of a categorical variable. The bars
should be constructed vertically when the categorized observations
are outcomes of a numerical variable.
2. All bars should have the same width so as not to mislead the
reader. Only the length should differ.
3. Spaces between bars should range from one-half the width of a bar
to the width of a bar
4. Scales and guidelines are useful aids in reading a chart and should
be included. The zero point or origin should be indicated.
5. The axes of the chart should be indicated.
6. Any “keys” to interpreting the chart may be included within the
body of the chart or below the body of the chart.
7. Footnotes or source notes, when appropriate, are presented after
the title of the chart or at the bottom edge of the chart’s frame.

1
2

a) Categorical Data:-
1) Example: - Squash World Open Champions from 1976-2001.

Country No. of times


Pakistan 14
Australia 6
United Kingdom 2
Canada 2
Newzeland 1

2) Example: - Worlds Largest Deserts by Area in Km2.Source: -

Microsoft Encarta

Deserts Sq Km
Sahara 9100000
Gobi 1300000
Patagonian 670000
Rub al Khali 650000
Great Sandy 390500
Great Victoria 390500

2
3

b) Numerical Data:-
1) Example: - The Following data indicates Consumption of Raw
Material by Industry in Pakistan during 1999-2004.
Source:-APTMA
period Consumption ('000' Kgs)
1999-00 1,566,348
2000-01 1,673,280
2001-02 1,755,669
2002-03 1,943,197
2003-04 1,938,678

2) Example: - The following data shows Pakistan’s share of World Trade


in Cotton Yarn. Data shown in percent of world trade.
Source: - World Textile Demand
3
4

Year Share (%)


1999 26.1
2000 27.3
2001 26.9
2002 27
2003 23.8

1. The following table gives the birth rate per thousand of different countries over a
certain period of time.
COUNTRY POPULATION RATE IN THOUSANDS
INDIA 33
GERMANY 15
UK 20
CHINA 40
DENMARK 30
SWEDEN 15

DIAGRAM:

2. The following table shows the price list of various brands of cars with 2 Liter
engine.

4
5

CAR VALUE IN RUPEES


TOYOTA 1554000
NISSAN 1734000
HONDA 2500000
MERCEDES 5800000
BMW 3300000
MITSUBISHI 2015000

DIAGRAM:

Data Array: The arrangement of raw data by observations in either ascending or


descending order.

2) Multiple Bar Diagrams:-


A multiple bar chart shows two or more characteristic corresponding to the values
of a common variable in the form of grouped bars, whose lengths are proportional to the
values of the characteristics, and each of which is shaded or coloured differently to aid
identification. This is a good device for the comparison of two or more kinds of
information. For example, imports, exports and productions of a country can be
compared from year to year by grouping the three bars together.

1) Example: - The Following data shows production of yarn by Punjab


and Sindh from 2000-04. Source: - APTMA
Production (000 Kgs)
Period Punjab Sindh
2000-01 1,239,358 365,029
2001-02 1,283,964 391,492
2002-03 1,353,027 417,513

5
6

2003-04 1,411,817 428,411

2) Example: - The Following data shows Marks obtained position


holders boys and girls in Federal Board HSSC-2 examination 2005.
Source: - www.fbise.edu.pk
Discipline Boys Girls
Pre-Engineering 978 1003
Pre-Medical 974 979
Humanities 896 909
Commerce 844 854

1. The table below gives data relating to the exports and imports of a certain country
X (in thousands of dollars) during the four years ending in 1930 - 31.

YEAR EXPORT IMPORT


2000-01 315 249
2001-02 423 234
2002-03 123 300
2003-04 278 250
2004-05 350 190

DIAGRAM:
6
7

2. The table given below shows the values of shares of P.T.C.L.A, OGDCL, & PPL
in KSE on trading days of a week.

SHARES MON TUES WED THURS FRI


P.T.C.L.A 62.58 63.15 65.69 66.70 68.85
OGDCL 115.45 118.90 119.95 123.30 125.50
PPL 208.30 204.15 202.36 205.70 210.95

DIAGRAM:

3) Component Bar Chart:-


A component bar chart is an effective technique in which each bar is divided into
two or more sections, proportional in size to the component parts of a total being
displayed by each bar. The various component parts shown as sections of the bar are
shaded or coloured differently to increase the overall effectiveness of the diagram. The
component bar charts are used to present the cumulation of the various components of
data and the percentages. They are also known as sub-divided bars.
7
8

1) Example: - The following data shows rate of Birth, death and


migrants per 1000 in different regions of world.
Source:-Microsoft Encarta
Rate Per 1000
Region Births Deaths Migrants
World 21.2 8.9 1.5
Africa 35.9 14.2 1.7
Asia 20.7 7.6 2
Europe 10.2 11.4 1.1

2) Example:-The Following data show World war two casualties of some


countries. Source:-Microsoft Encarta
Casualties
Military Military Prisoners of
Country killed wounded War
India 24,250 64,250 79,500
Newzeland 12,250 19,250 8,500
Australia 23,250 39,750 26,250
Canada 37,500 53,250 9,750

1. The following table shows the number of students in various technologies in


certain years.
8
9

YEAR SPINNING WEAVING PROCESSING GARMENTS


2001 40 45 48 55
2002 30 32 35 38
2003 35 40 46 50
2004 50 55 62 65

DIAGRAM:

2. The following table shows the salary packages of university staff.

SALARY TEACHER ASSIT. PROFESSOR DEAN VICE


PROFESSOR CHANCELLOR
BASIC PAY 25 30 40 50 60
MEDICAL 5 5 5 5 5
TRANSPORT 4 4 4 4 4
HOUSE 8 12 14 16 20
RENT
TOTAL 42 51 63 75 89

DIAGRAM:

9
10

4) Pie Chart:-
A pie-diagram, also known as sector diagram, is a graphic device consisting of a
circle divided into sectors or pie-shaped pieces whose areas are proportional to the
various parts into which the whole quantity is divided. The sectors are shaded or
coloured differently to show the relationship of parts to the whole.
Procedure for construction of pie chart: Draw a circle of any convenient
radius. As a circle consisting of 360o, the whole quantity to be displayed is
equated to 360. the proportion that each component part or category bears to the
whole quantity will be the corresponding proportion of 360o. These
corresponding proportions, i.e. angles, are calculated by
component part
Angle = �3600
wholequantity
Then divided the circle into different sectors by constructing angles at the
centre by means of a protector and draw the corresponding radii.

1) Example: - The Data Shows Soccer world cup won by different teams
from 1930-1998. Source:-www.google.com
No. of times Out of
Country won Percentages 3600
Uruguay 2 13 46.8
Italy 3 19 68.4
West Germany 3 19 68.4
Brazil 4 24 86.4
England 1 6 21.6
Argentina 2 13 46.8
France 1 6 21.6

10
11

2) Example:-The give data show Exports of USA in Billion dollars from 1975-2000.
Source:-Microsoft Encarta
Exports(Billion Out of
Year Dollars) Percentage (%) 3600
1975 132.6 4 14.4
1980 271.8 9 32.4
1985 288.8 9 32.4
1990 537.2 17 61.2
1995 794.2 26 93.6
2000 1065.7 35 126

1. The following table shows the yearly expenditure of a Mr. Ted, a college
undergraduate in various categories.

CATeGORIES EXPENDITURE %age Out Of 360


TUITION FEES 6000 20 72
BOOKS AND LAB 2000 6.66 24
CLOTHS CLEANING 2000 6.66 24
ROOM AND 12000 40 144
11
12

BIARDING
TRANSPORTATION 3000 10 36
INSURANCE 1000 3.33 12
SINDRY EXPENCES 4000 13.33 48
TOTAL 30000

DIAGRAM:

2. The pie chart below shows the fractions of dogs in a dog competition in seven
different groups of dog breeds. Suppose 1000 dogs entered the competition in all.

GROUPS NO. OF DOGS %age out of 360


SPORTING GROUP 240 24 86.4
WORKING GROUP 210 21 75.6
HOUND GROUP 160 16 57.6
TERRIER GROUP 160 16 57.6
TOY GROUP 120 12 43.2
NON-SPORTING 50 5 18
GROUP
HERDING GROUP 60 6 21.6
TOTAL 1000

DIAGRAM:
12
13

5) Frequency Polygons:-

1) Example: -The Following data shows marks obtained by students of a


class in a quiz.
Marks No. of Students
10 5
11 7
12 11
13 15
14 24
15 16
16 11
17 6
18 3
19 2
20 10

13
14

2) Example: -The following data displays monthly wages of person in


Dollars.
Wages No. of persons
50 12
55 3
60 7
70 8
75 19

1. The following table shows the Frequency distribution for quantity of Glucose in
100 people.

QUANTITY OF GLUCOSE FREQUENCY


64 0
68 1
72 0
76 0
80 2
84 8
88 5

14
15

92 14
96 18
100 11
104 18
108 6
112 8
116 5
120 3
124 1
128 0
132 0
136 0
140 0
144 0

DIAGRAM:

2. The following chart shows the number of lunatics and their frequency.

No. of lunatics Frequency


0 0
25 2
100 6
200 2
300 0
400 3
500 1
550 0

DIAGRAM:

15
16

6) Cumulative Frequency Polygons:-


1) Example: - The data below is a frequency table
Cumulative
Midpoints(x) Frequency(f) Frequency(C.F)
25 18 18
35 25 43
45 44 87
55 88 175
65 91 266
75 97 363

2) Example: -The data shows wickets taken by a cricket player in his debut.
Wickets Cumulative
taken(x) No. of times(f) Frequency(c.f)
1 22 22
2 19 41
3 20 61
4 12 73
5 9 82
6 3 85

16
17

1. The following table shows the frequency distribution for quantity of Glucose in
100 people.

QUANTITY OF FREQUENCY CUMMULATIVE


GLUCOSE FREQUENCY
64 0 0
68 1 1
72 0 1
76 0 1
80 2 3
84 8 11
88 5 16
92 14 30
96 18 48
100 11 59
104 18 77
108 6 83
112 8 91
116 5 96
120 3 99
124 1 100
128 0 100
132 0 100
136 0 100
140 0 100
144 0 100

DIAGRAM:

17
18

2. The following chart shows the number of lunatics and their frequency

No.of lunatics Frequency CUMMULATIVE


FREQUNCY
0 0 0
25 2 2
100 6 8
200 2 10
300 0 10
400 3 13
500 1 14
550 0 14
DIAGRAM:

18
19

7) Pareto Diagrams:-
Definition: A bar graph used to arrange information in such a way that
priorities for process improvement can be established.

A Pareto diagram is used to determine what characteristic is the major


contributor in a process. The diagram is constructed by ranking the data in
frequency of occurrence and plotting the bars in descending order.

Purposes:

To display the relative importance of data.


To direct efforts to the biggest improvement opportunity by
highlighting the vital few in contrast to the useful many.

Pareto diagrams are named after Vilfredo Pareto, an Italian sociologist and
economist, who invented this method of information presentation toward
the end of the 19th century. The chart is similar to the histogram or bar
chart, except that the bars are arranged in decreasing order from left to
right along the abscissa. The fundamental idea behind the use of Pareto
diagrams for quality improvement is that the first few (as presented on the
diagram) contributing causes to a problem usually account for the majority
of the result. Thus, targeting these "major causes" for elimination results
in the most cost-effective improvement scheme.

How to Construct:

1. Determine the categories and the units for comparison of the data,
such as frequency, cost, or time.
2. Total the raw data in each category, then determine the grand total
by adding the totals of each category.
3. Re-order the categories from largest to smallest.
4. Determine the cumulative percent of each category (i.e., the sum of
each category plus all categories that precede it in the rank order,
divided by the grand total and multiplied by 100).
5. Draw and label the left-hand vertical axis with the unit of
comparison, such as frequency, cost or time.
6. Draw and label the horizontal axis with the categories. List from
left to right in rank order.
7. Draw and label the right-hand vertical axis from 0 to 100 percent.
The 100 percent should line up with the grand total on the left-
hand vertical axis.
8. Beginning with the largest category, draw in bars for each category
representing the total for that category.
9. Draw a line graph beginning at the right-hand corner of the first
bar to represent the cumulative percent for each category as
measured on the right-hand axis.

19
20

10. Analyze the chart. Usually the top 20% of the categories will
comprise roughly 80% of the cumulative total.

Tips:

Create before and after comparisons of Pareto charts to show


impact of improvement efforts.
Construct Pareto charts using different measurement scales,
frequency, cost or time.
Pareto charts are useful displays of data for presentations.
Use objective data to perform Pareto analysis rather than team
members opinions.
If there is no clear distinction between the categories -- if all bars
are roughly the same height or half of the categories are required to
account for 60 percent of the effect -- consider organizing the data
in a different manner and repeating Pareto analysis.
Pareto analysis is most effective when the problem at hand is
defined in terms of shrinking the PV to a customer target. For
example, reducing defects or elimination the non-value added time
in a process.

1) Example: - The Following data shows Price of Polyester Staple Fiber


at Karachi in Rs/Kgs. Source:-APTMA
Cumulative
Months Price Percentage (%) Percentage
May 111.72 17.8 17.8
April 111.55 17.78 35.58
March 106.95 17.05 52.63
January 103.5 16.5 69.13
February 103.5 16.5 85.63
June 90.15 14.37 100

2) Example: - The following data shows areas of continents in million sq.


miles.
Continent Area(million Percentage cumulative

20
21

sq miles) (%) Percentage


Asia 17.1 35 35
Africa 11.7 24 59
North America 9.4 19 78
South America 6.9 14 92
Europe 3.9 8 100

1. The table given below shows the problems with the computer.

PROBLEMS FREQUENCY FRACTIONS


Setup Difficulty 80 0.296296
Not Easy to Use 45 0.166667
Unspecified 25 0.092593
Not Fast Enough 22 0.081481
Too Slow 15 0.055556
Incompatible 10 0.037037
Internet Inoperative 10 0.037037
Too Heavy 9 0.033333
Too loud 6 0.022222
Too Small 6 0.022222
Power Inop 5 0.018519
Bad Color 4 0.014815
Screen Small 3 0.011111
Dim Screen 3 0.011111
Cord too short 3 0.011111
Slow Internet 3 0.011111
Too Fast 3 0.011111
Floppy Slow 3 0.011111
Too Big 2 0.007407
Case smells 2 0.007407
No Printer 2 0.007407
No Books 2 0.007407
Pwr Btn Stiff 1 0.003704
Won't Start 1 0.003704

21
22

Won't Print 1 0.003704


Didn't talk 1 0.003704
No Movies 1 0.003704
No Help 1 0.003704
No Manuals 1 0.003704

DIAGRAM:

2. The following table shows the defect in production in a factory.

TYPE SUBTOTAL % OF TOTAL


HIGH TURN ON SPEED 18 14.754
HIGH RIPPLE CURRENT 38 31.147
22
23

HIGH LEAKAGE 12 9.836


LOW OUTPUT AT LOW SPEED 15 12.295
LOW OUTPUT AT HIGH SPEED 7 5.737
DEAD UNIT 4 3.278
BAD REGULATOR 22 18.032
BAD VOLTAGE SETPOINT 6 4.918

DIAGRAM:

8) Pictographs:-
1) Example: - The following data shows Bikes assembled by a company
on different week days from Monday to Saturday.
Scale: - 1 Bike picture represents 5 bikes

Day Production
Monday 12
Tuesday 23
Wednesday 17
Thursday 27
23
24

Friday 10
Saturday 22

2) Example: - The following data shows Gold medals won by different


nations in Athens Olympics 2004.
Source: -www.olympic.org

Nation Gold won


USA 35
China 32
Russia 27
Australia 17
Japan 16

24
25

9) Fishbone diagram:-
Dr. Kaoru Ishikawa, a Japanese quality control statistician, invented the fishbone
diagram. Therefore, it may be referred to as the Ishikawa diagram. The fishbone diagram
is an analysis tool that provides a systematic way of looking at effects and the causes that
create or contribute to those effects. Because of the function of the fishbone diagram, it
may be referred to as a cause-and-effect diagram. The design of the diagram looks much
like the skeleton of a fish. Therefore, it is often referred to as the fishbone diagram.
Whatever name you choose, remember that the value of the fishbone diagram is to assist
teams in categorizing the many potential causes of problems or issues in an orderly way
and in identifying root causes.
When should a fishbone diagram be used?
Does the team...
 Need to study a problem/issue to determine the root cause?
 Want to study all the possible reasons why a process is beginning to have
difficulties, problems, or breakdowns?
 Need to identify areas for data collection?
 Want to study why a process is not performing properly or producing the desired
results?

1) Example: - This diagram indicates Cause and effect of Heavy School


bags. The effect is back pain.

2) Example: - This diagram indicates some possible reasons of high


reject rate of machine parts.

25
26

EXAMPLE # 1:
Draw a fishbone diagram of doorknob by showing its parts.

GRAPH:

26
27

EXAMPLE # 2:
Draw a fishbone diagram of Biological Warfare Disease.
GRAPH:

27
28

1. The CEO of a call center wants to know the cause that why all calls is not
answered and also wants to improve the ability to handle calls.
DIAGRAM:

2. The following fish bone diagram illustrates that what were the causes that project
deadline was not met
DIAGRAM:

28
29

10) Histogram:-

1) Example: - The following data shows the number of workers in different factories.
No of workers Factories h
70-75 15 5
75-80 9 5
80-85 25 5
85-90 18 5
90-95 27 5

2) Example: - The following data shows marks obtained by 50 students in Physics


exam.

29
30

Marks obtained Number of Students h


0-10 3 10
10-20 6 10
20-30 12 10
30-40 18 10
40-50 11 10

11) Stem and Leaf Diagram:-


A stem-and-leaf display separates data entries into “leading digits” or “stem” and
“trailing digits” or “leaves.” For example, since the annual cost (in $000) in the
private institution data set all have two-digit integer numbers, the tens and units
columns would be the leading digits, and the remaining column (the tenths
column) would be the trailing digit. Thus, an entry of 26.4 (corresponding to
$26,400) has a stem of 26 and a trailing digit or leaf of 4.
The following figure depicts the stem-and-leaf display of the annual cost of
attending the 50 sampled private colleges and universities.

Steps to follow in constructing a Stem and Leaf Display

1. Divide each observation in the data set into two parts, the Stem and the
Leaf.
2. List the stems in order in a column, starting with the smallest stem and
ending with the largest.
3. Proceed through the data set, placing the leaf for each observation in the
appropriate stem row.

Depending on the data, a display can use one, two or five lines per stem. Among
the different stems, two-line stems are widely used.

Advantages of a stem and leaf display over a frequency distribution


(considered in the next section):

1. the original data are preserved.

30
31

2. a stem and leaf display arranges the data in an orderly fashion and makes
it easy to determine certain numerical characteristics to be discussed in
the following chapter.

the classes and numbers falling in them are quickly determined once we have
selected the digits that we want to use for the stems and leaves

1) Example: - The following data shows the sugar levels of 50 patients.

12 13 12 14 11 17 16 11 14 16
2 2 8 0 0 0 5 2 5 7
17 18 19 15 18 16 16 14 17 13
6 9 0 6 8 6 5 5 8 3
11 19 18 17 16 15 14 12 11 11
8 2 5 3 9 4 8 2 3 1
17 12 13 13 10 16 19 12 16 17
0 2 2 7 9 5 7 9 7 8
10 11 18 17 15 17 14 18 14 12
9 8 7 6 1 6 5 9 5 3

Blood Sugar Level


Frequency Stem Leaf (=<100)
2 10 99
6 11 012388
6 12 222389
4 13 2237
6 14 055558
4 15 1446
7 16 5556779
8 17 00366688
4 18 8959
3 19 027
2) Example: - The following data shows
price of different goods in Pakistani
rupees.
10 55 63 36 29 10 20 35 18 10
14 15 20 22 23 60 58 30 85 90
62 75 20 35 40 10 15 17 18 29
31
32

Price of Items in a Utility store


Frequency Stem Leaf (=<10 )
10 1 0000455788
7 2 0002399
4 3 0556
1 4 0
2 5 58
3 6 023
1 7 5
1 8 5
1 9 0

NO#2

A visual display of the five number summary. The box-and-whisker plot


is a simplified boxplot taught to beginners . It does not show outliers.
The whiskers extending all the way to the minimum and maximum
values regardless of how far out they may be.

32
33

A box and whisker graph is used to display a set of data


so that you can easily see where most of the numbers are.

For example, suppose you were to catch and measure


the length of 13 fish in a lake:

A box and whisker plot is based on medians. The first step is to


rewrite the data in order, from smallest length to largest:

Now find the median of all the numbers. Notice that since there are
13 numbers, the middle one will be the seventh number:

This must be the median (middle number) because there are six
numbers on each side.

The next step is to find the lower median. This is the middle of the
lower six numbers. The exact centre is half-way between 8 and 9 ...
which would be 8.5
Now find the upper median. This is the middle of the upper six
numbers. The exact centre is half-way between 14 and 14 ... which
must be 14

33
34

Now you are ready to construct the actual box & whisker graph. First
you will need to draw an ordinary number line that extends far
enough in both directions to include all the numbers in your data:

First, locate the main median 12 using a vertical line just above your
number line:

Now locate the lower median 8.5 and the upper median 14 with
similar vertical lines:

Next, draw a box using the lower and upper median lines as
endpoints:

Finally, the whiskers extend out to the data's smallest number 5 and
largest number 20:

This is a box & whisker plot!

But what does it mean? What information about the data does this
34
35

graph give you?

Well, it's obvious from the graph that the lengths of the fish were as
small as 5 cm, and as long as 20 cm. This gives you the range of the
data ... 15.
You also know the median, or middle value was 12 cm.
Since the medians (three of them) represent the middle points, they
split the data into four equal parts. In other words:
 one quarter of the data numbers are less than 8.5
 one quarter of the data numbers are between 8.5 and
12
 one quarter of the data numbers are between 12 and
14

 one quarter of the data numbers are greater than 14

The shading below, as an example, shows the quarter of the numbers


that are between 12 and 14:

Here is a picture of the quarter of the data that is between 8.5 and 12.
Notice that the data is more spread out here:

This picture is showing where half the data numbers are. Half of all
the fish caught had a length between 8.5 and 14 centimetres:

35
36

Graphical representation of
Textile Related data
1. SIMPLE BAR DIAGRAM
a) Exported man made fiber

Quantity of
fiber
Years Exported
1993-94 25,422
1994-95 61,485
1995-96 28,714
1996-97 48,484
1997-98 34,015
1998-99 34,515
1999-00 22,716
2000-01 28,524
2001-02 45,665
2002-03 66,653
2003-04 54,878

36
37

b) Consumption of cotton

Year Cotton
1990-91 1,128,978
1991-92 1,257,399
1992-93 1,318,892
1993-94 1,511,610
1994-95 1,412,732
1995-96 1,509,955
1996-97 1,444,368
1997-98 1,471,169
1998-99 1,441,923
1999-00 1,566,348
2000-01 1,673,280
2001-02 1,755,669
2002-03 1,943,197
2003-04 1,938,678

consumption of cotton

2,500,000
2,000,000
1,500,000
cotton
1,000,000
500,000
0
1
19 1

19 3

19 5

19 7

20 9

3
-0
-9

-9

-9

-9

-9

-0
00
92

96

98

02
90

94
19

20

37
38

2. MULTIPLE BAR DIAGRAM

a) Production of cloth
province wise

PERIOD PANJAB SIND N.W.F.P BALUCHISTAN


1993-94 223,789 158,693 6,255 0
1994-95 174,293 143,720 3,828 0
1995-96 189,559 137,422 0 0
1996-97 208,107 125,388 0 0
1997-98 212,813 256,258 8 2
1998-99 230,018 154,519 24 0
1999-00 292,536 144,634 20 0
2000-01 309,634 180,530 0 0
2001-02 355,370 256,952 0 2
2002-03 317,981 264,164 0 0
2003-04 358,962 255,963 0 0

38
39

b) Textile related exports


Year %age of textile material exported
yarn cloth others
1971-72 55.6 35.5 9
1979-80 32.87 38.9 28.14
1989-90 33.2 22.32 44.35
1996-97 28.09 25.12 46.77
1997-98 23.7 25.56 50.72
1998-99 20.72 24.45 55
1999-00 20.95 21.44 57.59
2000-01 20.6 19.8 59.59
2001-02 16 20 64
2002-03 13 19 68
2003-04 14 21 65

39
40

3. COMPONENT BAR DIAGRAM

a) Production of yarn

Year %age of yarn produced


Blended Coarse Medium Fine

1997-98 23 45 27 4
1998-99 26 47 24 2
1999-00 25 49 23 2
2000-01 25 50 22 3
2001-02 26 49 21 3
2002-03 25 46 25 4
2003-04 20 50 25 5

40
41

B) Production of cloth

Year %age of cloth produced


grey bleached Dyed & printed blended
1971-72 64 17 19 -
1981-82 61 10 16 13
1991-92 52 6 21 21
1997-98 61 4 19 16
1998-99 51 7 25 17
1999-00 60 3 23 14
2000-01 57 4 25 14
2001-02 56 3 27 14
2002-03 51 5 28 16
2003-04 49 6 30 15

4. FREQUENCY POLYGONES
41
42

a) Export of Wool yarn

Years Quantity
1995-96 455,693
1996-97 212,452
1997-98 512,862
1998-99 421,481
1999-00 512,971
2000-01 512,467
2001-02 544,217
2002-03 519,329
2003-04 458,962

B) Export of cloth
Year Quantity of cloth exported
1971-72 409808
42
43

1979-80 545768
1989-90 1017868
1996-97 1257430
1997-98 1271272
1998-99 1355166
1999-00 1574876
2001-01 1735824
2001-02 1957353
2002-03 2036321
2003-04 2378900

5. CUMMULATIVE FREQUENCY
POLYGONES

43
44

A) Export of cotton yarn


Quantity %age of cotton yarn exported cumulative %
Year Kgs
1991-92 505,863 7.527587673 7.527587673
1992-93 555,294 8.263154785 15.79074246
1993-94 578,648 8.61067829 24.40142075
1994-95 522,091 7.76907142 32.17049217
1995-96 535,889 7.974395104 40.14488727
1996-97 508,188 7.562185264 47.70707254
1997-98 461,919 6.873670876 54.58074341
1998-99 421,481 6.271925758 65.58074341
1999-00 512,971 7.633359578 68.48602875
2000-01 545,134 8.111967032 76.59799578
2001-02 544,217 8.098321444 84.69631722
2002-03 519,329 7.72797097 92.42428819
2003-04 509,097 7.575711806 100

b) Export of cloth

Year Quantity of cloth exported % Quantity Cumulative %


1996-97 1257430 9.26820107 9.26820107
44
45

1997-98 1271272 9.370226979 18.63842805


1998-99 1355166 9.988588606 28.62701666
1999-00 1574876 11.6080159 40.23503256
2001-01 1735824 12.79432323 53.02935579
2001-02 1957353 14.42715791 67.4565137
2002-03 2036321 15.00921123 82.46572493
2003-04 2378900 17.53427509 100

6. PARETO DIAGRAM

A) Export of yarn
Year Quantity of yarn exported % quantity Cumulative %
1996-97 1,411,519 18.89990509 18.89990509
1997-98 1,153,542 15.44565416 34.34555925
45
46

2003-04 1,141,219 15.28065212 49.62621137


2000-01 1,076,600 14.41541901 64.04163038
1999-00 1,071,616 14.34868443 78.39031481
2002-03 926,358 12.40371421 90.79402902
1998-99 345,169 4.621731157 95.41576018
2001-02 342,369 4.58423982 100

Cotton prices month wise in year


2004-05

Month Rs per maund % value Cumulative %


value
Jul 2615 10.06040088 10.06040088
Aug 2273 8.744662024 18.8050629
Mar 2272 8.740814835 27.54587774
46
47

June 2251 8.660023853 36.20590159


Apr 2233 8.590774439 44.79667603
May 2217 8.529219405 53.32589544
Sep 2205 8.48305313 61.80894857
Feb 2164 8.325318355 70.13426692
Jan 2035 7.829030893 77.96329781
Oct 1940 7.463547878 85.42684569
Nov 1923 7.398145655 92.82499135
Dec 1865 7.175008656 100

7.PIE CHART
a) World wide fiber production in year 2004
(Quantity measured in 1000 tones)

Type of fiber Quantity of fiber produced % quantity Quantity out of 360


Man made fiber 34560 56.86 204.72
Cotton 24900 40.97 147.50

47
48

Wool 1215 1.99 7.19


Silk 97 0.16 0.57

B) Production of cloth province wise


in year 1998-99
Province Quantity of % quantity Angle of sectors
cloth produced in degrees
Punjab 230,018 59.81 215.32
Sindh 154,519 40.18 144.65
N.W.F.P 24 0.0062 0.022
Baluchistan 0 0 0

48
49

Multiple Choice Questions


1. Consider the following output from DataDesk when analyzing the pH values of
the 1986 data collected on precipitation events.

2.
Summary stats for 1986
Which
NumNumeric = 55 of the
NumNonNumeric = 0
NumCases = 55
Mean = 6.0673
Median = 6.1000
Std Deviation = 0.47339
Range = 2.4000
Minimum = 4.6000
Maximum = 7
75-th %ile = 6.4000
following is NOT CORRECT?
a. The 25th percentile is about 5.9.
b. Some outliers appear to be present below a pH of 5.4.
c. About 95% of the observations have pH values in the approximate range
6±1.
d. About 10% of the values are in the range 5.8 to 6.0.
e. About 75% of the values are less than 6.4. (d)

3. The following is a histogram showing the actual frequency of the closing prices
on the New York exchange of a particular stock.

Based on the above frequency histogram for New York Stock exchange, the class
that contains the 80th percentile is:
a. 20-30
b. 10-20
49
50

c. 40-50
d. 50-60 (e)
e. 30-40

4. A histogram of the heights of 39 plants is as follows:

The 75th percentile of the height distribution is approximately:


a. 9.4
b. 9.7
c. 7.7
d. 7.5 (b)
e. 10.0

5. The weights of the male and female students in a class are summarized in the
following boxplots:

Which of the following is NOT correct?


a. About 50% of the male students have weights between 150 and 185 lbs.
b. About 25% of female students have weights more than 130 lbs.
c. The median weight of male students is about 162 lbs.
d. The mean weight of female students is about 120 because of symmetry.
e. The male students have less variability than the female students.

(e)

6. Consider the following box plots of the grades in a course in statistics for each sex
drawn according to the convention that the whiskers reach the 10th and 90th
percentiles.

50
51

Which of the following is correct?


a. The mean grade of the female students is about 72.
b. The median grade for the male students is about 68.
c. About 25% of female students get grades above 72.
d. About 10% of male students get grades below 60. (e)
e. About 50% of female students get grades between 62 and 82.

7. Consider the following box-plot of the yield of barley drawn using the convention
that the wiskers reach the 10 and 90th percentiles.

Which of the following is NOT correct?


a. The mean is about 200 g/400 m2.
b. the median is about 180 g/400 m2.
c. About 50% of the yields are between 160 and 220 g/400 m2.
d. About 25% of the yields are below 220 g/400 m2. (d)
e. About 10% of the yields are below 130 g/400 m2.

8. Consider the following ogive of the scores of students in an introductory statistics


course:

A grade of C or C+ is assigned to a student who scores between 55 and 70. The


percentage of students that obtained a grade of C or C+ is:
a. 25%
b. 30%
c. 20%
d. 50% (c)

51
52

e. 15%

9. Which of the following is NOT correct about constructing histograms?


a. The approximate number of classes is 1+3.3log(n).
b. All class intervals should be of equal width.
c. The bars of the histogram are centred over the class mark (midpoint).
d. The first and last classes should be open-ended to account for extreme
points.
e. There should be no spaces between bars. (d)

10. Forest companies routinely take samples from tracts that have been replanted to
monitor the growth of the trees. Suppose that in a recent sample of two tracts, the
diameter of the trees was measured with the following results:

Tract A B
Trees 75 210
Range 232-315 215-250 (mm)

Histograms to compare the two groups are to be constructed. Which of the


following is not recommended.

a. The number of classes in Group A and Group B should be around 7 or 8.


b. The class width of both groups will be 10 mm.
c. The class bounds will be 232-242 mm, 242-252 mm, etc. for Group A and
215- 225 mm, 225-235 mm, etc for Group B.
d. The vertical scale for both groups should be relative frequency (%).
e. The two histograms will be stacked and alligned so that the vertical axes
are the same and the horizontal axes are identical. (c)

11. Consider the following SAS procedure to construct a histogram.

PROC CHART DATA=BARLEY;


VBAR YIELD/ TYPE=PERCENT
MIDPOINTS=90 TO 300 BY 30;

Which of the following statements is correct?

a. The first bar will contain only values from 75 to 105.


b. All yields below 90 will not be used.
c. The vertical axis will be labelled with the actual frequency in each class.
d. Several charts will be produced, one for each distinct year.
e. The last class will include values above 315. (e)

12. For each student in a class, the sex and weight (in kilograms) are recorded
Consider the following SAS program:

DATA STUDENTS;
INPUT SEX $ WEIGHT;
DATALINES;
F 62
52
53

.
. (<-- more data here)
.
M 78
;
PROC SORT DATA=STUDENTS;
BY SEX;
PROC CHART DATA=STUDENTS;
VBAR WEIGHT/TYPE=PERCENT MIDPOINTS=55 TO 95 BY 10;
BY SEX;

Which of the following statements is correct?

a. The first class will contain weights from 55 to 65 kg.


b. All weights below 55 kilograms will be discarded.
c. One vertical bar chart will be produced; it will contain the weights of all of
the students, and the males and females will be indistinguishable.
d. The vertical axis will be labelled "frequency".
e. Two separate vertical bar charts will be produced -- one for males and one
for females. (e)

13. A single stem-and-leaf plot is a useful tool because:


a. it includes the average and the standard deviation.
b. it shows the percentage distribution of the data values.
c. it enables us to examine the data values for the presence of trends, cycles,
and seasonal variation.
d. it enables us to locate the centre of the data, see the overall shape of the
distribution, and look to marked deviations from the overall shape.
e. it enables us to compare this dataset against others of a similar kind. (d)

14. Forty students wrote a Statistics examination having a maximum of 50 marks.


The mark distribution is given in the following stem-and-leaf plot:

0|28
1|2245
2|01333358889
3|001356679
4|22444466788
5|000

The third quartile of the mark distribution is equal to:

a. 75
b. 44
c. 32
d. 37.5 (b)
e. 30

15. Thirty students wrote a statistics examination having a maximum of 50 marks.


The mark distribution is given in the following stem-and-leaf plot:
53
54

0|9
1|225
2|013335889
3|00136679
4|02244478
5|0

The median mark is equal to:

a. 30.5
b. 30.0
c. 25.0
d. 28.5 (a)
e. 44.0

16. Rainwater was collected in water collectors at thirty different sites near an
industrial basin and the amount of acidity (pH level) was measured. The following
stem-and- leaf diagram shows the pH values that ranged from 2.6 to 6.3.

Stems Leaves
2 679
3 237789
4 1222446899
5 0556788
6 0233

The median acidity is:

a. 4.2
b. 4.4
c. 4.5
d. 4.6 (c)
e. Average of 15 and 16.

17. Refer to the previous question. Which of the following box-plots is correct:

(e)

18. Refer to the previous question. The interquartile range is:


a. 7.75
b. 23.25
c. 5.625
54
55

d. 3.77 (e)
e. 1.855

19. The following is a stem-plot of the birth weights of male babies born to the
smoking group. The stems are in units of kg.

Stems Leaves
2 3,4,6,7,7,8,8,8,9
3 2,2,3,4,6,7,8,9
4 1,2,2,3,4
5 3,5,5,6

The median birth weight is:

a. 13.5
b. 3.2
c. 3.5
d. 3.7
e. Average of 13 and 14. (c)

20. Refer to the previous question. The first quartile (25th) percentile of the weights is
a. 2.3
b. 2.7
c. .25
d. 6.5 (e)
e. 2.8

21. Which one of the following statements is FALSE?


a. Pie charts are better than bar graphs for comparing relative sizes.
b. Data that are nominal scale are presented using frequency tables.
c. Means and standard deviation of ordinal data are meaningless.
d. The scatter-plot is the basic graphic tool for investigating relationships
between two interval or ratio scaled variables.
e. Box-plots are a good choice for comparing the distribution of values
among groups. (a)

22. Which of the following is NOT CORRECT?


a. The scatterplot is the basic graphical tool for investigating relationships
between two continuous interval or ratio scaled variables.
b. The frequency table is useful for summarizing data from a nominal scaled
variable.
c. Means and standard deviations of nominal or ordinal scaled variables are
useful summary measures.
d. Pie charts don’t perform well because people have difficulty in accurately
quantifying angles. (c)
e. Boxplots perform well for comparing groups because it is relatively
straightforward to see how the mean and median change over the groups.

55
56

23. The following is a display comparing the favorite TV shows selected from a
specified set by gender. Each person had to select one preferred show from the
three shows given below:

<--------------------- Percent -------------------> Sample


10 20 30 40 50 60 70 80 90 100 Size
|----|----|----|----|----|----|----|----|----|----|
Male: sssssssssssssssssssssfffffffffffffffbbbbbbbbbbbbbbb 100

Female:sssssssssssffffffffffffffffffffffffffffffbbbbbbbbbb 300

Star Trek; ffff= Friends; bbbb = Baywatch;

Which of the following is FALSE:

a. A greater percentage of males have StarTrek as their favorite show than


females.
b. More females (in absolute number) selected Baywatch as their favorite
show than males.
c. About 1/5 of females surveyed selected StarTrek as their favorite show
from the three given.
d. The modal favorite show (from the three specified) for males is StarTrek.
e. About twice as many females (in relative numbers) enjoy Star Trek than
males. (e)

24. An experiment was conducted to investigate the effect of a new weed killer to
suppress weed germination in onion crops. Two chemicals were used, the standard
week killer (C) and the new chemical (W). Both chemicals were tested at high
and low concentrations. Measurements are made on each of 50 plots for each
treatment combinations of the % weed germination. Here are some box-plots of
the results where the whiskers extend to the min and max of the data.

0 10 20 30 40 50
|---------|--------|---------|---------|---------|
|
| _______________
W-low conc. | -----------|________|______|--------
|
| ___________
C-low conc. | ----|_____|_____|----
|
| ______________
W-high conc | -----|_______|______|---------- * * *
|
| _________
C-high conc. | --|____|____|---

Which of the following is NOT a feature of this data:

a. At either high or low concentrations, the new chemical (W) gives better
control of weed germination than the control (C).
56
57

b. Fewer weeds germinate at higher concentrations of both chemicals.


c. The results from the control chemical are less variable than the new
chemical.
d. High or low concentrations of either chemical have approximately the
same effects on weed germination.
e. Some of the results from the low concentration of weed killer W have
fewer weeds germinating than some of the results from the high
concentration of W. (d)

57

You might also like