Chapter1 Statistics
Chapter1 Statistics
A population is said to be infinite if it has infinite number of units. For example the number of
stars in the sky, the number of people seeing the Television programmes etc.,
Sample: Statisticians use the word sample to describe a portion chosen from the population. A
finite subset of statistical individuals defined in a population is called a sample. The number of
units in a sample is called the sample size.
Variable or Variate: A quantity which can vary from one individual or object to another is
called variable e.g age, height, weight, price of commodity etc. A variable usually denoted by
the last letters of alphabets e.g X, Y, Z.
Continuous variable: A variable which can assume any value within a given range are called
a continuous variable e.g height, weight of a students.
Discrete variable: A variable which can assume only specific values within a given range are
called discrete variable e.g number of house in a street, number of children in a family etc
Constant: A quantity which can assume only one value is called a constant. A constant is
usually denoted by the first letters of alphabets e.g a, b, c etc
Frequency Distribution
A frequency distribution is a tabular arrangement of the data that shows the distribution
of observation among different classes.
A frequency distribution is a method of classifying data into classes or intervals in such a way
that the number (or frequency) of each class can be determined. This method provide a way of
reviewing a set of numbers without actually have to consider the individual numbers and it can
be very usefully when dealing with large amounts of data.
Class-limits: The class-limits are defined as the numbers or the values of the variables
which describe the classes; the smaller number is the lowe class limit and the larger
number is the upper class limit.
For example: The marks obtained by students in a class are:
10-14, 15-19, 20-24, 25-29, etc.
Class boundaries: A class boundary is located midway between the upper limit of a
class and the lower limit of the next higher class.
For example: The marks obtained by students in a class are:
10-14, 15-19, 20-24, 25-29, etc.
Then class boundaries are: 9.5-14.5, 14.5-19.5, 19.5-24.5, 24.5-29.5, etc.
Class mark: A class mark is the midpoint of a class. It divides the class into two parts.
It is denoted by the variable X.
Class Interval: The class interval or class width is the difference between the class
boundaries it is denoted by h. It is obtained by finding the difference either between the
lower class limits or the upper class limits or two successive class marks.
Class Frequency
3
The number of values divided by the total frequency of all the classes is called the relative
frequency
Relative Frequency = f
f
The frequency of a class divided by the total frequency of all the classes is called relative
frequency
Percentage Relative Frequency
If 100 multiply relative frequencies, we obtain percentage relative frequency. A table
showing percentage relative frequencies is called percentage frequency distribution.
Cumulative Frequency
The total frequency of all the classes less than the upper class boundary of a given class
is called as the cumulative frequency of that class
Construction of a Frequency Distribution
There are no hard and fast rules to construct a frequency distribution; however some basic
guidelines must be observed.
i) Appropriate number of classes in a frequency distribution
The number of classes denoted by C, depends on the situation and the amount of data. There is
no hard and fast rules regarding the number of classes to use and the choice is arbitrary. It is
generally accepted that the number of classes should be between 5 and 20, depending on the
amount of data. A useful suggestion regarding the number of classes is given by Sturge’s rule.
The rule is:
C = 3.3 log (n) + 1
where, C denotes the number of classes and n is the number of observations. For example, if
there are 25 observations in a data set, then
C = 3.3 log (25) + 1 = 3.3 (1.3979) + 1 = 6
ii) Find the lowest value and the highest value in the data.
iii) Find the range: Range R is obtained by subtracting the lowest value X L from the
highest value X S i.e. R X L X S
R
iv) Divide the range by the number of classes to find the class width or class interval i.e. h
C
v) Determine the value at which the lowest interval should begin. It should be ordinarily be a
multiple of the class interval.
vi) Determine the remaining class-limits and class boundaries by adding the class interval
repeatedly. The lowest class should be placed at the top and the rest should follow according
to size. Sometimes, the highest class is placed at the top.
vii) Using the tally system, enter the raw data in the appropriate class intervals. It is
customary for convenience in counting to place the first four bars or strokes vertically and fifth
one diagonally so as to have a set of five. Sometimes for a smaller data set, the actual values
can be written against each class instead of tally bars.
viii) Convert each tally to a frequency (f).
4
ix) Finally, total the frequency column to see that all the data have been accounted for.
Example: Make a frequency distribution to the weights recorded to the nearest grams of
60 apples picked at random from a consignment:
Ungrouped data
106 107 76 82 109 107 115 93 187 95 123 125
111 92 86 70 126 68 130 129 139 119 115 128
100 186 84 99 113 204 111 141 136 123 90 115
98 110 78 185 162 178 140 152 173 146 158 194
148 90 107 181 131 75 184 104 110 80 118 82
Solution:
Step 2: Suppose we decide to take C=7 classes. Then the class interval is
R 136
h 19.47 20 .
C 7
Grouped data or frequency distribution
class Mid-point
Classes Tally frequency
Boundaries X
65-84 IIII 9 64.5-84.5 74.5
85-104 10 84.5-104.5 94.5
105-124 II 17 104.5-124.5 114.5
125-144 10 124.5-144.5 134.5
145-164 5 144.5-164.5 154.5
165-184 IIII 4 164.5-184.5 174.5
185-204 5 184.5-204.5 194.5
60
Example:
The following data give the index numbers of 100 commodities in a certain year. Make a
frequency distribution.
Solution:Step 1: We first find the range R. As the Maximum value is 153 and the
Minimum value is 61, the range is R X L X S = 153 – 61 = 92
Step 2: We next decide the number of classes. Suppose we decide to take C=10 classes.
Then the class interval is typically, the value of R/C is rounded up to the next value determined
R 92
by the precision of measurement to produce a convenient value. h 9.2 . So h=10.
C 10
Step 3: Next we decide to locate the lower limit class at 60. With this choice, the class limits
will be 60-69, 70-79, 80-89, ….
Step 4: To determine the frequency of each class we use either a entry table (for small
data set) or a tally column. If a piece of data falls in a class, we record a tally mark (l) in the
tally column corresponding to that class
The frequency distribution is then constructed as follows:
Classes
class Mid-
(Index Tally frequency
Boundaries point
Number)
60-69 III 3 59.5-69.5 64.5
70-79 IIII 9 69.5-79.5 74.5
80-89 IIII 9 79.5-89.5 84.5
90-99 III 13 89.5-99.5 94.5
100-109 I 21 99.5-109.5 104.5
110-119 IIII 19 109.5-119.5 114.5
120-129 II 12 119.5-129.5 124.5
130-139 5 129.5-139.5 134.5
140-149 II 7 139.5-149.5 144.5
150-159 II 2 149.5-159.5 154.5
100
Example: The following data set represents the amounts of cash (in rupees) spent on a
particular day by 25 students. Construct a grouped frequency table.
25
Graph: Data can be effectively presented by means of graph. A graph consist of a curves or a
straight lines
Diagram or Graph:
• A diagram or graph is a pictorial means for portraying and summarizing data. No
doubt tabulation is a good method of condensing and summarizing data but many
people has no taste for numbers. They may prefer a way of representation where
figures could be avoided. More over a pictorial presentation of the data often makes
certain features of the data more apparent them a tabular presentation.
• In the media it is common to represent the data graphically and with the use of
computer graphics it is now further enhanced.
• Diagram refers to various types of devices such as bars, circles, pictorials etc.
Diagrammatic representation is suited to spatial series. The following are the
advantages and limitations of diagrammatic presentations.
Example: A sample of 50 college students was taken who were planning to go to Punjab
University. Each of the students was asked which of the following masters program be or she
intended to choose: Statistics, Economics, Business, Information Technology (IT), Arts and
other. The responses of these students are presented in table below. Construct a simple bar
chart for this data.
7
Masters f
Program 20
Statistics 6
Frequency
15
Economics 10
10 Series1
Business 12
5
IT 15
0
Arts 3 Stat Eco Business IT Arts Others
Other 4 Masters Programs
Example:
Draw a simple bar diagram to represent the Sales of a Company for 5 years.
Newspaper 50 75 100
Books Printing 60 65 75
Wrapping 20 15 25
Special Variations 10 18 15
Others 40 45 40
120
100
80 Series1
60 Series2
40 Series3
20
0
Newspaper Books Wrapping Special Others
Printing Variations
9
12
10
8
6
4
2
0
Lahore Karachi Rawalpindi Peshawar Quetta
Series1 Series2
Pie chart: The pie chart or Pie diagram is a division of a circular region into different sectors.
It is constructed by dividing the total angle of a circle of 360 degrees into different components.
The angle for each sector is obtained by the relation
Component part
Q= Angle= 360
wholequantity
HISTOGRAM
A histogram is a bar chart or graph showing the frequency of occurrence of each value of the
variable being analysed. In histogram, data are plotted as a series of rectangles. Class
boundaries are shown on the ‘ X-axis’ and the frequencies on the ‘Y-axis’. The height of
each rectangle represents the frequency of the class interval. Each rectangle is formed with
the other so as to give a continuous picture.
11
Frequency Polygon:
If we mark the midpoints of the top horizontal sides of the rectangles in a histogram and join
them by a straight line, the figure so formed is called a Frequency Polygon. This is done
under the assumption that the frequencies in a class interval are evenly distributed throughout
the class. The area of the polygon is equal to the area of the histogram, because the area left
outside is just equal to the area included in it.
Frequency Curve:
When a frequency polygon or a histogram constructed over class intervals made sufficiently
small for a large number of observations, it approaches a smooth and continuous curve called
frequency curve.
12