0% found this document useful (0 votes)
61 views12 pages

Chapter1 Statistics

Statistics is the scientific study of collecting, organizing, summarizing, presenting, and analyzing data. There are four main stages: collection of data, presentation of data, analysis of data, and interpretation of data. A population is the total group being studied, while a sample is a subset of the population. Data can be organized using frequency distributions which classify data into groups and counts the frequency of observations in each group.

Uploaded by

Sam Sam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views12 pages

Chapter1 Statistics

Statistics is the scientific study of collecting, organizing, summarizing, presenting, and analyzing data. There are four main stages: collection of data, presentation of data, analysis of data, and interpretation of data. A population is the total group being studied, while a sample is a subset of the population. Data can be organized using frequency distributions which classify data into groups and counts the frequency of observations in each group.

Uploaded by

Sam Sam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

1

Dr. Muhammad Riaz ■ Department of Mathematics, University of the Punjab, Lahore ■


Email: [email protected]
STATISTICS
Chapter 1
DEFINITION
Statistics is concerned with scientific methods for collecting, organizing, summarizing,
presenting and analyzing data as well as deriving valid conclusions and making reasonable
decisions on the basis of this analysis. Statistics is concerned with the systematic collection of
numerical data and its interpretation.
According to this definition there are four stages:
1. Collection of Data
2. Presentation of data
3. Analysis of data
4. Interpretation of data
Collection of Data: It is the first step and this is the foundation upon which the entire data
set. Careful planning is essential before collecting the data. There are different methods of
collection of data such as census, sampling, primary, secondary, etc., and the investigator
should make use of correct method.
Presentation of Data: The mass data collected should be presented in a suitable, concise
form for further analysis. The collected data may be presented in the form of tabular or
diagrammatic or graphic form.
Analysis of Data: The data presented should be carefully analyzed for making inference
from the presented data such as measures of central tendencies, dispersion, correlation,
regression etc.,
Interpretation of Data: The final step is drawing conclusion from the data collected. A valid
conclusion must be drawn on the basis of analysis. A high degree of skill and experience is
necessary for the interpretation.

Population and Sample


In collecting data concerning the characteristics of a group of individuals or objects such as
the heights, weights, marks etc. the entire group is called population and small part of the
group is called a sample.
OR
Totality of observation under some common characteristics is called population. In other
words, the population is a complete set of all possible observations of the type which is to be
investigated.
e.g Number of students in a college, Number of citizen in a city etc.

Finite population and infinite population:


A population is said to be finite if it consists of finite number of units. Number of workers in a
factory, production of articles in a particular day for a company are examples of finite
population. The total number of units in a population is called population size.
2

A population is said to be infinite if it has infinite number of units. For example the number of
stars in the sky, the number of people seeing the Television programmes etc.,
Sample: Statisticians use the word sample to describe a portion chosen from the population. A
finite subset of statistical individuals defined in a population is called a sample. The number of
units in a sample is called the sample size.
Variable or Variate: A quantity which can vary from one individual or object to another is
called variable e.g age, height, weight, price of commodity etc. A variable usually denoted by
the last letters of alphabets e.g X, Y, Z.
Continuous variable: A variable which can assume any value within a given range are called
a continuous variable e.g height, weight of a students.
Discrete variable: A variable which can assume only specific values within a given range are
called discrete variable e.g number of house in a street, number of children in a family etc
Constant: A quantity which can assume only one value is called a constant. A constant is
usually denoted by the first letters of alphabets e.g a, b, c etc

Raw data / ungrouped data:


Raw data are collected values which have not been arranged (organized) numerically in a
systematic order.

Frequency Distribution
A frequency distribution is a tabular arrangement of the data that shows the distribution
of observation among different classes.
A frequency distribution is a method of classifying data into classes or intervals in such a way
that the number (or frequency) of each class can be determined. This method provide a way of
reviewing a set of numbers without actually have to consider the individual numbers and it can
be very usefully when dealing with large amounts of data.
Class-limits: The class-limits are defined as the numbers or the values of the variables
which describe the classes; the smaller number is the lowe class limit and the larger
number is the upper class limit.
For example: The marks obtained by students in a class are:
10-14, 15-19, 20-24, 25-29, etc.
Class boundaries: A class boundary is located midway between the upper limit of a
class and the lower limit of the next higher class.
For example: The marks obtained by students in a class are:
10-14, 15-19, 20-24, 25-29, etc.
Then class boundaries are: 9.5-14.5, 14.5-19.5, 19.5-24.5, 24.5-29.5, etc.
Class mark: A class mark is the midpoint of a class. It divides the class into two parts.
It is denoted by the variable X.
Class Interval: The class interval or class width is the difference between the class
boundaries it is denoted by h. It is obtained by finding the difference either between the
lower class limits or the upper class limits or two successive class marks.
Class Frequency
3

The number of values divided by the total frequency of all the classes is called the relative
frequency
Relative Frequency = f
f
The frequency of a class divided by the total frequency of all the classes is called relative
frequency
Percentage Relative Frequency
If 100 multiply relative frequencies, we obtain percentage relative frequency. A table
showing percentage relative frequencies is called percentage frequency distribution.
Cumulative Frequency
The total frequency of all the classes less than the upper class boundary of a given class
is called as the cumulative frequency of that class
Construction of a Frequency Distribution
There are no hard and fast rules to construct a frequency distribution; however some basic
guidelines must be observed.
i) Appropriate number of classes in a frequency distribution
The number of classes denoted by C, depends on the situation and the amount of data. There is
no hard and fast rules regarding the number of classes to use and the choice is arbitrary. It is
generally accepted that the number of classes should be between 5 and 20, depending on the
amount of data. A useful suggestion regarding the number of classes is given by Sturge’s rule.
The rule is:
C = 3.3 log (n) + 1
where, C denotes the number of classes and n is the number of observations. For example, if
there are 25 observations in a data set, then
C = 3.3 log (25) + 1 = 3.3 (1.3979) + 1 = 6
ii) Find the lowest value and the highest value in the data.

iii) Find the range: Range R is obtained by subtracting the lowest value X L from the
highest value X S i.e. R  X L  X S

R
iv) Divide the range by the number of classes to find the class width or class interval i.e. h 
C
v) Determine the value at which the lowest interval should begin. It should be ordinarily be a
multiple of the class interval.
vi) Determine the remaining class-limits and class boundaries by adding the class interval
repeatedly. The lowest class should be placed at the top and the rest should follow according
to size. Sometimes, the highest class is placed at the top.
vii) Using the tally system, enter the raw data in the appropriate class intervals. It is
customary for convenience in counting to place the first four bars or strokes vertically and fifth
one diagonally so as to have a set of five. Sometimes for a smaller data set, the actual values
can be written against each class instead of tally bars.
viii) Convert each tally to a frequency (f).
4

ix) Finally, total the frequency column to see that all the data have been accounted for.

Example: Make a frequency distribution to the weights recorded to the nearest grams of
60 apples picked at random from a consignment:
Ungrouped data
106 107 76 82 109 107 115 93 187 95 123 125
111 92 86 70 126 68 130 129 139 119 115 128
100 186 84 99 113 204 111 141 136 123 90 115
98 110 78 185 162 178 140 152 173 146 158 194
148 90 107 181 131 75 184 104 110 80 118 82
Solution:

Step 1: We first find the range R  X L  X S = 204-68=136

Step 2: Suppose we decide to take C=7 classes. Then the class interval is
R 136
h   19.47  20 .
C 7
Grouped data or frequency distribution

class Mid-point
Classes Tally frequency
Boundaries X 
65-84 IIII 9 64.5-84.5 74.5
85-104 10 84.5-104.5 94.5
105-124 II 17 104.5-124.5 114.5
125-144 10 124.5-144.5 134.5
145-164 5 144.5-164.5 154.5
165-184 IIII 4 164.5-184.5 174.5
185-204 5 184.5-204.5 194.5
60

Example:
The following data give the index numbers of 100 commodities in a certain year. Make a
frequency distribution.

91 120 138 96 99 113 97 94 119 111


118 83 91 86 71 119 123 87 151 117
87 116 134 90 61 141 104 115 125 79
119 124 112 145 96 114 114 106 113 89
110 111 75 106 153 63 107 96 100 96
81 101 104 108 147 133 100 109 104 110
143 77 109 138 113 86 121 86 136 117
99 95 90 100 104 79 68 88 116 101
5

144 127 101 128 102 105 106 122 76 78


73 147 127 129 140 120 129 77 108 109

Solution:Step 1: We first find the range R. As the Maximum value is 153 and the
Minimum value is 61, the range is R  X L  X S = 153 – 61 = 92

Step 2: We next decide the number of classes. Suppose we decide to take C=10 classes.
Then the class interval is typically, the value of R/C is rounded up to the next value determined
R 92
by the precision of measurement to produce a convenient value. h    9.2 . So h=10.
C 10
Step 3: Next we decide to locate the lower limit class at 60. With this choice, the class limits
will be 60-69, 70-79, 80-89, ….
Step 4: To determine the frequency of each class we use either a entry table (for small
data set) or a tally column. If a piece of data falls in a class, we record a tally mark (l) in the
tally column corresponding to that class
The frequency distribution is then constructed as follows:

Classes
class Mid-
(Index Tally frequency
Boundaries point
Number)
60-69 III 3 59.5-69.5 64.5
70-79 IIII 9 69.5-79.5 74.5
80-89 IIII 9 79.5-89.5 84.5
90-99 III 13 89.5-99.5 94.5
100-109 I 21 99.5-109.5 104.5
110-119 IIII 19 109.5-119.5 114.5
120-129 II 12 119.5-129.5 124.5
130-139 5 129.5-139.5 134.5
140-149 II 7 139.5-149.5 144.5
150-159 II 2 149.5-159.5 154.5
100

Example: The following data set represents the amounts of cash (in rupees) spent on a
particular day by 25 students. Construct a grouped frequency table.

39.78 28.30 28.31 17.95 44.47


46l.65 31.47 33.45 29.17 48.39
82.71 43.63 41.17 47.32 52.16
25.94 50.32 35.25 35.70 17.89
60.20 48.14 22.78 38.22 23.25

Class Tally f class boundaries X

17.85-30.84 III 8 17.845 – 30.845 24.345

30.85-43.84 III 8 30.845-43.845 37.345


6

43.85-56.84 II 7 43.845-56.845 50.345

56.85-69.84 I 1 56.845-69.845 63.345

69.85-82.84 I 1 69.845-82.845 76.345

25

Graph: Data can be effectively presented by means of graph. A graph consist of a curves or a
straight lines

Diagram or Graph:
• A diagram or graph is a pictorial means for portraying and summarizing data. No
doubt tabulation is a good method of condensing and summarizing data but many
people has no taste for numbers. They may prefer a way of representation where
figures could be avoided. More over a pictorial presentation of the data often makes
certain features of the data more apparent them a tabular presentation.
• In the media it is common to represent the data graphically and with the use of
computer graphics it is now further enhanced.
• Diagram refers to various types of devices such as bars, circles, pictorials etc.
Diagrammatic representation is suited to spatial series. The following are the
advantages and limitations of diagrammatic presentations.

SIMPLE BAR CHART

Example: A sample of 50 college students was taken who were planning to go to Punjab
University. Each of the students was asked which of the following masters program be or she
intended to choose: Statistics, Economics, Business, Information Technology (IT), Arts and
other. The responses of these students are presented in table below. Construct a simple bar
chart for this data.
7

Masters f
Program 20
Statistics 6

Frequency
15
Economics 10
10 Series1
Business 12
5
IT 15
0
Arts 3 Stat Eco Business IT Arts Others
Other 4 Masters Programs

Example:
Draw a simple bar diagram to represent the Sales of a Company for 5 years.

Year 1997 1998 1999 2000 2001


Sales 50000 60000 70000 80000 90000
(Rupees)
8

Sales (000 Rupees) 100


80
60
Series1
40
20
0
1997 1998 1999 2000 2001
Year

Multiple Bar Chart


When two or more sets of data with common characteristic are to be represented in the same
diagram, Multiple bar Diagram is drawn.
Example: The following frequency table gives the sales of paper (1000 tons) in Lahore for
the last three years. Draw a multiple bar diagram to represent the data.

Categories 2000 2001 2002

Newspaper 50 75 100

Books Printing 60 65 75

Wrapping 20 15 25

Special Variations 10 18 15

Others 40 45 40

120
100
80 Series1
60 Series2
40 Series3
20
0
Newspaper Books Wrapping Special Others
Printing Variations
9

Component or Sectional Bar Chart:


In component bar chart, a bar is drawn to represent the total frequency and then divide the bar
into components or sections whose lengths are proportional to the frequencies of the categories
they represent. This diagram can also be drawn in the percentage form where one bar represent
100%, then it is known as percentage component bar chart.
Example: Draw the component bar chart for the following data.

Cities Total Males Females

Lahore 7 3.7 3.3

Karachi 10 5.5 4.5

Rawalpindi 4 2.2 1.8

Peshawar 4.5 2.5 2.0

Quetta 2.0 1.1 0.9

12
10
8
6
4
2
0
Lahore Karachi Rawalpindi Peshawar Quetta

Series1 Series2

Pie chart: The pie chart or Pie diagram is a division of a circular region into different sectors.
It is constructed by dividing the total angle of a circle of 360 degrees into different components.
The angle for each sector is obtained by the relation

Component part
Q= Angle=  360
wholequantity

Example: Represent the expenditures on various items of a family by a pie chart.

Items Expenditures Angle


Food 50
Clothing 30
House Rent 20
10

Fuel and light 15


Miscellaneous 35
Total 150

HISTOGRAM
A histogram is a bar chart or graph showing the frequency of occurrence of each value of the
variable being analysed. In histogram, data are plotted as a series of rectangles. Class
boundaries are shown on the ‘ X-axis’ and the frequencies on the ‘Y-axis’. The height of
each rectangle represents the frequency of the class interval. Each rectangle is formed with
the other so as to give a continuous picture.
11

Frequency Polygon:
If we mark the midpoints of the top horizontal sides of the rectangles in a histogram and join
them by a straight line, the figure so formed is called a Frequency Polygon. This is done
under the assumption that the frequencies in a class interval are evenly distributed throughout
the class. The area of the polygon is equal to the area of the histogram, because the area left
outside is just equal to the area included in it.

Frequency Curve:
When a frequency polygon or a histogram constructed over class intervals made sufficiently
small for a large number of observations, it approaches a smooth and continuous curve called
frequency curve.
12

Stem= Leading digit, Leave= trailing digit


If the number is 243 then
Stem= 2, Leave= 43 OR Stem= 24, Leave= 3

Cumulative frequency polygon or ogive:


It is a graph obtained by plotting the cumulative frequencies against the upper class
boundaries.

You might also like