Download as pdf or txt
Download as pdf or txt
You are on page 1of 29


Meaning of Data
• Data is plural word and comprehend the idea of collection of pieces
of information on some variables.
• Data are the raw, disorganized facts and figures collected from any
field of inquiry.

• The raw material of Statistics which is the outcome of facts (Sex,

Occupation etc.), events (birth, death, and disease etc.) or
measurements (Height, weight etc.) and which contains

We may define data as a numbers whose common characteristics is

variability or variation.
The investigators generate data through some process of counting,
observations or measurement.

Example: Among the male workers of an industry to know whether the

workers smoke or not. Thus, all the workers of the industry may be
classified into two categories: smokers and non-smokers. The number of
smokers and non-smokers are numerical data, obtained through the
process of counting.

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
We may further attempt to record their ages or measure their heights
and weights.
Some information can be obtained simply by observing i.e; we may
observe whether a given day is rainy or sunny day.
All these information constitute data.
Different Types of Data
All statistical data may be broadly classified into two broad categories:
Qualitative and quantitative data.

1. Primary and Secondary Data

2. Quantitative and Qualitative Data

Sources of Data
Statistical data depending upon the sources of two types
• Primary data
The data which are originally collected by an investigator or an
agent for the first time for the purpose of statistical enquiry are
known as primary data.The data is thus original in character.
• Secondary data
The data which are collected or obtained from some published or
unpublished sources are called secondary data. This type of data is
not original in character.

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
For example: The reports and publications made by Bangladesh Bureau
of Statistics (BBS) are primary for that organization but secondary for
those who use it.
Methods of Data Collection

1. Complete enumeration survey or census method

2. Sample Survey or Sampling method.
3. Experiments
4. Observational Studies

Census method
• In Census every unit of the population is studied.

Sample Survey
• In sampling method instead of every unit of the population only a
part of the population is studied and the conclusions are drawn on
the basis of the sample for the entire population.
• Sampling is a technique to select some representative units from
population units.

Sample surveys involve the selection and study of a sample of items

from a population. A sample is just a set of members chosen from a
population, but not the whole population. A survey of a whole
population is called a census.
Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
In any survey the information are collected according to some pre-
formulated questions. A set of questions for any survey constitutes the
A questionnaire is an instrument that is generally mailed or handed over
to the respondents and filled in by them with no help from the
interviewer or any other person.

A schedule, also known as an interview schedule, is an instrument that is
not given to the respondents but it is filled in by interviewer himself who
reads the questions to the respondents and records the answers as
provided by the respondents.

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR

Suppose from a sample of 50 workers from 500 workers, the researcher

collected such data as the workers’ age, level of education, wage, and
religion by directly interviewing them. This information’s contains both
qualitative and quantitative data.

Having obtained the data, the most usual questions one might ask:
• How many of the workers are below 30 years of age? Over 50?
• How many of them earn between 74 and 81 taka?
• How many of them secondary level of education?
• Are the workers frequent to remain absent from work?

The answers to the above questions can be given simply counting the
cases that appear in the table. But it will simply be a cumbersome job
and sometimes impossible, if the number of cases is very large. What
would then one expect us to do with this large volume of data?
Most of us would wish that someone had classified, categorized or
summarized the data in a more convenient and readily interpretable
Tabular and graphical procedures provide useful ways and means of
organizing and describing the data such that they are more easily used
and interpreted.
Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
The concept of frequency distribution is introduced here as a tabular
method of summarizing data. This frequency distribution can also be
displayed graphically employing a number of diagrams, charts, plots and

Presentation of data
1. Arrangement
Ø Ascending (Lowest to highest)
Ø Descending (Highest to lowest)
2. Tabulation
Ø Frequency Table
Ø Frequency Distribution
Ø Cross tabulation
3. Graphs
Ø Histogram
Ø Frequency curve (Polygon)
ü Line chart
ü Scatter diagram
4. Diagrams
Ø Bar diagram
Ø Pie diagram

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
Frequency distribution
A Frequency distribution is a set of mutually exclusive classes or
categories together with the frequency of occurrences of items, values or
observations in each class or category in a given set of data, presented
usually in a tabular form.
Frequency Distribution:
• A tabular presentation of data showing the number of observations in
each class.
• A grouping of data into mutually exclusive classes showing the
number of observations in each class presented usually in a tabular

Purpose of constructing a frequency distribution

A Frequency distribution
• Provides a good overall picture of information
• Presented usually in a tabular form
• Showing the frequency of measurements or observations in each of
the several non-over-lapping classes
• It may also be displayed graphically or by statement, algebraic
formula, or rules pairing a class of observations with its frequency.
• A Frequency distribution can be constructed for both qualitative
data and quantitative data.
Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
Steps of constructing frequency distribution
• Choose the category into which the data are to be grouped.
• Sort or tally the items or observations into appropriate categories.
• Count the number of items or observations falling in each
• Display the results in a table.

Construction of a Frequency distribution for qualitative data:

Frequency distribution for the family size data

Family size Tally marks Number of workers Percent (%)

Large ////////////// 16 32

Medium //////////////////// 24 48

//////// 10 20
Total 50 100

Here 16, 24, and 10 are called the class frequencies

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
Frequency distribution of workers by religion
Religion Tally Number of workers Percent

Muslim /// 72
Hindu //////// 9 18

//// 5 10

Total 50 100

Construction of a Frequency distribution for quantitative data:

The construction of a frequency distribution with numerical or
quantitative data is very similar to those for qualitative data, except
The data have to be grouped into classes of appropriate intervals
To form an array (ordering of values of the variable usually in ascending
order, i.e. from smallest to largest).

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
Table: Array of daily wage data
50 63 70 75 84
51 65 7175 85
54 65 72 76 86
56 66 72 77 87
56 67 72 79 88
57 68 73 80 88
59 68 73 81 89
60 69 74 82 93
61 69 74 82 93
62 70 74 83 97

We see minimum wage is TK 50 and maximum is TK 97 and some

values repeated more than once-65, 68, 69…..

So the ungrouped frequency distribution of raw data will be

Wage Frequency
51 1

… …
… …
97 1

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
Formation of grouped discrete frequency distribution
• Grouping, however, has limitations too.
• One disadvantage of group distribution is the loss of information.
Example: The number of complete days the workers were absent from
their work are arranged below in an ascending array:
5 8 9 9 10 10 10 10 11 11
12 12 12 13 13 13 14 14 14 15
15 15 15 16 16 16 16 14 17 17
17 18 18 18 18 18 19 19 19 19
20 21 21 22 23 24 26 27 29 33

Solution: We refer to the unorganized information above as raw data or

ungrouped data.
The Steps for Organizing Data into a Frequency Distribution:
Step 1: Find out the lowest and the highest selling price:
The Lowest: 5
The Highest: 33
The range = 33– 5 = 28
Total Number of Observations, N = 50
Step 2: Determine the No. of classes & class interval or width:
K = No. of classes = 1+3.322logN
The sturge’s rule suggest, 2k ³ N so 26 = 64 ³ 50
The rule suggests a value of k=6
K should not be less than 4 and shouldn’t be more than 20.
Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
Generally the class interval or width should be the same for all classes.
Class interval,
Range 28
i= = = 4.21 Where, i is the class interval.
1 + 3.322 log10 N 6.64
In our example,
In practice this interval size is rounded up to some convenient number,
such as a multiple of 5 or 10 or 100. The value of 10 will be used in our
Step 3: Set the Individual Class Limits.
Step 4: Tally the Profits into the Classes.
Step 5: Count the Number of Items in Each Class.

Table: Frequency Distribution of the workers absence in days

Class interval Tally Frequency
5–9 //// 4
10 – 14 /////////// 15
15 – 19 ///////////////// 21
20 – 24 ///// 6
25– 29 /// 3
30 –34 / 1
For the class 15-19 (say), the class mid-point is 17 and the class width is
5.This is read as15 to 19 inclusive and is obtained as (19-15) +1=5.
• For discrete distributions, the class limits are always inclusive in
Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
Formation of continuous frequency distribution
The ages of the 50 workers are shown in the following table. Construct a
frequency distribution
25 33 37 42 45
28 34 37 42 46
29 35 37 42 46
30 35 38 43 46
31 35 38 43 46
32 36 38 43 46
32 36 38 43 47
32 36 39 44 50
32 36 40 44 51
33 36 41 44 52
33 37 42 45 54

Continuous frequency distribution of ages

Number of workers
Class interval Tally
25 –2 9 /// 3
30 – 34 //////// 9
35 – 39 //////////// 15
40 – 44 ////////// 12
45– 49 ////// 7
50 –54 //// 4
Total 50

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
It doesn’t maintain the continuity of the age data. So we have to
reconstruct the frequency distribution with the correction factor
Lower lim it of sec ond class - Upperr lim it of first class 30 - 29
C= = = 0.5
2 2

Distribution showing true class limits

Number of workers
Class interval Tally
24.5 –29.5 /// 3
29.5 – 34.5 //////// 9
34.55 – 39.5 //////////// 15
39.5 – 44.5 ////////// 12
44.55– 49.5 ////// 7
49.5 –54.5 //// 4

• This type of classification is known exclusive method of classification.

Other forms of frequency distribution

• Percentage distribution
Relative frequency distribution
Instead of presenting the frequencies in absolute terms, it is sometimes
convenient to express the frequencies in relative terms. The resulting
distribution is then called Relative frequency distribution.
For the wage data the percentage frequency distribution and Relative
frequency distribution are shown in bellow:

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
Class interval/
Absolute Percentage Relative
frequency frequency frequency
49.5 –57.5 6 12.0 0.12
57.5 – 65.5 7 14.0 0.14
65.5 – 73.5 14 28.0 0.28
73.5 – 81.5 10 20.0 0.20
81.5– 89.5 10 20.0 0.20
89.5 –97.5 3 6.0 0.06
Total 50 100.0 1.00

Cumulative frequency distribution

From the above table if we want to know-
Ø How many of the workers earn less than taka 73.5?
Ø How many of the workers earn less than taka 57.5 or more?
There are two types of Cumulative frequency distribution
1. Less than frequency distribution for wage data
Class interval/
Cumulative % Cumulative
boundareies Frequency
frequency frequency
49.5 –57.5 6 6 12.0
57.5 – 65.5 7 13 26.0
65.5 – 73.5 14 27 54.0
73.5 – 81.5 10 37 74.0
81.5– 89.5 10 47 94.0
89.5 –97.5 3 50 100.0
Total 50

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
Thus the cumulative frequency 27 in column 3 states that 27 workers
received less than taka 73.5.in other words 54 % workers received such
2. More than frequency distribution for wage data
Class interval/
Cumulative %Cumulative
boundaries frequency
frequency frequency
49.5 –57.5 6 50 100.0
57.5 – 65.5 7 44 88.0
65.5 – 73.5 14 37 74.0
73.5 – 81.5 10 23 46.0
81.5– 89.5 10 13 26.0
89.5 –97.5 3 3 06.0
Total 50

Out of 50 workers 44(88%) receive a sum of taka 57.5 or more.

The above table we discussed is called univariate frequency distribution
because it involves only one variable. Any statistical Analysis based on
these tables will be called uni-variate analysis.

Frequently, more than one variable are studied simultaneously to

establish the causal relationship among the variables. For example, we
might be interested to study the relationship education of the workers
and their family size. Such an analysis is facilitated through constructing
what is called bi-variate frequency distribution.
Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
When such tables are constructed with qualitative data, the resulting
table is called contingency table. The simplest method of looking at
relations between variables in a contingency table is to do a percentage
comparison based on the row totals, column totals or the overall totals.

Family size and education level of 50 workers

Family size
Education Large Medium Small Total
None 4 6 1 11
Primary 6 8 5 19
Higher 6 10 4 20
Total 16 24 10 50

The above table is a contingency table of two categorical (qualitative

variables).Since two variables are involved in the construction of the
above table, the table is known as bi-variate (or 2-way) table. Since both
variables have three levels, it is also known as 3x3 (read three by three)
cross table.
This table intended to answer the question of the type:
Does education have any effect on the family size?
Here, family size is the dependent variable (problem) and education is
an independent variable (the factor).
The totals in column and rows are called the marginal frequencies. The
distributions formed with these frequencies are known as the marginal
Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
Marginal distributions of respondents by level of education
Family size
Education Total Percent
None 11 22
Primary 19 38
Higher 20 40
Total 50 100

In a similar way, one can think of a tri-variate table, in which the

variables are involved: one dependent and two independent variables.
The concept can be extended to multivariate cases, which involves
several variables.

A tri-variate table for education and family size by religion

Family size
Education Large Medium Small Total
None 4 3 1 8
Muslim Primary 5 5 3 13
Higher 4 7 4 15
Total 13 15 8 36
None 0 3 0 3
Non-Muslim Primary 1 3 2 6
Higher 2 3 0 5
Total 3 9 2 14
Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
Suppose that from the above table, we infer by some statistical testing
that workers with more education are more likely to have smaller family
than those who are less educated. One might argue that this difference is
due to religion. To test this claim, one can form a table that displays the
relationships of education and family size for each religion category.
This table is tri-variate table. Religion, in this particular instance is
known as controlled variable.

Presenting data by graphs and diagrams

• The most common forms of diagrams for presenting categorical
(qualitative) data are
• Bar diagram
• Pie diagram
• Multiple bar diagram
• Component bar diagram
• Pareto diagram

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
Bar diagram/Bar Chart (The classes are reported on the horizontal
axis and the class frequencies on the vertical axis)

30 50
20 30
Foreign Domestic
Laptop Type

Figure: Bar chart of Laptop sold

Pie diagram/Pie Chart (A chart that shows the proportion or

percentage that each class represents of the total number of frequencies)


Figure: Pie chart of Laptop sold by Daffodil IT Dhaka

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
Diagrams for discrete data
• Dot diagram
Diagrams for continuous data
• Histogram
Histogram: A graph in which the classes are marked on the horizontal
axis and the class frequencies on the vertical axis.

6 6
NO. Of laptops

5 5
3 Series1
14 – 24 24 – 34 34 – 44 44 – 54 54 – 64 64 – 74
Selling price(thousands TK)

Figure: Histogram of the selling price of the Laptops

Differences between a Bar Diagram and A Histogram?

Steam and leaf plot
The Steam and leaf plot is a simple device to construct a histogram-like
picture of a frequency distribution. It allows us to use the information
contained in a frequency distribution, to show the range and

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
concentrations of the scores, the shape of the distribution, and presence
of any outliers in the distribution.
Compared to other graphical methods, Steam and leaf plot is an easy and
quick way of displaying data.
The following data represent the marks obtained by 2 students in a
statistics test
84 17 38 45 47 53 76 54 75 22 66 65 55 54 51 33 39 19 54 72
Use a stem and leaf plot to display the data?
Here, the lowest score is 17 and the highest score is 84.Suppose, for the
score 84, the leading digit (tens) of scores as the stem is 8, and the
trailing digit (units) as the leaf is 4.
The complete diagram in unordered sequence’s is-

The complete diagram in unordered sequence is

Stem Leaf
1 79
2 2
3 839
4 57
5 345414
6 65
7 652
8 4

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
The final figure in ordered sequence is
Stem Leaf
1 79
2 2
3 389
4 57
5 134445
6 56
7 256
8 4
There are more scores in the fifties than any other group; 8 scores are
less than 50, and only 4 scores are above 70.

Frequency polygon
A frequency polygon also shows the shape of a distribution and is
similar to a histogram. The midpoint of each class is scaled on the X-
axis and the class frequencies on the Y-axis.
Selling price Number of Laptops sold
(in thousand taka) (Frequency)
14 – 24 19 5
24 – 34 29 2
34 – 44 39 7
44 – 54 49 6
54 – 64 59 5
64 – 74 69 5

Total 30

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
Construct a cumulative frequency polygon (ogive)?

Cumulative frequency distribution for selling price of Laptop

Selling price Number of Laptops frequency
(in thousand taka) sold (Frequency)
Top Bottom
14 – 24 5 5 30
24 – 34 2 7 25
34 – 44 7 14 23
44 – 54 6 20 16
54 – 64 5 25 10
64 – 74 5 30 5

Total 30

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR
Line diagram
A line graph is particularly useful for numerical data if we wish to show
time series data.
Census of Population of Bangladesh: 1901-1991
Year Population
1901 28.9
1911 31.6
1921 33.2
1931 35.6
1941 42
1951 44.2
1961 55.2
1974 76.4
1981 89.9
1991 111.5

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR


Population(in million)



Year 1901 1911 1921 1931 1941 1951 1961 1974 1981
Census Year

Scatter diagram
Scatter diagram are useful for displaying information on two quantitative
variables, which are believed to be inter-related. Height & weight, age &
height, income & expenditure, are some data sets that are assumed to be
related to each other. Data are below relate to the age at marriage of 20
couples obtained in a survey. A Scatter diagram displays these data. The
diagram clearly demonstrates that as age of the husband increases, the
wife’s age also increases, thus implying a positive relationship exists
between husband’s age and wife’s age.

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR

Wife's age 25
15 Wife's age

0 10 20 30 40 50
Husband's age

Solve self- review 2-1(Page: 26), self- review 2-2 (Page: 31), self- review 2-4
(Page: 33), Exercises (15-18, Page 39), self- review 2-6 (Page: 42).

Dr. Md. Siddikur Rahman, Associate Professor, Dept. of Statistics, BRUR

You might also like