0% found this document useful (0 votes)
15 views26 pages

Processing of Data

The document provides an overview of data processing in statistics, focusing on the definition and types of variables, including qualitative and quantitative variables. It discusses classification of data based on geographical, chronological, qualitative, and quantitative criteria, as well as methods for forming frequency distributions and cumulative frequency distributions. Additionally, it outlines principles for classifying data into intervals and the structure of statistical tables.

Uploaded by

sabbir.00769
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views26 pages

Processing of Data

The document provides an overview of data processing in statistics, focusing on the definition and types of variables, including qualitative and quantitative variables. It discusses classification of data based on geographical, chronological, qualitative, and quantitative criteria, as well as methods for forming frequency distributions and cumulative frequency distributions. Additionally, it outlines principles for classifying data into intervals and the structure of statistical tables.

Uploaded by

sabbir.00769
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Bangladesh University of Business and Technology

Course Title: Introduction to Statistics


Chapter: Processing of Data

Variable
If, we observe a characteristic, we find that it takes on different values in different
persons, place or things; we label the characteristic a variable. We do this for the simple
reason that the characteristic is not the same when observed in different possessors of it.

Example

i. The heights of adult males


ii. The weights of preschool children

Types of Variables

Qualitative Quantitative

Hair Color Discrete

Children in a family

Continuous

Weight of a student

Types of Variables

There are two basic types of variables

1. Qualitative

2. Quantitative

1
Qualitative Variable or an attribute
When the characteristic being studied is nonnumeric, it is called a qualitative variable or
an attribute.

Examples
Qualitative variables are gender, religious, affiliation, type of automobiles owned, state of
birth and eye color.

When the data are qualitative, we are usually interested in how many or what proportion
fall in each category. For example, what percent of the population has blue eyes? How
many Catholics and how many Protestants are there in the United States?

Quantitative Variable
When the variable studied can be reported numerically, the variable is called a
quantitative variable.

Examples
Quantitative variables are the balance in your checking account, the ages of company
presidents, the life of an automobile battery (such as 42 months) and the number of
children in a family.

Quantitative variables are either discrete or continuous.

i. Discrete Variables

Discrete variables can assume only certain values, and there are usually “gaps” between
the values. Examples of discrete variables are the number of bedrooms in a house(1,2,3,4
etc), the number of cars arriving and the number of students in each section course(25 in
section A, 42 in section B and 18 in section C).

ii. Continuous Variables.

Observations of a continuous variable can assume any value within a specific range.
Examples of continuous variables are in the air pressure in a tire and the weight of a
shipment of tomatoes.

2
Classification of Data
After collection and editing of data an important step towards processing the data is
classification.

Types of Classification
Broadly, the data can be classified on the following four basis:

i. Geographical, i.e. area-wise, e.g., cities, districts, etc


ii. Chronological, i.e. on the basis of time
iii. Qualitative, i.e. according to some attributes
iv. Quantitative, i.e. in terms of magnitudes.

Classification of
Data

Geographical Chronological Qualitative Quantitative


Classification Classification Classification Classification

i. Geographical Classification

In geographical classification data are classified on the basis of geographical or locational


differences between the various items. For example, when we present the production of
sugarcane, wheat, rice etc., for various states, this would be called geographical
classification.

Geographical classifications are usually listed in alphabetical order for easy reference.
Items may also be listed by size to emphasize the important areas as in ranking the states
by the population.

3
ii. Chronological Classification

When data are observed over a period of time the type of classification is known as
chronological classification. For examples, the sales figures of a company are given
below:

Year Sales(Tk.lakhs)
2000 18810
2001 23601
2002 23816
2003 32435
2004 39343

iii. Qualitative Classification

In qualitative classification, data are classified on the basis of some attribute or quality
such as sex, color of hair, literacy, religion etc. The point to note in this type of
classification is that the attribute under study is blindness, we may found out how many
persons are blind in a given population.

Population

Blinds Non-Blinds

iv. Quantitative Classification

Quantitative classification refers to the classification of data according to some


characteristics that can be measured, such as height, weight, income, sales etc. For
examples, the workers of a factory may be classified according to wages as follows:

Monthly No of Workers
Wages(Tk.)
1500-1600 50
1600-1700 200
1700-1800 260

4
Formation of a Frequency Distribution

The process of preparing this type of distribution is very simple. We have just to count
the number of times a particular value is repeated which is called the frequency of that
class. In order to facilitate counting, prepare a column of “tally”. In other column, place
all possible values of the variable from the lowest to the highest. Then put a bar (vertical
line) opposite the particular value to which it relates.

We finally count the number of bars corresponding to each value of the variable and
place it in the column of the frequency.

Example
The number of refrigerators sold on 22 working days by a leading agency house:

23 30 20 26 30 20 23 40 40 26 20 30

23 40 28 26 23 40 28 28 30 30

Frequency distribution of the number of refrigerators sold

No of Tally Frequency no of
Refrigerators Days

20 lll 3

23 llll 4

26 lll 3

28 lll 3

30 IIII 5

40 llll 4

The table clearly shows that on 3 days 20 refrigerators were sold each day, on 4
days 23 refrigerators were sold each day etc.

This method of classification helps in condensing the data only where values are
largely repeated, otherwise there will be hardly and condensation. In order to make the
series more compact so that its characteristics can be easily studied, data may be
classified according to class- intervals.

5
Cumulative Frequency
In some situations, we may be interested, not in the frequencies in various classes,
but rather in the frequencies or proportions of observation which are “less than” or
“greater than” a given value. This leads to a cumulative frequency distribution. This is
derived from a frequency distribution by forming a cumulative frequency column. This
column is computed by adding the successive class frequencies from top to bottom. The
entry corresponding to the top interval is the frequency of that class., the entry opposite
the second interval is the sum of the frequencies in first and second class intervals etc.
and so on.

If we divide cumulative frequency by N, the total number of observations, we get


the relative cumulative frequencies, which are often expressed in percentage.

Value f c.f Relative c.f

0-10 4 4 4/96

10-20 12 16 16/96

20-30 24 40 40/96

30-40 36 76 76/96

40-50 20 96 96/96

6
Classification according to class intervals

This type of classification is most popular in practice. The following technical


terms are important when data are classified according to class intervals:

i. Class limits

The class limits are the lowest and the highest values that can be included in the class.
For example, take the class 20-24. The lowest value of this class is 20 and the highest 40.
The two boundaries of a class are known as the lower limit and upper limit of the class.
The lower limit of a class is the value below which there can be no value in that class.
The upper limit of a class is the value above which no value can belong to that class. Of
the class 70-89, 70 is the lower limit and 89 is the upper limit, i.e. in this class there can
be no value which is less than 70 or more than 89. Similarly, if we take the class 90-109,
there can be no value in that class is less than 90 or more than 109.

ii. Class intervals

The span of a class, that is, the difference between the upper limit and lower limit, is
known as class interval. For example, in the class 20-40, the class interval is 20 (i.e. 40
minus 20). The size of the class interval is determined by the number of the classes and
the total range in the data.

iii. Class frequency

The number of observations corresponding to the particular class is known as the


frequency of that class or the class frequency.

iv. Class mid-point

It is the value lying half-way between the lower and the upper class limits of a class
interval. Mid point of a class is ascertained as follows:

Mid point of a class= (Upper limit of the class+ Lower limit of the class)/2

7
Methods of classifying the data according to class interval
There are two methods of classifying the data according to class intervals namely

a. Exclusive method
b. Inclusive method

a. Exclusive Method

When the class intervals are so fixed that the upper limit of one class is the lower limit of
the next class it is known as the „Exclusive‟ method of classification. The following data
are classified on the basis:

Income(Tk.) No of Employees
1800-1900 50

1900-2000 100

2000-2200 200

It is clear that „Exclusive method‟ ensures continuity of data inasmuch as the upper limit
of one class is the lower limit of the next class. Thus in the above example, there are 50
persons whose income is between Tk. 1800 and Tk. 1888.99. A person who is getting
exactly Tk. 1900 would be included in the class 1900-2000.

Here, whenever this method is used it is necessary to give clear instructions in the
questionnaire. However, the reader should note that if class intervals are given like 0-10,
10-20,, it is always presumed that upper limit is exclusive i.e. an observation exactly to
the upper limit is not included in that class.

b. Inclusive method

Under the “Inclusive method‟ of classification, the upper limit of one class is included in
that class itself.
Income(Tk.) No of Employees
800-899 50

900-999 100

1000-1099 200

In the class 800-899 we include persons whose income is between Tk 800 and Tk.
899. If the income of persons is exactly Tk. 900 he is included in the next class.

8
Principles of Classification

It is difficult to lay down any hard and fast rules for classifying the data as the
type of classification.

1. The number of classes should preferably be between 5 and 15. However, there
is no rigidity about it. The classes can be more than 15 depending upon the total number
of observations in the series and the details required, but they should not be less than five
because in that case the classification may not reveal the essential characteristics.

Struges suggested the following formula for determining the approximate number
of classes:

K=1+3.322 logN

K= The approximate number of classes

N= Total number of observation

Log= The ordinary logarithm to the base of 10.

However, the precise number of classes to be used for a given variable depends
upon personal judgment and other considerations such as the details required, The case of
calculation of further statistical work etc.

2. As far as possible one should avoid odd values of class intervals e.g. 3, 7, 11,
26, 39 etc. Preferably, one should have class intervals of either five or multiples of five
like 10, 20, 25, 100 etc.

3. The starting point, i.e. the lower limit of the first class, should either be zero or
5 or multiple of 5. For example, if the lowest value of the series is 63 and we have taken a
class interval of 10, then the first class should be 60-70, instead of 63-75. Similarly, if the
lowest value of the series is 76 and the class interval is 5 then the first class should be 75
to 80 rather than 76 to 81.

9
Example
The profits (in lakhs of Tk‟s) of 30 Bangladeshis companies for the year 2005-2006 are
given below:

18 16 23 37 35 49 63 65 55

45 58 57 69 20 22 35 42 37

42 48 53 49 65 39 48 67 25

29 58 65

Classify the above data taking a suitable class interval.

Solution
Let us determine the suitable class interval with the help of the following formula:

Range
i=
K
Where, K=1+3.322logN and Range=Highest value-lowest value

We have, N=30, Highest value=69, lowest value=16

K=1+3.322log30=5.91  6, Range=69-16=53

Range 53
i= =  8.97 or 9
K 5.91
Since values like 3, 7, 9 etc. should be avoided we will take 10 as the class interval and
the first class be 15-25.

Frequency Distribution of the profits

Profit(Tk.lakhs) Tally No. of companies


15-25 IIII 5

25-35 ll 2

35-45 IIII II 7

45-55 IIII l 6

55-65 llll 5

65-75 llll 5

Total 30

10
Example
The following are the marks of the 30 students in statistics. Prepare a frequency
distribution taking a suitable class interval.

12 33 23 25 18 35 37 49 54 51 37 15

27 33 42 45 47 55 69 65 63 46 29 18

37 45 46 59 29 55

Solution
Let us determine the suitable class interval with the help of the following formula:

Range
i=
K
Where, K=1+3.322logN and Range=Highest value-lowest value

We have, N=30, Highest value=69, lowest value=12

K=1+3.322log30=5.91  6, Range=69-12=57

Range 57
i= =  9.64 or 10
K 5.91

Frequency Distribution the marks in Statistics

Marks Tally Frequency


10-20 llll 4

20-30 llll 5

30-40 llll I 6

40-50 llll II 7

50-60 llll 5

60-70 lll 3

Total 30

11
Tabulation of Data
One of the simplest and most revealing devices for summarizing data and presenting
them in meaningful fashion is the statistical table. A table is a systematic arrangement of
statistical data in columns and rows. Rows are horizontal arrangemen, whereas columns
are vertical ones.

Parts of a table
The various parts of a table may vary from case to case depending upon the given data.
But a good table must contain at least the following parts:
1. Table number
2. Title of the table
3. Caption
4. Stub
5. Body of the table
6. Head note
7. Footnote

1. Table number
Each table should be numbered. There are the different practices with regard to the place
where this number is to be given. The number may be given either in the centre at the top
above the title or in the side of the table at the top or at the bottom of the table on the left
hand side.
2. Title of the table
Every table must have a suitable title.
3. Caption
Captions refer to the column headings. It explains what the column represents. It may
consists of one or more column headings. Under a column heading there may be sub-
heads.
4. Stub
As distinguished from caption, stubs are the designation of rows or row headings.
5. Body
The body of the table contains the numerically information. This is the most vital part of
the table.
6. Head note
It is used to explain certain points relating to the whole table that have not been included
in the title nor in the captions or stubs. For example, the unit of measurement is
frequently written as the head note, such as “in thousand” or “in millions” or “in crores”
etc.
7. Footnote
Anything in a table which the reader may find difficult to understand from the title,
captions and stubs should be explained in footnotes.

12
Types of Tables

Tables my broadly classified into two categories

1. Simple and Complex tables, and


2. General purpose and special purpose(or summary) tables

1. Simple and Complex tables

i. Simple table or one way table

In this type of table only one characteristics is shown. This is the simplest of tables. The
following is the illustration of such a table:

Number of Employees is an organization of age group

Age(in years) No. of Employees


Below 25 50
25-35 67
35-45 43
45-55 15
55 and above 5

Total 180

ii. Two-way Table

Such a table shows two characteristics and is formed when either the stub or the caption
is divided into two coordinate parts.

Number of Employees in an organization According to Age and Sex

Age(in years) Employees Total


Males Females
Below 25 32 18 50
25-35 40 27 67
35-45 25 18 43
45-55 10 5 15
55 & above 5 - 5
Total 112 68 180

iii. Higher order table

When three or more characteristics are represented in the same table, such a table is
called higher order table.

13
2. General Purpose and Special Purpose Tables

General purpose tables, also known as the reference tables or repository tables, provided
information for general use or reference.

Special purpose tables, also known as summary or analytical tables, provided information
for particular discussion. They show relationship between different groups of figures.

Example

Table showing percentage of coffee drinkers

Attributes Town X Total Town Y Total


Males Females Males Female
Coffee 16 9 25 18 10 28
drinkers
Non 36 39 75 37 35 72
Drinkers
Total 52 48 100 55 45 100

14
Charting Data
A chart can take the shape of either a diagram or a graph. For the sake of clarity we will
discuss them under two separate heads:

i. Diagrams
ii. Graphs

Diagrams
For representing data diagrams are more commonly used than graphs.

General Rules for Constructing Diagrams

1. Title
Every diagram must be given a suitable title. The title should convey in as few a words as
possible the main idea that the diagram is intended to portray.

2. Proportion between width and height


A proper proportion between the height and width of the diagram should be maintained.
If either the height or width is too short or too long is proportion, the diagram would give
an ugly look.

3. Selection of appropriate scale


The scale showing the values should be in even numbers or in multiples of five or ten e.g.
25, 50, 75 or 20, 40, 60. Odd values like 1, 3, 5, and 7 should be avoided.

4. Footnotes
In order to clarify certain points about the diagrams footnotes may be given at the bottom
of the diagram.

5. Index
Index illustration different types of lines or different shades, colors, should be given so
that the reader can easily make out the meaning of the diagram.

6. Neatness and cleanliness


Diagrams should be absolutely neat and clean

7. Simplicity
Diagrams should be as simple as possible so that the reader can understand their meaning
clearly.

15
Types of
Diagrams

One-dimensional Two dimensional Pictograms and


Diagrams Diagrams cartograms
ams Two Dimensional
grams Diagrams
Bar Diagrams Rectangle,Squares
And Circles

One- dimensional or Bar Diagrams

Bar diagrams are the most common type of diagrams used in practice. A bar is a thick
line whose width is shown merely for attention. They are called one-dimensional because
it is only the length of the bar that matters and not the width.

Points to be kept in mind while constructing Bar Diagrams

1. The width of the bars should be uniform throughout the diagram.

2. The gap between one bar and another bar should be uniform throughout.

3. Bars may be either horizontal or vertical. The vertical bars should be preferred
because they give better look and also facilitate comparison.

4. While constructing the bar diagrams, it is desirable to write the respective figure
at the end of each bar so that the reader can know the precise value without
looking at the scale.

Types of Bar Diagrams

1. Simple Bar Diagrams


2. Sub-divided Bar Diagrams
3. Multiple Bar Diagrams
4. Percentage Bar Diagrams
5. Deviation Bar Diagrams
6. Broken Bar Diagrams

16
Simple Bar Diagrams

A simple bar diagram is used to represent only one variable. For example the figures of
sales, production, population etc, for various years may be shown by means of a simple
bar diagram. However, an important limitation of such diagrams is that they can present
only one classification or one category of data.

Example

The funds flow of Goodwill India Ltd from 1991-92 to 1995-96 are given below:

Year Funds Flow


(Rs.Crores)
1991-92 85.80

1992-93 109.61

1993-94 204.29

1994-95 126.31

1995-96 209.89

Represent this data by a suitable bar diagram.

Simple bar diagram

250
204.29 209.89
Funds flow(Rs.crores)

200

150 126.31
109.61 Funds Flow
100 85.8

50

0
1991-92 1992-93 1993-94 1994-95 1995-96
Years

17
Sub-divided Bar Diagrams
These diagrams are used to represent various parts of the total. For example, the number
of employees in various departments of a company may be represented by a sub-divided
bar diagrams. While constructing such a diagram the various components in each bar
should be kept in the same order. To distinguish between the different components, it is
useful to use different shades or colors. Sub-divided bar diagrams can be vertical as well
as horizontal.

Example
Represent the following data by sub-divided bar diagrams
(in Rs.Crores)

Year Department 1 Department 2 Department 3


1983-84 233.8 365.3 283.4
1984-85 301.8 484.7 473.8
1985-86 303.2 668.6 402.8

Sub-divided bar diagrams


1600
1400
1200 668.6
Rs.crores

1000 402.8 1985-86


800
303.2 1984-85
600 484.7
473.8 1983-84
400 301.8
200 365.3 283.4
233.8
0
Department 1 Department 2 Department 3
Year

18
Multiple bar Diagrams

In multiple bar diagram two or more sets of inter-related data are represented. The
technique of drawing such a diagram is the same as that of simple bar diagram. The only
difference is that since more than one phenomenon is represented, different shades, colors
or crossings are used to distinguish between the bars.

Example

Corporate Sector Profits(Rs.crores)


1994-95 1995-96
Gross profits 3104 3123

Profits before tax 1663 1376

Profits after tax 1219 982

Multiple bar diagrams

3500 3104 3123


3000
Corporate sector
profit(Rs.crores)

2500
Gross profits
2000 1663
1376 Profits before tax
1500 1219
982 Profits after tax
1000
500
0
1994-95 1995-96
Year

19
Graphs of Frequency Distributions
A frequency distribution can be presented graphically in any of the following diagrams:

1. Histogram
2. Frequency Polygon
3. Smoothed frequency curve
4. Cumulative frequency curves or „Ogives‟.

Histogram
A histogram is a graphical method for presenting data, where the observations are located
on a horizontal axis (usually grouped into intervals) and the frequency of those
observations is depicted along the vertical axis.

While constructing histograms the variable (class interval) is always taken on the X-axis
and the frequencies depending on it on the Y axis. The distance for each rectangle on the
X-axis shall remain the same in case the class intervals are uniform throughout; if they
are different the width of the rectangles shall also vary. The Y axis represents the
frequencies of each class which constitute the height of its rectangle.

The histogram is most widely used for graphically presentation of a frequency


distribution. However, we cannot construct a histogram for distributions with open-end
classes.

Histogram and Bar Diagram


First, a histogram is used for representing a frequency distribution only but a bar diagram
is never used for representing a frequency distribution. A bar diagram is one-dimensional
i.e. only the length of the bar is material and not the width; a histogram is two
dimensional, that is in a bar histogram both the length as well as the width are important.

The technique of constructing histogram is now illustrated

i. For distribution having equal class-intervals

ii. For distribution having unequal class-interval

20
Construction of Histogram when Class-intervals are Equal
When class-interval are equal, take frequency on the Y axis, the variable on the X-axis
and construct adjacent rectangles. In such a case the heights of the rectangles will be
proportional to the frequencies.

Example

Represent the following data by a histogram:

Size Class Frequency Size Class Frequency


0-10 5 50-60 10
10-20 11 60-70 8
20-30 19 70-80 6
30-40 21 80-90 3
40-50 16 90-100 1

25
21
0-10
20 19
10-20
16
20-30
Frequency

15
30-40
11
10 40-50
10 8
6 50-60
5
5 60-70
3
1 70-80
0 80-90
1
Size class 90-100

Fig: Histogram when Class intervals are equal

21
Construction of Histogram when Class-intervals are Unequal

When class-intervals are equal, the frequencies must be adjusted before constructing the
histogram. For making the adjustment we take that class which has the lowest class-
interval and adjust the frequencies of other classes in the following manner. If one class-
interval is twice as wide as the one having lowest class interval, we divide the height of
its rectangle by two; if it is three times more, we divide the height of its rectangles by
three.

Example
Represent the following data by means of a histogram.

Weekly wages No of workers Weekly wages No of workers


(in Rs.) (in Rs.)
110-115 7 130-140 12(6)
115-120 19 140-160 12(3)
120-125 27 160-180 8(2)
125-130 15

22
Pie Diagram
This type of diagram enable us to show the portioning of a tatal into component parts. A
very common use of the pie chart is to represent the division of a sum of money into its
components. For example, the entire circle or pie, may represent the budget of a family
for a month and the sections may represent portions of the budget allotted to rent, food,
clothing and so on. Similarly, through a pie diagram we can show how a rupee by a firm
is distributed over various heads such as wages, raw materials, administration expenses
etc.

Example
Areas of continents of the world

Continent Area (millions of


square miles)
Africa 11.7
Asia 10.4
Europe 1.9
North America 9.4
Oceania 3.3
South America 6.9
U.S.S.R 7.9
Total 51.5

Continent Area (millions of square miles)

U.S.S.R, 7.9 Africa


Africa, 11.7
Asia
South Europe
America, 6.9
North America
Oceania
Oceania, 3.3 Asia, 10.4
South America
North America, Europe, 1.9 U.S.S.R
9.4

23
The pie diagram is intended to compare the distinct components which together constitute
a whole. The whole is represented by a circle of arbitrary radius and the segments of the
circle represent the component parts. To construct such a diagram we use the fact “the
whole” (51.5 in the above illustration) corresponds to the total number of degrees in the
circular arc, namely 3600. This 3600 is then proportionately divided among the various
components of the whole. Thus the above illustration; the arc of the segment representing
3600
Asia subtends on angle of 730(=  10.4 ) at the centre of the circle.
51.5

This diagram should be sparingly used, especially if there are many segments.

24
Line Diagram
If we are given values of a variable at different points of time, the set of values is known
as a time series. The line diagram is used to represent this type of data. In this diagram
time is represented along the X-axis and the variable is plotted along the Y-axis.Thus we
get a point, for each time period and successive points, when connected by straight lines,
give the desired diagram. Often smooth curve is drawn through these points. This
diagram is alternatively called a line diagram or a time series graph.

Example

Below are given the figures of production (in thousand quintals) of a sugar factory:

Year Production
(in ‘000 qtl)
1999 80
2000 90
2001 92
2002 83
2003 94
2004 99
2005 92

By using line diagrams plot these figures.

120

100 99
92 94 92
90
80 80 83
Production

60

40

20

0
1998 1999 2000 2001 2002 2003 2004 2005 2006
Year

25
26

You might also like