0% found this document useful (0 votes)
9 views

Lecture 2_classification_frequency

Uploaded by

mohakbhasin.work
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture 2_classification_frequency

Uploaded by

mohakbhasin.work
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

PRESENTATION OF DATA

Dr. Neha S. Dhanawade


Assistant Professor, GIPE
SOURCE OF DATA

How do we collect or find data?

According to source of data-


○ Primary data
○ Secondary data
SOURCE OF DATA

Sources of Primary Data: Sources of Secondary data:


● Surveys ● Health departments
● Focus groups ● Vital Statistics – birth, death
● Questionnaires certificates

● Personal interviews ● Hospital or clinic record

● Experiments and ● Laboratory, Schools


observational study ● Surveillance data from state
government programs
● Federal agency statistics –
Census etc.
DATA COLLECTION METHODS

●Documents
●Direct Observations
● Survey
DRAFTING THE
QUESTIONNAIRE
A questionnaire
is a standardised set of questions administered
to the respondents in a survey

Respondents are required to interpret a


pre-established set of questions and to supply
the information these questions seek.
INITIAL CONSIDERATIONS

● Is the development of a new questionnaire . . .


○ Necessary?

○ Feasible?

● Select mode of administration


○ Web Based?

○ Paper and pencil?

○ Combination?
STEPS
1. Determine the Objective (consider analyses)
2. Determine the Mode of Administration
3. Determine the Sampling Methodology
4. Construct the Questionnaire
5. Institutional Approval
6. Conduct the Pilot Study
7. Write the Initial Communication
8. Send the Questionnaire
9. Follow up
10. Analyze the Results
PRACTICAL EXAMPLE

Problem:
Businesses like banks which provide service have to
worry about problem of ‘Churn’ i.e. customers leaving
and joining another service provider. It is important to
understand which aspects of the service influence a
customer’s decision in this regard. Management can
concentrate efforts on improvement of service, keeping in
mind these priorities. So management decided to take a
survey of customer. Draft a questionnaire.
PRESENTATION OF DATA

Concept of Variable
Classification
Frequency distributions
Tabulation of data
VARIABLE
A characteristic which takes on different values in
different persons, place or things.
Example:
Availability and quality of water
Number and value of livestock.
Livestock purchases and sales.
Land and buildings,
deadstock and circulating capital.
Grants and Subsidies.
Sales, Investments, logistics and transportation
Employment, inflation, productivity, Value Added Tax,
taxes and interest rates
Age, sex, business income and expenses, capital expenditure,
class grades, eye colour , vehicle type
Diastolic blood pressure,
heart rate,
the heights of adult males,
the weights of preschool children
DEFINITION OF CLASSIFICATION

● Classification is the process of arranging data into


sequences and groups according to their common
characteristics or separating them into different but
related parts.
BASIS OF DATA CLASSIFICATION

Broadly 4 broad basis

(i) Geographical classification


(ii) Chronological classification
(iii) Qualitative classification
(iv) Quantitative classification
1. Geographical
In geographical classification, data are classified on the
basis of geographical or location differences such as cities,
districts, or villages i.e. area wise
Example
2. Chronological or Temporal
When data are classified on the basis of time, the classification is
known as chronological classification i.e. on the basis of time

Table: 2 Death due to specific Disease


Year Number
1990 10
1991 5
1992 12
1993 6
1994 9
1995 3
1996 3
1997 5
1998 12
1999 12
2000 8
2001 7
2002 8
Total 100
3. Qualitative
Data are classified on the basis of descriptive characteristics or on the
basis of attributes like sex, literacy, region, caste, or education, which
cannot be quantified. This is done in two:
1. Simple classification : two subclasses (ex. Male or female )
2. Manifold classification: More than two classes or based on more
than one attribute at a time.
Example: People by place of residence, sex and literacy
Place of residence

Rural Urban
Male Female Male Female
Literate Illiterate Literate Illiterate Literat Illiterate Literate Illiterate
e
4. Quantitative: On the basis of quantitative class intervals
In this classification, data are classified on the basis of
characteristics which can be measured such as height, weight,
income, expenditure, production, or sales.
Examples of continuous and discrete variables in a data set are
shown in the following Table
Practical Example:
In a survey of 35 families in a village, the number of
children per family was recorded data were obtained.

1 0 2 3 4 5 6
7 2 3 4 0 2 5
8 4 5 9 6 3 2
7 6 5 3 3 7 8
9 7 9 4 5 4 3
PRINCIPLES OF CLASSIFICATION

There is no hard and fast rules for deciding


the class interval, however it depends upon:
■Knowledge of the data
■Lowest and highest value of the set of
observations
■Utility of the class intervals for meaningful
comparison and interpretation

r
Classification will be called exclusive (Continuous),
when the class intervals are so fixed that the upper
limit of one class is the lower limit of the next class
and the upper limit is not included in the class.
An example
Income (Rs.) No. of
families
1000 – 1100 = (1000 but under 15
1100)
1100 – 1200 = (1100 but under 25
1200)
1200 – 1300 = (1200 but under 10
1300)
Total 50
● Classification will be inclusive (discontinuous) when
the upper and lower limit of one class is include in that
class itself
Income (Rs.) No. of
persons
1000 – 1099 = (1000 but < 50
1099)
1100 – 1199 = (1100 but < 100
1199)
1200 – 1299 = (1200 but < 200
1299)
Total 300
Exclusive Series Inclusive Series

Upper limit of one class is equal


Limits to the lower limit of next class. The two limits are not equal.
The value equal to the upper
limit is included in the next Both upper & lower limits are
Inclusion class. included in the same class.
Inclusive series is converted
It does not require any into exclusive series for
Conversion Conversion calculation purpose.

It is suitable in all situations. It


is suitable only when the values It is suitable only when the
Suitability are in integers values are in integers
● Discontinuous class interval can be made continuous
by applying the Correction factor.

Lower limit of 2nd Class – Upper limit of the 1st Class


CF =
2

From last example CF= (1100-1199)/2 =0.5

The correction factor is subtracted from the


lower limit and added to the upper limit to
make the class interval continuous.
FREQUENCY DISTRIBUTIONS

● Quantitative Variables:
○ Discrete variable
○ Continuous variable
● Qualitative variable (attributes)
The manner in which the total number of observations are
distributed over different classes is called a Frequency
Distribution.
FREQUENCY DISTRIBUTION OF AN
ATTRIBUTE
Table : Results of survey on
Awareness On COVID-19
Calcutta, Bombay and State of Number of
Madras were surveyed. Knowledge people
Each was asked, Aware 1054
among, other Unaware 620
Total 1674
questions, whether
he/she knew about the Table : Proportion of
COVID-19. The results people Aware of COVID-19
is tabulated. State of Relative
Knowledge frequency
Aware 0.630
Unaware 0.370
Total 1.000
FREQUENCY DISTRIBUTION FOR
DISCRETE VARIABLE
● Ungrouped frequency distributions - can be used for data that
can be enumerated and when the range of values in the data
set is not large.
Example: In a survey of 35 families in a village, the number
of children per family was recorded data were obtained.

1 0 2 3 4 5 6
7 2 3 4 0 2 5
8 4 5 9 6 3 2
7 6 5 3 3 7 8
9 7 9 4 5 4 3
STEPS FOR FREQUENCY
DISTRIBUTION
1. Find the largest & smallest value;

2. Form a table with 10 classes for the 10 values

3. Look at the given values of the variable one by


one and for each value put a tally mark in the
table against the appropriate class.

4. To facilitate counting, the tally marks are


arranged in the blocks of five every fifth stroke
being drawn across the proceeding four. This is
done below.
Table 6: Frequency Table
Cumulative Cumulative
No. of Frequency Frequency
Tallies Frequency
children Less than More than
type type
0 ⏐⏐ 2 2 35
1 ⏐ 1 3 33
2 ⏐⏐⏐⏐ 4 7 32
3 ⏐⏐⏐⏐ 13 28
6

4 ⏐⏐⏐⏐ 5 18 22
5 ⏐⏐⏐⏐ 5 23 17
6 ⏐⏐⏐ 3 26 12
7 ⏐⏐⏐⏐ 4 30 9
8 ⏐⏐ 2 32 5
9 ⏐⏐⏐ 3 35 3

Cumulative frequency is number of observations


less than or equal to the class or upper class limit
of each class.
FREQUENCY DISTRIBUTION FOR
CONTINUOUS VARIABLE

● Grouped frequency distributions - can be


used when the range of values in the data
set is very large.
● The data must be grouped into classes
that are more than one unit in width.
● Examples - the life of electric bulbs in
hours.
LIFETIMES OF ELECTRIC BULBS -
EXAMPLE

Type of No. of Cumulati


bulbs bulbs % Class Freq ve
incandescent 6 15 Class Boundari uenc Frequenc
halogen, 14 35 Limit es y y
fluorescent 5 12.5
LED 15 37.5 24-37 23.5-37.5 4 4
total 40 100
38-51 37.5-51.5 14 18

52-65 51.5-65.5 7 25
TERMS ASSOCIATED WITH A
GROUPED FREQUENCY
DISTRIBUTION

●class limits
●lower class limit
●upper class limit
●class width
STEPS FOR CONSTRUCTING
‘A GROUPED FREQUENCY DISTRIBUTION
● Find the highest and lowest value.
● Find the range.
● Select the number of classes desired.
● Step 1: Decide on the number of classes.
A useful recipe to determine the number of classes (k) is the “2 to the k rule.”
such that 2k > n.
There were 80 6vehicles sold. So n = 80. If we try k = 6, which means we would use 6
classes, then 2 7 = 64, somewhat less than 80. Hence, 6 is not enough classes. If we
let k = 7, then 2 128, which is greater than 80. So the recommended number of
classes is 7.

● Find the width by dividing the range by the number of classes and rounding up.
● Step 2: Determine the class interval or width.
The formula is: i ≥ (H-L)/k where i is the class interval, H is the highest
observed value, L is the lowest observed value, and k is the number of
classes.
($35,925 - $15,546)/7 = $2,911
Round up to some convenient number, such as a multiple of 10 or 100. Use a class
width of $3,000

● Select a starting point (usually the lowest value); add the width to get the lower limits.
● Find the upper class limits.
● Find the boundaries.
● Tally the data, find the frequencies and find the cumulative frequency.
STEPS FOR CONSTRUCTING
‘A GROUPED FREQUENCY
DISTRIBUTION’

● Find the highest and lowest value.


● Find the range.
● Select the number of classes desired.
● Find the width by dividing the range by the number of classes and
rounding up.
● Select a starting point (usually the lowest value); add the width to get
the lower limits.
● Find the upper class limits.
● Find the boundaries.
● Tally the data, find the frequencies and find the cumulative frequency.
GROUPED FREQUENCY
DISTRIBUTION - EXAMPLE
IN A SURVEY OF 20 PATIENTS WHO SMOKED, THE FOLLOWING
DATA WERE OBTAINED. EACH VALUE REPRESENTS THE NUMBER
OF CIGARETTES THE PATIENT SMOKED PER DAY. CONSTRUCT A
FREQUENCY DISTRIBUTION USING SIX CLASSES.
Difference Between Classification and Tabulation
TYPE OF CLASSIFICATION

One way table: When the data is classified with respect to one characteristic
then the table is known as one way table.
For ex. Data is collected on number of students in the degree college for the
year 2013-14, Then we can tabulate the information according to classes.

Table No. 1
No. of students in the college for the year 2013-14

F. Y. M.A. S.Y. M.A. Total


TYPE OF CLASSIFICATION
Two way table :When the data is classified with respect to two characteristics
then the table is known as two way table.
For ex. Data is collected on number of students in the degree college for the
year 2013-14 , then
we can tabulate the information according to classes and sex.

Table No. 2
No. of students in the college for the year 2013-14
Class F. Y. M.A. S.Y. M.A. Total
Sex
Male
Female
Total
TYPE OF CLASSIFICATION
Three way table: In the last data if we add one more characteristic namely
category (open, reserved), then we can tabulate the information according to
classes , sex and category then we have to prepare three way table.

Table No. 3
No. of students in the college for the year
2013-14
Class F. Y. M.A. S.Y. M.A. Total
Gender | Open Reserve Open Reserve Open Reserve
Category -> d d d

Male
Female
Total
PRACTICAL EXAMPLE
A survey of 370 students from the Commerce Faculty and 130
students from the Science Faculty revealed that 180 students were
studying for only C.A. Examinations, 140 for only Costing
Examinations, and 80 for both C.A. and Costing Examinations. The
rest had opted for part-time management courses. Of those studying
for Costing only, 13 were girls and 90 boys belonged to the
Commerce Faculty. Out of the 80 studying for both C.A. and
Costing, 72 were from the Commerce Faculty amongst whom 70
were boys. Amongst those who opted for part-time management
courses, 50 boys were from the Science Faculty and 30 boys and 10
girls from the Commerce Faculty. In all, there were 110 boys in the
Science Faculty.
Present this information in a tabular form. Find the number of
students from the Science Faculty studying for part-time
management courses.
SOLUTION
PRACTICAL EXAMPLE WITH EXCEL
Create frequency distribution in excel :

A sample of 100 claims for damage due to


water leakage on an insurance company’s
household contents policies might be as
follows:
243 306 271 396 287 399 466 269 295 330
425 324 228 113 226 176 320 230 404 487
127 74 523 164 366 343 330 436 141 388
293 464 200 392 265 403 372 259 426 262
221 355 324 374 347 261 278 113 135 291
176 342 443 239 302 483 231 292 373 346
293 236 223 371 287 400 314 468 337 308
359 352 273 267 277 184 286 214 351 270
330 238 248 419 330 319 440 427 343 414
291 299 265 318 415 372 238 323 411 494

You might also like