0% found this document useful (0 votes)
19 views5 pages

Class 2

Statistics and Probability notes 2.

Uploaded by

xyztempo69
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views5 pages

Class 2

Statistics and Probability notes 2.

Uploaded by

xyztempo69
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2.

2 Classification of Data
Classification of data depends on the type of variable we measure. We
can observe different types of variables in practice, each of which belongs
to any one category of the following broad classification:
Variable
Qualitative Quantitative
or or

Categorical Numerical

Ordinal Non-ordinal Discrete Continuous


Fig. 2.1 Classification of variable
The variable which cannot be measured
measured by their quality is called
numerically but can be
qualitative variable. Since these variables
place individuals in different categoriés, they are also called
variables ( or attributes). The variables like categorical
gender, hair-colour,
nationality,
response of a patient to a
therapy' (none, partial or complete cure),
cloudiness of sky' (overcast, mostly
are the
cloudy, partly cloudy or sunny) etc.
examples of categorical variable. There are some categorical
Variables, where the categories can be arranged
in natural
Small to order, i.e., from
big or from high
low etc..
to Those variables are called ordinal
categorical variables. For example, income group may be classified as
nedaum and low, effectiveness of medicine as high,
low, moderate and high,
C1al class as upper, middle and low etc.. In other type of categorical
anable no idea of order
amongst the categories exist. They are called
O-ordinal, e.g., blood group, religion, marital status etc.
A
variable which can be measured numerically is called
quantitative
merical
the variable. Height, weight, temperature, house rent etc. are
example of quantitative variable.
Note that the variables like
telephone number, date of birth etc., though
Cxpressedt
ype of qua
some numbers, are not quantitative, but qualitative. These
qualitative variables are known as nominal variable, They do not
n y arithmetic property of the numbers. The arithmetic operations
for those
division, averaging carry practical meaning
no
ike addition, for quantitative variables
variables. These operations are meaningful only
make an average of genders
or telephone th
it does not make any sense to
numbers or date of births.
as discrete and
further be classified
A quantitative variable can
which can assume finite
o r countable

continuous. A quantitative variable


discrele variable, e.g,
isolated values is called a
number of discrete or
number of
size, number of children in a family,
word length, family
number of people visiting
calls received by an operator each day,
telephone a farmer, number of students
a bank on a day,
number of cattle owned by
that a discrete variable can also
in a class etc.. It is worth mentioning
women in a group
assume fractional values,
for example, proportion of
0.3, 0.9, 1. Discrete variables
of 10. Its possible values are 0, 0.1, 0.2,
.,

as finite-valued or infinite-valued according


as
can again be classified
variables have finite or infinite number
of possible values, for example,
the
infinite-valued (countably infinite) discrete
'number oftelephone calls'is an finite-value screte variable.
variable, 'number of students in a class' is
a

A quantitative variable which any numerical value within


can assurre
a certain interval of real line is caller a continuous variable. Daily
temperature, time taken to finish ajob, weight, height, age etc. are examples
of continuous variable. A continuous variable can assume values with finite
or infinite range. For example, IQ of a student' has finite range of values,
where 'yield of paddy per acre' can have infinite range of values.

Note that sometimes the same continuous variable may be viewed as


discrete, discrete as continuous. Normally, 'age' is a continuous variable
(as we are continuously growing older with time). But if we define it as
'age at last birth day (lbd), it will then become discrete, measured in
complete years.
Likewise sometimes a quantitative variable can be looked upon as a
qualitative variable. If we record age in the categories like "below 10 years,
10 years to under 20 years, 20 years or more', then it would be considered
as an ordinal qualitative variable.

From now on, by the word 'variable' (or 'variate') we will refer to a
'quantitative variable', and by 'attribute' we refer to a 'qualitative variable'.
Now depending on the type of the variable we measure, we classiy

the data as follows


Data

Qualitative Quantitative

Ordinal Non-ordinal Discrete Continuous


Fig. 2.2 Classification of data
The data collected are called qualitative, if measured on a qualitative
variable, and quantitative, if measured on a quantitative variable. The
quantitative data are measured on a naturally occurring numerical scale.
The qualitative data do not contain any numerical observations.

The following is another way of classifying the data

Data

Frequency data Non-frequency (or series) data


Time series data Cross-sectional data

Fig. 2.3 Altemative classification of data


Data where the values of the variable occur repeatedly and are
expressed as the statement of possible variate values together with
respectivefrequencies (the number of times of occurrence of that particular
value in the data set) are called frequency data.
Data arranged or recorded in the form of series are called non-
frequency data or series data. Non-frequency data are of two types, viz.
time series (or chronological or historical or temporal) data and cross-
sectional data. The data collected on the same unit for the same variable
Jor diferent points (or periods) of time are called time series data. Average
prices of houses (rate per square feet of area) in Calcutta for the years
1990-2002, yield of paddy recorded for the last ten years, petroleum price
months etc. at
etc.a
in last twelve
atCalcutta in last five years, export of tea
data. Such data arise
due to the change i
the examples of time-series
variate values over time.
the same pomt (or period)
The data collected on different units for for ten can
Time are called cross-sectional
data. The present prices
an example
of crosS-sectional dat
1999 models) of different make is
In particular, if cross-sectional data
relate to different places, then it
called spatial(or geographical) series
data, such as maximum temperatu
on a particular day,
total population of differe
of different cities in India
in diferent districts of Wey
states as per 1991 census, yield of paddy
reserve (in million of fine troy ounce
Bengal in 2001, the amount of gold
countries in the year 2000.
held bygovernments of different
time and cross
The data that are measured across two dimensions,
for example, the dat
section, are called longitudinal data or panel data,
on petroleum prices for last seven years in five cities, like Calcutta, Delhi
Mumbai, Chennai, Hyderabad, weights of five new born babies in thei
first six months.

2.3 Collection of Data


Every statistical investigation begins with a definite objective, and
collection of data is the first step towards achieving that objective. At the
time of collecting the data we have to keep the following basic points in
mind what purpose the data are expected to serve, what the related
variables are and what the target population is, i.e., what individuals the
data are going to describe.
There are two sources of data:

Primary source, and


Secondary source
Primary source is a source where data are collected directly from the
field of enquiry and secondary source is a source where
data, originally
collected by some other agency, are available in a
published form
The data from a
primary source are called primary data and the dati
called secondary data. Primary data are those
from secondary
a source are

which are collected the first time directly from the field of enquiry
for in
according to the purpose of the study These data are original nature,
thus more reliable. Theycan be used with greater confidence, because

the investigator (or the authority) himself is responsible for collection of


the data and has enough control on different aspects of the data collection.
But the collection of such data requires more money, man -power and
time. Secondary data are those which have previously been collected by
some other agency for some other purpose and are just compiled from that
en source for the purpose of the current study. Data compiled from books,

journals, etc. are secondary data. It is clear that the primary data of one
study become the secondary data, when used for some other statistical
investigation. Census data, collected for the purpose of population census
are primary data, but the same data, compiled from census report when
SS
used for some other study, becomes a secondary data. The collection of
dat
secondary data is not very expensive, and is less time-consuming.
However, while collecting secondary data, the investigator has to be very
he
careful about the reliability of the source, method used to collect the
original data, the scope and objective of the original survey, the definitions
and units of measurements used at the time of collecting the original data
etc.. While using secondary data one should keep this in mind that the
an data may contain transcription error or estimation error due to rounding
th off the actual figures, and so, he should be careful in this regard.

You might also like