Lecture - 1 Introduction
Lecture - 1 Introduction
INTRODUCTIO
N
Meanings and Definitions of Statistics
It is extremely difficult to define Statistics and, for that matter, most difficult in a
few words. No definition of Statistics is perhaps beyond controversy. Before an attempt is
made to define the subject, it is necessary to point out that the term Statistics is used in
three distinct senses:
Although the term is used in three different senses, it is usually clear from the
context what it means in any particular instance and there is hardly any room for
confusion in practice.
In recent times, it has also been defined as the theory of decision-making in the
face of uncertainty.
STATISTICS
Types of Statistics
The study of statistics is usually divided into two categories: descriptive statistics and
inferential statistics.
Uses of Statistics
Population and sample are two very important terms in Statistics. It is necessary
to define these terms and also, to distinguish between them.
Sources of Data
Primary data: We may collect data ourselves, then the data is called primary
data. For example, the data collected from our own experimental plots or own enquiry or
own investigation or own survey is called the primary data.
Secondary data: We can get data from available sources also and such type of
data is called secondary data. The main sources of secondary data in our country are,
BBS, ICDDRB, BRAC, NIPORT (National Institute of Population Research and Training ),
some other NGOs and the largest source of the secondary is Internet.
Variable:
A variable is a measurable quantity which varies from one to another. For
example, the height of the students, weight of the students, expenditure of the students,
etc.
Random Variable: The variable associated with probability is called the random
variable.
Types of Variables:
There are two basic types of variables: (1) Qualitative and (2) Quantitative.
Quantitative Variable: A variable is a measurable quantity, which can vary within its
domain. For example, yield of a crop is a variable, because it is a measurable quantity
within its domain. Conceptually, the domain of a variable is defined by all possible
measurements that can be taken by the variable. Thus, we can say that all possible values
of a variable will constitute its domain. For the variable, yield of a crop, if the lowest
value in the measurements is considered to be 0 and the highest value is 30, then the
domain of yield of that crop is obviously (0-30).
Quantitative Variable:
Discrete variable: When a variable can assume only isolated values, it is called a
discrete variable. For example, if the number of students in different department is
the variable of interest, it is obvious that it cannot assume fractional values and
hence it is a discrete variable. Children in a family, Number of fruits on a tree,
Number of Cell phones in a family, etc. are the examples of the discrete variable.
Level of Measurement
Nominal level:
All qualitative measurements are nominal regardless of whether the categories
are designed by names (red, white, male) or numerals (June 20, Room no. 10, account
no., ID no. etc). In nominal level of measurement, the categories differ from one another
only in names. What one must ensure in this level of measurement is that the categories
must be homogeneous, mutually exclusive and no assumptions about ordered
relationships between categories. Some examples are Eye color, Religion, Place of
residence etc.
For the nominal level of measurement observations of a qualitative variable can
only be classified and counted. There is no particular order to the labels.
Ordinal Level:
When there is an ordered relationship among the categories, we achieve what we
refer to as the ordinal level of measurement. The categories are distinct, mutually
exclusive and exhaustive as well. Example of ordinal data are Academic degrees ( MA.,
BA etc), Soci-economic status ( high, medium, low), Rank in job etc.
The next higher level of data is the ordinal level. Table 1-2 lists the student
ratings of Professor James Brunner in an Introduction to Finance course. Each student in
the class answered the question “Overall, how did you rate the instructor in this class?"
The variable rating illustrates the use of the ordinal scale of measurement. One
classification is "higher" or "better" than the next one. That is, “Superior" is better than
"Good," "Good" is better than "Average," and so on: However, we are not able to
distinguish the magnitude of the differences between groups. Is the difference between
"Superior" and "Good" the same as the difference between "Poor" and "Inferior"? We
cannot tell. If we substitute a 5 for "Superior" and a 4 for "Good," we can conclude that
the rating of "Superior" is better than the rating of "Good," but we cannot add a ranking
of "Superior" and a ranking of "Good," with the result being meaningful. Further we
cannot conclude that a rating of "Good" (rating is 4) is necessarily twice as high as a
"Poor" (rating is 2). We can only conclude that a rating of "Good" is better than a rating
of "Poor." We cannot conclude how much better the rating is.
Rating Frequency
Superior 6
Good 28
Average 25
Poor 12
Inferior 3
Interval Level:
The interval level of measurement includes all the properties of the nominal and
ordinal level but an additional property that the difference (interval) between values is
known and of constant size. Here an arbitrary zero point is assumed. Some examples are
Temperature, IQ test score, Calendar time.
The interval level of measurement is the next highest level. It includes all the
characteristics of the ordinal level, but, in addition, the difference between values is a
constant size. An example of the interval level of measurement is temperature. Suppose the
high temperatures on three consecutive winter days in Boston are 28, 31, and 20 degrees
Fahrenheit. These temperatures can be easily ranked, but we can also determine the
difference between temperatures. This is possible because 1 degree Fahrenheit represents
a constant unit of measurement. Equal differences between two temperatures are the same,
regardless of their position on the scale. That is, the difference between 10 degrees
Fahrenheit and 15 degrees is 5, the difference between 50 and 55 degrees is also 5 degrees.
It is also important to note that 0 is just a point on the scale. It does -not represent the
absence of the condition. Zero degrees Fahrenheit does not represent the absence of
heat, just that it is cold! In fact 0 degrees Fahrenheit is about -18 degrees on the Celsius
scale.
Why is the "size" scale an interval measurement? Observe as the size changes by 2
units (say from size 10 to size 12 or from size 24 to size 26) each of the measurements
increases by 2 inches. To put it another way the intervals are the same.
Ratio level:
In practice all quantitative data fall under the ratio level of measurement. It has
all the ordering and distance properties of interval level. In addition a ‘zero point’ can
be meaningfully designated and thus ration between two numbers is also meaningful.
Examples are Height, weight, Fat consumed, wages etc.
Practically all quantitative data is recorded on the ratio level of measurement. The
ratio level is the "highest" level of measurement. It has all the characteristics of the
interval level, but in addition, the 0 point is meaningful and the ratio between two
numbers is meaningful. Examples of the ratio scale of measurement include wages,
units of production, weight, changes in stock prices, distance between branch offices,
and height. Money is a good illustration. If you have zero dollars, then you have no
money. Weight is another example. If the dial on the scale of a correctly calibrated
device is at 0, then there is a complete absence of weight. The ratio of two numbers is
also meaningful. If Jim earns $40,000 per year selling insurance and Rob earns $80,000
per year selling cars, then Rob earns twice as much as Jim.\
Types of Variables
Qualitative Quantitative
Statistic: Any function of sample values which is an estimate of the parameter and
which is a known value is called a statistic. A statistic is called an estimator also when it
is used to estimate a parameter. From a practical point of view, if we could get the
numeric value of an estimator, then that numeric value is called an estimate of the
parameter.
It is important to note that whether a census or a sample is used, both provide information that can be
used to draw conclusions about the whole population.
Information from the sampled units is used to estimate the characteristics for the entire population of
interest.
Once a population has been identified a decision needs to be made about whether
taking a census or selecting a sample will be the more suitable option. There
are advantages and disadvantages to using a census or sample to study a population: