0% found this document useful (0 votes)
76 views

Lecture - 1 Introduction

This document provides an introduction to statistics, including: 1) Statistics is the science of collecting, organizing, summarizing, presenting, analyzing, and interpreting data to assist in making more effective decisions. 2) There are two main categories of statistics: descriptive statistics which organizes and summarizes data, and inferential statistics which estimates properties of populations based on samples. 3) Statistics has wide applications across many fields of science and research. It is important that statistics are used carefully and appropriately to avoid misleading conclusions.

Uploaded by

Saiqa Riidi
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Lecture - 1 Introduction

This document provides an introduction to statistics, including: 1) Statistics is the science of collecting, organizing, summarizing, presenting, analyzing, and interpreting data to assist in making more effective decisions. 2) There are two main categories of statistics: descriptive statistics which organizes and summarizes data, and inferential statistics which estimates properties of populations based on samples. 3) Statistics has wide applications across many fields of science and research. It is important that statistics are used carefully and appropriately to avoid misleading conclusions.

Uploaded by

Saiqa Riidi
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 9

Introduction and Presentation of Data

INTRODUCTIO
N
Meanings and Definitions of Statistics

It is extremely difficult to define Statistics and, for that matter, most difficult in a
few words. No definition of Statistics is perhaps beyond controversy. Before an attempt is
made to define the subject, it is necessary to point out that the term Statistics is used in
three distinct senses:

1) By Statistics we often mean numerical data relating to any field of enquiry.


For example, “statistics of agricultural production”, “statistics of prices”,
statistics of births and deaths”, and so on.

2) By Statistics we refer to the scientific method by which we collect, elucidate


(explain), analyse and interpret numerical data.

3) By Statistics we frequently mean a set of numerical characteristics calculated


from a sample.

Although the term is used in three different senses, it is usually clear from the
context what it means in any particular instance and there is hardly any room for
confusion in practice.

According to Sir Ronald A. Fisher (Known as the father of Statistics), “The


science of Statistics is essentially a branch of Applied Mathematics and may be regarded
as mathematics applied to observational data”.

In recent times, it has also been defined as the theory of decision-making in the
face of uncertainty.

However, Statistics is concerned with scientific methods for collecting,


organizing, summarizing, presenting, analyzing and interpreting data as well as with
drawing valid conclusions and making reasonable and effective decisions on the basis of
such analysis.

Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 1


Introduction and Presentation of Data

STATISTICS

Statistics is the science of collecting, organizing, summarizing, presenting, analyzing,


and interpreting data to assist in making more effective decisions.

Types of Statistics

The study of statistics is usually divided into two categories: descriptive statistics and
inferential statistics.

Descriptive Statistics: Methods of organizing, summarizing, and presenting data in


an informative way is usually referred to as descriptive statistics.

Inferential Statistics: The methods used to estimate a property of a population on the


basis of a sample is called inferential statistics.

Uses of Statistics

Most of the statistical methods were originally developed to study problems in


biology and agriculture. This very fact would indicate that statistical methods have
extensive applications in these sciences.

In fact, it is impossible to visualize empirical research in most sciences without


the help of methods of Statistics.

Precaution Regarding Misuses

Like any other scientific method, Statistics is liable to be misused through


ignorance, preconceived notion or deliberate manipulation of data. While comparing
certain varieties or treatments it is important to make sure that the experimental units
receiving different varieties or treatments are equivalent in every respect. Lack of
precaution on this count has often given rise to misleading results. That is of utmost
importance to ensure that the primary data are free from error or the sample studied is
random as the statisticians so frequently assume. If this basic condition is not fulfilled, no
amount of statistical analysis can shed light on the phenomenon being studied. Statistical
methods cannot possibly reveal anything that is not already implicit in the data. A
“significant” answer does not prove that some hypothesis is true or false; it is quite
possible that an unlikely event has happened. Again, certain differences may be
significant if based on sufficiently large numbers. In conclusion, we might say that
Statistics is a very powerful but rather delicate tool and it has to be handled with care
and prudence. Expert advice may be needed at various stages of a statistical enquiry.
Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 2
Introduction and Presentation of Data

Population and Sample

Population and sample are two very important terms in Statistics. It is necessary
to define these terms and also, to distinguish between them.

Population: An aggregate of all individuals or items (actual or possible) under


study having on some common characteristics is called a population. For example,
suppose we are interested the average height of the IUBAT students, only the IUBAT
students will constitute the population.

Sample: A small but representative part of a population is called a sample. For


example, suppose we can select randomly some of the students, the selected students will
constitute the sample.

The Raw Material of Statistics/Data

Data: The raw materials of statistics consist of numbers or observations and


usually obtained by some process of counting or measurement is referred to collectively
as data.

Sources of Data

There are two sources: primary and secondary.

Primary data: We may collect data ourselves, then the data is called primary
data. For example, the data collected from our own experimental plots or own enquiry or
own investigation or own survey is called the primary data.

Secondary data: We can get data from available sources also and such type of
data is called secondary data. The main sources of secondary data in our country are,
BBS, ICDDRB, BRAC, NIPORT (National Institute of Population Research and Training ),
some other NGOs and the largest source of the secondary is Internet.

Variable:
A variable is a measurable quantity which varies from one to another. For
example, the height of the students, weight of the students, expenditure of the students,
etc.

Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 3


Introduction and Presentation of Data

Random Variable: The variable associated with probability is called the random
variable.

Types of Variables:

There are two basic types of variables: (1) Qualitative and (2) Quantitative.

Qualitative Variables: In certain statistical investigations we are concerned only with


the presence or absence of some characteristic in a set of objects or individuals. In this
situation we only count how many individuals do or do not possess the characteristic.
That is, when the variables are measured on the basis of their characteristic character is
called the qualitative variable and this type of data is called qualitative data. The
characteristic used to classify an individual into different categories is called an
attribute. For examples of qualitative variables are gender, religious affiliation, marital
status, state of birth, brand of PC, etc.

Quantitative Variable: A variable is a measurable quantity, which can vary within its
domain. For example, yield of a crop is a variable, because it is a measurable quantity
within its domain. Conceptually, the domain of a variable is defined by all possible
measurements that can be taken by the variable. Thus, we can say that all possible values
of a variable will constitute its domain. For the variable, yield of a crop, if the lowest
value in the measurements is considered to be 0 and the highest value is 30, then the
domain of yield of that crop is obviously (0-30).

Quantitative Variable:

There are two types of variables:

Discrete variable: When a variable can assume only isolated values, it is called a
discrete variable. For example, if the number of students in different department is
the variable of interest, it is obvious that it cannot assume fractional values and
hence it is a discrete variable. Children in a family, Number of fruits on a tree,
Number of Cell phones in a family, etc. are the examples of the discrete variable.

Continuous variable: A variable is said to be continuous if it can theoretically


assume any value within a given range or ranges. Such variables, for instance,
are height of the students, Monthly income of a family, air temp, etc.

Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 4


Introduction and Presentation of Data

Level of Measurement

Statistical Data whether qualitative or quantitative are generated through some


measurement or observational process. Measurement is essentially the task of assigning
numbers to observations according to certain rules. The way in which the numbers are
assigned to observations determines the scale of measurement being used. There are four
levels of measurement. These are (a) Nominal level (b) Ordinal level (c) Interval level (d)
Ratio level

Nominal level:
All qualitative measurements are nominal regardless of whether the categories
are designed by names (red, white, male) or numerals (June 20, Room no. 10, account
no., ID no. etc). In nominal level of measurement, the categories differ from one another
only in names. What one must ensure in this level of measurement is that the categories
must be homogeneous, mutually exclusive and no assumptions about ordered
relationships between categories. Some examples are Eye color, Religion, Place of
residence etc.
For the nominal level of measurement observations of a qualitative variable can
only be classified and counted. There is no particular order to the labels.

TABLE 1-1 Source of World Oil Supply for 2004

Source Millions of Percent


Barrels per Day
OPEC 32.91 39.7
OEDC (Including U.S.)* 22.76 27.4
Former U.S.S.R. 11.33 13.7
China 3.62 4.4
Other 12.35 14.9
82.97 100.1

Ordinal Level:
When there is an ordered relationship among the categories, we achieve what we
refer to as the ordinal level of measurement. The categories are distinct, mutually
exclusive and exhaustive as well. Example of ordinal data are Academic degrees ( MA.,
BA etc), Soci-economic status ( high, medium, low), Rank in job etc.

The next higher level of data is the ordinal level. Table 1-2 lists the student
ratings of Professor James Brunner in an Introduction to Finance course. Each student in

Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 5


Introduction and Presentation of Data

the class answered the question “Overall, how did you rate the instructor in this class?"
The variable rating illustrates the use of the ordinal scale of measurement. One
classification is "higher" or "better" than the next one. That is, “Superior" is better than
"Good," "Good" is better than "Average," and so on: However, we are not able to
distinguish the magnitude of the differences between groups. Is the difference between
"Superior" and "Good" the same as the difference between "Poor" and "Inferior"? We
cannot tell. If we substitute a 5 for "Superior" and a 4 for "Good," we can conclude that
the rating of "Superior" is better than the rating of "Good," but we cannot add a ranking
of "Superior" and a ranking of "Good," with the result being meaningful. Further we
cannot conclude that a rating of "Good" (rating is 4) is necessarily twice as high as a
"Poor" (rating is 2). We can only conclude that a rating of "Good" is better than a rating
of "Poor." We cannot conclude how much better the rating is.

TABLE 1-2 Rating of Finance Professor

Rating Frequency
Superior 6
Good 28
Average 25
Poor 12
Inferior 3

Interval Level:
The interval level of measurement includes all the properties of the nominal and
ordinal level but an additional property that the difference (interval) between values is
known and of constant size. Here an arbitrary zero point is assumed. Some examples are
Temperature, IQ test score, Calendar time.

The interval level of measurement is the next highest level. It includes all the
characteristics of the ordinal level, but, in addition, the difference between values is a
constant size. An example of the interval level of measurement is temperature. Suppose the
high temperatures on three consecutive winter days in Boston are 28, 31, and 20 degrees
Fahrenheit. These temperatures can be easily ranked, but we can also determine the
difference between temperatures. This is possible because 1 degree Fahrenheit represents
a constant unit of measurement. Equal differences between two temperatures are the same,
regardless of their position on the scale. That is, the difference between 10 degrees
Fahrenheit and 15 degrees is 5, the difference between 50 and 55 degrees is also 5 degrees.
It is also important to note that 0 is just a point on the scale. It does -not represent the
absence of the condition. Zero degrees Fahrenheit does not represent the absence of
heat, just that it is cold! In fact 0 degrees Fahrenheit is about -18 degrees on the Celsius
scale.

Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 6


Introduction and Presentation of Data

Another example of the interval scale of measurement is women's dress sizes.


Listed below is information on several dimensions of a standard U.S. women's dress.

Size Bust(in) Waist Hips (in)


(in)
8 32 24 35
10 34 26 37
12 36 28 39
14 38 30 41
16 40 32 43
18 42 34 45
20 44 36 47
22 46 38 49

Why is the "size" scale an interval measurement? Observe as the size changes by 2
units (say from size 10 to size 12 or from size 24 to size 26) each of the measurements
increases by 2 inches. To put it another way the intervals are the same.

Ratio level:
In practice all quantitative data fall under the ratio level of measurement. It has
all the ordering and distance properties of interval level. In addition a ‘zero point’ can
be meaningfully designated and thus ration between two numbers is also meaningful.
Examples are Height, weight, Fat consumed, wages etc.

Practically all quantitative data is recorded on the ratio level of measurement. The
ratio level is the "highest" level of measurement. It has all the characteristics of the
interval level, but in addition, the 0 point is meaningful and the ratio between two
numbers is meaningful. Examples of the ratio scale of measurement include wages,
units of production, weight, changes in stock prices, distance between branch offices,
and height. Money is a good illustration. If you have zero dollars, then you have no
money. Weight is another example. If the dial on the scale of a correctly calibrated
device is at 0, then there is a complete absence of weight. The ratio of two numbers is
also meaningful. If Jim earns $40,000 per year selling insurance and Rob earns $80,000
per year selling cars, then Rob earns twice as much as Jim.\

TABLE 1-3 Father–Son Income Combinations

Name Father Son


Lahey $80,000 $40,000
Nale 90,000 30,000
Rho 60,000 120,000

Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 7


Introduction and Presentation of Data

Steele 75,000 130,000

Flow Chart of Types of variables

Types of Variables

Qualitative Quantitative

 Brand of PC Discrete Continuous


 Marital Status
 Hair Color

 Children-in family  Amount of income tax paid


 Strokes on a golf hole  Weight of a student
 TV sets owned  Yearly rainfall in Dhaka Dist.

Parameter and Statistic

Parameter: Any properties or any characteristic character or any functional form


of relationship calculated from population and is usually unknown, is called a
parameter.

Statistic: Any function of sample values which is an estimate of the parameter and
which is a known value is called a statistic. A statistic is called an estimator also when it
is used to estimate a parameter. From a practical point of view, if we could get the
numeric value of an estimator, then that numeric value is called an estimate of the
parameter.

How do we study a population?


A population may be studied using one of two approaches: taking a census, or selecting a sample. 

It is important to note that whether a census or a sample is used, both provide information that can be
used to draw conclusions about the whole population. 

Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 8


Introduction and Presentation of Data

What is a census (complete enumeration)?


A census is a study of every unit, everyone or everything, in a population. It is known as
a complete enumeration, which means a complete count.

What is a sample (partial enumeration)?


A sample is a subset of units in a population, selected to represent all units in a population of
interest. It is a partial enumeration because it is a count from part of the population.

Information from the sampled units is used to estimate the characteristics for the entire population of
interest.

When to use a census or a sample? 

Once a population has been identified a decision needs to be made about whether
taking a census or selecting a sample will be the more suitable option. There
are advantages and disadvantages to using a census or sample to study a population:

Pros of a CENSUS Cons of a CENSUS


- provides a true measure of the population (no - may be difficult to enumerate all units of the
sampling error) population within the available time
- benchmark data may be obtained for future - higher costs, both in staff and monetary terms, than
studies for a sample
- detailed information about small sub-groups - generally takes longer to collect, process, and
within the population is more likely to be release data than from a sample
available
Pros of a SAMPLE Cons of a SAMPLE
- costs would generally be lower than for a - data may not be representative of the total
census population, particularly where the sample size is
- results may be available in less time small
- if good sampling techniques are used, the - often not suitable for producing benchmark data,
results can be very representative of the actual as data are collected from a subset of units and
population inferences made about the whole population, the
data are subject to 'sampling' error
- decreased number of units will reduce the detailed
information available about sub-groups within a
population

Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 9

You might also like