0% found this document useful (0 votes)

76 views

Lecture - 1 Introduction

This document provides an introduction to statistics, including: 1) Statistics is the science of collecting, organizing, summarizing, presenting, analyzing, and interpreting data to assist in making more effective decisions. 2) There are two main categories of statistics: descriptive statistics which organizes and summarizes data, and inferential statistics which estimates properties of populations based on samples. 3) Statistics has wide applications across many fields of science and research. It is important that statistics are used carefully and appropriately to avoid misleading conclusions.

Uploaded by

Saiqa Riidi

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views

Lecture - 1 Introduction

Uploaded by

Saiqa Riidi

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 9

Introduction and Presentation of Data

INTRODUCTIO
N
Meanings and Definitions of Statistics

It is extremely difficult to define Statistics and, for that matter, most difficult in a
few words. No definition of Statistics is perhaps beyond controversy. Before an attempt is
made to define the subject, it is necessary to point out that the term Statistics is used in
three distinct senses:

1) By Statistics we often mean numerical data relating to any field of enquiry.

For example, “statistics of agricultural production”, “statistics of prices”,
statistics of births and deaths”, and so on.

2) By Statistics we refer to the scientific method by which we collect, elucidate

(explain), analyse and interpret numerical data.

3) By Statistics we frequently mean a set of numerical characteristics calculated

from a sample.

Although the term is used in three different senses, it is usually clear from the
context what it means in any particular instance and there is hardly any room for
confusion in practice.

According to Sir Ronald A. Fisher (Known as the father of Statistics), “The

science of Statistics is essentially a branch of Applied Mathematics and may be regarded
as mathematics applied to observational data”.

In recent times, it has also been defined as the theory of decision-making in the
face of uncertainty.

However, Statistics is concerned with scientific methods for collecting,

organizing, summarizing, presenting, analyzing and interpreting data as well as with
drawing valid conclusions and making reasonable and effective decisions on the basis of
such analysis.

Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 1

Introduction and Presentation of Data

STATISTICS

Statistics is the science of collecting, organizing, summarizing, presenting, analyzing,

and interpreting data to assist in making more effective decisions.

Types of Statistics

The study of statistics is usually divided into two categories: descriptive statistics and
inferential statistics.

Descriptive Statistics: Methods of organizing, summarizing, and presenting data in

an informative way is usually referred to as descriptive statistics.

Inferential Statistics: The methods used to estimate a property of a population on the

basis of a sample is called inferential statistics.

Uses of Statistics

Most of the statistical methods were originally developed to study problems in

biology and agriculture. This very fact would indicate that statistical methods have
extensive applications in these sciences.

In fact, it is impossible to visualize empirical research in most sciences without

the help of methods of Statistics.

Precaution Regarding Misuses

Like any other scientific method, Statistics is liable to be misused through

ignorance, preconceived notion or deliberate manipulation of data. While comparing
certain varieties or treatments it is important to make sure that the experimental units
receiving different varieties or treatments are equivalent in every respect. Lack of
precaution on this count has often given rise to misleading results. That is of utmost
importance to ensure that the primary data are free from error or the sample studied is
random as the statisticians so frequently assume. If this basic condition is not fulfilled, no
amount of statistical analysis can shed light on the phenomenon being studied. Statistical
methods cannot possibly reveal anything that is not already implicit in the data. A
“significant” answer does not prove that some hypothesis is true or false; it is quite
possible that an unlikely event has happened. Again, certain differences may be
significant if based on sufficiently large numbers. In conclusion, we might say that
Statistics is a very powerful but rather delicate tool and it has to be handled with care
and prudence. Expert advice may be needed at various stages of a statistical enquiry.
Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 2
Introduction and Presentation of Data

Population and Sample

Population and sample are two very important terms in Statistics. It is necessary
to define these terms and also, to distinguish between them.

Population: An aggregate of all individuals or items (actual or possible) under

study having on some common characteristics is called a population. For example,
suppose we are interested the average height of the IUBAT students, only the IUBAT
students will constitute the population.

Sample: A small but representative part of a population is called a sample. For

example, suppose we can select randomly some of the students, the selected students will
constitute the sample.

The Raw Material of Statistics/Data

Data: The raw materials of statistics consist of numbers or observations and

usually obtained by some process of counting or measurement is referred to collectively
as data.

Sources of Data

There are two sources: primary and secondary.

Primary data: We may collect data ourselves, then the data is called primary
data. For example, the data collected from our own experimental plots or own enquiry or
own investigation or own survey is called the primary data.

Secondary data: We can get data from available sources also and such type of
data is called secondary data. The main sources of secondary data in our country are,
BBS, ICDDRB, BRAC, NIPORT (National Institute of Population Research and Training ),
some other NGOs and the largest source of the secondary is Internet.

Variable:
A variable is a measurable quantity which varies from one to another. For
example, the height of the students, weight of the students, expenditure of the students,
etc.

Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 3

Introduction and Presentation of Data

Random Variable: The variable associated with probability is called the random
variable.

Types of Variables:

There are two basic types of variables: (1) Qualitative and (2) Quantitative.

Qualitative Variables: In certain statistical investigations we are concerned only with

the presence or absence of some characteristic in a set of objects or individuals. In this
situation we only count how many individuals do or do not possess the characteristic.
That is, when the variables are measured on the basis of their characteristic character is
called the qualitative variable and this type of data is called qualitative data. The
characteristic used to classify an individual into different categories is called an
attribute. For examples of qualitative variables are gender, religious affiliation, marital
status, state of birth, brand of PC, etc.

Quantitative Variable: A variable is a measurable quantity, which can vary within its
domain. For example, yield of a crop is a variable, because it is a measurable quantity
within its domain. Conceptually, the domain of a variable is defined by all possible
measurements that can be taken by the variable. Thus, we can say that all possible values
of a variable will constitute its domain. For the variable, yield of a crop, if the lowest
value in the measurements is considered to be 0 and the highest value is 30, then the
domain of yield of that crop is obviously (0-30).

Quantitative Variable:

There are two types of variables:

Discrete variable: When a variable can assume only isolated values, it is called a
discrete variable. For example, if the number of students in different department is
the variable of interest, it is obvious that it cannot assume fractional values and
hence it is a discrete variable. Children in a family, Number of fruits on a tree,
Number of Cell phones in a family, etc. are the examples of the discrete variable.

Continuous variable: A variable is said to be continuous if it can theoretically

assume any value within a given range or ranges. Such variables, for instance,
are height of the students, Monthly income of a family, air temp, etc.

Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 4

Introduction and Presentation of Data

Level of Measurement

Statistical Data whether qualitative or quantitative are generated through some

measurement or observational process. Measurement is essentially the task of assigning
numbers to observations according to certain rules. The way in which the numbers are
assigned to observations determines the scale of measurement being used. There are four
levels of measurement. These are (a) Nominal level (b) Ordinal level (c) Interval level (d)
Ratio level

Nominal level:
All qualitative measurements are nominal regardless of whether the categories
are designed by names (red, white, male) or numerals (June 20, Room no. 10, account
no., ID no. etc). In nominal level of measurement, the categories differ from one another
only in names. What one must ensure in this level of measurement is that the categories
must be homogeneous, mutually exclusive and no assumptions about ordered
relationships between categories. Some examples are Eye color, Religion, Place of
residence etc.
For the nominal level of measurement observations of a qualitative variable can
only be classified and counted. There is no particular order to the labels.

TABLE 1-1 Source of World Oil Supply for 2004

Source Millions of Percent

Barrels per Day
OPEC 32.91 39.7
OEDC (Including U.S.)* 22.76 27.4
Former U.S.S.R. 11.33 13.7
China 3.62 4.4
Other 12.35 14.9
82.97 100.1

Ordinal Level:
When there is an ordered relationship among the categories, we achieve what we
refer to as the ordinal level of measurement. The categories are distinct, mutually
exclusive and exhaustive as well. Example of ordinal data are Academic degrees ( MA.,
BA etc), Soci-economic status ( high, medium, low), Rank in job etc.

The next higher level of data is the ordinal level. Table 1-2 lists the student
ratings of Professor James Brunner in an Introduction to Finance course. Each student in

Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 5

Introduction and Presentation of Data

the class answered the question “Overall, how did you rate the instructor in this class?"
The variable rating illustrates the use of the ordinal scale of measurement. One
classification is "higher" or "better" than the next one. That is, “Superior" is better than
"Good," "Good" is better than "Average," and so on: However, we are not able to
distinguish the magnitude of the differences between groups. Is the difference between
"Superior" and "Good" the same as the difference between "Poor" and "Inferior"? We
cannot tell. If we substitute a 5 for "Superior" and a 4 for "Good," we can conclude that
the rating of "Superior" is better than the rating of "Good," but we cannot add a ranking
of "Superior" and a ranking of "Good," with the result being meaningful. Further we
cannot conclude that a rating of "Good" (rating is 4) is necessarily twice as high as a
"Poor" (rating is 2). We can only conclude that a rating of "Good" is better than a rating
of "Poor." We cannot conclude how much better the rating is.

TABLE 1-2 Rating of Finance Professor

Rating Frequency
Superior 6
Good 28
Average 25
Poor 12
Inferior 3

Interval Level:
The interval level of measurement includes all the properties of the nominal and
ordinal level but an additional property that the difference (interval) between values is
known and of constant size. Here an arbitrary zero point is assumed. Some examples are
Temperature, IQ test score, Calendar time.

The interval level of measurement is the next highest level. It includes all the
characteristics of the ordinal level, but, in addition, the difference between values is a
constant size. An example of the interval level of measurement is temperature. Suppose the
high temperatures on three consecutive winter days in Boston are 28, 31, and 20 degrees
Fahrenheit. These temperatures can be easily ranked, but we can also determine the
difference between temperatures. This is possible because 1 degree Fahrenheit represents
a constant unit of measurement. Equal differences between two temperatures are the same,
regardless of their position on the scale. That is, the difference between 10 degrees
Fahrenheit and 15 degrees is 5, the difference between 50 and 55 degrees is also 5 degrees.
It is also important to note that 0 is just a point on the scale. It does -not represent the
absence of the condition. Zero degrees Fahrenheit does not represent the absence of
heat, just that it is cold! In fact 0 degrees Fahrenheit is about -18 degrees on the Celsius
scale.

Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 6

Introduction and Presentation of Data

Another example of the interval scale of measurement is women's dress sizes.

Listed below is information on several dimensions of a standard U.S. women's dress.

Size Bust(in) Waist Hips (in)

(in)
8 32 24 35
10 34 26 37
12 36 28 39
14 38 30 41
16 40 32 43
18 42 34 45
20 44 36 47
22 46 38 49

Why is the "size" scale an interval measurement? Observe as the size changes by 2
units (say from size 10 to size 12 or from size 24 to size 26) each of the measurements
increases by 2 inches. To put it another way the intervals are the same.

Ratio level:
In practice all quantitative data fall under the ratio level of measurement. It has
all the ordering and distance properties of interval level. In addition a ‘zero point’ can
be meaningfully designated and thus ration between two numbers is also meaningful.
Examples are Height, weight, Fat consumed, wages etc.

Practically all quantitative data is recorded on the ratio level of measurement. The
ratio level is the "highest" level of measurement. It has all the characteristics of the
interval level, but in addition, the 0 point is meaningful and the ratio between two
numbers is meaningful. Examples of the ratio scale of measurement include wages,
units of production, weight, changes in stock prices, distance between branch offices,
and height. Money is a good illustration. If you have zero dollars, then you have no
money. Weight is another example. If the dial on the scale of a correctly calibrated
device is at 0, then there is a complete absence of weight. The ratio of two numbers is
also meaningful. If Jim earns $40,000 per year selling insurance and Rob earns $80,000
per year selling cars, then Rob earns twice as much as Jim.\

TABLE 1-3 Father–Son Income Combinations

Name Father Son

Lahey $80,000 $40,000
Nale 90,000 30,000
Rho 60,000 120,000

Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 7

Introduction and Presentation of Data

Steele 75,000 130,000

Flow Chart of Types of variables

Types of Variables

Qualitative Quantitative

 Brand of PC Discrete Continuous

 Marital Status
 Hair Color

 Children-in family  Amount of income tax paid

 Strokes on a golf hole  Weight of a student
 TV sets owned  Yearly rainfall in Dhaka Dist.

Parameter and Statistic

Parameter: Any properties or any characteristic character or any functional form

of relationship calculated from population and is usually unknown, is called a
parameter.

Statistic: Any function of sample values which is an estimate of the parameter and
which is a known value is called a statistic. A statistic is called an estimator also when it
is used to estimate a parameter. From a practical point of view, if we could get the
numeric value of an estimator, then that numeric value is called an estimate of the
parameter.

How do we study a population?

A population may be studied using one of two approaches: taking a census, or selecting a sample.

It is important to note that whether a census or a sample is used, both provide information that can be
used to draw conclusions about the whole population.

Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 8

Introduction and Presentation of Data

What is a census (complete enumeration)?

A census is a study of every unit, everyone or everything, in a population. It is known as
a complete enumeration, which means a complete count.

What is a sample (partial enumeration)?

A sample is a subset of units in a population, selected to represent all units in a population of
interest. It is a partial enumeration because it is a count from part of the population.

Information from the sampled units is used to estimate the characteristics for the entire population of
interest.

When to use a census or a sample?

Once a population has been identified a decision needs to be made about whether
taking a census or selecting a sample will be the more suitable option. There
are advantages and disadvantages to using a census or sample to study a population:

Pros of a CENSUS Cons of a CENSUS

- provides a true measure of the population (no - may be difficult to enumerate all units of the
sampling error) population within the available time
- benchmark data may be obtained for future - higher costs, both in staff and monetary terms, than
studies for a sample
- detailed information about small sub-groups - generally takes longer to collect, process, and
within the population is more likely to be release data than from a sample
available
Pros of a SAMPLE Cons of a SAMPLE
- costs would generally be lower than for a - data may not be representative of the total
census population, particularly where the sample size is
- results may be available in less time small
- if good sampling techniques are used, the - often not suitable for producing benchmark data,
results can be very representative of the actual as data are collected from a subset of units and
population inferences made about the whole population, the
data are subject to 'sampling' error
- decreased number of units will reduce the detailed
information available about sub-groups within a
population

Professor Dr. Khandoker Saif Uddin Lecture # 1, Page 9

Risk and Opportunities Procedure
100% (1)
Risk and Opportunities Procedure
4 pages
Answer Key - Epi Midterm Study Guide - 2018
No ratings yet
Answer Key - Epi Midterm Study Guide - 2018
8 pages
Ass 5 PDF
100% (8)
Ass 5 PDF
3 pages
GONDAR
100% (1)
GONDAR
3 pages
3-slaughter house
No ratings yet
3-slaughter house
8 pages
Inference About Population Variance
100% (1)
Inference About Population Variance
30 pages
University of Gondar College of Veterinary Medicine and Animal Sciences
100% (1)
University of Gondar College of Veterinary Medicine and Animal Sciences
32 pages
Wolaita Sodo University: By: Firomsa Regasa Adviser: - DR Demo D
No ratings yet
Wolaita Sodo University: By: Firomsa Regasa Adviser: - DR Demo D
36 pages
Theory Session: Introduction To Biostatistics
No ratings yet
Theory Session: Introduction To Biostatistics
22 pages
Missing Plot
No ratings yet
Missing Plot
2 pages
Biostatistics Assignment
No ratings yet
Biostatistics Assignment
3 pages
Excercise 2 - Frequency Measure
75% (4)
Excercise 2 - Frequency Measure
6 pages
Relative Risk
No ratings yet
Relative Risk
8 pages
Review Questions of Midterm Chapters 1-4
100% (2)
Review Questions of Midterm Chapters 1-4
2 pages
Social & Economic Statistics (Chapter 1 - 5)
No ratings yet
Social & Economic Statistics (Chapter 1 - 5)
71 pages
Research Methodology - Parametric and Non-Parametric Tests
No ratings yet
Research Methodology - Parametric and Non-Parametric Tests
7 pages
Inferential Estimation
100% (1)
Inferential Estimation
74 pages
Sampling and Estimation
No ratings yet
Sampling and Estimation
15 pages
epidiomologyMCQ 10-20. HIGHLIGHT
No ratings yet
epidiomologyMCQ 10-20. HIGHLIGHT
14 pages
Topic 4a
No ratings yet
Topic 4a
28 pages
3 Summarizing Data
No ratings yet
3 Summarizing Data
71 pages
Questions & Answers Chapter - 7 Set 1
No ratings yet
Questions & Answers Chapter - 7 Set 1
6 pages
Negative Binomial Distribution
100% (1)
Negative Binomial Distribution
5 pages
Bus 3104.E1 Midtm Fall 16
No ratings yet
Bus 3104.E1 Midtm Fall 16
8 pages
Variables: Mesfin Kote (BSC., MPH
No ratings yet
Variables: Mesfin Kote (BSC., MPH
28 pages
Abattoir Wastes Generation, Management and The Environment: A Case of Minna, North Central Nigeria
No ratings yet
Abattoir Wastes Generation, Management and The Environment: A Case of Minna, North Central Nigeria
20 pages
Chapter 1 Introduction The Teaching of Theory (3 Hours) Objective
100% (1)
Chapter 1 Introduction The Teaching of Theory (3 Hours) Objective
32 pages
Biostat Midterm
No ratings yet
Biostat Midterm
4 pages
Rular Marketing
No ratings yet
Rular Marketing
32 pages
Master of Statistics
100% (1)
Master of Statistics
24 pages
Full Download Longitudinal Data Analysis by Donald Hedeker PDF
100% (4)
Full Download Longitudinal Data Analysis by Donald Hedeker PDF
51 pages
Measures of Association
100% (1)
Measures of Association
56 pages
Introduction To Biostatistics1
No ratings yet
Introduction To Biostatistics1
23 pages
Food Security and Nutrition Indicators For Impact Assessment
No ratings yet
Food Security and Nutrition Indicators For Impact Assessment
20 pages
Statatistical Inferences
No ratings yet
Statatistical Inferences
22 pages
EPIData Presentation
No ratings yet
EPIData Presentation
36 pages
Quiz 1 Data and Graphical Descriptive Statistics
No ratings yet
Quiz 1 Data and Graphical Descriptive Statistics
10 pages
Chapter Eight: Answer: B
No ratings yet
Chapter Eight: Answer: B
10 pages
Model Exit Exam BVSc 2016 by Dr(1)
100% (1)
Model Exit Exam BVSc 2016 by Dr(1)
27 pages
Sample Exam SFM - 2020
No ratings yet
Sample Exam SFM - 2020
7 pages
R Programming Exam With Solutions
No ratings yet
R Programming Exam With Solutions
9 pages
Presentation 2 Epidemic
No ratings yet
Presentation 2 Epidemic
79 pages
L 7estimating Risk
No ratings yet
L 7estimating Risk
63 pages
Physicalfitnessverification Format 2014 15
No ratings yet
Physicalfitnessverification Format 2014 15
5 pages
PSM Estimation Confidence Interval
No ratings yet
PSM Estimation Confidence Interval
55 pages
Stat For MGT II New (1) - 1
No ratings yet
Stat For MGT II New (1) - 1
67 pages
Hyg Env Part 1.lo
No ratings yet
Hyg Env Part 1.lo
184 pages
Institute of Health Sciences Mardan
No ratings yet
Institute of Health Sciences Mardan
3 pages
Econometrics Material For Exit Exam
No ratings yet
Econometrics Material For Exit Exam
81 pages
3) Measurement of Mortality and Morbidity
100% (1)
3) Measurement of Mortality and Morbidity
21 pages
Theories of Knowledge: Knowledge Is A Familiarity, Awareness, or Understanding of Someone or Something, Such
No ratings yet
Theories of Knowledge: Knowledge Is A Familiarity, Awareness, or Understanding of Someone or Something, Such
5 pages
Guide To DHS Statistics DHS-7 v2
No ratings yet
Guide To DHS Statistics DHS-7 v2
683 pages
Simple Random Sampling Without Replacement (SRSWOR)
No ratings yet
Simple Random Sampling Without Replacement (SRSWOR)
23 pages
Measures of Association and Impact 2020 Edited
No ratings yet
Measures of Association and Impact 2020 Edited
107 pages
Chapter 1 Introduction To Biostat
No ratings yet
Chapter 1 Introduction To Biostat
62 pages
Haramay Research Methods in Agricultural Economics
No ratings yet
Haramay Research Methods in Agricultural Economics
89 pages
Chapter 3 Final 1
100% (1)
Chapter 3 Final 1
5 pages
AgStat 2.22019 Mannula PDF
No ratings yet
AgStat 2.22019 Mannula PDF
132 pages
(6.) Chromatography - Lecture Notes
No ratings yet
(6.) Chromatography - Lecture Notes
10 pages
Tests of Significance and Measures of Association
No ratings yet
Tests of Significance and Measures of Association
21 pages
Basic Stat 1-2 PDF-1-1
No ratings yet
Basic Stat 1-2 PDF-1-1
15 pages
Chapter 1
No ratings yet
Chapter 1
18 pages
Script
No ratings yet
Script
19 pages
Lesson 1-34
No ratings yet
Lesson 1-34
67 pages
DIYEgg
100% (1)
DIYEgg
16 pages
CV Pritu
No ratings yet
CV Pritu
2 pages
Discombobulator Mini-Tutorial (Evan's Plugins)
100% (1)
Discombobulator Mini-Tutorial (Evan's Plugins)
3 pages
Creativity
100% (2)
Creativity
19 pages
IOM Manual C132355.Sflb
No ratings yet
IOM Manual C132355.Sflb
14 pages
Elastics in Orthodontics
No ratings yet
Elastics in Orthodontics
23 pages
Potential Liquefaction of Locations Along The Pasig City Segment of The Valley Fault System
No ratings yet
Potential Liquefaction of Locations Along The Pasig City Segment of The Valley Fault System
87 pages
Waste Water Treatment 2023 Mid Sem Question Paper
No ratings yet
Waste Water Treatment 2023 Mid Sem Question Paper
4 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
2 pages
Tendernotice 1
No ratings yet
Tendernotice 1
8 pages
Military Civil Engineering
100% (1)
Military Civil Engineering
8 pages
Proposal Sponsorship Xendit Conference 2024
No ratings yet
Proposal Sponsorship Xendit Conference 2024
30 pages
Lab 4
No ratings yet
Lab 4
2 pages
Philosophy and Objectives of Edukasyon Sa Pagpapakatao
No ratings yet
Philosophy and Objectives of Edukasyon Sa Pagpapakatao
5 pages
Unit 3
No ratings yet
Unit 3
64 pages
Bank Management System Suryadeep
No ratings yet
Bank Management System Suryadeep
8 pages
Ceri D 21 11715
No ratings yet
Ceri D 21 11715
44 pages
Previewpdf
No ratings yet
Previewpdf
61 pages
1 s2.0 002008919290052U Main
No ratings yet
1 s2.0 002008919290052U Main
6 pages
D-Truck Workshop Manual
No ratings yet
D-Truck Workshop Manual
83 pages
Role of FICCI in Indian Ex
No ratings yet
Role of FICCI in Indian Ex
50 pages
Energies: A New Power Sharing Scheme of Multiple Microgrids and An Iterative Pairing-Based Scheduling Method
No ratings yet
Energies: A New Power Sharing Scheme of Multiple Microgrids and An Iterative Pairing-Based Scheduling Method
20 pages
Eagle Quantum Premier 8 Channel Relay Module Model EQ3720RM: Specification Data
No ratings yet
Eagle Quantum Premier 8 Channel Relay Module Model EQ3720RM: Specification Data
4 pages
Real Time Face Parcing Using Enhanced KNN and DLIB
No ratings yet
Real Time Face Parcing Using Enhanced KNN and DLIB
6 pages
USCP Graded Recitation
No ratings yet
USCP Graded Recitation
4 pages