Chapter 2
Basic Concepts in Statistics
Chapter Goals
Create an initial image of the field of
statistics.
Brief history of Statistics
Know the branches of Statistics.
Introduce several basic vocabulary words
used in studying statistics: population,
variable, statistic.
What is Statistics?
STATISTICS
The science of collecting, describing, and interpreting
data.
Can give an instant overall picture of data based on
graphical presentation or numerical summarization
irrespective to the number of data points.
It is the methodology which scientists and
mathematicians have developed for interpreting and
drawing conclusions from collected data.
It is clear that statistics is much more than
just the tabulation of numbers and the
graphical presentation of these tabulated
numbers. Statistics is also the science of
gaining information from numerical and
categorical data. Statistical methods can be
used to find answers to the questions like:
What kind and how much data need to be collected?
How should we organize and summarize the data?
How can we analyze the data and draw conclusions
from it?
How can we assess the strength of the conclusions
and evaluate their uncertainty?
That is, statistics provides methods for :
1. Design: Planning and carrying out research
studies.
2. Description: Summarizing and exploring data.
3. Inference: Making predictions and generalizing
about phenomena represented by the data.
Brief History of
Statistics
ancient chief
trained warriors
taxes
kingdom
Keywords
17th to 18th century
Mathematicians were asked by gamblers to develop
principles that would improve more chances of
winning.
MATHEMATICIAN!!!
Bernoulli and De Moir
> probability
De Moir - developed the equation for the normal
curve.
19th century
DURING 19th CENTURY
La Place and Gauss
> probability principles to astronomy
*EARLY 19th century
Quetelet - Belgian statistician
> investigation of social and educational
problems
> statistical theory on a general method of
research to sciences.
Social sciences
Heredity
Eugenies
psychology
anthropometry
statistics
measurement between two variables
centiles and percentiles
Francis Galton
Correlation and regression
PEARSON - GALTON
psychologists
Europe in 1880
applied agriculture and biological setting
(E L thorndike)
JAMES MCKEEN CATELL
In 20th century
R.A. Fisher
Applied in agriculture and biological
setting.
The data can be classified into two
types
* Continouos
. can be made into measurement of varying degress of
precison.
(e.g. 1 yard equal 3 feet)
* Discontinouos/ discrete data
measurement expressed in whoke units
( e g. number of object)
According to stevens
Type of Measurement
Measurement of Scale
4 types of measurement
1. Nominal scale -used as measures of identity
e.g. Individuals into categories : yes or no, M and F.
2. Ordinal scales -used in measurement like handling of
individual object.
e.g. harder or softer, cold or hot
3. Interval scale- numbers that reflect differences among
other items.
e.g. score in a test, blood pressure, ages, number of
students
4. Ratio scales- measure of length weight, loudness,
softness, width and so on
highest types of scale
Terminologies
Concepts of Statistics
Population- is the collection of all
individuals or items under consideration in a
statistical study. Two kinds of populations:
finite (countable) or infinite. (uncountable)
Sample-is that part of the population from
which information is collected.
Data- are the facts and figures collected,
summarized, analyzed and interpreted.
The data collected in a particular study
are referred to as data set.
Variable - any characteristic of an individual or entity.
A variable can take different values for different
individuals. Variables can be
Interval - Values of the variable are ordered as in
Ordinal, and additionally, differences between values are
meaningful, however, the scale is not absolutely anchored.
Calendar dates and temperatures on the Fahrenheit scale are
examples. Addition and subtraction, but not multiplication
and division are meaningful operations.
Ratio - Variables with all properties of Interval plus an
absolute, non-arbitrary zero point, e.g. age, weight,
temperature (Kelvin). Addition, subtraction, multiplication,
and division are all meaningful operations.
TYPES OF STATISTICS
Descriptive Statistics
is concerned with summary calculations,
graphs, charts, and tables, this is also a set of
methods to describe data that we have collected.
Of 350 randomly selected people in the town of Luserna,
Italy, 280 people had the last name Nicolussi. An example
of descriptive statistics is the following statement :
"80% of these people have the last name Nicolussi."
This is a descriptive statement because they can actually be
verified from the information provided.
Inferential Statistics
is a method used to generalize from a sample to a
population.. For example, the average income of all
families(the population) in India can be estimated from
figures obtained from a few hundred(the sample)
families.
This is also a set of methods used to make a
generalization, estimate, prediction or decision.
The major use of inferential statistics is to use
information from a sample to infer something about
a population.
Of 350 randomly selected people in the town of Luserna,
Italy, 280 people had the last name Nicolussi. An example
of inferential statistics is the following statement :
"80% of all people living in Italy have the last name
Nicolussi.
We have no information about all people living in Italy, just
about the 350 living in Luserna. We have taken that
information and generalized it to talk about all people living
in Italy. The easiest way to tell that this statement is not
descriptive is by trying to verify it based upon the
information provided.
VARIABLES
Variable - any characteristic of an individual or entity. A
variable can take different values for different individuals.
-any characteristics number or quantity that can be
measured/ counted.
Variables can be
categorical variables- have values that describe a quality or
characteristics of a data unit like what typeor which
category
Nominal - Categorical variables with no inherent order or ranking
sequence such as names or classes (e.g., gender). Value may be a
numerical, but without numerical value (e.g., I, II, III). The only
operation that can be applied to Nominal variables is enumeration.
Ordinal - Variables with an inherent rank or order, e.g. mild,
moderate, severe. Can be compared for equality, or greater or less,
but not how much greater or less.
NUMERIC OR QUANTITATIVE- have values that
describe a measurable quantity as a number like how
many or how much
continuous variable- observation can take any value
between a certain set of a real number
discrete variable- observation can take a value based on
a count from a set of distinct whole values.
DATA
Two types of statistical presentation of data - graphical
and numerical.
Graphical Presentation: We look for the overall pattern
and for striking deviations from that pattern. Over all
pattern usually described by shape, center, and spread of
the data.
Bar diagram and Pie charts are used for categorical
variables.
Histogram is used for numerical variable.
Graphical
Presentation
Bar Diagram: Lists the categories and presents the
percent or count of individuals who fall in each category.
Categorical Variable
Pie Chart: Lists the categories and presents the percent or
count of individuals who fall in each category.
A fundamental concept in summary statistics is that of a
central value for a set of observations and the extent to
which the central value characterizes the whole set of
data. Measures of central value such as the mean or
median must be coupled with measures of data
dispersion (e.g., average distance from the mean) to
indicate how well the central value characterizes the
data as a whole.
To understand how well a central value characterizes a set of observations,
let us consider the following two sets of data:
A: 30, 50, 70
B: 40, 50, 60
The mean of both two data sets is 50. But, the distance of the observations
from the mean in data set A is larger than in the data set B. Thus, the mean
of data set B is a better representation of the data set than is the case for
set A.
Numerical
Presentation
Histogram: Overall pattern can be described by its shape,
center, and spread. The following age distribution is right
skewed. The center lies between 80 to 100. No outliers.
Numerical Variable
THANK YOU