Introduction to
Statistics
Definitions of Statistics
• According to you what is statistics?
• What purpose does Statistics serve?
Statistics consist of facts and figures such as the average
annual snowfall in Denver or Derrick Jeter’s lifetime
batting average. These statistics are usually informative
and time-saving because they condense large quantities
of information into a few simple figures.
Purposes of Statistics
Statistics serve two general purposes:
1. Statistics are used to organize and summarize the
information so that the researcher can see what
happened in the research study and can communicate
the results to others.
2. Statistics help the researcher to answer the questions
that initiated the research by determining exactly what
general conclusions are justified based on the specific
results that were obtained.
Populations and Samples
Population
• A population is the set of all the individuals of interest in a particular study.
• A population can be quite large—for example, the entire set of women on the planet
Earth. A researcher might be more specific, limiting the population for study to women
who are registered voters in the United States. Perhaps the investigator would like to
study the population consisting of women who are heads of state. Populations can
obviously vary in size from extremely large to very small, depending on how the
investigator defines the population.
Sample
A sample is a set of individuals selected from a population,
usually intended to represent the population in a research study.
• Just as we saw with populations, samples can vary in size. For example, one study
might examine a sample of only 10 students in a graduate program and another study
might use a sample of more than 10,000 people who take a specific cholesterol
medication.
Variables and Data
Variable
A variable is a characteristic or condition that
changes or has different values for different
individuals.
Researchers are interested in specific characteristics of the individuals in the
population (or in the sample), or they are interested in outside factors that may
influence the individuals. For example, a researcher may be interested in the
influence of the weather on people’s moods. As the weather changes, do
people’s moods also change? Something that can change or have different
values is called a variable.
Data
Data (plural) are measurements or observations. A data
set is a collection of measurements or observations. A
datum (singular) is a single measurement or observation
and is commonly called a score or raw score.
To demonstrate changes in variables, it is necessary to make measurements of
the variables being examined. The measurement obtained for each individual is
called a datum, or more commonly, a score or raw score. The complete set of
scores is called the data set or simply the data.
Parameters and Statistics
A parameter is a value, usually a numerical value,
that describes a population. A parameter is usually
derived from measurements of the individuals in the
population.
A statistic is a value, usually a numerical value,
that describes a sample. A statistic is usually derived
from measurements of the individuals in the sample.
Descriptive and Inferential Statistical
Methods
Descriptive Statistics
• Descriptive statistics are statistical
procedures used to summarize, organize,
and simplify data.
Descriptive statistics are techniques that take raw scores and organize or
summarize them in a form that is more manageable. Often the scores are
organized in a table or a graph so that it is possible to see the entire set of
scores. Another common technique is to summarize a set of scores by
computing an average. Note that even if the data set has hundreds of scores,
the average provides a single descriptive value for the entire set.
Inferential Statistics
Inferential statistics consist of techniques that
allow us to study samples and then make
generalizations about the populations from which
they were selected.
Because populations are typically very large, it usually is not possible to
measure everyone in the population. Therefore, a sample is selected to
represent the population. By analyzing the results from the sample, we hope to
make general statements about the population. Typically, researchers use
sample statistics as the basis for drawing conclusions about population
parameters.
Sampling Error
Sampling error is the naturally occurring
discrepancy, or error, that exists between a
sample statistic and the corresponding population
parameter.
Variables and Measurement
Discrete and Continuous Variables
• A discrete variable consists of separate,
indivisible categories. No values can exist
between two neighboring categories.
Discrete variables are commonly restricted to whole, countable numbers—for example,
the number of children in a family or the number of students attending class.
What are other examples of Discrete variable?
For a continuous variable, there are an infinite
number of possible values that fall between any two
observed values. A continuous variable is divisible into
an infinite number of fractional parts.
• When measuring a continuous variable, it should be very rare to obtain
identical measurements for two different individuals. Because a
continuous variable has an infinite number of possible values, it should
be almost impossible for two people to have exactly the same score.
• When measuring a continuous variable, each measurement category is
actually an interval that must be defined by boundaries. For example,
two people who both claim to weigh 150 pounds are probably not
exactly the same weight. However, they are both around 150 pounds.
One person may actually weigh 149.6 and the other 150.3.
• Is gender a discrete variable or continuous variable?
• Is height a discrete variable or continuous variable?
• Is weight a discrete variable or continuous variable?
Scales of Measurement
• It should be obvious by now that data collection
requires that we make measurements of our
observations. Measurement involves assigning
individuals or events to categories. The categories can
simply be names such as male/female or
employed/unemployed, or they can be numerical values
such as 68 inches or 175 pounds. The categories used
to measure a variable make up a scale of measurement,
and the relationships between the categories determine
different types of scales.
The Nominal Scale
A nominal scale consists of a set of categories that
have different names. Measurements on a nominal scale
label and categorize observations, but do not make any
quantitative distinctions between observations.
• If you were measuring the academic majors for a group of college
students, the categories would be art, biology, business,
chemistry, and so on. Each student would be classified in one
category according to his or her major.
• Does your attendance list fall into this category?
The Ordinal Scale
An ordinal scale consists of a set of categories
that are organized in an ordered sequence.
Measurements on an ordinal scale rank
observations in terms of size or magnitude.
An ordinal scale consists of a series of ranks (first, second, third,
and so on) like the order of finish in a horse race. Occasionally, the
categories are identified by verbal labels like small, medium, and
large drink sizes at a fast-food restaurant.
The Interval and Ratio Scales
An interval scale consists of ordered categories that are
all intervals of exactly the same size. Equal differences
between numbers on scale reflect equal differences in
magnitude. However, the zero point on an interval scale is
arbitrary and does not indicate a zero amount of the
variable being measured.
A ratio scale is an interval scale with the additional feature
of an absolute zero point. With a ratio scale, ratios of
numbers do reflect ratios of magnitude.
• A temperature of 0º Fahrenheit does not mean
that there is no temperature, and it does not
prohibit the temperature from going even lower.
Interval scales with an arbitrary zero point are
relatively rare.
• A ratio scale is anchored by a zero point that is
not arbitrary but rather is a meaningful value
representing none (a complete absence) of the
variable being measured. The existence of an
absolute, non-arbitrary zero point means that we
can measure the absolute amount of the variable;
that is, we can measure the distance from 0.