Stt041 Module 1
Stt041 Module 1
INTRODUCTION TO STATISTICS
Overview/Introduction:
In this chapter, you will be introduced to the basic concepts and goals of statistics. For
instance, statistics were used to construct the following graphs, which show the fastest
growing U.S. cities (population over 100,000) in 2008 by percent increase in population,
U.S. cities with the largest numerical increases in population, and the regions where the
cities are located.
For the 2010 Census, the Census Bureau sent short forms to every household. Short forms
ask all members of every household such things as their gender, age, race, and ethnicity.
Previously, a long form, which covered additional topics, was sent to about 17% of the
population. But for the first time since 1940, the long form is being replaced by the
American Community Survey, which will survey about 3 million households a year
throughout the decade. These 3 million households will form a sample. In this course, you
will learn how the data collected from a sample are used to infer characteristics about the
entire population.
STT041 Module 1
PA
MODULE 1. INTRODUCTION TO STATISTICS
A census consists of data from an entire population. But, unless a population is small, it is
usually impractical to obtain all the population data. In most studies, information must be
obtained from a sample.
Data Sets
There are two types of data sets you will use when studying statistics. These data sets are
called population and sample.
A population is the collection of all outcomes, responses, measurements, or counts that are
of interest.
A sample should be representative of a population so that sample data can be used to form
conclusions about that population. Sample data must be collected using an appropriate
method, such as random sampling.
Solution
The population consists of the responses of all adults in the United States, and the sample
consists of the responses of the 1500 adults in the United States in the survey. The sample is
a subset of the responses of all adults in the United States. The sample data set consists of
855 yes’s and 645 no’s.
STT041 Module 1
PA
MODULE 1. INTRODUCTION TO STATISTICS
Solution
1. Because the average of $83,121 is based on a subset of the population, it is a sample
statistic.
2. Because the SAT score of 1442 is based on all the students who accepted admission offers
in 2009, it is a population parameter.
BRANCHES OF STATISTICS
The study of statistics has two major branches: descriptive statistics and Inferential statistics.
Descriptive statistics is the branch of statistics that involves the organization,
summarization, and display of data.
Inferential statistics is the branch of statistics that involves using a sample to draw
conclusions about a population. A basic tool in the study of inferential statistics is
probability.
STT041 Module 1
PA
MODULE 1. INTRODUCTION TO STATISTICS
Solution
Descriptive statistics involves statements such as “For
unmarried men, approximately 70% were alive at age 65”
and “For married men, 90% were alive at 65.” A possible
inference drawn from the study is that being married is
associated with a longer life for men.
MEASUREMENT
Measurement refers to the process of determining the value or label, either qualitatively or
quantitatively, of a particular variable for a particular unit of analysis.
Levels of Measurement
1. Level Characteristics
2. 1.Nominal - numbers or symbols are used simply to classify an object, person,
or characteristics into categories
- the categories must be distinct, non-overlapping and exhaustive
and weakest level of measurement
Example: Religious Affiliation:
Roman Catholic Iglesia Ni Cristo Baptist
STT041 Module 1
PA
MODULE 1. INTRODUCTION TO STATISTICS
Anemic
3. Interval - contains the properties of the ordinal level
- the distances between any two numbers on the scale are of known
sizes
- characterized by a common and constant unit of measurement
- units of measurement are arbitrary
- the number zero does not imply the absence of the characteristic
under consideration (thus, the zero point is arbitrary)
Examples: Temperature in oC and oF Intelligence quotient
(75,100, etc.)
4. Ratio - contains the properties of the interval level
- has a true zero point, that is, the number zero indicates the absence
of the characteristic under consideration
- strongest level of measurement
Examples: Scores in a test Height in meters,
feet, etc.
STT041 Module 1
PA
MODULE 1. INTRODUCTION TO STATISTICS
questionable. Even with the best methods of sampling, a sampling error may occur. A
sampling error is the difference between the results of a sample and those of the population.
When you learn about inferential statistics, you will learn techniques of controlling sampling
errors.
Advantages of Survey Sampling
reduced cost greater speed
greater scope greater accuracy
Purposive Sampling sets out to make a sample agree with the population
in regard to certain characteristics
STT041 Module 1
PA
MODULE 1. INTRODUCTION TO STATISTICS
Disadvantages
The sample chosen may be widely spread, thus entailing high transportation
costs.
A population list, or frame, is needed.
The sample chosen may not be truly typical of the population if the population is
heterogeneous with respect to the characteristic under study.
There are several other commonly used sampling techniques. Each has advantages and
disadvantages.
• Stratified Sample When it is important for the sample to have members from each segment
of the population, you should use a stratified sample. Depending on the focus of the study,
members of the population are divided into two or more subsets, called strata, that share a
similar characteristic such as age, gender, ethnicity, or even political preference. A sample is
then randomly selected from each of the strata. Using a stratified sample ensures that each
segment of the population is represented. For instance, to collect a stratified sample of the
number of people who live in West Ridge County households, you could divide the
households into socioeconomic levels, and then randomly select households from each level.
STT041 Module 1
PA
MODULE 1. INTRODUCTION TO STATISTICS
Stratified random sampling then consists of selecting a SRS from each of the strata into
which the population has been divided.
Advantages
Stratification may bring about a gain in precision of the estimates of characteristics
of the population.
It allows a more comprehensive data analysis since information is provided in each
stratum.
It is administratively convenient.
Disadvantages
A listing of the population for each stratum is needed.
The stratification of the population may require additional prior information about
the population and its strata.
• Cluster Sample When the population falls into naturally occurring subgroups, each having
similar characteristics, a cluster sample may be the most appropriate. To select a cluster
sample, divide the population into groups, called clusters, and select all of the members in
one or more (but not all) of the clusters. Examples of clusters could be different sections of
the same course or different branches of a bank. For instance, to collect a cluster sample of
the number of people who live in West Ridge County households, divide the households into
groups according to zip codes, then select all the households in one or more, but not all, zip
codes and count the number of people living in each household. In using a cluster sample,
care must be taken to ensure that all clusters have similar characteristics. For instance, if one
of the zip code clusters has a greater proportion of high-income people, the data might not
be representative of the population.
STT041 Module 1
PA
MODULE 1. INTRODUCTION TO STATISTICS
Advantages
A population list is not needed.
Listing cost is reduced.
Disadvantages
The costs and problems of statistical analysis are greater.
Estimation procedures are difficult.
Advantages
Drawing of the sample is administratively easy.
It is possible to select a sample in the field without a frame.
Disadvantage
If periodic irregularities are found in the list, a systematic sample may consist only of
similar types.
A type of sample that often leads to biased studies (so it is not recommended) is a
convenience sample. A convenience sample consists only of available members of the
population.
STT041 Module 1
PA
MODULE 1. INTRODUCTION TO STATISTICS
STT041 Module 1
PA
MODULE 1. INTRODUCTION TO STATISTICS
Assessment:
STT041 Module 1
PA