Statistics 8

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 33

S ta t i s t ics

1
The Nature of
Statistic

• Definition (Statistics)
Statistics is concerned with
• the collection of data,
• their description, and
• their analysis, which often leads to the drawing
of conclusions.
Statistics: The science of collecting, describing,
and interpreting data.

2
Two areas of
statistics:

Descriptive Statistics: collection, presentation,


and description of sample data.
Inferential Statistics: making decisions and
drawing conclusions about populations.
Introduction to Basic
Terms

Population: A collection, or set, of individuals


or objects or events whose properties are to
be analyzed.
Two kinds of populations: finite or infinite.

Sample: A subset of the population.


Variable: A characteristic about each individual element of
a population or sample.
Data (singular): The value of the variable associated with
one element of a population or sample. This value may
be a number, a word, or a symbol.
Data (plural): The set of values collected for the variable
from each of the elements belonging to the sample.
Experiment: A planned activity whose results yield a set
of data.
Parameter: A numerical value summarizing all the data of
an entire population.
Statistic: A numerical value summarizing the sample data.
Example: A college dean is interested in learning about the
average age of faculty. Identify the basic terms in this situation.

The population is the age of all faculty members at the college.


A sample is any subset of that population. For example, we might
select 10 faculty members and determine their age.
The variable is the “age” of each faculty member.
One data would be the age of a specific faculty member.
The data would be the set of values in the sample.
The experiment would be the method used to select the ages
forming the sample and determining the actual age of each faculty
member in the sample.
The parameter of interest is the “average” age of all faculty at the
college.
The statistic is the “average” age for all faculty in the sample.
Two kinds of variables:
Qualitative, or Attribute, or Categorical,
Variable: A variable that categorizes or
describes an element of a population.
Note: Arithmetic operations, such as
addition and averaging, are not meaningful
for data resulting from a qualitative
variable.
Quantitative, or Numerical, Variable: A
variable that quantifies an element of a
population.
Note: Arithmetic operations such as addition
and averaging, are meaningful for data resulting
from a quantitative variable.
Quantitative
Data

• Number of TV sets owned by a


family
• Number of books in the library
• Time spent on travel to school
• Amount of rainfall
• Weight of an apple

8
Qualitative
data

• Eye color
• Gender
• color of an
apple
• taste of an
apple
• smell of an
apple
9
Example: Identify each of the following examples as attribute
(qualitative) or numerical (quantitative) variables.

1. The residence hall for each student in a statistics class.


2. The amount of gasoline pumped by the next 10 customers
at the local NE Mall.
3. The amount of radon in the basement of each of 25 homes
in a new development.
4. The color of the baseball cap worn by each of 20 students.
5. The length of time to complete a mathematics homework
assignment.
6. The state in which each truck is registered when stopped
and inspected at a weigh station.
Example: Identify each of the following as examples
of qualitative or numerical variables:
1. The temperature in Baler, Aurora at 12:00 pm on
any given day.
2. The make of automobile driven by each faculty
member.
3. Whether or not a 6 volt lantern battery is
defective.
4. The weight of a lead pencil.
5. The length of time billed for a long distance
telephone call.
6. The brand of cereal children eat for breakfast.
7. The type of book taken out of the library by an
adult.
Qualitative and quantitative variables may be further
subdivided:

Nominal
Qualitative
Ordinal
Variable
Discrete
Quantitative
Continuous
Nominal Variable: A qualitative variable that
categorizes (or describes, or names) an element of a
population.

Ordinal Variable: A qualitative variable that incorporates


an ordered position, or ranking.

Discrete Variable: A quantitative variable that can assume


a countable number of values. Intuitively, a discrete
variable can assume values corresponding to isolated points
along a line interval. That is, there is a gap between any two
values.

Continuous Variable: A quantitative variable that can


assume an uncountable number of values. Intuitively,
a continuous variable can assume any value along a line
interval, including every possible value between any
two values.
Statistics deals with
numbers

• Need to know nature of numbers collected


– Continuous variables: type of numbers
associated with measuring or weighing; any
value in a continuous interval of measurement.
• Examples:
– Weight of students, height of plants, time to flowering
– Discrete variables: type of numbers that
are counted or categorical
• Examples:
– Numbers of boys, girls, insects, plants
Note:
1. In many cases, a discrete and continuous variable
may be distinguished by determining whether the
variables are related to a count or a measurement.
2. Discrete variables are usually associated with
counting. If the variable cannot be further
subdivided, it is a clue that you are probably dealing
with a discrete variable.
3. Continuous variables are usually associated with
measurements. The values of discrete
variables are only limited by your ability to
measure them.
1.4: Data
Collection

• First problem a statistician faces: how


to obtain the data.
• It is important to obtain good, or
representative, data.
• Inferences are made based on
statistics obtained from the data.
• Inferences can only be as good as the
data.
Biased Sampling Method: A sampling method that
produces data which systematically differs from the
sampled population. An unbiased sampling method is one
that is not biased.

Sampling methods that often result in biased samples:


1. Convenience sample: sample selected from elements of
a population that are easily accessible.
2. Volunteer sample: sample collected from those
elements of the population which chose to contribute
the needed information on their own initiative.
Process of data collection:

1.Define the objectives of the survey or experiment.


Example: Estimate the average life of an
electronic component.
2.Define the variable and population of interest.
Example: Length of time for anesthesia to wear off
after surgery.
3.Defining the data-collection and data-measuring
schemes. This includes sampling procedures, sample
size, and the data-measuring device (questionnaire,
scale, ruler, etc.).
4.Determine the appropriate descriptive or inferential
data- analysis techniques.
Methods used to collect data:

Experiment: The investigator controls or modifies the


environment and observes the effect on the variable
under study.

Survey: Data are obtained by sampling some of the


population of interest. The investigator does not modify
the environment.

Census: A 100% survey. Every element of the


population is listed. Seldom used: difficult and time-
consuming to compile, and expensive.
Sampling Frame: A list of the elements belonging to
the population from which the sample will be drawn.

Note: It is important that the sampling frame


be representative of the population.

Sample Design: The process of selecting sample


elements from the sampling frame.

Note: There are many different types of sample designs.


Usually they all fit into two categories: judgment
samples and probability samples.
Judgment Samples: Samples that are selected on the basis
of being “typical.”

Items are selected that are representative of the


population. The validity of the results from a judgment
sample reflects the soundness of the collector’s judgment.

Probability Samples: Samples in which the elements to be


selected are drawn on the basis of probability. Each
element in a population has a certain probability of being
selected as part of the sample.
Random Samples: A sample selected in such a way that
every element in the population has a equal probability of
being chosen. Equivalently, all samples of size n
have an equal chance of being selected. Random samples
are obtained either by sampling with replacement from a
finite population or by sampling without replacement from
an infinite population.

Note:
1. Inherent in the concept of randomness: the next result (or occurrence) is
not predictable.
2. Proper procedure for selecting a random sample: use a random number
generator or a table of random numbers.
Example: An employer is interested in the time it takes
each employee to commute to work each morning.
A random sample of 35 employees will be selected and
their commuting time will be recorded.

There are 2712 employees.


Each employee is numbered: 0001, 0002, 0003, etc. up
to 2712.
Using four-digit random numbers, a sample is
identified: 1315, 0987, 1125, etc.
Organizing and Graphing
Data
Ranking of Donut-eating Profs.
(most to least)
Zingers 308
Honkey-Doorey 251
Calzone 227
Bopsey 213
Googles-boop 199
Pallitto 189
Homer 187
Schnickerson 165
Smuggle 165
Boehmer 151
Levin 148
Queeny 132
Here we have placed the Professors into
weight classes and depict with a histogram in
columns.
Weight Class Intervals of Donut-Munching Professors

3.5
3
2.5
2
Number
1.5
1
0.5
0
1
3
0
-
1
5
0

1
Here it is another histogram depicted
as a bar graph.

Weight Clas Intervals of Donut-Munching Profes ors

311+
271-310
241-270
211-240 Number
186-210
151-185
130-150

0 0.5 1 1.5 2 2.5 3 3.5


Pie
Charts:

Proportions of Donut-Eating Professors by Weight Class

130-150
151-185
186-210
211-240
241-270
271-310
311+
Actually, why not use a donut graph.
Duh!

Proportions of Donut-Eating Profes ors by Weight Clas

130-150
151-185
186-210
211-240
241-270
271-310
311+
Approva
19 l
81

100

10
20
30
40
50
60
70
80
90

0
19
82
19
83 Series
19
84
19
85
19
86
19
87
19
88
19
89
19
90
19
91

h
Line Graphs: A Time

19
Economic approval

92

Mont
19
93
19
94
l

19
95
19
Approva

96
19
97
19
98
19
99
20
00
20
01
Scatter Plot (Two
variable)

Presidential Approval and Unemployment

100
80
Approval

60
Approv
40 e
20
0
0 2 4 6 8 10 12
Unemploymen
t
1. Terminology
Populations &
Samples

• Population: the complete set of individuals,


objects or scores of interest.
– Often too large to sample in its entirety
– It may be real or hypothetical (e.g. the results from
an experiment repeated ad infinitum)

• Sample: A subset of the population.


– A sample may be classified as random (each member
has equal chance of being selected from a population)
or convenience (what’s available).
– Random selection attempts to ensure the sample is
representative of the population.
41
Variables

• Variables are the quantities measured in


a sample.They may be classified as:
– Quantitative i.e. numerical
• Continuous (e.g. pH of a sample,
patient cholesterol levels)
• Discrete (e.g. number of bacteria colonies in
a culture)
– Categorical
• Nominal (e.g. gender, blood group)
• Ordinal (ranked e.g. mild, moderate or severe
illness). Often ordinal variables are re-coded to
be quantitative.
42

You might also like