0% found this document useful (0 votes)
34 views68 pages

Lesson 1 Introduction To Statistics

Uploaded by

Cheez Nutz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views68 pages

Lesson 1 Introduction To Statistics

Uploaded by

Cheez Nutz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

CITM.

003
Quantitative Methods
Mr. Melvin Ledesma
Instructor
Introductory Lessons: Definitions

OBJECTIVES
 Define statistics
 Distinguish clearly between
◼ Descriptive and Inferential Statistics
◼ Surveys and Experiments
◼ Retrospective and Prospective Studies
◼ Descriptive and Analytical Surveys
 Define bias
 Describe a clinical trial
Basics of Statistics
Definition: Science of collection, presentation,
analysis, and reasonable interpretation of data.
Statistics presents a rigorous scientific method for gaining insight into
data. For example, suppose we measure the weight of 100 patients in
a study. With so many measurements, simply looking at the data fails
to provide an informative account. However statistics can give an
instant overall picture of data based on graphical presentation or
numerical summarization irrespective to the number of data points.
Besides data summarization, another important task of statistics is to
make inference and predict relations of variables.
A Taxonomy of Statistics
The Uses of Statistics
Descriptive vs Inferential Statistics
 Descriptive statistics – deals with the
enumeration, organization, and
graphical representation of data.
 Inferential statistics – concerned with
reaching conclusions from incomplete
information—that is, generalizing from
the specific. It uses information taken
from a sample to say something about
an entire population.
Example of Descriptive Statistics
 An example of descriptive statistics is
the decennial census of the USA,
where all residents are requested to
provide information such as age, sex,
race and marital stratus.
 The collected data can be compiled
and arranged into tables and graphs
that DESCRIBE the characteristics of
the population at a given time.
Example of Inferential Statistics
 An example of inferential statistics is an
opinion poll such as the Gallup Poll,
which attempts to draw inferences as to
the outcome of an election.
 In such a poll, a sample of individuals
(frequently fewer than 2000) is selected,
their preferences are tabulated, and
inferences are made as to how more
than 80 million persons would vote if an
election were held that day.
Sources of Data
 Surveys and Experiments
◼ Known to be the two fundamental kinds of
investigations.
◼ Data from survey may represent
observations of events or phenomena over
which few, if any, controls are imposed.
◼ In an experiment, we design a research plan
purposely to impose controls over the
amount of exposure (treatment) to a
phenomenon.
Sources of Data
 Retrospective Studies (case-control
studies)
◼ Gather data from selected cases and
controls to determine differences, if any,
in the exposure to a suspected factor.
◼ The researcher identifies individuals with
a specific disease or condition (cases)
and also identifies a comparable sample
without that disease or condition
(controls).
Sources of Data
 Prospective Studies (cohort studies)
◼ The researchers enroll a group of healthy
persons (a cohort) and follow them over a
certain period to determine the frequency
with which a disease develops.
◼ The advantage of this study is that they
permit the accurate estimation of disease
incidence in a population. They make it
possible to include potentially relevant
variables (e.g. age, gender, ethnicity,
occupation) that may be related to the
outcome variable.
Descriptive vs Analytical Surveys
 Retrospective surveys are usually
descriptive surveys that provide
estimates of a population’s
characteristics.
 Prospective surveys may be descriptive
or analytical. Analytical surveys seek
to determine the degree of association
between a variable and a factor in the
population.
Clinical Trial
 A carefully designed experiment that
is generally considered to be the best
method for evaluating the
effectiveness of a new drug or
treatment method.
 Are used extensively to test the
efficacy of new drugs and treatments.
Statistical Description of Data
 Statistics describes a numeric set of
data by its
 Center
 Variability
 Shape
 Statistics describes a categorical set
of data by
 Frequency, percentage or proportion of
each category
Populations and Samples

OBJECTIVES
 Distinguish between
◼ Populations and samples
◼ Parameters and statistics
◼ Various methods of sampling
 Define Random Sample
 Explain why it is important to use random
sampling
Populations and Samples
Definition: A population is a set of persons (or
objects) having a common observable
characteristics. A sample is a subset of a
population.

Note: The way the sample is selected, not its


size, determines whether we may draw
appropriate inferences about the population.
SAMPLING
 A sample is “a smaller (but hopefully representative)
collection of units from a population used to determine
truths about that population” (Field, 2005)
 Why sample?
◼ Resources (time, money) and workload
◼ Gives results with known accuracy that can be
calculated mathematically
 The sampling frame is the list from which the potential
respondents are drawn
◼ Registrar’s office
◼ Class rosters
◼ Must assess sampling frame errors

16
SAMPLING……
 What is your population of interest?
 To whom do you want to generalize your
results?
◼ All doctors
◼ School children
◼ Indians
◼ Women aged 15-45 years
◼ Other
 Can you sample the entire population?
17
SAMPLING…….
 3 factors that influence sample
representativeness
 Sampling procedure
 Sample size
 Participation (response)

 When might you sample the entire population?


 When your population is very small
 When you have extensive resources
 When you don’t expect a very high response

18
19

SAMPLING BREAKDOWN
Types of Samples
 Probability (Random) Samples
 Simple random sample
◼ Systematic random sample
◼ Stratified random sample
◼ Cluster sample

 Non-Probability Samples
◼ Convenience sample
◼ Purposive sample
◼ Quota

20
Process
 The sampling process comprises several stages:
◼ Defining the population of concern
◼ Specifying a sampling frame, a set of items or events
possible to measure
◼ Specifying a sampling method for selecting items or
events from the frame
◼ Determining the sample size
◼ Implementing the sampling plan
◼ Sampling and data collecting
◼ Reviewing the sampling process
21
Population definition
 A population can be defined as including all
people or items with the characteristic one
wishes to understand.
 Because there is very rarely enough time
or money to gather information from
everyone or everything in a population, the
goal becomes finding a representative
sample (or subset) of that population.

22
SAMPLING FRAME
 It is possible to identify and measure every single item in the
population and to include any one of them in our sample.
However, in the more general case this is not possible. There
is no way to identify all rats in the set of all rats. Where
voting is not compulsory, there is no way to identify which
people will actually vote at a forthcoming election (in advance
of the election)
 As a remedy, we seek a sampling frame which has the
property that we can identify every single element and include
any in our sample .
 The sampling frame must be representative of the population

23
PROBABILITY SAMPLING

 A Probability Sampling scheme is one in which every


unit in the population has a chance (greater than
zero) of being selected in the sample, and this
probability can be accurately determined.

 When every element in the population does have the


same probability of selection, this is known as an
'equal probability of selection' (EPS) design. Such
designs are also referred to as 'self-weighting'
because all sampled units are given the same weight.

24
PROBABILITY SAMPLING…….

Probability sampling includes:


 Simple Random Sampling,
 Systematic Sampling,
 Stratified Random Sampling,
 Cluster Sampling

25
NON-PROBABILITY SAMPLING
 Any sampling method where some elements of population
have no chance of selection (these are sometimes
referred to as 'out of coverage'/'undercovered'), or
where the probability of selection can't be accurately
determined. It involves the selection of elements based
on assumptions regarding the population of interest,
which forms the criteria for selection. Hence, because
the selection of elements is nonrandom, nonprobability
sampling not allows the estimation of sampling errors..

 Example: We visit every household in a given street, and


interview the first person to answer the door. In any
household with more than one occupant, this is a
nonprobability sample, because some people are more
likely to answer the door (e.g. an unemployed person who
spends most of their time at home is more likely to
answer than an employed housemate who might be at
work when the interviewer calls) and it's not practical to
calculate these probabilities. 26
NONPROBABILITY SAMPLING…….
• Nonprobability Sampling includes:
Accidental Sampling, Quota Sampling and
Purposive Sampling.
• In addition, nonresponse effects may
turn any probability design into a
nonprobability design if the
characteristics of nonresponse are not
well understood, since nonresponse
effectively modifies each element's
probability of being sampled. 27
SIMPLE RANDOM SAMPLING
• Applicable when population is small, homogeneous &
readily available
• All subsets of the frame are given an equal probability.
Each element of the frame thus has an equal probability
of selection.
• It provides for greatest number of possible samples.
This is done by assigning a number to each unit in the
sampling frame.
• A table of random number or lottery system is used to
determine which units are to be selected.

28
SIMPLE RANDOM SAMPLING……..
Definition: Can also be thought of as a 'pick a name out of the hat'
technique. Samples are chosen from a population either by
using a random number table or a random number generator.
Each member of the population has an equal, independent
and known chance of being selected.
Advantages
 Easy to implement
 Each member of the population has an equal chance of being
selected
 Free from bias
Disadvantages
 If sampling frame large, this method may be impractical.
 A complete list of population may not be available.
 Minority subgroups of interest in population may not be present
in sample in sufficient numbers for study. 29
SYSTEMATIC SAMPLING
 Systematic sampling relies on arranging the target
population according to some ordering scheme and then
selecting elements at regular intervals through that
ordered list.
 Systematic sampling involves a random start and then
proceeds with the selection of every kth element from
then onwards. In this case, k=(population size/sample
size).
 It is important that the starting point is not
automatically the first in the list, but is instead
randomly chosen from within the first to the kth
element in the list.
 A simple example would be to select every 10th name
from the telephone directory (an 'every 10th' sample,
also referred to as 'sampling with a skip of 10').
30
SYSTEMATIC SAMPLING……

As described above, systematic sampling is an EPS method,


because all elements have the same probability of selection (in the
example given, one in ten). It is not 'simple random sampling'
because different subsets of the same size have different
selection probabilities - e.g. the set {4,14,24,...,994} has a one-in-
ten probability of selection, but the set {4,13,24,34,...} has zero
probability of selection.

31
SYSTEMATIC SAMPLING……
ADVANTAGES:
 Sample easy to select
 Suitable sampling frame can be identified easily
 Sample evenly spread over entire reference population

DISADVANTAGES:
 May be biased where the pattern used for the samples
coincides with a pattern in the population.
 Difficult to assess precision of estimate from one survey.

32
STRATIFIED SAMPLING
Where population embraces a number of distinct
categories, the frame can be organized into separate
"strata." Each stratum is then sampled as an
independent sub-population, out of which individual
elements can be randomly selected.
 Every unit in a stratum has same chance of being
selected.
 Using same sampling fraction for all strata ensures
proportionate representation in the sample.
 Adequate representation of minority subgroups of
interest can be ensured by stratification & varying
sampling fraction between strata as required. 33
STRATIFIED SAMPLING……
 Finally, since each stratum is treated as an
independent population, different sampling approaches
can be applied to different strata.

Drawbacks to using stratified sampling.


 First, sampling frame of entire population has to be
prepared separately for each stratum
 Second, when examining multiple criteria, stratifying
variables may be related to some, but not to others,
further complicating the design, and potentially
reducing the utility of the strata.
 Finally, in some cases (such as designs with a large
number of strata, or those with a specified minimum
sample size per group), stratified sampling can
potentially require a larger sample than would other 34
methods
STRATIFIED SAMPLING…….

Draw a sample from each stratum

35
CLUSTER SAMPLING
 Cluster sampling is an example of 'two-stage sampling' .
 First stage a sample of areas is chosen;
 Second stage a sample of respondents within those
areas is selected.
 Population divided into clusters of homogeneous units,
usually based on geographical contiguity.
 Sampling units are groups rather than individuals.
 A sample of such clusters is then selected.
 All units from the selected clusters are studied.

36
CLUSTER SAMPLING…….
Advantages :
 Cuts down on the cost of preparing a
sampling frame.
 This can reduce travel and other
administrative costs.

Disadvantages:
 sampling error is higher for a simple random
sample of same size.

37
CLUSTER SAMPLING…….
• Identification of clusters
– List all cities, towns, villages & wards of cities with
their population falling in target area under study.
– Calculate cumulative population & divide by 30, this
gives sampling interval.
– Select a random no. less than or equal to sampling
interval having same no. of digits. This forms 1st
cluster.
– Random no.+ sampling interval = population of 2nd
cluster.
– Second cluster + sampling interval = 4th cluster.
– Last or 30th cluster = 29th cluster + sampling interval38
CLUSTER SAMPLING…….
Two types of cluster sampling methods.
One-stage sampling. All of the elements
within selected clusters are included in
the sample.
Two-stage sampling. A subset of elements
within selected clusters are randomly
selected for inclusion in the sample.

39
Non-PROBABILITY SAMPLING

 A Non-Probability Sampling is a
method of selecting units from
population using a subjective (i.e.
nonrandom) method.
 It does not require any study frame.

40
(1) QUOTA SAMPLING
 The population is first segmented into mutually exclusive sub-
groups, just as in stratified sampling.
 Then judgment used to select subjects or units from each segment
based on a specified proportion.
 For example, an interviewer may be told to sample 200 females and
300 males between the age of 45 and 60.
 It is this second step which makes the technique one of non-
probability sampling.
 In quota sampling the selection of the sample is non-random.
 For example interviewers might be tempted to interview those who
look most helpful. The problem is that these samples may be biased
because not everyone gets a chance of selection. This random
element is its greatest weakness and quota versus probability has
been a matter of controversy for many years

41
(2) CONVENIENCE SAMPLING
 Sometimes known as grab or opportunity sampling or accidental or
haphazard sampling.
 A type of nonprobability sampling which involves the sample being drawn
from that part of the population which is close to hand. That is, readily
available and convenient.
 The researcher using such a sample cannot scientifically make
generalizations about the total population from this sample because it would
not be representative enough.
 For example, if the interviewer was to conduct a survey at a shopping
center early in the morning on a given day, the people that he/she could
interview would be limited to those given there at that given time, which
would not represent the views of other members of society in such an area,
if the survey was to be conducted at different times of day and several
times per week.
 This type of sampling is most useful for pilot testing.
 In social science research, snowball sampling is a similar technique, where
existing study subjects are used to recruit more subjects into the sample.

42
CONVENIENCE SAMPLING…….

◼ Use results that are easy to get

43 43
(3) Judgmental sampling or Purposive sampling

 - The researcher chooses the sample


based on who they think would be
appropriate for the study. This is used
primarily when there is a limited number
of people that have expertise in the area
being researched

44
PANEL SAMPLING
 Method of first selecting a group of participants through a
random sampling method and then asking that group for the
same information again several times over a period of time.
 Therefore, each participant is given same survey or interview at
two or more time points; each period of data collection called a
"wave".
 This sampling methodology often chosen for large scale or
nation-wide studies in order to gauge changes in the population
with regard to any number of variables from chronic illness to
job stress to weekly food expenditures.
 Panel sampling can also be used to inform researchers about
within-person health changes due to age or help explain changes
in continuous dependent variables such as spousal interaction.
 There have been several proposed methods of analyzing panel
sample data, including growth curves.
45
REVIEW:
Introduction to Statistics
Exercise
Identify whether each situation shows a
descriptive or inferential statistics.

a) The average age of the students in a statistics


class is 21 years.
b) The chances of winning the California Lottery
are one chance in twenty-two million.
c) There is a relationship between smoking
cigarettes and getting emphysema.
d) From past figures, it is predicted that 39% of
the registered voters in California will vote in
the June primary.
Exercise
Identify the population and the sample:

a) A survey of 1353 American


households found that 18% of the
households own a computer.
b) A recent survey of 2625 elementary
school children found that 28% of the
children could be classified obese.
c) The average weight of every sixth
person entering the mall within 3
hour period was 146 lb.
Exercise:
Identify the sampling method used in each situation

 A sample of 2,000 was sought to estimate the average achievement in science of


fifth graders in a city’s public schools. The average fifth grade enrollment in the
city’s elementary schools is 100 students. Thus, 20 schools were randomly
selected and within each of those schools all fifth graders were tested.
 A researcher has a population of 100 third grade children from a local school
district from which a sample of 25 children is to be selected. Each child’s name
is put on a list, and each child is assigned a number from 1 to 100. Then the
numbers 1 to 100 are written on separate pieces of paper and shuffled. Finally,
the researcher picks 25 slips of paper and the numbers on the paper determine
the 25 participants.
 A sociologist conducts an opinion survey in a major city. Part of the research
plan calls for describing and comparing the opinions of four different ethnic
groups: African Americans, Asian Americans, European Americans, and Native
Americans. For a total sample of 300, the researcher selects 75 participants
from each of the four predetermined subgroups.
Exercise:
Identify the sampling method used in each situation

 Instructors teaching research methods are interested in


knowing what study techniques their students are utilizing.
Rather than assessing all students, the researchers randomly
select 10 students from each of the sections to comprise
their sample.
Exercise:
Describe how you would select people to take your survey in the
following situations with the listed method (when provided):

 Say you wanted to survey professors at KSU


about their publication records. You want to be
sure to include professors from each
department, even the very small ones.
 You want to select a sample of KSU students
using simple random sampling, how would you
do this?
 Say you were interested in sampling students
who deal drugs on campus. What sampling
technique could you use to build this sample?
Basic Definition of Terms
in Statistics
Parameter vs Statistic Levels of Measurement
Qualitative vs Quantitative Data Discrete vs Continuous Variable
Parameter vs Statistic
 Parameter – a value that
represents a population

 Statistic – a value that represents


a sample
Steps to tell the difference between a
statistic and a parameter:
 Step 1: Ask yourself, is this a fact
about the whole population? With
small populations, you usually have
a parameter because the groups
are small enough to measure.
Steps to tell the difference between a
statistic and a parameter:

 Step 2: Ask yourself, is this


obviously a fact about a very
large population? If it is, you
have a statistic.
Parameter vs Statistic
 The symbols change as you move
from a statistic to a parameter. Greek
symbols are used for parameters.
Example
 Identify whether the numbers
identified are Statistics or
Parameters.
◼ Of all US Kindergarten teachers, 32%
say that knowing the alphabet is an
essential skill.
◼ Of the 800 US Kindergarten teachers
polled, 24% say that knowing
Exercise 1
 Determine whether the numerical value is a
parameter or a statistics (and explain):
a) A recent survey by the alumni of a major
university indicated that the average salary of
10,000 of its 300,000 graduates was 125,000.
b) The average salary of all assembly-line
employees at a certain car manufacturer is
$33,000.
c) The average late fee for 360 credit card
holders was found to be $56.75
Exercise 2
 For the studies described, identify the
population, sample, population
parameters, and sample statistics:
a) In a USA Today Internet poll, readers
responded voluntarily to the question “Do you
consume at least one caffeinated beverage
every day?”
b) Astronomers typically determine the distance
to galaxy (a galaxy is a huge collection of
billions of stars) by measuring the distances to
just a few stars within it and taking the mean
(average) of these distance measurements.
Exercise 2
 For the studies described, identify the
population, sample, population
parameters, and sample statistics:
a) In a USA Today Internet poll, readers
responded voluntarily to the question “Do you
consume at least one caffeinated beverage
every day?”
Population: All readers of USA Today
Sample: Volunteers that responded to the survey
Population parameter: % who have at least one caffeinated drink
among all readers of USA Today
Sample statistic: % who have at least one caffeinated drink among
those who responded to the survey
Qualitative or Quantitative Data
Qualitative Data Quantitative Data
• Deals with • Deals with numbers.
descriptions. • Data which can be
• Data can be observed measured.
but not measured. • Length, height, area,
• Colors, textures, volume, weight,
smells, tastes, speed, time,
appearance, beauty, temperature,
etc. humidity, sound
levels, cost,
members, ages, etc.

Qualitative or Quantitative Data
Qualitative Data Quantitative Data
• Red/green, gold frame • Picture is 10” by 14”
• Smells old and musty • With frame 14” by 18”
• Texture shows brush strokes • Weighs 8.5 pounds
of oil paint • Surface area of painting is
• Peaceful scene of the country 140 sq. in.
• Masterful brush srokes • Cost P30,00
Exercise
 Determine whether the data are
qualitative or quantitative:
a) the colors of automobiles on a used car lot
b) the numbers on the shirts of a girl’s soccer
team
c) the number of seats in a movie theater
d) a list of house numbers on your street
e) the ages of a sample of 350 employees of a
large hospital
Levels of Measurement
Variable - any characteristic of an individual or entity. A variable can
take different values for different individuals. Variables can be
categorical or quantitative. Per S. S. Stevens…
• Nominal - Categorical variables with no inherent order or ranking sequence
such as names or classes (e.g., gender). Value may be a numerical, but without
numerical value (e.g., I, II, III). The only operation that can be applied to Nominal
variables is enumeration.
• Ordinal - Variables with an inherent rank or order, e.g. mild, moderate, severe.
Can be compared for equality, or greater or less, but not how much greater or
less.
• Interval - Values of the variable are ordered as in Ordinal, and additionally,
differences between values are meaningful, however, the scale is not absolutely
anchored. Calendar dates and temperatures on the Fahrenheit scale are examples.
Addition and subtraction, but not multiplication and division are meaningful
operations.
• Ratio - Variables with all properties of Interval plus an absolute, non-arbitrary
zero point, e.g. age, weight, temperature (Kelvin). Addition, subtraction,
multiplication, and division are all meaningful operations.
Exercise
Identify the data set’s level of measurement
(nominal, ordinal, interval, ratio):
a) hair color of women on a high school tennis team
b) numbers on the shirts of a girl’s soccer team
c) ages of students in a statistics class
d) temperatures of 22 selected refrigerators
e) number of milligrams of tar in 28 cigarettes
f) number of pages in your statistics book
g) marriage status of the faculty at the local
community college
h) list of 1247 social security numbers
Discrete vs
Continuous
Variable
Discrete vs
Continuous
Variable
Discrete vs
Continuous
Variable

You might also like