0% found this document useful (0 votes)

49 views7 pages

1 The Mathematics Behind Polling

1) Polling aims to estimate the percentage of people in a large population who agree on an issue by surveying a representative sample. 2) Statistically, each person sampled can be viewed as a Bernoulli trial with an unknown probability p of agreeing. Repeated sampling allows estimating p and calculating confidence intervals. 3) For example, a poll of 1000 people that found 675 agreed is a 90% confident estimate that the true population percentage agreeing is between 65-70%.

Uploaded by

Paulo Sabo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views7 pages

1 The Mathematics Behind Polling

Uploaded by

Paulo Sabo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

1 The Mathematics Behind Polling

1.1 Introduction
One place where we see inferential statistics used every day is in polling. Opin-
ion polls, like it or not, are part of our political system. Polling is also used
in marketing, sales, and entertainment. The intricacies of polling are far too
complicated for us to treat completely. We can, however. understand the basic
ideas behind this discipline and see how probability theory is used in polling.
The view we take is greatly simpli…ed and will necessarily gloss over some prac-
tical di¢ culties. It will, however, make it easier to interpret the kind of poll
results normally reported in the news.
The basic idea is that in a large population of people a certain percentage
will agree on one particular issue. We would like to know what percentage of
people this is. Asking everyone and computing the exact result is out of the
question given the size of the population. We might at least try to estimate
the percentage by choosing a representative sample from the population and
determining the percentage in the sample. Assuming that our sample is truly
representative of the population, the percentage holding that opinion in the
sample should provide a reasonable estimate of the percentage in the entire
population.
Two natural questions might be, what exactly is "a reasonable estimate"
and how con…dent are you in this estimate. Here is where statistics can use
probability theory to quantify the results.

1.2 The Ideas Behind Statistical Estimation.

We will take a very simpli…ed version of statistical estimation and see how
the mathematics works. Remember, we are not going to deal with all the
mathematical and practical issues that come with this method of estimation.
We will just take the simplest possible version of estimation and explore the
logic and mathematics behind it.
We start by assuming we have a very large population; so large that its
actual size overwhelms any particular number we use to sample it. We would
like to estimate the percentage of people in this population who would answer
a speci…c question, "yes." For reasons that will become clear later, we restrict
ourselves to questions where we expect a good number of people will give an
answer of "yes" and a good number will give an answer "no." Let us denote this
unknown percentage as P0 %: We know it exists; it is just that we do not know
what it is.
The …rst thing we do is consider the experiment of choosing one person at
random from the population and asking them the question we are interested in.
What is the probability that they will answer "yes"? If the selection process is
actually random, than any one person is as equally likely to be chosen as any
other. The people who will answer "yes" represent P0 % of that population.

1
That would mean that the probability that the person chosen will answer "yes"
P0
is p0 = 100 : All we have done is convert the percentage to a ratio of the whole
and changed that ratio into a real number between 0 and 1. Our one rule about
our question is that a good number of people will answer of "yes" and a good
number will answer "no." That means that p0 should not be too close to 0 or to
1. Now that we are thinking of a random experiment where a particular event
has probability p0 , we can consider that experiment as a Bernoulli trial where
an answer of "yes" is a success. We still only know that the probability exists,
and we do not know what it is.
Once we have chosen one person from the population, we do not what to
choose them again. So we will not. Now our other overriding assumption
about the population is that its actual size overwhelms any particular number
of people we pick from it. This means that removing one single person from the
population will have no measurable impact on the percentage who would answer
"yes." If we choose a second person, the probability that they will answer "yes"
is still p0 : The same goes for a third, fourth, or …fth. Thus we reasonably
assume that every time we choose a person from the population, the probability
they will answer "yes" is always p0 : That is to say, choosing n people randomly
from the population amounts to repeating the same Bernoulli trial n times.
We know a lot about the probability model that comes from repeating a
Bernoulli trial a number of times. We also have a quick way of computing
probabilities in that model if the number of repetitions is large. Well we can
if we actually know the probability of success in one trial. We do not just yet,
but let us go on.
If we choose a large sample of people, say n = 100, n = 500, n = 1000 or
more, the probability model should be very close to the normal distribution.
Now as usual in a random experiment, anything can happen, but we still know
what to expect. We expect that the number of successes in the sample will be
close to the mean of the experiment, and that it be within one or two standard
deviations of this mean. Unfortunately we do not know this mean or this
standard deviation, but that does not change where the result would be if we
did know them. We expect that the result will end up somewhere in the middle
part of the normal distribution approximating the probability model

2
We have a chart that quanti…es various choices for de…ning the center of the
distribution:

Con…dence Lower limit Upper limit

90% 1:645 S.D. 1:645 S.D.
95% 1:92 S.D. 1:92 S.D.
99% 2:575 S.D. 2:575 S.D.
We can use this chart and say that we are 90% con…dent that the sample mean
will be no more than 1:645 standard deviations away from the actual mean. We
could claim 95% or 99% con…dence by choosing no more than 1:92 or 2:575
standard deviations.
Here is where we draw a inference from a particular sample. We will go
out a collect a random sample of 1000 people and ask them our question. We
will …nd out exactly how many of these people answer "yes." Suppose that this
turns out to be 675 of the 1000 sampled.
If the sample is truly random, this result should re‡ect the entire population
quite closely. Thus we will consider the percentage of people in the sample who
answer "yes" to be a close approximation to the percentage of people in the
entire population who would have answered "yes" if asked. As a ratio to the
whole, the ratio of people in the sample should be a close approximation to the
percentage in the entire population. From the point of view of a Bernoulli trial,
this ratio should be a good approximation of the probability p0 of "success" in
one trial.
675
Thus we will assume that estimate obtained from the sample 0:675 = 1000 is
close enough to the actual probability p0 to use it in its place. We can compute
the expected mean, variance and standard deviation of the sampling experiment
using the formulas:

= np0
2
= np0 (1 p0 )
p
= np0 (1 p0 )

However, since we do not know p0 ; we use the approximation

p0 ' 0:675:

We the compute a sample mean

675
m = ns = 1000 = 675:
1000
A sample variance

d2 = ns(1 s)
= 1000 (0:675) (1 0:675)
= 1000 (0:675) (0:325)
= 219: 38

3
And …nally a sample standard deviation
p p
d = ns(1 s) = 219: 38 = 14: 811:
Since we are 90% con…dent that the sample mean will be no more than 1:645
standard deviations away from the actual mean, we can say we are 90% con…dent
that the actual mean will be no more than 1:645 sample standard deviations
away from the sample mean So we are 90% con…dent that the actual mean
is between
m 1:645d and m 1:645d:
That is to say we get
675 (1:645) (14: 811) 675 + (1:645) (14: 811)
650: 64 699:36::
But we know the relationship between the population mean and the popu-
lation probability, = np0 . Thus
650: 64 1000p0 = 699:36:
So we are 90% con…dent that the actual probability p0 is in the interval
0:65064 p0 0:69936:
Converting this to a percentage and doing a bit of rounding o¤, we have esti-
mated, with 90% con…dence that the percentage of people in the population
who would answer yes to our question is between 65% and 70%. In other
words, the percentage is approximately 65% with a margin of error of 2:5%
and a con…dence level of 90%.

1.3 Examples
Example 1 Suppose you would like to estimate the percentage of people in Ari-
zona that say they enjoy the summer heat. You survey 500 people and …nd
that 267 of them say that they do. This translates into a ratio of the whole
267
of 500 = 0:534 or a percentage of 53%. What is the margin of error in the
estimate if you use a con…dence level of 90%?
First, the sample size is n = 500. We represented the number of people
who answered yes as a ratio of the whole: s = 0:534: This will approximate the
unknown population ratio:
p0 ' s = 0:534:
When we know the population probability, the formulas for the population pa-
rameters are
= np0
2
= np0 (1 p0 )
p
= np0 (1 p0 )

4
We use these and the approximation of p0 to compute a sample mean, a sample
variance, and a sample standard deviation

m = ns = 500 0:534 = 267

d2 = ns(1 s) = 500 0:534 0:466 = 122:82
p p
d = ns(1 s) = 122:82 = 11:082:

So we are 90% con…dent that the actual mean is between

m 1:645d and m 1:645d:

That is,

267 (1:645) (11:082) 267 + (1:645) (11:082)

248:77 285:23:

But = np0 . Thus

248:77 500p0 285:23::
So we are 90% con…dent that the actual probability p0 is in the interval
248:77 285:23:
0:49754 = p0 = 0:570 46:
500 500
Thus the population percentage P0 is in the interval

49% P0 58%:

We round o¤ making sure to widen the interval so that we do not loose any
con…dence in our estimation interval.
The …nal result we obtain is an estimate of 53:5% within an error of 4:4%
and a con…dence of 90%:

Example 2 Suppose we approximate the percentage of people over the age of 30

in Tucson by taking a random sample of people. It turns out that of 5000 people
chosen 3254 are over 30. What result will we obtain using a 90% con…dence
level?
3254
The sample size is n = 5000. The sample gives a ratio of s = 5000 = 0:6508:
We use this to approximate the unknown population ratio

p0 ' s = 0:6508:

in the formulas for the population parameters:

= np0
2
= np0 (1 p0 )
p
= np0 (1 p0 )

5
That allows us to compute a sample mean, a sample variance, and a sample
standard deviation

m = ns = 5000 0:6508 = 3254

d2 = ns(1 s) = 5000 0:6508 0:3412 = 1110:3
p p
d = ns(1 s) = 1110:3 = 33: 321:

As usual, a 90% con…dence means that the actual mean is between

m 1:645d and m 1:645d:

That is,

3254 (1:645) (33: 321) 3254 + (1:645) (33: 321)

3199:2 3308:8:

Using = np0 ,
3199:2 5000p0 3308:8:
So we are 90% con…dent that the actual probability p0 is in the interval
3199:2 3308:8
0:63984 = p0 = 0:66176:
5000 5000
Thus the population percentage P0 is in the interval

63% P0 67%:

We have found an estimate of 65% within an error of 2% and a con…dence

level of 90%:

1.4 Final Comments

This gives a ‡avor of the statistical methods used in polling. Indeed if you look
at news reports of polling results you will see data similar to what we produce.
You will almost see a percentage estimate and bound on the error. Often you
will see the number of people sampled reported as well. The con…dence level is
often left out of news reports, or even if it appears, it is buried in the article or
in small type. This is probably because it is technical information that editors
assume would confuse people not familiar with the language and methods of
statistics.
It is important to realize that the practical practice of polling, especially
political polling, is much more sophisticated than the examples above let on.
One large, and clearly major issue in polling is choosing a representative random
sample. As we have seen, random means unpredictable. Thus a process that
controls the selection of people to be sampled should not single out particular
groups, locations, or ages to be truly random. In polling a random sample
must also be representative. If a political pollster does not limit his polling to

6
citizens of a legal age to vote, the resulting sample is not likely to be represen-
tative of the population he wants to study. As a result as much statistics goes
into the process of selecting a random sample as in analyzing the data collected
from the sample.
A …nal note is that we have only seen two simple examples of statistics in
use. There are many, many more. There are methods, not that dissimilar
from the ones above that apply when the problem limits the size of a sample
or test. These use other mathematical distributions other than the normal
distribution. There are also estimation methods that can be used to sharpen
the results of a complete survey of a large population. Rather than beginning
with a small sample and roughly estimating counts in an entire population, these
techniques take the results of a comprehensive survey of a population and adjust
the raw data to account for errors in the counting and tabulation of the data
collected. The US census bureau uses these techniques to strengthen the quality
of the results they report involving demographic information about the country.
However, there is a long-standing controversy that does not allow them to use
these results on the data used by congress to apportion legislative representation
or the distribution of federal funds to states and local communities. While this
debate often revolves about the veracity of the statistical methods, the true
issue is the perceived advantages that using statistics or not using statistics
might have for one party or the other.

Prepared by: Daniel Madden and Alyssa Keri: May 2009

Table of Contents Sample For Research
80% (5)
Table of Contents Sample For Research
4 pages
Traditional Medicine Research Methodology in Naturopathy & Yoga
100% (2)
Traditional Medicine Research Methodology in Naturopathy & Yoga
124 pages
Describing Sample Size and Sampling Procedure
No ratings yet
Describing Sample Size and Sampling Procedure
17 pages
Strategic and Operational Planning
No ratings yet
Strategic and Operational Planning
12 pages
IMRaD Format
100% (1)
IMRaD Format
20 pages
Types of Research
100% (2)
Types of Research
15 pages
Critical Appraisal of Epidemiological Study
No ratings yet
Critical Appraisal of Epidemiological Study
20 pages
CHM256 - Tutorial 2
No ratings yet
CHM256 - Tutorial 2
3 pages
Rough Work
No ratings yet
Rough Work
41 pages
BUS173 ch15
No ratings yet
BUS173 ch15
183 pages
Eto Talaga Yon
No ratings yet
Eto Talaga Yon
73 pages
管理研究提案
100% (1)
管理研究提案
7 pages
Capstone Project Format - Data Driven Research: Dalubhasaan NG Lungsod NG Lucena
No ratings yet
Capstone Project Format - Data Driven Research: Dalubhasaan NG Lungsod NG Lucena
5 pages
Research Paper Writing
No ratings yet
Research Paper Writing
14 pages
Penerapan SEM
No ratings yet
Penerapan SEM
10 pages
Uantum Mechanics A L A Irac T S - G Q
No ratings yet
Uantum Mechanics A L A Irac T S - G Q
10 pages
Predictive Analytics Exam-December 2019: Exam PA Home Page
No ratings yet
Predictive Analytics Exam-December 2019: Exam PA Home Page
9 pages
Chapter Iii
No ratings yet
Chapter Iii
14 pages
Two Sample Hypothesis Testing
No ratings yet
Two Sample Hypothesis Testing
1 page
Introduction To Evaluation
No ratings yet
Introduction To Evaluation
8 pages
The Impact of Recruitment and Selection On Organizational Culture and Employee Performance
No ratings yet
The Impact of Recruitment and Selection On Organizational Culture and Employee Performance
2 pages
Business - Research.Chapter 4
No ratings yet
Business - Research.Chapter 4
2 pages
Unit 2 Psy 613
No ratings yet
Unit 2 Psy 613
26 pages
Integrating 21 Century Competencies Into A K-12 Curriculum Reform in Macau
No ratings yet
Integrating 21 Century Competencies Into A K-12 Curriculum Reform in Macau
16 pages
Research Objectives and Questions
No ratings yet
Research Objectives and Questions
18 pages
Descriptive Research: From Wikipedia, The Free Encyclopedia
No ratings yet
Descriptive Research: From Wikipedia, The Free Encyclopedia
4 pages
535 Phy-1
No ratings yet
535 Phy-1
5 pages
Submission of Outputs: Action Research Training-Workshop For School Heads and Teachers
No ratings yet
Submission of Outputs: Action Research Training-Workshop For School Heads and Teachers
1 page
Machine Learning: A Probabilistic Perspective: Solutions Manual (Please Do Not Make Publicly Available)
No ratings yet
Machine Learning: A Probabilistic Perspective: Solutions Manual (Please Do Not Make Publicly Available)
127 pages
ADU5301 - Home Assignment
No ratings yet
ADU5301 - Home Assignment
3 pages
FDS Unit3
No ratings yet
FDS Unit3
29 pages
The Two Slit Experiment: 4.1 An Experiment With Bullets
No ratings yet
The Two Slit Experiment: 4.1 An Experiment With Bullets
10 pages
w4 5 ch3 Anno
No ratings yet
w4 5 ch3 Anno
18 pages
Correlation, Probability
No ratings yet
Correlation, Probability
36 pages
Election Result
No ratings yet
Election Result
116 pages
Fast Generation of Deviates For Order Statistics by An Exact Method
No ratings yet
Fast Generation of Deviates For Order Statistics by An Exact Method
9 pages
Probability & Statistics
No ratings yet
Probability & Statistics
43 pages
6.2. One Variable Data
No ratings yet
6.2. One Variable Data
16 pages
1.AlleleFrequencies 0
No ratings yet
1.AlleleFrequencies 0
55 pages
Chapter 4 Part 1
No ratings yet
Chapter 4 Part 1
52 pages
AIML-Unit 3 Notes-Assignment 3
No ratings yet
AIML-Unit 3 Notes-Assignment 3
37 pages
ProbabilityDistributions BRSM SP2022 Lecture3
No ratings yet
ProbabilityDistributions BRSM SP2022 Lecture3
45 pages
Chapter 6. Estiamation
No ratings yet
Chapter 6. Estiamation
65 pages
Statistical Modeling & Intro To Probability
No ratings yet
Statistical Modeling & Intro To Probability
31 pages
CHP 5
No ratings yet
CHP 5
101 pages
2024 Statistics Lecture Notes
No ratings yet
2024 Statistics Lecture Notes
22 pages
Chapter 9
No ratings yet
Chapter 9
20 pages
Estimation
No ratings yet
Estimation
15 pages
Statistics 17 18
No ratings yet
Statistics 17 18
21 pages
Measuring Uncertainty - Unit - 3
No ratings yet
Measuring Uncertainty - Unit - 3
28 pages
Kavya RM
No ratings yet
Kavya RM
10 pages
Z Test
No ratings yet
Z Test
17 pages
Math 227 Chapter 9
No ratings yet
Math 227 Chapter 9
16 pages
The Logic of Statistical Tests of Significance
0% (1)
The Logic of Statistical Tests of Significance
19 pages
Chapter 2
No ratings yet
Chapter 2
37 pages
Chapter 3
No ratings yet
Chapter 3
18 pages
Stats Exam 1 Cheat Sheet
No ratings yet
Stats Exam 1 Cheat Sheet
3 pages
Unit Iii 1
No ratings yet
Unit Iii 1
20 pages
STATS Notes
No ratings yet
STATS Notes
18 pages
Chapter 3-171
No ratings yet
Chapter 3-171
53 pages
Statistics
No ratings yet
Statistics
5 pages
QA Estimation
No ratings yet
QA Estimation
39 pages
Introduction To Probability Theory and S
No ratings yet
Introduction To Probability Theory and S
127 pages
Maricris DLP 1
No ratings yet
Maricris DLP 1
7 pages
Definition of R-WPS Office
No ratings yet
Definition of R-WPS Office
8 pages
Справочник по Гипотезам
No ratings yet
Справочник по Гипотезам
3 pages
Foundations of Statistical Inference
No ratings yet
Foundations of Statistical Inference
22 pages
4 3+–+Interval+Estimates+for+Proportions
No ratings yet
4 3+–+Interval+Estimates+for+Proportions
4 pages
Probability and Hypothesis Testing
No ratings yet
Probability and Hypothesis Testing
31 pages
CH 3 Solutions
No ratings yet
CH 3 Solutions
10 pages
Probability: PSYB07 Gabriel Baylon October 2, 2013
No ratings yet
Probability: PSYB07 Gabriel Baylon October 2, 2013
9 pages
Opinion Polls, Exit Polls and Early Seat Projections: Rajeeva L. Karandikar
No ratings yet
Opinion Polls, Exit Polls and Early Seat Projections: Rajeeva L. Karandikar
55 pages
Statistical Methods
No ratings yet
Statistical Methods
16 pages
Statistical Inference
No ratings yet
Statistical Inference
29 pages
Chapter 1 - Comparing Normal Populations
No ratings yet
Chapter 1 - Comparing Normal Populations
39 pages
Chapter 3
No ratings yet
Chapter 3
57 pages
Question: How Do We Estimate Precision Error?
No ratings yet
Question: How Do We Estimate Precision Error?
38 pages
Stat Chapter 4
No ratings yet
Stat Chapter 4
19 pages
Stat Handout 2
No ratings yet
Stat Handout 2
11 pages
STATS 200: Introduction To Statistical Inference: Lecture 1: Course Introduction and Polling
No ratings yet
STATS 200: Introduction To Statistical Inference: Lecture 1: Course Introduction and Polling
35 pages
Chapter 5 - 7
No ratings yet
Chapter 5 - 7
110 pages
SSCH 4
No ratings yet
SSCH 4
18 pages
The Probabilistic Method: David Arthur
No ratings yet
The Probabilistic Method: David Arthur
7 pages
Sample Size Calculator
No ratings yet
Sample Size Calculator
1 page
Statistics PDF
No ratings yet
Statistics PDF
17 pages
Introduction To Probability Theory and Statistics
No ratings yet
Introduction To Probability Theory and Statistics
127 pages
Module 17 Coin Tossing From A Statistical Perspective
No ratings yet
Module 17 Coin Tossing From A Statistical Perspective
6 pages
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)

1 The Mathematics Behind Polling

Uploaded by

1 The Mathematics Behind Polling

Uploaded by

1 The Mathematics Behind Polling

1.2 The Ideas Behind Statistical Estimation.

Con…dence Lower limit Upper limit

However, since we do not know p0 ; we use the approximation

We the compute a sample mean

m = ns = 500 0:534 = 267

So we are 90% con…dent that the actual mean is between

m 1:645d and m 1:645d:

267 (1:645) (11:082) 267 + (1:645) (11:082)

But = np0 . Thus

Example 2 Suppose we approximate the percentage of people over the age of 30

in the formulas for the population parameters:

m = ns = 5000 0:6508 = 3254

As usual, a 90% con…dence means that the actual mean is between

m 1:645d and m 1:645d:

3254 (1:645) (33: 321) 3254 + (1:645) (33: 321)

We have found an estimate of 65% within an error of 2% and a con…dence

1.4 Final Comments

Prepared by: Daniel Madden and Alyssa Keri: May 2009

You might also like