To Statistical Analysis: Yale Braunstein School of Information
To Statistical Analysis: Yale Braunstein School of Information
to Statistical Analysis
Yale Braunstein
School of Information
1
Approximate (!) Schedule
Today
Data, data collection instruments (e.g., surveys)
– Focus is on descriptive statistics
Research design
Sample size, sources of error (maybe)
Thursday
Sample size, sources of error
Measures of central tendency
Demos of Excel & SPSS
Discussion of statistics assignment
Next Tuesday
More on SPSS with lots of examples
Q & A on the assignment
2
Introduction
We are focusing on “quantitative analysis”
3
Data
Another Data
4
Analysis--Introduction
5
Data Collection Instruments
Questionnaires & surveys
Transactions logs
Experimental observation
6
Issues in Research Design
Case study vs. statistical sample
What is the universe ? (uses, users, etc.)
Example: political debate over “average tax
cut” vs. “tax cut for the average family”
Is the sample representative ?
Volumes vs. titles in the library
Does correlation imply causality?
Do we need to identify the pathogen?
Controlling for outside factors
7
Sample Size
How large a sample is needed?
The larger the sample the more accurate the results
(unless the response rate becomes very low)
The larger the sample the more the cost/effort
Sample size does NOT depend on the size of the population
Rules of thumb
100 for 95% confidence, 5% tolerance, 90-10 expected split
400 for 95% confidence, 5% tolerance, 50-50 expected split
30 – 50 in each cell on n x m discrete classes
Exact formula (use with care):
Size = 0.25 * (certainty factor/acceptable error)^2
Where the certainty factor = 1.96 for 95%; 2.576 for 99%
[Alternate approach: hire a statistical consultant.]
8
Sources of Error
The respondent
The investigator
Sampling error
Other