We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13
3.
Sources and Samples
BUSI 5801 Statistics for Managers Online Module 3 Learning Objectives: Sources and Samples At the end of this module students will be able to: • Recognize the basics of sampling concepts and terminology (CO 1) • Identify what sample information is needed to interpret data (CO 1,4) • Determine possible sources of sampling limitations or bias (CO 4) Sample versus Population • A population represents all of the elements or units (people, companies, countries, etc.) in which you are interested • A sample generally represents the subset of the population from whom/which you have collected data • The sample = the who of the data • We usually employ samples because collecting data from entire populations is costly, time consuming or impossible (called a census) Sample versus Population • We use data from sample to infer insights into the entire population – inference is imprecise • We sacrifice precision (using a sample) for practicality (using entire population)
• Parameter: Describes a population (e.g., population mean)
• Statistic: Describes a sample (e.g., sample mean) Sampling Frame • A sampling frame is the specific set of units/elements in your population that you want to study. • Ideally, the sampling frame is the same as the population • But it may be more limited based on practical reasons (all residents of a city versus list of everyone with an address) • The sampling frame clarifies from where you will draw your sample Target Sample • A target sample is the subset of the sampling frame from whom/which you intend to collect data • There is an infinite number of possible subsets within a sampling frame – both with respect to who and how many • The particular subset that makes up a sample has implications for the meaning of the analysis • A biased sample contains an over- or under-representation of some characteristics of the population and may lead to biased results Randomization • A randomized sample is most likely to be representative of the population across all characteristics • A randomized sample minimizes bias – but does not eliminate it • In a randomized sample: – All elements of the population have an equal probability of being included in the sample – The elements of the sample are chosen randomly • Note: no two randomized samples are likely to be exactly the same Other “Randomized” Sample Designs • More complicated sample designs may save time or money or avert sampling problems • Stratified Sampling, Cluster Sampling, Systematic Sampling, Multistage Sampling • These require dividing or ordering the population into groups based on some characteristics and sampling randomly from the groups/lists • Because these methods require choosing the characteristics upon which to divide or order the population, the benefits of randomization are sacrificed Sample Size • The bigger the sample, the better it will represent the population • The size of the sample affects what can be concluded from the analyses • The size of the sample usually depends on: – What we want to do with the data – Practical limitations (remember – this is why were are not studying a whole population) Sampling
Population Sampling Target
Frame Sample Sample Sampling Biases • Broad invitations to participate (voluntary sampling – low, biased response rates) • Convenience sampling (including those that are convenient) • Response biases (providing the “right” answer) • Non-response bias (portion of sample cannot or will not respond) • Inattention (long surveys may lead to inattention and low quality results) • Ineffective, confusing or leading questions Parameters and Statistics • A representative characteristic of a population is a parameter • The same characteristic of a sample is a statistic • We used sample statistics to estimate the parameters of populations • Examples: means, standard deviations, correlations, etc. Recap: 3. Sources and Samples • Sample data tells us about the sample • But we often use samples to make inferences about populations of interest • The subset that you target for your sample can bias your results – randomization is best • The subset of your target sample that ends up in your final sample will affect your results • The approach to solicit participation in your study will affect your results • The methods used to collect data (e.g. surveys) will affect your results