0% found this document useful (0 votes)

10 views29 pages

Sampling Chapter3

Uploaded by

zopauy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views29 pages

Sampling Chapter3

Uploaded by

zopauy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Relative error of

point estimates
SAMPLING IN R

Richie Cotton
Data Evangelist at DataCamp
Sample is number of rows
coffee_ratings %>% coffee_ratings %>%
slice_sample(n = 300) %>% slice_sample(prop = 0.25) %>%
nrow() nrow()

300 334

SAMPLING IN R
Various sample sizes
coffee_ratings %>% coffee_ratings %>%
summarize(mean_points = mean(total_cup_points)) %>% slice_sample(n = 10) %>%
pull(mean_points) summarize(mean_points = mean(total_cup_points)) %>%
pull(mean_points)

82.15 82.82

coffee_ratings %>% coffee_ratings %>%

slice_sample(n = 100) %>% slice_sample(n = 1000) %>%
summarize(mean_points = mean(total_cup_points)) %>% summarize(mean_points = mean(total_cup_points)) %>%
pull(mean_points) pull(mean_points)

82.02 82.16

SAMPLING IN R
Relative errors
Population parameter

population_mean <- coffee_ratings %>%

summarize(mean_points = mean(total_cup_points)) %>%
pull(mean_points)

Point estimate

sample_mean <- coffee_ratings %>%

slice_sample(n = sample_size) %>%
summarize(mean_points = mean(total_cup_points)) %>%
pull(mean_points)

Relative error as a percentage

100 * abs(population_mean - sample_mean) / population_mean

SAMPLING IN R
Relative error vs. sample size
ggplot(errors, aes(sample_size, relative_error)) +
geom_line() +
geom_smooth(method = "loess")

SAMPLING IN R
Let's practice!
SAMPLING IN R
Creating a sampling
distribution
SAMPLING IN R

Richie Cotton
Data Evangelist at DataCamp
Same code, different answer
coffee_ratings %>% coffee_ratings %>%
slice_sample(n = 30) %>% slice_sample(n = 30) %>%
summarize(mean_cup_points = mean(total_cup_points)) %>% summarize(mean_cup_points = mean(total_cup_points)) %>%
pull(mean_cup_points) pull(mean_cup_points)

83.33 82.59

coffee_ratings %>% coffee_ratings %>%

slice_sample(n = 30) %>% slice_sample(n = 30) %>%
summarize(mean_cup_points = mean(total_cup_points)) %>% summarize(mean_cup_points = mean(total_cup_points)) %>%
pull(mean_cup_points) pull(mean_cup_points)

82.16 82.25

SAMPLING IN R
Same code, 1000 times
mean_cup_points_1000 <- replicate( [1] 81.65 81.57 82.66 82.27 81.76 81.74 82.71
n = 1000, [8] 82.20 80.43 82.45 82.29 82.63 82.28 82.11
expr = coffee_ratings %>% [15] 82.14 81.72 81.97 82.58 81.78 82.47 81.73
slice_sample(n = 30) %>% [22] 82.78 82.14 82.39 81.69 82.36 82.64 82.68
summarize( [29] 82.56 82.14 82.72 82.43 81.68 82.74 82.80
mean_cup_points = mean(total_cup_points) [36] 82.12 82.31 81.02 82.83 81.71 82.25 82.11
) %>% [43] 82.76 82.26 81.57 82.00 81.75 81.47 81.99
pull(mean_cup_points) [50] 82.68 82.05 82.43 82.40 82.66 80.78 82.43
) ...
[967] 81.84 83.12 81.54 81.83 82.24 82.36 82.49
[974] 82.05 82.08 81.98 82.45 82.04 81.42 83.06
[981] 81.97 82.65 81.12 82.48 81.64 81.92 81.96
[988] 81.71 81.96 81.78 82.30 81.76 82.46 82.43
[995] 81.95 82.60 81.84 82.78 82.23 82.56

SAMPLING IN R
Preparing for plotting
library(tibble) # A tibble: 1,000 x 1
sample_means <- tibble( sample_mean
sample_mean = mean_cup_points_1000 <dbl>
) 1 83.3
2 82.6
3 82.2
4 82.2
5 81.7
6 81.6
7 82.7
8 82.3
9 81.8
10 81.7
# ... with 990 more rows

SAMPLING IN R
Distribution of sample means for size 30
ggplot(sample_means, aes(sample_mean)) +
geom_histogram(binwidth = 0.1)

A sampling distribution is a distribution of

several replicates of point estimates.

SAMPLING IN R
Different sample sizes
Sample size 6 Sample size 150

SAMPLING IN R
Let's practice!
SAMPLING IN R
Approximate
sampling
distributions
SAMPLING IN R

Richie Cotton
Data Evangelist at DataCamp
4 dice
# A tibble: 1,296 x 4
die1 die2 die3 die4
<int> <int> <int> <int>
1 1 1 1 1
2 1 1 1 2
3 1 1 1 3
library(tidyr) 4 1 1 1 4
dice <- expand_grid( 5 1 1 1 5
die1 = 1:6, 6 1 1 1 6
die2 = 1:6, 7 1 1 2 1
die3 = 1:6, 8 1 1 2 2
die4 = 1:6 9 1 1 2 3
) 10 1 1 2 4
# ... with 1,286 more rows

SAMPLING IN R
Mean roll
dice <- expand_grid( # A tibble: 1,296 x 5
die1 = 1:6, die1 die2 die3 die4 mean_roll
die2 = 1:6, <int> <int> <int> <int> <dbl>
die3 = 1:6, 1 1 1 1 1 1
die4 = 1:6 2 1 1 1 2 1.25
) %>% 3 1 1 1 3 1.5
mutate( 4 1 1 1 4 1.75
mean_roll = (die1 + die2 + die3 + die4) / 4 5 1 1 1 5 2
) 6 1 1 1 6 2.25
7 1 1 2 1 1.25
8 1 1 2 2 1.5
9 1 1 2 3 1.75
10 1 1 2 4 2
# ... with 1,286 more rows

SAMPLING IN R
Exact sampling distribution
ggplot(dice, aes(factor(mean_roll))) +
geom_bar()

SAMPLING IN R
The number of outcomes increases fast
outcomes <- tibble(
n_dice = 1:100,
n_outcomes = 6 ^ n_dice
)

ggplot(outcomes, aes(n_dice, n_outcomes)) +

geom_point()

SAMPLING IN R
Simulating the mean of four dice rolls

four_rolls <- sample(

1:6, size = 4, replace = TRUE
)
mean(four_rolls)

SAMPLING IN R
Simulating the mean of four dice rolls
sample_means_1000 <- replicate( # A tibble: 1,000 x 1
n = 1000, sample_mean
expr = { <dbl>
four_rolls <- sample( 1 4
1:6, size = 4, replace = TRUE 2 4.5
) 3 2.5
mean(four_rolls) 4 3.75
} 5 3.75
) 6 4
7 3

sample_means <- tibble( 8 4.75

sample_mean = sample_means_1000 9 3.75

) 10 4.25
# ... with 990 more rows

SAMPLING IN R
Approximate sampling distribution
ggplot(sample_means, aes(factor(sample_mean))) +
geom_bar()

SAMPLING IN R
Let's practice!
SAMPLING IN R
Standard errors and
the Central Limit
Theorem
SAMPLING IN R

Richie Cotton
Data Evangelist at DataCamp
Sampling distribution of mean cup points

SAMPLING IN R
Consequences of the central limit theorem
Averages of independent samples have approximately normal distributions.
As the sample size increases,

the distribution of the averages gets closer to being normally distributed, and
the width of the sampling distribution gets narrower.

SAMPLING IN R
Population & sampling distribution means
coffee_ratings %>% Sample size Mean sample mean
summarize(
5 82.1496
mean_cup_points = mean(total_cup_points)
) %>% 20 82.1610
pull(mean_cup_points)
80 82.1496

82.1512 320 82.1521

SAMPLING IN R
Population & sampling distribution standard deviations
coffee_ratings %>% Sample size Std dev sample mean
summarize( 5 1.1929
sd_cup_points = sd(total_cup_points)
20 0.6028
) %>%
pull(sd_cup_points) 80 0.2865

320 0.1304
2.68686

SAMPLING IN R
Population mean over square root sample size
Sample size Std dev sample mean Calculation Result
5 1.1929 2.68686 / sqrt(5) 1.2016

20 0.6028 2.68686 / sqrt(20) 0.6008

80 0.2865 2.68686 / sqrt(80) 0.3004

320 0.1304 2.68686 / sqrt(320) 0.1502

SAMPLING IN R
Let's practice!
SAMPLING IN R

1 Sampling-Chapter1
No ratings yet
1 Sampling-Chapter1
32 pages
R Programming Unit 4
No ratings yet
R Programming Unit 4
26 pages
3 Sam-Chapter3
No ratings yet
3 Sam-Chapter3
29 pages
IST2081 - Week11
No ratings yet
IST2081 - Week11
46 pages
DA All
No ratings yet
DA All
15 pages
Sampling in Python
No ratings yet
Sampling in Python
140 pages
Predicting The Improvement of NBA Players Presentation
No ratings yet
Predicting The Improvement of NBA Players Presentation
12 pages
Sampling Chapter4
No ratings yet
Sampling Chapter4
41 pages
Unit5 Randomsampling
No ratings yet
Unit5 Randomsampling
21 pages
Sujal 4
No ratings yet
Sujal 4
31 pages
Uji Validitas Soal
No ratings yet
Uji Validitas Soal
18 pages
R Programming
No ratings yet
R Programming
23 pages
Final
No ratings yet
Final
145 pages
Sampling Chapter1
No ratings yet
Sampling Chapter1
29 pages
Raheema KV BPCC 134
No ratings yet
Raheema KV BPCC 134
25 pages
Statistical Computing With R: Masters in Data Sciences 503 (S29) Third Batch, SMS, TU, 2024
No ratings yet
Statistical Computing With R: Masters in Data Sciences 503 (S29) Third Batch, SMS, TU, 2024
40 pages
HWK4 324
No ratings yet
HWK4 324
11 pages
Iis 16 A04
No ratings yet
Iis 16 A04
26 pages
Cleaning Data3
No ratings yet
Cleaning Data3
41 pages
Advantages of R Programming Language:: Extensive Libraries
No ratings yet
Advantages of R Programming Language:: Extensive Libraries
34 pages
Cleaning Data2
No ratings yet
Cleaning Data2
39 pages
US - TMC - 06 - Curve Fitting & Interpolation
No ratings yet
US - TMC - 06 - Curve Fitting & Interpolation
64 pages
Hypotesis Testing Chapter1
No ratings yet
Hypotesis Testing Chapter1
32 pages
Theorising The Contemporary Sport Suppor
No ratings yet
Theorising The Contemporary Sport Suppor
310 pages
Dice Roll - Familiarizing With R Ecosystem
No ratings yet
Dice Roll - Familiarizing With R Ecosystem
6 pages
Chapter1 1
No ratings yet
Chapter1 1
27 pages
Mathematical Computations Using R
No ratings yet
Mathematical Computations Using R
53 pages
Chapter 03 RM
No ratings yet
Chapter 03 RM
34 pages
R Tutorial
No ratings yet
R Tutorial
32 pages
Random Quakckkk Ack
No ratings yet
Random Quakckkk Ack
38 pages
Chapter 4
No ratings yet
Chapter 4
22 pages
Chapter 2
No ratings yet
Chapter 2
18 pages
ZC-417 Quantitative Methods Exam Notes
No ratings yet
ZC-417 Quantitative Methods Exam Notes
144 pages
Lab 3
No ratings yet
Lab 3
6 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
Presentation 3
No ratings yet
Presentation 3
29 pages
Real Statistics Examples Part 1A
No ratings yet
Real Statistics Examples Part 1A
853 pages
Topic 7 Tutorial Am025
No ratings yet
Topic 7 Tutorial Am025
4 pages
In Sem 2 Study Material
No ratings yet
In Sem 2 Study Material
19 pages
Week 2-A.Guess The Distribution
No ratings yet
Week 2-A.Guess The Distribution
10 pages
Data Science - Probability
No ratings yet
Data Science - Probability
53 pages
Chapter 1
No ratings yet
Chapter 1
32 pages
MATH 120 Introduction To Statistics Week 8 Final Exam
No ratings yet
MATH 120 Introduction To Statistics Week 8 Final Exam
6 pages
Genetica Cuantitativa
No ratings yet
Genetica Cuantitativa
120 pages
Heteroscedasticity Week 1 Econometrics
No ratings yet
Heteroscedasticity Week 1 Econometrics
33 pages
Esa - QP - Ue19-20cs203 - SDS
No ratings yet
Esa - QP - Ue19-20cs203 - SDS
11 pages
Chapter 1
No ratings yet
Chapter 1
32 pages
Continuous Distributions in R
No ratings yet
Continuous Distributions in R
155 pages
Data Analysis2
No ratings yet
Data Analysis2
16 pages
Week 2-B.Concentration Around The Mean
No ratings yet
Week 2-B.Concentration Around The Mean
5 pages
Bacs HW3
No ratings yet
Bacs HW3
12 pages
Practical 4
No ratings yet
Practical 4
9 pages
Fyybsc - CS Sem 1 FMS Journal
No ratings yet
Fyybsc - CS Sem 1 FMS Journal
43 pages
7eqSDtcBsP0aBdxz1vc KXQM eW2hZm9yAYa117 - W3S5TwzmZ7fk5I7HcjioJ1i2oTg3cs Fl2p8ySXxrKXkbETLn9HkJO6yC3JQ1 - 1W0lw
No ratings yet
7eqSDtcBsP0aBdxz1vc KXQM eW2hZm9yAYa117 - W3S5TwzmZ7fk5I7HcjioJ1i2oTg3cs Fl2p8ySXxrKXkbETLn9HkJO6yC3JQ1 - 1W0lw
12 pages
Chap03 E.57 StockComparison
No ratings yet
Chap03 E.57 StockComparison
4 pages
Simple Statistics Functions in R
No ratings yet
Simple Statistics Functions in R
41 pages
Practical 5 2
No ratings yet
Practical 5 2
7 pages
Tarea 10nestadistica
No ratings yet
Tarea 10nestadistica
9 pages
CH-3-Sampling and Sample Size Determination
No ratings yet
CH-3-Sampling and Sample Size Determination
7 pages
R Programming Slides
No ratings yet
R Programming Slides
73 pages
Descriptive & Inferential Statistics
No ratings yet
Descriptive & Inferential Statistics
9 pages
Package Prob': R Topics Documented
No ratings yet
Package Prob': R Topics Documented
37 pages
How To Build A Pikler Triangle
100% (1)
How To Build A Pikler Triangle
24 pages
VECM
No ratings yet
VECM
56 pages
On Hands On R Programming
No ratings yet
On Hands On R Programming
30 pages
Giulianotti 1999 Intro
No ratings yet
Giulianotti 1999 Intro
15 pages
22 - Demand Forecasting Using Linear Regression
No ratings yet
22 - Demand Forecasting Using Linear Regression
2 pages
Numerato Who Says No To Modern Football
No ratings yet
Numerato Who Says No To Modern Football
19 pages
Statistic CHP 15 Revision Worksheet
No ratings yet
Statistic CHP 15 Revision Worksheet
15 pages
Probstat
100% (2)
Probstat
5 pages
Probability and Statistics With Examples Using R: Siva Athreya, Deepayan Sarkar, and Steve Tanner April 25, 2016
No ratings yet
Probability and Statistics With Examples Using R: Siva Athreya, Deepayan Sarkar, and Steve Tanner April 25, 2016
4 pages
Probability and Statistics With Examples Using R Siva Athreya, Deepayan Sarkar, and Steve Tanner
No ratings yet
Probability and Statistics With Examples Using R Siva Athreya, Deepayan Sarkar, and Steve Tanner
258 pages
R Commands
No ratings yet
R Commands
5 pages
Lab3 Fitting and Plotting of Binomial Distribution & Poisson Distribution (Challenging Experiment 2 (A) and 2 (B) ) Aim
No ratings yet
Lab3 Fitting and Plotting of Binomial Distribution & Poisson Distribution (Challenging Experiment 2 (A) and 2 (B) ) Aim
18 pages
Mid-Century Modern Lounge Chair: Instructables
No ratings yet
Mid-Century Modern Lounge Chair: Instructables
24 pages
Chapter 12 ANOVA
No ratings yet
Chapter 12 ANOVA
25 pages
R-Bloggers - Taking Samples in R
No ratings yet
R-Bloggers - Taking Samples in R
3 pages
Tutorial 7 - Questions
No ratings yet
Tutorial 7 - Questions
4 pages
Stats DA1
No ratings yet
Stats DA1
21 pages
Simulation: Programming in R For Data Science Anders Stockmarr, Kasper Kristensen, Anders Nielsen
No ratings yet
Simulation: Programming in R For Data Science Anders Stockmarr, Kasper Kristensen, Anders Nielsen
19 pages
Debarghya Das (Ba-1), 18021141033
No ratings yet
Debarghya Das (Ba-1), 18021141033
12 pages
T - Test
No ratings yet
T - Test
45 pages
Homework 1 - Solution
No ratings yet
Homework 1 - Solution
2 pages
Jalabert. Montevideo 1930 Reassessing The Selection of The First World Cup Host
No ratings yet
Jalabert. Montevideo 1930 Reassessing The Selection of The First World Cup Host
14 pages
BCBR Highyield
No ratings yet
BCBR Highyield
72 pages
Homework 3 R Tutorial: How To Use This Tutorial
No ratings yet
Homework 3 R Tutorial: How To Use This Tutorial
8 pages
Programming With R Test 2
50% (2)
Programming With R Test 2
5 pages
Sampling and Replication
No ratings yet
Sampling and Replication
16 pages
Simulation: Hadley Wickham
No ratings yet
Simulation: Hadley Wickham
23 pages
Sampling & Sampling Distribution Part - (A), (B)
No ratings yet
Sampling & Sampling Distribution Part - (A), (B)
3 pages
Basic Mathematics. Explained Easy | For Beginners
From Everand
Basic Mathematics. Explained Easy | For Beginners
ExaGrecation
No ratings yet
Intro To Data Analysis Project
No ratings yet
Intro To Data Analysis Project
49 pages
Homework 1 - Jaime Chaire: Part ONE
No ratings yet
Homework 1 - Jaime Chaire: Part ONE
3 pages
2008-Response Surface Methodology (RSM) As A Tool For Optimization in Analytical Chemistry PDF
No ratings yet
2008-Response Surface Methodology (RSM) As A Tool For Optimization in Analytical Chemistry PDF
13 pages
Distribución Normal Estándar: Tabla Z
No ratings yet
Distribución Normal Estándar: Tabla Z
6 pages
Differences Between Skewness and Kurtosis
No ratings yet
Differences Between Skewness and Kurtosis
2 pages
Sampling Distributions Coursera
No ratings yet
Sampling Distributions Coursera
8 pages
R Notes For Data Analysis and Statistical Inference
No ratings yet
R Notes For Data Analysis and Statistical Inference
10 pages
Detailed Lesson Plan in Shs I. Learning Objective
No ratings yet
Detailed Lesson Plan in Shs I. Learning Objective
3 pages
Sim R
No ratings yet
Sim R
6 pages
Tutorial On "R" Programming Language
No ratings yet
Tutorial On "R" Programming Language
25 pages
Fast mental calculation tricks
From Everand
Fast mental calculation tricks
EasyMath
No ratings yet
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)

Sampling Chapter3

Uploaded by

Sampling Chapter3

Uploaded by

Relative error of

coffee_ratings %>% coffee_ratings %>%

population_mean <- coffee_ratings %>%

sample_mean <- coffee_ratings %>%

Relative error as a percentage

100 * abs(population_mean - sample_mean) / population_mean

coffee_ratings %>% coffee_ratings %>%

A sampling distribution is a distribution of

ggplot(outcomes, aes(n_dice, n_outcomes)) +

four_rolls <- sample(

sample_means <- tibble( 8 4.75

sample_mean = sample_means_1000 9 3.75

82.1512 320 82.1521

20 0.6028 2.68686 / sqrt(20) 0.6008

80 0.2865 2.68686 / sqrt(80) 0.3004

320 0.1304 2.68686 / sqrt(320) 0.1502

You might also like