0% found this document useful (0 votes)

286 views8 pages

Sampling Distributions Coursera

- The document discusses sampling distributions and how they can be used to understand the variability of sample statistics in estimating population parameters. It loads real estate data from Ames, Iowa and takes random samples from the full population to estimate properties like the mean living area. - Code is provided to take 15,000 samples of size 50 from the full population and calculate the mean of each sample to build the sampling distribution for the sample mean. Exercises are included to help understand how sampling distributions are constructed. - Increasing the sample size decreases the variability in the sampling distribution, providing a more accurate estimate of the population mean.

Uploaded by

rrutayisire

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

286 views8 pages

Sampling Distributions Coursera

Uploaded by

rrutayisire

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 8

---

title: "Foundations for inference - Sampling distributions"

output: statsr:::statswithr_lab
runtime: shiny
---

<div id="instructions">
Complete all **Exercises**, and submit answers to **Questions** on the Coursera
platform.
</div>

## Getting Started

### Load packages

In this lab we will explore the data using the `dplyr` package and visualize it
using the `ggplot2` package for data visualization. The data can be found in the
companion package for this course, `statsr`.

Let's load the packages.

```{r load-packages, message=FALSE}

library(statsr)
library(dplyr)
library(shiny)
library(ggplot2)
```

### The data

We consider real estate data from the city of Ames, Iowa. The details of
every real estate transaction in Ames is recorded by the City Assessor's
office. Our particular focus for this lab will be all residential home sales
in Ames between 2006 and 2010. This collection represents our population of
interest. In this lab we would like to learn about these home sales by taking
smaller samples from the full population. Let's load the data.

```{r load-data}
data(ames)
```

We see that there are quite a few variables in the data set, enough to do a
very in-depth analysis. For this lab, we'll restrict our attention to just
two of the variables: the above ground living area of the house in square feet
(`area`) and the sale price (`price`).

We can explore the distribution of areas of homes in the population of home

sales visually and with summary statistics. Let's first create a visualization,
a histogram:

```{r area-hist}
ggplot(data = ames, aes(x = area)) +
geom_histogram(binwidth = 250)
```

Let's also obtain some summary statistics. Note that we can do this using the
`summarise` function. We can calculate as many statistics as we want using this
function, and just string along the results. Some of the functions below should
be self explanatory (like `mean`, `median`, `sd`, `IQR`, `min`, and `max`). A
new function here is the `quantile` function which we can use to calculate
values corresponding to specific percentile cutoffs in the distribution. For
example `quantile(x, 0.25)` will yield the cutoff value for the 25th percentile
(Q1)
in the distribution of x. Finding these values are useful for describing the
distribution, as we can use them for descriptions like *"the middle 50% of the
homes have areas between such and such square feet"*.

```{r area-stats}
ames %>%
summarise(mu = mean(area), pop_med = median(area),
sigma = sd(area), pop_iqr = IQR(area),
pop_min = min(area), pop_max = max(area),
pop_q1 = quantile(area, 0.25), # first quartile, 25th percentile
pop_q3 = quantile(area, 0.75)) # third quartile, 75th percentile
```

1. Which of the following is false?

<ol>
<li> The distribution of areas of houses in Ames is unimodal and right-skewed.
</li>
<li> 50\% of houses in Ames are smaller than 1,499.69 square feet. </li>
<li> The middle 50\% of the houses range between approximately 1,126 square feet
and 1,742.7 square feet. </li>
<li> The IQR is approximately 616.7 square feet. </li>
<li> The smallest house is 334 square feet and the largest is 5,642 square feet.
</li>
</ol>

## The unknown sampling distribution

In this lab we have access to the entire population, but this is rarely the
case in real life. Gathering information on an entire population is often
extremely costly or impossible. Because of this, we often take a sample of
the population and use that to understand the properties of the population.

If we were interested in estimating the mean living area in Ames based on a

sample, we can use the following command to survey the population.

```{r samp1}
samp1 <- ames %>%
sample_n(size = 50)
```

This command collects a simple random sample of `size` 50 from the `ames` dataset,
which is assigned to `samp1`. This is like going into the City
Assessor's database and pulling up the files on 50 random home sales. Working
with these 50 files would be considerably simpler than working with all 2930
home sales.

<div id="exercise">
**Exercise**: Describe the distribution of this sample? How does it compare to the
distribution of the population? **Hint:** `sample_n` function takes a random sample
of observations (i.e. rows) from the dataset, you can still refer to the variables
in the dataset with the same names. Code you used in the previous exercise will
also be helpful for visualizing and summarizing the sample, however be careful to
not label values `mu` and `sigma` anymore since these are sample statistics, not
population parameters. You can customize the labels of any of the statistics to
indicate that these come from the sample.
</div>
```{r samp1-dist}
# type your code for the Exercise here, and Run Document

```

If we're interested in estimating the average living area in homes in Ames

using the sample, our best single guess is the sample mean.

```{r mean-samp1}
samp1 %>%
summarise(x_bar = mean(area))
```

Depending on which 50 homes you selected, your estimate could be a bit above
or a bit below the true population mean of 1,499.69 square feet. In general,
though, the sample mean turns out to be a pretty good estimate of the average
living area, and we were able to get it by sampling less than 3\% of the
population.

2. Suppose we took two more samples, one of size 100 and one of size 1000. Which
would you think would provide a more accurate estimate of the population mean?
<ol>
<li> Sample size of 50. </li>
<li> Sample size of 100. </li>
<li> Sample size of 1000. </li>
</ol>

Let's take one more sample of size 50, and view the mean area in this sample:
```{r mean-samp2}
ames %>%
sample_n(size = 50) %>%
summarise(x_bar = mean(area))
```

Not surprisingly, every time we take another random sample, we get a different
sample mean. It's useful to get a sense of just how much variability we
should expect when estimating the population mean this way. The distribution
of sample means, called the *sampling distribution*, can help us understand
this variability. In this lab, because we have access to the population, we
can build up the sampling distribution for the sample mean by repeating the
above steps many times. Here we will generate 15,000 samples and compute the
sample mean of each. Note that we are sampling with replacement,
`replace = TRUE` since sampling distributions are constructed with sampling
with replacement.

```{r loop}
sample_means50 <- ames %>%
rep_sample_n(size = 50, reps = 15000, replace = TRUE) %>%
summarise(x_bar = mean(area))

ggplot(data = sample_means50, aes(x = x_bar)) +

geom_histogram(binwidth = 20)
```

Here we use R to take 15,000 samples of size 50 from the population, calculate
the mean of each sample, and store each result in a vector called
`sample_means50`. Next, we review how this set of code works.

<div id="exercise">
**Exercise**: How many elements are there in `sample_means50`? Describe the
sampling distribution, and be sure to specifically note its center. Make sure to
include a plot of the distribution in your answer.
</div>
```{r sampling-dist}
# type your code for the Exercise here, and Run Document

```

## Interlude: Sampling distributions

The idea behind the `rep_sample_n` function is repetition. Earlier we took

a single sample of size `n` (50) from the population of all houses in Ames. With
this new function we are able to repeat this sampling procedure `rep` times in
order
to build a distribution of a series of sample statistics, which is called the
**sampling distribution**.

Note that in practice one rarely gets to build sampling distributions,

because we rarely have access to data from the entire population.

Without the `rep_sample_n` function, this would be painful. We would have to

manually run the following code 15,000 times
```{r sample-code, eval=FALSE}
ames %>%
sample_n(size = 50) %>%
summarise(x_bar = mean(area))
```
as well as store the resulting sample means each time in a separate vector.

Note that for each of the 15,000 times we computed a mean, we did so from a
**different** sample!

<div id="exercise">
**Exercise**: To make sure you understand how sampling distributions are built, and
exactly what the `sample_n` and `do` function do, try modifying the code to create
a sampling distribution of **25 sample means** from **samples of size 10**, and put
them in a data frame named `sample_means_small`. Print the output. How many
observations are there in this object called `sample_means_small`? What does each
observation represent?
</div>
```{r practice-sampling-dist}
# type your code for the Exercise here, and Run Document

```

3. How many elements are there in this object called `sample_means_small`?

<ol>
<li> 0 </li>
<li> 3 </li>
<li> 25 </li>
<li> 100 </li>
<li> 5,000 </li>
</ol>
```{r sample-means-small}
# type your code for Question 3 here, and Run Document
```

4. Which of the following is true about the elements in the sampling

distributions you created?
<ol>
<li> Each element represents a mean square footage from a simple random sample of
10 houses. </li>
<li> Each element represents the square footage of a house. </li>
<li> Each element represents the true population mean of square footage of houses.
</li>
</ol>

## Sample size and the sampling distribution

Mechanics aside, let's return to the reason we used the `rep_sample_n` function: to
compute a sampling distribution, specifically, this one.

```{r hist}
ggplot(data = sample_means50, aes(x = x_bar)) +
geom_histogram(binwidth = 20)
```

The sampling distribution that we computed tells us much about estimating

the average living area in homes in Ames. Because the sample mean is an
unbiased estimator, the sampling distribution is centered at the true average
living area of the population, and the spread of the distribution
indicates how much variability is induced by sampling only 50 home sales.

In the remainder of this section we will work on getting a sense of the effect that
sample size has on our sampling distribution.

<div id="exercise">
**Exercise**: Use the app below to create sampling distributions of means of
`area`s from samples of size 10, 50, and 100. Use 5,000 simulations. What does each
observation in the sampling distribution represent? How does the mean, standard
error, and shape of the sampling distribution change as the sample size increases?
How (if at all) do these values change if you increase the number of simulations?
</div>

```{r shiny, echo=FALSE}

shinyApp(
ui <- fluidPage(

# Sidebar with a slider input for number of bins

sidebarLayout(
sidebarPanel(

selectInput("selected_var",
"Variable:",
choices = list("area", "price"),
selected = "area"),

numericInput("n_samp",
"Sample size:",
min = 1,
max = nrow(ames),
value = 30),
numericInput("n_sim",
"Number of samples:",
min = 1,
max = 30000,
value = 15000)

# Show a plot of the generated distribution

mainPanel(
plotOutput("sampling_plot"),
verbatimTextOutput("sampling_mean"),
verbatimTextOutput("sampling_se")
)
)
),

# Define server logic required to draw a histogram

server <- function(input, output) {

# create sampling distribution

sampling_dist <- reactive({
ames[[input$selected_var]] %>%
sample(size = input$n_samp * input$n_sim, replace = TRUE) %>%
matrix(ncol = input$n_samp) %>%
rowMeans() %>%
data.frame(x_bar = .)
#ames %>%
# rep_sample_n(size = input$n_samp, reps = input$n_sim, replace = TRUE) %>%
# summarise_(x_bar = mean(input$selected_var))
})

# plot sampling distribution

output$sampling_plot <- renderPlot({
x_min <- quantile(ames[[input$selected_var]], 0.1)
x_max <- quantile(ames[[input$selected_var]], 0.9)

ggplot(sampling_dist(), aes(x = x_bar)) +

geom_histogram() +
xlim(x_min, x_max) +
ylim(0, input$n_sim * 0.35) +
ggtitle(paste0("Sampling distribution of mean ",
input$selected_var, " (n = ", input$n_samp, ")")) +
xlab(paste("mean", input$selected_var)) +
theme(plot.title = element_text(face = "bold", size = 16))
})

# mean of sampling distribution

output$sampling_mean <- renderText({
paste0("mean of sampling distribution = ", round(mean(sampling_dist()$x_bar),
2))
})

# mean of sampling distribution

output$sampling_se <- renderText({
paste0("SE of sampling distribution = ", round(sd(sampling_dist()$x_bar), 2))
})
},
options = list(height = 500)
)
```

5. It makes intuitive sense that as the sample size increases, the center of the
sampling distribution becomes a more reliable estimate for the true population
mean. Also as the sample size increases, the variability of the sampling
distribution ________.
<ol>
<li> decreases </li>
<li> increases </li>
<li> stays the same </li>
</ol>

<div id="exercise">
**Exercise**: Take a random sample of size 50 from `price`. Using this sample, what
is your best point estimate of the population mean?
</div>
```{r price-sample}
# type your code for this Exercise here, and Run Document

```

<div id="exercise">
**Exercise**: Since you have access to the population, simulate the sampling
distribution for $\bar{x}_{price}$ by taking 5000 samples from the population of
size 50 and computing 5000 sample means. Store these means in a vector called
`sample_means50`. Plot the data, then describe the shape of this sampling
distribution. Based on this sampling distribution, what would you guess the mean
home price of the population to be?
</div>
```{r price-sampling}
# type your code for this Exercise here, and Run Document

```

<div id="exercise">
**Exercise**: Change your sample size from 50 to 150, then compute the sampling
distribution using the same method as above, and store these means in a new vector
called `sample_means150`. Describe the shape of this sampling distribution, and
compare it to the sampling distribution for a sample size of 50. Based on this
sampling distribution, what would you guess to be the mean sale price of homes in
Ames?
</div>
```{r price-sampling-more}
# type your code for this Exercise here, and Run Document

```

* * *

So far, we have only focused on estimating the mean living area in homes in
Ames. Now you'll try to estimate the mean home price.

Note that while you might be able to answer some of these questions using the app
you are expected to write the required code and produce the necessary plots and
summary statistics. You are welcomed to use the app for exploration.
<div id="exercise">
**Exercise**: Take a sample of size 15 from the population and calculate the mean
`price` of the homes in this sample. Using this sample, what is your best point
estimate of the population mean of prices of homes?
</div>
```{r price-sample-small}
# type your code for this Exercise here, and Run Document

```

<div id="exercise">
**Exercise**: Since you have access to the population, simulate the sampling
distribution for $\bar{x}_{price}$ by taking 2000 samples from the population of
size 15 and computing 2000 sample means. Store these means in a vector called
`sample_means15`. Plot the data, then describe the shape of this sampling
distribution. Based on this sampling distribution, what would you guess the mean
home price of the population to be? Finally, calculate and report the population
mean.
</div>
```{r price-sampling-small}
# type your code for this Exercise here, and Run Document

```

<div id="exercise">
**Exercise**: Change your sample size from 15 to 150, then compute the sampling
distribution using the same method as above, and store these means in a new vector
called `sample_means150`. Describe the shape of this sampling distribution, and
compare it to the sampling distribution for a sample size of 15. Based on this
sampling distribution, what would you guess to be the mean sale price of homes in
Ames?
</div>
```{r price-sampling-big}
# type your code for this Exercise here, and Run Document

```

6. Which of the following is false?

<ol>
<li> The variability of the sampling distribution with the smaller sample size
(`sample_means50`) is smaller than the variability of the sampling distribution
with the larger sample size (`sample_means150`). </li>
<li> The means for the two sampling distribtuions are roughly similar. </li>
<li> Both sampling distributions are symmetric. </li>
</ol>
```{r price-sampling-compare}
# type your code for Question 6 here, and Run Document

```

<div id="license">
This is a derivative of an [OpenIntro](https://www.openintro.org/stat/labs.php)
lab, and is released under a [Attribution-NonCommercial-ShareAlike 3.0 United
States](https://creativecommons.org/licenses/by-nc-sa/3.0/us/) license.
</div>

Inferential Statistics in Details
No ratings yet
Inferential Statistics in Details
652 pages
Chapter 4: Probability Distributions: 4.1 Random Variables
100% (2)
Chapter 4: Probability Distributions: 4.1 Random Variables
53 pages
Linear Statistical Models The Less Than Full Rank Model: Yao-Ban Chan
100% (1)
Linear Statistical Models The Less Than Full Rank Model: Yao-Ban Chan
140 pages
Practise Problem3
0% (11)
Practise Problem3
6 pages
Basic Stat-1, Descriptive Statistics and Probability
100% (1)
Basic Stat-1, Descriptive Statistics and Probability
13 pages
Data Analysis Formula Sheet Tables (DADM)
No ratings yet
Data Analysis Formula Sheet Tables (DADM)
8 pages
A5 - One-Way ANOVA
100% (1)
A5 - One-Way ANOVA
32 pages
Statistics Packet
No ratings yet
Statistics Packet
17 pages
Statistical Inference
100% (1)
Statistical Inference
33 pages
Linear Combination of Random Variables: E (X) and Var (X) of Modified Random Variable
No ratings yet
Linear Combination of Random Variables: E (X) and Var (X) of Modified Random Variable
2 pages
Hypothesis Testing Mean
100% (1)
Hypothesis Testing Mean
26 pages
Estimation and Hypothesis
100% (2)
Estimation and Hypothesis
32 pages
Basic Statistics Course at COURSERA
0% (1)
Basic Statistics Course at COURSERA
17 pages
Poisson Distribution
No ratings yet
Poisson Distribution
22 pages
Chapter 1 Basic Definitions of Stochastic Process, Kolmogorov Consistency Theorem (Lecture On 01-05-2021) - STAT 243 - Stochastic Process
No ratings yet
Chapter 1 Basic Definitions of Stochastic Process, Kolmogorov Consistency Theorem (Lecture On 01-05-2021) - STAT 243 - Stochastic Process
5 pages
Chapter 12 ANOVA
No ratings yet
Chapter 12 ANOVA
25 pages
An Introduction To Bayesian Statistics and MCMC Methods
No ratings yet
An Introduction To Bayesian Statistics and MCMC Methods
69 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
Ch. 9 Multiple Choice Review Questions: 1.96 B) 1.645 C) 1.699 D) 0.90 E) 1.311
100% (1)
Ch. 9 Multiple Choice Review Questions: 1.96 B) 1.645 C) 1.699 D) 0.90 E) 1.311
5 pages
EC2303 Final Formula Sheet PDF
No ratings yet
EC2303 Final Formula Sheet PDF
8 pages
Chapter 2: Descriptive Analysis and Presentation of Single-Variable Data
No ratings yet
Chapter 2: Descriptive Analysis and Presentation of Single-Variable Data
71 pages
Dplyr Tutorial
100% (1)
Dplyr Tutorial
22 pages
Lecture 9 Moments
No ratings yet
Lecture 9 Moments
29 pages
Practis Exam Chapter 8
No ratings yet
Practis Exam Chapter 8
12 pages
Quartiles, Deciles, Percentiles
100% (1)
Quartiles, Deciles, Percentiles
5 pages
Assignmeant-1 Sharan S
No ratings yet
Assignmeant-1 Sharan S
20 pages
Week 4 Lab - Coursera
100% (1)
Week 4 Lab - Coursera
4 pages
Gamma Distribution
No ratings yet
Gamma Distribution
30 pages
R-Tutorial - Introduction
No ratings yet
R-Tutorial - Introduction
30 pages
M2 - Problem Set - Introduction To Statistics-2021 - Lagios
No ratings yet
M2 - Problem Set - Introduction To Statistics-2021 - Lagios
15 pages
Glossary of Statistical Terms and Symbols
No ratings yet
Glossary of Statistical Terms and Symbols
4 pages
R Programming
No ratings yet
R Programming
63 pages
Formula Sheet
100% (1)
Formula Sheet
2 pages
Question and Answers For Pyplots
No ratings yet
Question and Answers For Pyplots
11 pages
P&O Elementary Mathematics
100% (1)
P&O Elementary Mathematics
22 pages
Count Data Models in SAS
No ratings yet
Count Data Models in SAS
12 pages
Applications of Statistical Software For Data Analysis
No ratings yet
Applications of Statistical Software For Data Analysis
5 pages
R Programming Swirl
No ratings yet
R Programming Swirl
22 pages
SPSS2 Workshop Handout 20200917
No ratings yet
SPSS2 Workshop Handout 20200917
17 pages
9B BMGT 220 THEORY of ESTIMATION 2
No ratings yet
9B BMGT 220 THEORY of ESTIMATION 2
4 pages
R Packages For Machine Learning
No ratings yet
R Packages For Machine Learning
3 pages
EDA LAB Experiment No. 5 Confidence Interval2
No ratings yet
EDA LAB Experiment No. 5 Confidence Interval2
11 pages
R Lab - Probability Distributions
No ratings yet
R Lab - Probability Distributions
10 pages
Assignment 2 Questions One
No ratings yet
Assignment 2 Questions One
2 pages
Lab06 Confidence Intervals
No ratings yet
Lab06 Confidence Intervals
4 pages
Questions & Answers Chapter - 7 Set 1
No ratings yet
Questions & Answers Chapter - 7 Set 1
6 pages
Summary of Probability 2 1
No ratings yet
Summary of Probability 2 1
3 pages
Parameters: Unless Otherwise Noted, These Formulas Assume
No ratings yet
Parameters: Unless Otherwise Noted, These Formulas Assume
6 pages
Geometric Distribution Report
No ratings yet
Geometric Distribution Report
5 pages
Discrete Data Is A Count That Involves Integers. Only A Limited Number of
No ratings yet
Discrete Data Is A Count That Involves Integers. Only A Limited Number of
3 pages
Confidence Interval Estimationnew
No ratings yet
Confidence Interval Estimationnew
12 pages
Linear Regression Model
No ratings yet
Linear Regression Model
3 pages
Beta Distribution
No ratings yet
Beta Distribution
8 pages
Refresher Probabilities Statistics PDF
No ratings yet
Refresher Probabilities Statistics PDF
3 pages
R Programming Introduction
No ratings yet
R Programming Introduction
20 pages
LAB 1 Notes
No ratings yet
LAB 1 Notes
3 pages
Assignment 5
No ratings yet
Assignment 5
11 pages
Assignment 1 1
No ratings yet
Assignment 1 1
13 pages
Absolute Measure of Dispersion
No ratings yet
Absolute Measure of Dispersion
4 pages
Statistical Treatment Self Esteem
100% (1)
Statistical Treatment Self Esteem
4 pages
Confidence Intervals
No ratings yet
Confidence Intervals
3 pages
Cohen 1992
No ratings yet
Cohen 1992
5 pages
Types of Distributions: Probablity Distribution (Non Specific) Binomial Distribution
No ratings yet
Types of Distributions: Probablity Distribution (Non Specific) Binomial Distribution
1 page
Mock 1 Quantitative Techniques
No ratings yet
Mock 1 Quantitative Techniques
5 pages
Final Presentation On GBV - Nov 2016
No ratings yet
Final Presentation On GBV - Nov 2016
23 pages
Skewness and Kurtosis
No ratings yet
Skewness and Kurtosis
21 pages
8614 Solved Paper
No ratings yet
8614 Solved Paper
9 pages
Analysis of Variance Anova
No ratings yet
Analysis of Variance Anova
7 pages
CH 6 Practice
No ratings yet
CH 6 Practice
5 pages
Nduba Landfil1
50% (2)
Nduba Landfil1
6 pages
Mathematics For Orthop CAT 24-3-2017
No ratings yet
Mathematics For Orthop CAT 24-3-2017
5 pages
Using Macro Variables To Subset Data in Procedures
0% (2)
Using Macro Variables To Subset Data in Procedures
2 pages
Smtb1402-Probability & Statistics: Correlation
No ratings yet
Smtb1402-Probability & Statistics: Correlation
19 pages
OU Osmania University - MBA - 2015 - 1st Semester - Feb - 2005 Statistics For Management
No ratings yet
OU Osmania University - MBA - 2015 - 1st Semester - Feb - 2005 Statistics For Management
3 pages
Computing New Variables Using Generate and Replace
No ratings yet
Computing New Variables Using Generate and Replace
9 pages
7SSMM700 Lecture 8
No ratings yet
7SSMM700 Lecture 8
33 pages
A Test For Normality
No ratings yet
A Test For Normality
5 pages
Bola Armoush - MAT120 final-REVIEW2A-2023
No ratings yet
Bola Armoush - MAT120 final-REVIEW2A-2023
5 pages
Business Statistics in Practice 7th Edition Bowerman Solutions Manualdownload
100% (7)
Business Statistics in Practice 7th Edition Bowerman Solutions Manualdownload
46 pages
DepartmentofStatistics
No ratings yet
DepartmentofStatistics
3 pages
Regression Analysis: Basic Concepts: 1 The Simple Linear Model
No ratings yet
Regression Analysis: Basic Concepts: 1 The Simple Linear Model
4 pages
MSA Worksheet
No ratings yet
MSA Worksheet
18 pages
T Test
No ratings yet
T Test
14 pages
Regression
No ratings yet
Regression
7 pages
01descriptive Statistics
No ratings yet
01descriptive Statistics
48 pages
Environmental Health-Groups and Topics
No ratings yet
Environmental Health-Groups and Topics
2 pages
1q3b8AXWiBQ80Aki yDW-q qNGhtwoVV
No ratings yet
1q3b8AXWiBQ80Aki yDW-q qNGhtwoVV
8 pages
Influential Observation
No ratings yet
Influential Observation
4 pages
Behavioral Variability - Leary Fall 2015
No ratings yet
Behavioral Variability - Leary Fall 2015
18 pages
R. A. Fisher and The Making of Maximum Likelihood 1912 - 1922
No ratings yet
R. A. Fisher and The Making of Maximum Likelihood 1912 - 1922
15 pages
Calculation of Median, Quartiles and Percentiles
No ratings yet
Calculation of Median, Quartiles and Percentiles
4 pages
Merit Certificate Template
No ratings yet
Merit Certificate Template
1 page
Students by Departments
No ratings yet
Students by Departments
1 page
Stephens Et Al 2005 Information Theory and Hypothesis Testing
No ratings yet
Stephens Et Al 2005 Information Theory and Hypothesis Testing
9 pages
Physics 2014-2015
No ratings yet
Physics 2014-2015
11 pages
Statistical Officer
No ratings yet
Statistical Officer
24 pages
Orientation Week: Sunday Monday Tuesday Wednesday Thursday Friday Saturday
No ratings yet
Orientation Week: Sunday Monday Tuesday Wednesday Thursday Friday Saturday
6 pages
Gender-Based Violence
No ratings yet
Gender-Based Violence
11 pages
Case PT Exposed 21 263 Non Exposed 8 247: MH Squared
No ratings yet
Case PT Exposed 21 263 Non Exposed 8 247: MH Squared
7 pages
Biostatistics 2-Homework 2
No ratings yet
Biostatistics 2-Homework 2
3 pages
Solutions To Questions
No ratings yet
Solutions To Questions
3 pages
Case PT Exposed 21 263 Non Exposed 8 247 Interpretation: MH MH Squared
No ratings yet
Case PT Exposed 21 263 Non Exposed 8 247 Interpretation: MH MH Squared
3 pages
Cat-Medical Imaging Sciences
No ratings yet
Cat-Medical Imaging Sciences
2 pages
Hodgkins
No ratings yet
Hodgkins
1 page
Moodle For Teachers
No ratings yet
Moodle For Teachers
1 page
Computing Thousands of Test Statistics Simultaneously in R
No ratings yet
Computing Thousands of Test Statistics Simultaneously in R
6 pages
Section I: Multiple-Choice Questions (5 Marks) : STAT 201-Exam I College of Business (Fall 2016)
No ratings yet
Section I: Multiple-Choice Questions (5 Marks) : STAT 201-Exam I College of Business (Fall 2016)
6 pages
Summary of Lesson 1 - Essentials
No ratings yet
Summary of Lesson 1 - Essentials
2 pages

Sampling Distributions Coursera

Uploaded by

Sampling Distributions Coursera

Uploaded by

---

title: "Foundations for inference - Sampling distributions"

### Load packages

Let's load the packages.

```{r load-packages, message=FALSE}

### The data

We can explore the distribution of areas of homes in the population of home

1. Which of the following is **false**?

## The unknown sampling distribution

If we were interested in estimating the mean living area in Ames based on a

If we're interested in estimating the average living area in homes in Ames

ggplot(data = sample_means50, aes(x = x_bar)) +

## Interlude: Sampling distributions

The idea behind the `rep_sample_n` function is *repetition*. Earlier we took

Note that in practice one rarely gets to build sampling distributions,

Without the `rep_sample_n` function, this would be painful. We would have to

3. How many elements are there in this object called `sample_means_small`?

4. Which of the following is **true** about the elements in the sampling

## Sample size and the sampling distribution

The sampling distribution that we computed tells us much about estimating

```{r shiny, echo=FALSE}

# Sidebar with a slider input for number of bins

# Show a plot of the generated distribution

# Define server logic required to draw a histogram

# create sampling distribution

# plot sampling distribution

ggplot(sampling_dist(), aes(x = x_bar)) +

# mean of sampling distribution

# mean of sampling distribution

6. Which of the following is false?

You might also like

1. Which of the following is false?

The idea behind the `rep_sample_n` function is repetition. Earlier we took

4. Which of the following is true about the elements in the sampling