Population Sample Parameter

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

POPULATION, SAMPLE, PARAMETER & STATISTICS

In this post, I have explained the difference between Population &


Sample.

Population

1- A population refers to an entire group of people, objects, events,


measurements etc. that is under consideration or observation with
some common features. Eg.

 Number of US voters between the age of 30 & 40 and finding out


whether they voted for Congress

 Number of Apple employees who are satisfied with their annual


salary

2- Parameter — A parameter is a data of a population. Eg.

 Reaching out to every US voter between the age of 30 & 40 and


finding out whether they voted for Congress

 Reaching out to every employee of Apple and find out whether


they are satisfied with their salary

Sample

1- A sample is a subset of the population, and its members are


randomly selected from the population. Eg.
 Reaching out to a fraction of US voters between the age of 30 &
40 and checking with them whether they voted for Congress

 Reaching out to a few employees of Apple and finding out


whether they are satisfied with their salary

2- Statistics — A statistic is a data of a sample that is a


representation of a population. Eg.

 Instead of reaching out to every US voter between 30 & 40 and


checking with them whether they voted for Congress, we draw a
conclusion based on the percentage of people who voted for
Congress, using sample

 We infer what percentage of employees are satisfied with their


salary based on a sample
Population vs Sample & Parameter vs
Statistic & Biased vs Unbiased in
Statistics
We will cover Population and Sample, Parameter and
Statistic, Population Mean and Sample Mean, Biased and
Unbiased

Introduction

In this blog, you will see about these topics in Statistics

 Population & Sample

 Parameter & Statistic

 Population Mean & Sample Mean

 Biased & Unbiased Estimator

Want to learn about Measure of Central


Tendency and Measure of Variability. Here is the Blog for
you. . .
Statistics: Mean, Median and Mode
Measure of Central Tendency — Mean, Median and Mode in Statistics — formula
and use cases applied using sample data and…
medium.com

Statistics: Range, Variance and Standard Deviation


Measure of Variability/Dispersion : Range, Variance and Standard Deviation, Why
the numerator is Squared in . . .
medium.com
Let’s get into Action . . . . . . . .

Population & Sample

https://www.omniconvert.com/what-is/sample-size/

Population : The Population is the Entire group that you are


taking for analysis or prediction.

Sample : Sample is the Subset of the Population(i.e. Taking


random samples from the population). The size of the sample is
always less than the total size of the population.

Let’s take a Scenario to describe the Population and Sample to make


more clarity.
You are doing a Voting Prediction to
analyze/predict which party will get majority of
vote and won the Election.

So, your next step is to collect the data from the people that they
voted for which party. Let’s consider India, there are above 130
Crore people, you can’t get all the people opinions that they voted.

Due to constraints of resources, time, and accessibility computing


data from a population is nearly impossible, hence a sample is used.
As an analogy, you can think of your sample as an aquarium and
your population as the ocean. Your sample is small portion of a
vaster ocean that you are attempting to understand.

Coming back to the Scenario, you randomly select some people


and take their opinions then you will do the analysis/prediction.

Note: You have to take the people opinions randomly. Because, If


you collect information from one State/district for the Entire
Indian People voting, your prediction/analysis goes wrong,
because the data would get biased. We will see “Bias & Unbiased”
in the below part

While taking the samples from the population, there are different
types. . .

Sampling with and Without Replacement: Let’s start with


an example, you have one basket contains 5 Red Balls and 4 Blue
Balls. In the first event, you are taking a sample of 3 Red Balls and
2 Blue Balls and Calculating their probability.

 If you put the sample of 3 Red Balls and 2 Blue Balls back into
the basket is referred as Sampling With Replacement.

 If you didn’t put the sample back into the basket and calculation
probability for the next event, this is referred as Sampling
Without Replacement.

Parameter & Statistic

Parameters

Calculating Mean, Variance and Standard
Deviation on Population Data known to be a Population
parameters. The population mean and population standard
deviation are represented by the Greek letters µ and σ respectively. A
parameter is a characteristic of a population.

Statistic

Calculating Mean(x̅), Variance and Standard
Deviation on Sample Data known to be a Sample statistic. A
statistic is a characteristic of a sample.

If anyone ask and calculate statistic means, you have to calculate x̅,


s2 ( S Square) and S.
Notation of Population and Sample — Image Created by Author

Population Mean & Sample Mean

Population Mean

Mean gives the average of the data. If you calculate mean for
population data is known as Population Mean. Population mean
is a fixed one. . . it doesn’t vary.
Image Created by Author

Sample Mean

Calculation of mean using Sample data is known as Sample


Mean. Sample mean vary as our data size/sample size increases. . .

Image Created by Author

Biased & Unbiased Estimator

Biased
If your Population Parameter and Sample Statistic is not equal,
then it is called as Biased. Usually Bias somewhat tilt towards one
sided of the data rather than random.

Unbiased

If your Population Parameter and Sample Statistic is equal, then it is


called as Unbiased

Conclusion

I hope this article will help you to know about Population, Sample,
Parameter and Statistic, Population Mean, Sample Mean, Biased
and Unbiased Estimator.
Populations, Parameters, and Samples in
Inferential Statistics
By Jim Frost 22 Comments

Inferential statistics lets you draw conclusions about populations by using small


samples. Consequently, inferential statistics provide enormous benefits because
typically you can’t measure an entire population.

However, to gain these benefits, you must understand the relationship between
populations, subpopulations, population parameters, samples, and sample statistics.

In this blog post, learn the differences between population vs. sample, parameter vs.
statistic, and how to obtain representative samples using random sampling.

Related post: Difference between Descriptive and Inferential Statistics

Populations
Populations can include people, but other examples include objects, events,
businesses, and so on. In statistics, there are two general types of populations.

Populations can be the complete set of all similar items that exist. For example, the
population of a country includes all people currently within that country. It’s a finite
but potentially large list of members.

However, a population can be a theoretical construct that is potentially infinite in size.


For example, quality improvement analysts often consider all current and future
output from a manufacturing line to be part of a population.

Populations share a set of attributes that you define. For example, the following are
populations:

o Stars in the Milky Way galaxy.


o Parts from a production line.
o Citizens of the United States.

Before you begin a study, you must carefully define the population that you are
studying. These populations can be narrowly defined to meet the needs of your
analysis. For example, adult Swedish women who are otherwise healthy but have
osteoporosis.
Population vs Sample
It’s virtually impossible to measure a whole population completely because they tend
to be extremely large. Consequently, researchers must measure a subset of the
population for their study. These subsets are known as samples.

Typically, a researcher’s goal is to draw a representative sample from their target


population. A representative sample mirrors the properties of the population. Using
this approach, researchers can generalize the results from their sample to the
population. Performing valid inferential statistics requires a strong relationship
between the population and a sample.

In a later section, you’ll learn about the importance of representative samples and
how to obtain them.

A statistical inference is when you use a sample to infer the properties of the entire
population from which it was drawn. Learn more about making Statistical Inferences.

Learn more in-depth about Populations vs. Samples: Uses and Examples.

Subpopulations can Improve Your Analysis


Subpopulations share additional attributes. For instance, the population of the United
States contains the subpopulations of men and women. You can also subdivide it in
other ways such as region, age, socioeconomic status, and so on. Different studies
that involve the same population can divide it into different subpopulations
depending on what makes sense for the data and the analyses.

Understanding the subpopulations in your study helps you grasp the subject matter
more thoroughly. They can also help you produce statistical models that fit the data
better. Subpopulations are particularly important when they have characteristics that
are systematically different than the overall population. When you analyze your data,
you need to be aware of these deeper divisions. In fact, you can treat the relevant
subpopulations as additional factors in later analyses.

For example, if you’re analyzing the average height of adults in the United States,
you’ll improve your results by including male and female subpopulations because
their heights are systematically different. I’ll cover that example in depth later in this
post!

Parameter vs Statistic
A parameter is a value that describes a characteristic of an entire population, such
as the population mean. Because you can almost never measure an entire
population, you usually don’t know the real value of a parameter. In fact, parameter
values are nearly always unknowable. While we don’t know the value, it definitely
exists.

For example, the average height of adult women in the United States is a parameter
that has an exact value—we just don’t know what it is!

The population mean and standard deviation are two common parameters. In
statistics, Greek symbols usually represent population parameters, such as μ (mu)
for the mean and σ (sigma) for the standard deviation.

A statistic is a characteristic of a sample. If you collect a sample and calculate the


mean and standard deviation, these are sample statistics. Inferential statistics allow
you to use sample statistics to make conclusions about a population. However, to
draw valid conclusions, you must use particular sampling techniques. These
techniques help ensure that samples produce unbiased estimates. Biased estimates
are systematically too high or too low. You want unbiased estimates because they
are correct on average.

In inferential statistics, we use sample statistics to estimate population parameters.


For example, if we collect a random sample of adult women in the United States and
measure their heights, we can calculate the sample mean and use it as an unbiased
estimate of the population mean. We can also perform hypothesis testing on the
sample estimate and create confidence intervals to construct a range that the actual
population value likely falls within. Learn more about Parameters vs Statistics.

The law of large numbers states that as the sample size grows, sample statistics will
converge on the population parameters. Additionally, the standard error of the
mean mathematically describes how larger samples produce more precise
estimates.

Population Parameter Sample Statistic

Mu (μ) Sample mean

Sigma (σ) Sample standard deviation

Related posts: Measures of Central Tendency and Measures of Variability


Representative Sampling and Simple Random Samples

A sample is a subset of the whole


population
In statistics, sampling refers to selecting a subset of a population. After drawing the
sample, you measure one or more characteristics of all items in the sample, such as
height, income, temperature, opinion, etc. If you want to draw conclusions about
these characteristics in the whole population, it imposes restrictions on how you
collect the sample. If you use an incorrect methodology, the sample might not
represent the population, which can lead you to erroneous conclusions. Learn more
about Representative Samples.

The most well-known method to obtain an unbiased, representative sample is simple


random sampling. With this method, all items in the population have an equal
probability of being selected. This process helps ensure that the sample includes the
full range of the population. Additionally, all relevant subpopulations should be
incorporated into the sample and represented accurately on average. Simple random
sampling minimizes the bias and simplifies data analysis.

I’ll discuss sampling methodology in more detail in a future blog post, but there are
several crucial caveats about simple random sampling. While this approach
minimizes bias, it does not indicate that your sample statistics exactly equal the
population parameters. Instead, estimates from a specific sample are likely to be a
bit high or low, but the process produces accurate estimates on average.
Furthermore, it is possible to obtain unusual samples with random sampling—it’s just
not the expected result.

Methods for collecting a representative sample include the following:

o Simple random sampling


o Stratified sampling
o Cluster sampling
o Systematic sampling

Additionally, random sampling might sound a bit haphazard and easy to do—both of
which are not true. Simple random sampling assumes that you systematically
compile a complete list of all people or items that exist in the population. You then
randomly select subjects from that list and include them in the sample. It can be a
very cumbersome process.

Random sampling can increase the internal and external validity of your study. Learn
more about internal and external validity.

Conversely, convenience sampling does not tend to obtain representative samples.


These samples are much easier to collect but the results are minimally useful.

Let’s bring these concepts to life!

Related post: Sample Statistics Are Always Wrong (to Some Extent)!

Example of a Population with Important Subpopulations


Suppose we’re studying the height of American citizens and let’s further assume that
we don’t know much about the subject. Consequently, we collect a random sample,
measure the heights in centimeters, and calculate the sample mean and standard
deviation. Here is the CSV data file: Heights.

We obtain the following results:


Because we gathered a random sample, we can assume that these sample statistics
are unbiased estimates of the population parameters.

Now, suppose we learn more about the study area and include male and female as
subpopulations. We obtain the following results.

Notice how the single broad distribution has been replaced by two narrower
distributions? The distribution for each gender has a smaller standard deviation than
the single distribution for all adults, which is consistent with the tighter spread around
the means for both men and women in the graph. These results show how the mean
provides more precise estimates when we assess heights by gender. In fact, the
mean for the entire population does not equal the mean for either subpopulation. It’s
misleading!
During this process, we learn that gender is a crucial subpopulation that relates to
height and increases our understanding of the subject matter. In future studies about
height, we can include gender as a predictor variable.

This example uses a categorical grouping variable (Gender) and a continuous


outcome variable (Heights). When you want to compare distributions of continuous
values between groups like this example, consider using boxplots and individual
value plots. These plots become more useful as the number of groups increases.

This example is intentionally easy to understand but imagine a study about a less
obvious subject. This process helps you gain new insights and produce better
statistical models.

Using your knowledge of populations, subpopulations, parameters, sampling, and


sample statistics, you can draw valuable conclusions about large populations by
using small samples. For more information about how you can test hypotheses about
populations, read my Overview of Hypothesis Tests.

You might also like