0% found this document useful (0 votes)
74 views

Sampling Techniques for Data Preprocessing

The document discusses various sampling techniques used in data preprocessing, including Simple Random Sampling, Stratified Sampling, Systematic Sampling, Cluster Sampling, Convenience Sampling, and Multi-Stage Sampling. Each technique is explained with real-world examples, pros, and cons, highlighting their applicability and limitations in data analysis. The importance of sampling is emphasized as a method to efficiently analyze large datasets by selecting representative subsets.

Uploaded by

saadkhancr91
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

Sampling Techniques for Data Preprocessing

The document discusses various sampling techniques used in data preprocessing, including Simple Random Sampling, Stratified Sampling, Systematic Sampling, Cluster Sampling, Convenience Sampling, and Multi-Stage Sampling. Each technique is explained with real-world examples, pros, and cons, highlighting their applicability and limitations in data analysis. The importance of sampling is emphasized as a method to efficiently analyze large datasets by selecting representative subsets.

Uploaded by

saadkhancr91
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Technical Seminar

Sampling Techniques For


Data Preprocessing
Mohammad Israr (1IC21CD001) Guide: Dr. Sheethal Aji Mani
Why Do We Need Sampling?
Imagine you have a huge bucket of popcorn and want to check if it’s salty
enough. You can’t taste every piece, so you grab a few kernels, taste them,
and decide. That’s sampling—picking a small part of something big to
analyze and make conclusions about the whole.

In data science, datasets can be massive. Instead of processing millions of


rows, we take a smart shortcut by sampling a smaller set that represents the
entire data.
Different Sampling Techniques

1. Simple Random Sampling


2. Stratified Sampling
3. Systematic Sampling
4. Cluster Sampling
5. Convenience Sampling
6. Multi-Stage Sampling
1. Simple Random Sampling
– The Lucky Draw
1. Simple Random Sampling

You write everyone’s name on a slip, throw them in a hat, and randomly
pick a few.

Major Real-World Example:


Election Polls:
Before elections, agencies like Gallup or Pew Research use Simple
Random Sampling to pick a set of voters across different regions to predict
the likely outcome.
Pros And Cons of Simple Random Sampling

✅ Pros:
✔ Completely unbiased (each item has an equal chance).
✔ Easy to understand and apply.
✔ Works well for small populations.
❌ Cons:
❌ Can be time-consuming and costly for large populations.
❌ Might not cover all subgroups properly.
❌ Requires a full list of the population, which isn’t always available.
2. Stratified Sampling
2. Stratified Sampling – Dividing the Cake
Now, let’s say you’re making a fruitcake, and you want every slice to have an
equal mix of nuts, raisins, and chocolate chips. Instead of just randomly scooping
ingredients, you make sure each slice has a fair amount of everything.

Major Real-World Example:


Census Data Collection:
Governments divide the population into strata like age, income, gender, and
education level to ensure fair representation in the census.
Pros And Cons Of Stratified Sampling
✅ Pros:
✔ Ensures all important groups (strata) are well-represented.
✔ More precise than simple random sampling.
✔ Works well for diverse populations (e.g., age groups, income levels).

❌ Cons:
❌ Requires detailed knowledge about the population (to create proper strata).
❌ More complex and time-consuming to implement.
3. Systematic Sampling
3. Systematic Sampling – Every 10th Customer
Imagine you own a store and want to survey customers, but instead of
randomly stopping people, you decide, “I’ll talk to every 10th person who
walks in.

Major Real-World Example:

A factory wants to check product quality. Instead of testing all, they check
every 50th item off the assembly line
Pros And Cons Of Systematic Sampling
✅ Pros:
✔ Easy to use and requires less effort than random sampling.
✔ Spreads the sample evenly across the population.
✔ Works well for large populations.

❌ Cons:
❌ If there’s a hidden pattern in the population, it can introduce bias.
4. Cluster Sampling
4. Cluster Sampling – Picking a Few Schools Instead of Students

Imagine you want to study student habits across a country. Instead of picking
random students, you randomly select entire schools and survey all the students
in those schools.

Major Real-World Example:

WHO’s Health Surveys:


The World Health Organization (WHO) conducts global health surveys by
randomly selecting countries → cities → hospitals and studying all patients in
those hospitals.
Pros And Cons Of Cluster Sampling
✅ Pros:
✔ Cost-effective and time-saving for large populations.
✔ Useful when the population is spread out over a wide area.
✔ Easier to conduct than random sampling.

❌ Cons:
❌ If clusters are not diverse, results may be biased.
❌ Less accurate than stratified sampling.
❌ One bad cluster can distort results.
5. Convenience Sampling
5. Convenience Sampling – The Easy Way Out

This is the “I’ll just ask whoever is nearby” method. It’s like asking only
your family and friends about their favorite movie and assuming that’s
what everyone likes.

Major Real-World Example:


Startup Market Research:
A new coffee shop asks only walk-in customers about their
experience, rather than conducting a city-wide survey.
Pros And Cons Of Convenience Sampling
✅ Pros:
✔ Fast and easy—no complex process required.
✔ Useful for early-stage research (when you just need quick insights).
✔ Low-cost compared to other methods.

❌ Cons:
❌ Highly biased—doesn’t represent the whole population.
❌ Low reliability—results can’t be generalized.
❌ People selected may have similar traits, leading to misleading conclusions.
6. Multi-Stage Sampling
6. Multi-Stage Sampling – A Mix of Everything
Imagine a singing competition.

1 First, judges randomly pick


1️⃣ some cities (Cluster Sampling).

2️⃣Then, in each city, they pick contestants from different age groups
(Stratified Sampling).

3️⃣Finally, they randomly select finalists (Simple Random Sampling).

Real-life example:

● The government does a national survey by picking some provinces, then districts, then
random individuals to participate.
Pros And Cons Of Multi-Stage Sampling
✅ Pros:
✔ Works well for large, complex populations.
✔ Reduces cost and effort compared to a full survey.
✔ Combines different sampling methods for better accuracy.

❌ Cons:
❌ Requires multiple steps, making it time-consuming.
❌ If not done properly, it can lose randomness.
❌ Needs careful planning to avoid errors.
Thank You

You might also like