Sampling Techniques for Data Preprocessing
Sampling Techniques for Data Preprocessing
You write everyone’s name on a slip, throw them in a hat, and randomly
pick a few.
✅ Pros:
✔ Completely unbiased (each item has an equal chance).
✔ Easy to understand and apply.
✔ Works well for small populations.
❌ Cons:
❌ Can be time-consuming and costly for large populations.
❌ Might not cover all subgroups properly.
❌ Requires a full list of the population, which isn’t always available.
2. Stratified Sampling
2. Stratified Sampling – Dividing the Cake
Now, let’s say you’re making a fruitcake, and you want every slice to have an
equal mix of nuts, raisins, and chocolate chips. Instead of just randomly scooping
ingredients, you make sure each slice has a fair amount of everything.
❌ Cons:
❌ Requires detailed knowledge about the population (to create proper strata).
❌ More complex and time-consuming to implement.
3. Systematic Sampling
3. Systematic Sampling – Every 10th Customer
Imagine you own a store and want to survey customers, but instead of
randomly stopping people, you decide, “I’ll talk to every 10th person who
walks in.
A factory wants to check product quality. Instead of testing all, they check
every 50th item off the assembly line
Pros And Cons Of Systematic Sampling
✅ Pros:
✔ Easy to use and requires less effort than random sampling.
✔ Spreads the sample evenly across the population.
✔ Works well for large populations.
❌ Cons:
❌ If there’s a hidden pattern in the population, it can introduce bias.
4. Cluster Sampling
4. Cluster Sampling – Picking a Few Schools Instead of Students
Imagine you want to study student habits across a country. Instead of picking
random students, you randomly select entire schools and survey all the students
in those schools.
❌ Cons:
❌ If clusters are not diverse, results may be biased.
❌ Less accurate than stratified sampling.
❌ One bad cluster can distort results.
5. Convenience Sampling
5. Convenience Sampling – The Easy Way Out
This is the “I’ll just ask whoever is nearby” method. It’s like asking only
your family and friends about their favorite movie and assuming that’s
what everyone likes.
❌ Cons:
❌ Highly biased—doesn’t represent the whole population.
❌ Low reliability—results can’t be generalized.
❌ People selected may have similar traits, leading to misleading conclusions.
6. Multi-Stage Sampling
6. Multi-Stage Sampling – A Mix of Everything
Imagine a singing competition.
2️⃣Then, in each city, they pick contestants from different age groups
(Stratified Sampling).
Real-life example:
● The government does a national survey by picking some provinces, then districts, then
random individuals to participate.
Pros And Cons Of Multi-Stage Sampling
✅ Pros:
✔ Works well for large, complex populations.
✔ Reduces cost and effort compared to a full survey.
✔ Combines different sampling methods for better accuracy.
❌ Cons:
❌ Requires multiple steps, making it time-consuming.
❌ If not done properly, it can lose randomness.
❌ Needs careful planning to avoid errors.
Thank You