0% found this document useful (0 votes)

5 views6 pages

C3 Comm213

Chapter 3 discusses the definitions and differences between populations and samples, emphasizing the importance of sampling methods to reduce costs and biases in data collection. It covers various statistical concepts including descriptive and inferential statistics, measures of central tendency and dispersion, and probability distributions. Additionally, it highlights the significance of visualizing data through tools like histograms and box plots for better interpretation and analysis.

Uploaded by

laurabosselet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views6 pages

C3 Comm213

Uploaded by

laurabosselet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Chapter 3: Defining population and samples

LO 3.1 Defining pop and samples

Population:
- Group with something in common (parameter)
- Expensive/impossible to get all
- Parameter: characteristic of a population

VS Sample:
- Subset of a population
- Representative and focus on statistics = characteristics of a sample
- Used to make inferences (conclusion) about the characteristic of a pop
- To reduce cost = sample is drawn out of population
 Parameter and inference may match

Descriptive/Summary statistics: measures that describes visible component of population or

sample
Inferential statistics: measure calculated only using sample
Hypothesis: proposed explanation made on basis of sample/limited evidence
- Starting point for more investigation
- Uses inferential statistics
- T-test, Z-test, P-test, F-test

LO 3.2: Sampling methods data reduction and bias

1. Simple random sampling:

- all elements have equal chance of being selected into the sample to hope select
representative of whole population
- Doesn’t care about selecting subset, or when homogeneous data/similar
2. 3ified random sampling:
- Pop is categorized/ dividing population into subgroups or strata based on specific
characteristics or attributes  calculate proportion of population in each group and
random sample each group to ensure appropriate number of each group (stratum) is
represented
- Ex: ensure representation of both happy/unhappy customers
3. Cluster sampling:
- Diving into group, calculate proportion, select only few cluster relevant + proportionate
- Select few groups: geography, time zone
- More efficient and cost-effective
4. Convenience/non-probability sampling
- Ease of access and availability when limited and budgets at low
- Some bias
 data already collected = simply subset data
 if need to be collected = distribute survey then stop collecting data once # reached

Data reduction: process of reducing size of data set to more manageable and suitable size for a
business analysis projects
- To retain meaningful information
- Focus on most interesting, critical, and abnormal items
- Speeds up analysis + reduce cost
How? Filtering:
 data  sort and filter  filter: removing rows that aren’t interesting/relevant/ choosing
interesting one
Decision process of using subset of data: need to consider purpose of analysis, time and cost.

Bias in business analytics:

- Prejudice in favor of or against
- Intentional vs unintentional
- During data collection, analysis, results
Types:
1. Nonresponse: sample not answering survey = potential distortion in results
 to prevent: inform sample about importance = sending gift card
- Partiality that results when respondents differ from non-respondents
2. Selection: analyst selected inappropriate sample
 when purposefully selects potions of participants/data that are likely to provide answers
aligning with analyst’s belief = not representative so not generalized
3. Confirmation: favor info that confirm pre-existing belief – do not reflect on truth =
inaccurate conclusions. Have already been collected
4. Outlier: extreme values influencing the interpretation of results. Have already been
collected

LO 3 Understanding statistics:

Probability distributions:
1. Random Variable: quantifies the outcomes of random occurrences
- To measure things that happen by chance
2. Data distribution: shows all possible values for a variable and how frequent
- Tells you what numbers could come up and how frequent each number is likely to appear
- Deals with hypothetical/predicted data
3. Probability distribution:
- Statistical function that describes possible values in population and likelihood that any
given observation (random variable) can take a given range/value
- How probable an outcome is
- Deals with observed data you already have
- If discrete distribution = presented in ranges

Types of numerical data: determine types of probability distribution

1. Discrete data: whole number, finite set of values between 2 observations
Ex: inventory, vehicles
2. Continuous data: any numerical value, infinite set of value
Ex: Weight, currency

Measures of central tendency: describe the center point of data set + susceptible to outliers!
Mean: average = Sum/n – cannot be categorical + susceptible to outliers
Median: midpoint of data distribution – cannot be categorical (small – highest) – ave if pair – no
susceptibility to outliers
Mode: simplest+ most common value – most important for categorical - uni/bi/multi-modal
Symmetry = when all above are equal
Kurtosis: distribution shape + thickness of tails
- Closer to 0 = normal distribution
- Whether more clustered in peak or tails
Work with Skewness: help determine likelihood
of event falling in a tail
- Positive kurtosis : skewed right. More
extreme value peaked at center
- Negative Kurtosis: skewed left. Fewer extreme values so flatter around center
 Mean higher than median = implied some outliers are skewing it right

Measures of dispersion:
Range: Max – Min
- Affected by outliers
Interquartile range (IQR): 4 quartiles to determine shape
- Should be in sorted order
- Q1: lowest 25% of observations
- Q2: next 25% - from 25% to 50%(= median)
- Q3: median to 75%
- Q4: 75 and more
- “Inter”= implies not interest in 4 quartile but specific middle section Q2 and Q3
 Suppose you have a data set : 24,25,25,25,26,27,28,28,29
- Q2 value = median = 26
- Q1 value = median of lower half (red color) = 25+25/2 =25
- Q3 value = median of upper half in blue color = 28+28/2 =28
- Interquartile range : Q3 – Q1 = 28-25 = 3
-
Variance: average squared deviation from the mean
- Measure how individual data points in a dataset differ from the average
- [X1 – Mean1]^2 + [….] / # of x
Standard deviation:
- Square root of Variance
- Same unit as data value
Coefficient of variation: o/u
- Measure of relative variability by comparing standard deviation to mean
- Useful when comparing different datasets or units or scales
- [SD/MEAN]*100

Continuous probability distributions:

Normal Gaussian Distribution:
- Bell shaped with most data point clustered near average/mean
- Natural occurring, symmetric = skewness is 0, but kurtosis is 3 b/c 68% data fall within
one SD, 95% within 2 SD, 99.7% falls within 3 SD
- Ex: weight, pop height, pop IQ, shoe size

Standard Normal distribution:

- Theoretical distribution: Doesn’t represent a real distribution – used only to make
comparison between distribution easier + calculated probabilities of individual
observation
- M-M-M = 0
- SD=1

Z score: standardized values

- Measures number of SD a data point is away from the mean
- Z = (x-mean)/ SD

Uniform distribution:
- Probability distribution that describes a set of continuous random variable where every
value within a given range is equally likely to occur:
- Flat and constant distribution – no peaks no tails
LO 3.4: Using Software tools to Create Summary Statistics:

Argument: value that the function uses to perform calculations (IQR exceptions)
Descriptive data: Data  Data analysis on right  DS  Ok = get data
Tableau Summary statistics – descriptive Statistics: Worksheet  Show summary

- Bin - Frequency

- [0-50) - 65
LO 3.5 Interpreting and Visualizing statistics:
Frequency distribution: - [50-100) - 63
- For numerical data b/c categorical we can just count
- Bins, classes, and intervals (categories in numerical - [100-150) - 30
data) - [150-200) - 8
- Table uses bins = to list frequent of various outcome
in sample
- Symmetrical = expect 5th/middle to contain most observation
- Skewed right = more observation in 1st
- Skewed left = more in last one

Histogram:
- Visual representation for frequency distribution
- Size of bins can shape
- Shows stats like: mean, sun, count etc…
- No gaps in between unless need to indicate absence of data point for that interval
- Reference lines for mean and median
- Like vertical bar chart but bars are replaced by bins
- Y axis in bar graph for descriptive but in H for # of observations

Box Plot:
- shows dispersion in
terms of quartiles
Box: represents the IQR: 25th
Q1, 75th Q3
- Length: reflects spread
of middle 50% of the
data

Line/whisker: represent the

range of data from quartiles
(Q3+1.5IQR and Q1-
1.5IQR)
Median: horizontal bar
which divides the data
in two equal halves
Outliers: individual
points beyond

STAB22 Lecture's Notes
No ratings yet
STAB22 Lecture's Notes
64 pages
Comm 215.MidtermReview
No ratings yet
Comm 215.MidtermReview
71 pages
ISOM Cheat Sheet 1
No ratings yet
ISOM Cheat Sheet 1
6 pages
Engineering Data Book III
No ratings yet
Engineering Data Book III
6 pages
Introduction To IOAA
No ratings yet
Introduction To IOAA
14 pages
Mathematics Statistics
No ratings yet
Mathematics Statistics
4 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
63 pages
Statistics Midterm Review
No ratings yet
Statistics Midterm Review
21 pages
Statisitcs
No ratings yet
Statisitcs
22 pages
Data Management
No ratings yet
Data Management
43 pages
Lecture 9
No ratings yet
Lecture 9
40 pages
Chapter 1
No ratings yet
Chapter 1
51 pages
Research
No ratings yet
Research
9 pages
Manm526 W1
No ratings yet
Manm526 W1
38 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
3RD Quarter Statistics and Probability
No ratings yet
3RD Quarter Statistics and Probability
7 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
EDA - Reviewer Midterm
No ratings yet
EDA - Reviewer Midterm
8 pages
Data Management (1)
No ratings yet
Data Management (1)
46 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
30 pages
Elementary Statisctics Reviewer
No ratings yet
Elementary Statisctics Reviewer
5 pages
719 Final Syllabus Merged
No ratings yet
719 Final Syllabus Merged
200 pages
ST8114 Module1 PartI UnivariateEDA
No ratings yet
ST8114 Module1 PartI UnivariateEDA
60 pages
Stats Midterms Cheat Sheet
No ratings yet
Stats Midterms Cheat Sheet
3 pages
Spring Semester, 2020-2021
No ratings yet
Spring Semester, 2020-2021
40 pages
ANALYST Sources
No ratings yet
ANALYST Sources
23 pages
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
No ratings yet
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
20 pages
Deck 1 - Data Types, Data Display, and Summary 2024F
No ratings yet
Deck 1 - Data Types, Data Display, and Summary 2024F
42 pages
H1.1 Definitions, Measures, Plots, CLT
No ratings yet
H1.1 Definitions, Measures, Plots, CLT
83 pages
Statistics Notes 1702100127
No ratings yet
Statistics Notes 1702100127
22 pages
Statistics For Data Analysis
No ratings yet
Statistics For Data Analysis
13 pages
Introduction Book 1
No ratings yet
Introduction Book 1
41 pages
Data Management (1) (1) - Compressed
No ratings yet
Data Management (1) (1) - Compressed
46 pages
Statistics Basics
No ratings yet
Statistics Basics
25 pages
Statistics 091147
No ratings yet
Statistics 091147
60 pages
DAVA Notes 1-1
No ratings yet
DAVA Notes 1-1
19 pages
Data Management
No ratings yet
Data Management
36 pages
SML 1 3
No ratings yet
SML 1 3
24 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
89 pages
Business Statistics - Session 1 - 3
No ratings yet
Business Statistics - Session 1 - 3
63 pages
Chapter 1 - F2021 - IE 242
No ratings yet
Chapter 1 - F2021 - IE 242
35 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
35 pages
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
No ratings yet
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
46 pages
Business Statistics I BBA 1303: Muktasha Deena Chowdhury Assistant Professor, Statistics, AUB
100% (1)
Business Statistics I BBA 1303: Muktasha Deena Chowdhury Assistant Professor, Statistics, AUB
54 pages
Tutoring Session 2023 - Statistics For Business
No ratings yet
Tutoring Session 2023 - Statistics For Business
65 pages
RESEARCH
No ratings yet
RESEARCH
9 pages
PC 2 Statistics by Praveen Mathur
No ratings yet
PC 2 Statistics by Praveen Mathur
44 pages
Stats and Maths For Data Analyst
No ratings yet
Stats and Maths For Data Analyst
23 pages
A. Variables:: Types of Distributions
No ratings yet
A. Variables:: Types of Distributions
10 pages
Week 01
No ratings yet
Week 01
71 pages
Basic Statistics
No ratings yet
Basic Statistics
90 pages
Bio Statistics
No ratings yet
Bio Statistics
72 pages
EDA - Reviewer Midterm
No ratings yet
EDA - Reviewer Midterm
9 pages
Statistics (Curso Completo)
No ratings yet
Statistics (Curso Completo)
9 pages
Chapter 5 - RM
No ratings yet
Chapter 5 - RM
22 pages
Math 140 Final Review Notes
No ratings yet
Math 140 Final Review Notes
20 pages
Business Statistics: A Decision-Making Approach: The Where, Why, and How of Data Collection
No ratings yet
Business Statistics: A Decision-Making Approach: The Where, Why, and How of Data Collection
129 pages
02data Part2
No ratings yet
02data Part2
34 pages
Mining Data Dispersion Characteristics
No ratings yet
Mining Data Dispersion Characteristics
7 pages
Unit-2 Data Analytics Approaches
No ratings yet
Unit-2 Data Analytics Approaches
24 pages
Math Notes Module 4A
No ratings yet
Math Notes Module 4A
4 pages
Lampiran Uji Wilcoxon Rifki 100719
No ratings yet
Lampiran Uji Wilcoxon Rifki 100719
4 pages
End-Of-Semester Presentation
No ratings yet
End-Of-Semester Presentation
16 pages
Preventing Child Maltreatment PDF
No ratings yet
Preventing Child Maltreatment PDF
102 pages
Translation Errors On Public Place Signboards:An Error Analysis and Translation Strategies Applied
No ratings yet
Translation Errors On Public Place Signboards:An Error Analysis and Translation Strategies Applied
17 pages
Mixed Models For Multilevel Data Analysis: An Applied Introduction
No ratings yet
Mixed Models For Multilevel Data Analysis: An Applied Introduction
28 pages
A Genre Analysis of ESP Book Reviews and Its Reflections Into Genre-Based Instruction
No ratings yet
A Genre Analysis of ESP Book Reviews and Its Reflections Into Genre-Based Instruction
15 pages
CH 07 Project Management
No ratings yet
CH 07 Project Management
9 pages
Synopsis
No ratings yet
Synopsis
12 pages
K08052 - Example Lessons Learned Submission Forms
No ratings yet
K08052 - Example Lessons Learned Submission Forms
6 pages
Crassipes) Sebagai Fitoremediasi Dalam Menurunkan
No ratings yet
Crassipes) Sebagai Fitoremediasi Dalam Menurunkan
8 pages
Experimentation On Psi-Chi Interaction: 1 Legal Stuff
No ratings yet
Experimentation On Psi-Chi Interaction: 1 Legal Stuff
9 pages
Business Statistics Consolidated Assignment-1 - 10th February
No ratings yet
Business Statistics Consolidated Assignment-1 - 10th February
21 pages
Thesis Capsule
100% (1)
Thesis Capsule
5 pages
Eapp 4TH Q Adm
No ratings yet
Eapp 4TH Q Adm
32 pages
Curriculum Planning Reflection
No ratings yet
Curriculum Planning Reflection
10 pages
CIGRE - 2012-11-07+List+of+Technical+Brochures
No ratings yet
CIGRE - 2012-11-07+List+of+Technical+Brochures
2 pages
Fundamentals of Risk Management - Understanding, Evaluating and Implementing Effective Risk Management (PDFDrive)
100% (1)
Fundamentals of Risk Management - Understanding, Evaluating and Implementing Effective Risk Management (PDFDrive)
9 pages
AIIMS-MAMC-PGI Imaging Series Radiology Diagnostic Paediatric Imaging 4th
No ratings yet
AIIMS-MAMC-PGI Imaging Series Radiology Diagnostic Paediatric Imaging 4th
334 pages
Final Transcript
No ratings yet
Final Transcript
2 pages
ELT Materials - Claims, Critiques and Controversies
No ratings yet
ELT Materials - Claims, Critiques and Controversies
15 pages
Article1380012233 - Mawere and Mawere
No ratings yet
Article1380012233 - Mawere and Mawere
10 pages
HRM Term Paper
No ratings yet
HRM Term Paper
30 pages
Final Report of The NASA Technology Readiness Assessment (TRA) Study Team
No ratings yet
Final Report of The NASA Technology Readiness Assessment (TRA) Study Team
63 pages
English Writing UVic English
No ratings yet
English Writing UVic English
63 pages
Performance Task Instruction Sheet
No ratings yet
Performance Task Instruction Sheet
2 pages
AP Language and Composition Synthesis Essay Prompt 2013
100% (2)
AP Language and Composition Synthesis Essay Prompt 2013
5 pages
UGC NET PHYSICAL EDUCATIOn
No ratings yet
UGC NET PHYSICAL EDUCATIOn
85 pages
Sample of A Scientific Literature Review
100% (2)
Sample of A Scientific Literature Review
8 pages

C3 Comm213

Uploaded by

C3 Comm213

Uploaded by

Chapter 3: Defining population and samples

LO 3.1 Defining pop and samples

Descriptive/Summary statistics: measures that describes visible component of population or

LO 3.2: Sampling methods data reduction and bias

1. Simple random sampling:

Bias in business analytics:

Types of numerical data: determine types of probability distribution

Continuous probability distributions:

Standard Normal distribution:

Z score: standardized values

Line/whisker: represent the

You might also like