0% found this document useful (0 votes)

12 views43 pages

Standardization & Probability: Empirical Methodologies & Theory of Science

Uploaded by

Sophia Lindholm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views43 pages

Standardization & Probability: Empirical Methodologies & Theory of Science

Uploaded by

Sophia Lindholm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Standardization &

probability

Empirical Methodologies & Theory of Science

04.10.2024 2

What we’re doing today

1. Standardization
a. What?
b. Why?
c. How?
2. Probability
3. Worked example
04.10.2024 3

What we’re doing today

1. Standardization
a. What?
b. Why?
c. How?
2. Probability
3. Worked example
04.10.2024 4

What do you think is Standardization?

04.10.2024 5

N: “How was your kebab?”

R: “It was a 5!”
> Raw scores make little sense
> Standardization helps!
04.10.2024 6

Standardization is the process of establishing and applying a set of

uniform criteria, guidelines, or practices to ensure consistency and
quality across different entities, processes, or products.
In industry, the goal of standardization is to ensure that products,
services, or processes are reliable, compatible, and understandable.
We can translate that to statistics.
04.10.2024 7

*
In statistics, standardization is a way to transform data so that it
becomes easier to compare different variables, especially when they’re
on different scales or units.
For example, you might have one set of data measured in kilograms and
another in centimeters—standardizing helps you compare them more
easily by putting them on the same scale.
04.10.2024 8

N: “How was your kebab?”

R: “It was a 5 out of 5!”
04.10.2024 9

*
However!
Even with standardization, data may
require attentive interpretation!

• Implicit standards
• Manipulation
• Bad questions or measurements
04.10.2024 10
04.10.2024 11

What we’re doing today

1. Standardization
a. What?
b. Why?
c. How?
2. Probability
3. Worked example
04.10.2024 12

Why do you think, we do Standardization?

04.10.2024 13

Imagine you have two types of measurements:

• Kebab quality measured in stars: 0 to 5.
• Spiciness measured in scoville: 0 to 9 mio.

And now you are interested if your kebab quality depends on the spiciness!
04.10.2024 14
04.10.2024 15

Since kebab quality and spiciness are measured on very different scales,
comparing or analyzing them directly could be misleading.
Standardization helps by transforming these values to a common scale —
one where the mean (average) is zero and the spread (standard deviation) is
the same for all the data.
04.10.2024 16

*Standardization – Why?
1. Comparability: By transforming variables to a common scale, you can compare different
datasets or features that originally have different ranges or units.
We want to compare kebab quality and spiciness score!
2. Modeling: Many machine learning (ML) algorithms work better or only when input
features are standardized because it prevents some features from dominating.
We want to be able to put it into ML algorithms (bc that’s everything today)!
3. Normalization of Distributions: Standardizing data can convert different distributions
into the same scale, which helps identify patterns or anomalies.
We want to see whether e.g. some spicy kebabs are always better!
4. Interpretation of Z-scores: Z-scores tell you how many standard deviations a data point
is from the mean. This allows to interpret its relative position within its distribution.
We want to be able if the Kebabistan kebab is really above average! ;)
04.10.2024 17

Real Life Example Kebab Standard kebab Standardized

Kebab Place
score formula values
On the right you see my kebab ratings of
Kösem 3.6
Nørrebrogade places and an estimated
Kebabistan 3.4
spiciness.
Dürüm synfonie 3.6
Now, we’ll do some stats! Kebab bar 2.0
1. Calculate the mean for the scores. Fuldkorn kebab 2.8
2. Calculate the difference to the mean for Ramo's 3.0
each observation. Gaza grill 3.2
3. Calculate the mean of the differences — Durum bar 3.0
that’s the standard deviation!
Flamingo 3.0
(ABSOLUTE DIFFs = UDEN FORTEGN) Berlin Döner
2.4
Kebab
04.10.2024 18

Ras’ Slide (relevant for the exercise)

Højde Afstand til gennemsnit (Højde – gns)

177,0 177,0 - 170 = 7,0

• Calculate the difference to the mean 159,5 159,5 - 170 = 10,5

for these observations 182,0 182,0 - 170 = 12

159,9 159,9 - 170 = 10,1

• Now: Calculate the average of the
170,5 170,5 - 170 = 0,5
differences
166,9 166,9 - 170 = 3,1
• (ABSOLUTE DIFFs = UDEN FORTEGN)
163,0 163,0 - 170 = 7,0
• I get 9,62 cm 152,8 152,8 - 170 = 17,2

190,5 190,5 - 170 = 20,5

178,3 178,3 - 170 = 8,3

Mean: 170,04 Mean: 9.62

04.10.2024 19

Ras’ Slide
(relevant for the exercise)
• The average height is 175 cm but
people’s height differ
• Standard deviation: How far are people’s
height from the average
• ...on average?
• (NOT!!! the exact definition but a useful
mnemonic rule)
• Full definition next time
• Approx. ⅔ of the observations lie between
+- 1 SD (Standard Deviation)
• … IF the observations follow a normal
distribution (bell curve)
04.10.2024 20

• The mean kebab score is 3 stars but kebab scores differ!

• Standard deviation: How far are kebab scores from the mean
• ...on average?
• Kebab score have a standard deviation of 0.36.
• We write σ (sigma) = 0.36.
04.10.2024 21

What we’re doing today

1. Standardization
a. What?
b. Why?
c. How?
2. Probability
3. Worked example
04.10.2024 22

How do you think, we do Standardization?

04.10.2024 23

*
This is a
Standard Distribution
We want our data to
look like this (important:
look at the values!).
04.10.2024 24

* Standardization — How?
As for everything, we have a formula. For this we need variables.
1. Z is what we want to get out, the Ztandardised Zcore
2. σ (sigma) is the standard deviation.
3. X is the data point.
4. μ (mu) is the mean.
Standardization Formula
04.10.2024 25

Kebab Kebab score Standard kebab formula Standardized values

Kösem 3.6 Z = (3.6 − 3)/0.36 3.6 → 1.67

Kebabistan 3.4 Z = (3.4 − 3)/0.36 3.4 → 1.11

Dürüm synfonie 3.6 Z = (3.6 − 3)/0.36 3.6 → 1.67

Kebab bar 2.0 Z = (2.0 − 3)/0.36 2.0 → -2.77

Fuldkorn kebab 2.8 Z = (2.8 − 3)/0.36 2.8 → -0.55

Ramo's 3.0 Z = (3.0 − 3)/0.36 3.0 → 0.00

Gaza grill 3.2 Z = (3.2 − 3)/0.36 3.2 → 0.55

Durum bar 3.0 Z = (3.0 − 3)/0.36 3.0 → 0.00

Flamingo 3.0 Z = (3.0 − 3)/0.36 3.0 → 0.00

Berlin Döner Kebab 2.4 Z = (2.4 − 3)/0.36 2.4 → -1.67

04.10.2024 26

And if we would plot this

now, it’d look like this:
(And if we’d do the same to
the scoville values, we’d have
the same shape and can
compare them :))
04.10.2024 27

What we’re doing today

1. Standardization
a. What?
b. Why?
c. How?
2. Probability
3. Worked example
28

Probability
04.10.2024 29

Probability
I love probability, it’s everywhere.
And it has the power to express uncertainty.

Traditional examples are usually a coin, dice, or drawings cards.

But there are many more examples!

Task: Everyone comes up with an example now (30 secs).

04.10.2024 30

*How do we calculate probability?

3. Sample Space (S)
The sample space is the set of all possible outcomes.
E.g., for rolling a six-sided die, the sample space is 𝑆 = {1,2,3,4,5,6}.
4. Event Space (E)
An event is a specific outcome and the event space a group of outcomes.
E.g., for a die roll, an event could be rolling an even number (𝐸={2,4,6}).
• In real examples mapping event spaces can be tricky (as you’ll see in a sec).
5. Probability (P)
Probability is a measure of how likely an event is to occur. It is defined as: 𝑃
(𝐸) = Number of outcomes (of interest) / Total number of possible outcomes
This is called Simple Probability.
04.10.2024 31

Example of Simple Probability

Is it my birthday today?
Sample space: {1, 2, …, 365}
Event space: {X}
Theoretical Probability: 1 / 365 = 0.0027
04.10.2024 32

*
Conditional Probability
Is the probability of an event happening given that
another event has already occurred.
It's a way to update our probabilities when we
have additional information.
Task: Everyone comes up with an example of
conditional probability now (30 secs).
04.10.2024 33

Simple Probability —
What is the chance of dicing one six?
Throw: 1 2 3 4 5 6
X
04.10.2024 34
04.10.2024 35

Conditional Probability —
What is the chance of dicing two sixes?
Two 1 2 3 4 5 6
throws
1

6 X
04.10.2024 36

What we’re doing today

1. Standardization
a. What?
b. Why?
c. How?
2. Probability
3. Worked example
04.10.2024 37

A Classic Example: The Birthday Problem

What is the chance that two people in here have
the same birthday?
• Not the same age, just the same day ;)
• Can be any day in the year!
Task: Discuss for 3 minutes!
04.10.2024 38

Let’s test! ;)
1. Go to https://www.random.org/integers/ (or scan the QR Code)
2. Make a list of X numbers between 1 and 365, organized in a single column;
X = number of people in this room
3. It should look like this (to the right).
04.10.2024 39

• How many of you had no duplicate numbers?

• How does that reflect on your estimate for the minimum
number required for two to share a birthday?
04.10.2024 40

TIME FOR KNIME

04.10.2024 41

Opposite Approach (no shared birthdays)

It's easier to first calculate the opposite: that no one shares a birthday, and then subtract it
from 1 (1 is always 100%, all possibilities).
Imagine everyone standing in a row. The very first person A cannot share a birthday with
themselves. Person B must have a different birthday from person A. So, there are 364
available. The probability of not sharing a birthday with the person A is 364 / 365.
For person C, they must have a different birthday from person A and B, so there are 363
days left. The probability of no shared birthday for the third person is 363 / 365.
So we write down 364 / 365 * 363 / 365 … until we’ve reached the end of the row of people.
In other words, until we’ve reached X people. For X = 23, P(no shared birthday) ≈ 0.4927.
So P(shared birthday for 23 people) = 1 - P(no shared birthday) ≈ 1 - 0.4927 = 0.5173
04.10.2024 42
04.10.2024 43

Thanks! :)

Norms and Basic Statistics For Testing
No ratings yet
Norms and Basic Statistics For Testing
26 pages
A Level Maths - Statistics Revision Notes
No ratings yet
A Level Maths - Statistics Revision Notes
9 pages
Pipeline RA and RLA - Cinta A To Cinta B - Oil
No ratings yet
Pipeline RA and RLA - Cinta A To Cinta B - Oil
90 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
MATH 6 Q4 Module 8
100% (1)
MATH 6 Q4 Module 8
17 pages
Nora Roberts Stakleni Otok PDF
0% (1)
Nora Roberts Stakleni Otok PDF
85 pages
SPC Awareness Training
No ratings yet
SPC Awareness Training
70 pages
Hypothesis Testing - A Visual Introduction To Statistical Significance
100% (4)
Hypothesis Testing - A Visual Introduction To Statistical Significance
137 pages
P3 Normal Curve
100% (2)
P3 Normal Curve
8 pages
STA1007 Notes
No ratings yet
STA1007 Notes
251 pages
Ahli K3 Umum - Engelhard Sianturi Dandel
100% (1)
Ahli K3 Umum - Engelhard Sianturi Dandel
5 pages
Point Process Calculus in Time and Space An Introduction With Applications
No ratings yet
Point Process Calculus in Time and Space An Introduction With Applications
562 pages
LAS 2 Distinguishing Between A Discrete and A Continuous Random Variable
100% (2)
LAS 2 Distinguishing Between A Discrete and A Continuous Random Variable
3 pages
Assignment 1
100% (1)
Assignment 1
15 pages
Statistical Data Analysis
No ratings yet
Statistical Data Analysis
124 pages
L03 ECO220 Print
No ratings yet
L03 ECO220 Print
15 pages
Intro To Probability and Statistics
No ratings yet
Intro To Probability and Statistics
147 pages
Lecture 2 Statistics For QA and QC
No ratings yet
Lecture 2 Statistics For QA and QC
58 pages
Chap 004 B
No ratings yet
Chap 004 B
61 pages
1 Intro-Statistics
No ratings yet
1 Intro-Statistics
61 pages
SPSC Final Chapter-4-1-1-3-1-1
No ratings yet
SPSC Final Chapter-4-1-1-3-1-1
63 pages
Statistics and Probability
No ratings yet
Statistics and Probability
43 pages
AYURSURE (Research and Stat) 4
No ratings yet
AYURSURE (Research and Stat) 4
44 pages
Love李 year 6 statistic
No ratings yet
Love李 year 6 statistic
57 pages
Assumption - 16 - Oct18
No ratings yet
Assumption - 16 - Oct18
48 pages
Lecture Note
No ratings yet
Lecture Note
124 pages
Statistics
No ratings yet
Statistics
36 pages
Module 3-FDS
No ratings yet
Module 3-FDS
52 pages
ASOM Chapter12 B
No ratings yet
ASOM Chapter12 B
47 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
51 pages
Normal LectureNote
No ratings yet
Normal LectureNote
48 pages
Prob & Stats (Slides) PDF
No ratings yet
Prob & Stats (Slides) PDF
101 pages
002 Probability-and-Statistics-Part-1-Data
No ratings yet
002 Probability-and-Statistics-Part-1-Data
84 pages
Biostatistics Unit 5. Measure of Skew
No ratings yet
Biostatistics Unit 5. Measure of Skew
38 pages
Statistics For Business and Economics,: 11E Anderson/Sweeney/Williams
No ratings yet
Statistics For Business and Economics,: 11E Anderson/Sweeney/Williams
39 pages
SBE12 CH 03 B
No ratings yet
SBE12 CH 03 B
40 pages
9-3 Basics of Statistics: Unit 9 Probability and Mathematical Induction
No ratings yet
9-3 Basics of Statistics: Unit 9 Probability and Mathematical Induction
16 pages
Chapter 06
No ratings yet
Chapter 06
55 pages
RMP470S Lecture 7 - One-Dimensionalstatistics
No ratings yet
RMP470S Lecture 7 - One-Dimensionalstatistics
27 pages
PowerPoint CH 03b
No ratings yet
PowerPoint CH 03b
50 pages
Chapter 3B
No ratings yet
Chapter 3B
50 pages
Descriptive Statistics Numerical Measures
No ratings yet
Descriptive Statistics Numerical Measures
49 pages
Engineering Data Analysis (Report)
No ratings yet
Engineering Data Analysis (Report)
18 pages
Lecture 1
No ratings yet
Lecture 1
57 pages
4 - Stat - Measures of Variation 2021
No ratings yet
4 - Stat - Measures of Variation 2021
26 pages
Statistics 101: Introduction To Data Management
No ratings yet
Statistics 101: Introduction To Data Management
37 pages
4 - Stat - Measures of Variation 2024
No ratings yet
4 - Stat - Measures of Variation 2024
27 pages
Graphics, Tables and Basic Statistics (Chapter 3) : Lecture Objectives
No ratings yet
Graphics, Tables and Basic Statistics (Chapter 3) : Lecture Objectives
37 pages
Statistics For Business and Economics: Anderson Sweeney Williams
No ratings yet
Statistics For Business and Economics: Anderson Sweeney Williams
50 pages
Introduction To Normal Distribution
No ratings yet
Introduction To Normal Distribution
15 pages
LQ1 Notes
No ratings yet
LQ1 Notes
15 pages
Week 11 Probability and Statistics
No ratings yet
Week 11 Probability and Statistics
27 pages
Statistics Class Notes
No ratings yet
Statistics Class Notes
9 pages
2.5 - Normal Distribution
No ratings yet
2.5 - Normal Distribution
10 pages
q3 Sum1 Statprob Answer-Key
No ratings yet
q3 Sum1 Statprob Answer-Key
3 pages
Chap 4
No ratings yet
Chap 4
7 pages
Chapter 10 Booklet v1
No ratings yet
Chapter 10 Booklet v1
12 pages
3 Statistical Distribution Functions
No ratings yet
3 Statistical Distribution Functions
4 pages
Mathmetics On Normal Distribution
No ratings yet
Mathmetics On Normal Distribution
10 pages
Have You Ever : Shot A Rifle? Played Darts? Played Basketball?
No ratings yet
Have You Ever : Shot A Rifle? Played Darts? Played Basketball?
39 pages
The Binomial Expansion The Binomial and Normal Distribution: Objectives
No ratings yet
The Binomial Expansion The Binomial and Normal Distribution: Objectives
38 pages
Normal Distribution
No ratings yet
Normal Distribution
9 pages
Normal Distribution: It Can Be Spread Out More On The Left or More On The Right
No ratings yet
Normal Distribution: It Can Be Spread Out More On The Left or More On The Right
10 pages
Answers For Data Screening in Spss Part One
No ratings yet
Answers For Data Screening in Spss Part One
4 pages
ISDS 361A - Cheat Sheet Exam 1 PDF
No ratings yet
ISDS 361A - Cheat Sheet Exam 1 PDF
2 pages
Normal Distribution
No ratings yet
Normal Distribution
6 pages
Bayesian Statistics Explained in Simple English For Beginners PDF
100% (1)
Bayesian Statistics Explained in Simple English For Beginners PDF
19 pages
TEST 2, CMTH 380, Summer 2022
No ratings yet
TEST 2, CMTH 380, Summer 2022
14 pages
Alexander 1983 Limitations Survey Techniques
No ratings yet
Alexander 1983 Limitations Survey Techniques
11 pages
13A Preparation For Exam
No ratings yet
13A Preparation For Exam
16 pages
Probability and Counting Techniques: Marnielle Salig Lecturer
No ratings yet
Probability and Counting Techniques: Marnielle Salig Lecturer
56 pages
Stat Proof Book
No ratings yet
Stat Proof Book
660 pages
Probability Final Exam
No ratings yet
Probability Final Exam
12 pages
Cl. 12 Maths Lab Activity 7,8 and 9.
No ratings yet
Cl. 12 Maths Lab Activity 7,8 and 9.
7 pages
Mathematics: Specific Objectives
No ratings yet
Mathematics: Specific Objectives
7 pages
General Probability Exercises
No ratings yet
General Probability Exercises
28 pages
7A Recap of Programming in JavaScript
No ratings yet
7A Recap of Programming in JavaScript
45 pages
9A Complexity of Programs:Functions
No ratings yet
9A Complexity of Programs:Functions
40 pages
Lecture 7-Prototyping II
No ratings yet
Lecture 7-Prototyping II
37 pages
9 2 MultipleRegression
No ratings yet
9 2 MultipleRegression
71 pages
7B Arrays Vs String & Programming With Loops and Arrays
No ratings yet
7B Arrays Vs String & Programming With Loops and Arrays
51 pages
8A Programming Arrays With Map, Filter and Reduce
No ratings yet
8A Programming Arrays With Map, Filter and Reduce
51 pages
8B Recap of Array Programming With Loops and Map, Reduce, Filter
No ratings yet
8B Recap of Array Programming With Loops and Map, Reduce, Filter
49 pages
8 2 Correlations+models Ninell
No ratings yet
8 2 Correlations+models Ninell
44 pages
(PDF) Probability Mastering Permutations and Combinations Tons of Examples
No ratings yet
(PDF) Probability Mastering Permutations and Combinations Tons of Examples
4 pages
Normal Random Variables
No ratings yet
Normal Random Variables
29 pages
8 1 Categorical Data Ninell
No ratings yet
8 1 Categorical Data Ninell
26 pages
Probability Modified PDF
No ratings yet
Probability Modified PDF
23 pages
11B Recursive Functions and Back Tracking
No ratings yet
11B Recursive Functions and Back Tracking
25 pages
10B Tree Data Structure and Intro To Recursion
No ratings yet
10B Tree Data Structure and Intro To Recursion
23 pages
2 2 Teory in Science Ras
No ratings yet
2 2 Teory in Science Ras
23 pages
3 1 Paradigms in Communication Ras
No ratings yet
3 1 Paradigms in Communication Ras
20 pages
2 1 Ontology Ras
No ratings yet
2 1 Ontology Ras
20 pages
9B Gentle Introduction To Objects
No ratings yet
9B Gentle Introduction To Objects
19 pages
1 2 Scientific Paradigms
No ratings yet
1 2 Scientific Paradigms
19 pages
Example Probability
No ratings yet
Example Probability
2 pages
10A Implementing Dictionaries by Map
No ratings yet
10A Implementing Dictionaries by Map
17 pages
3 - 2 - Positivism and Postpositivism
No ratings yet
3 - 2 - Positivism and Postpositivism
17 pages
Assignment # 2
No ratings yet
Assignment # 2
1 page
PLAT and Backtesting 1735006712
No ratings yet
PLAT and Backtesting 1735006712
13 pages
STS 181 2019-2020 Session
No ratings yet
STS 181 2019-2020 Session
19 pages
Lesson - Plan in General Mathematics
No ratings yet
Lesson - Plan in General Mathematics
5 pages
Assignment 4 - 6 Numerical Analysis
No ratings yet
Assignment 4 - 6 Numerical Analysis
4 pages
Sampling With Replacement - Definition
No ratings yet
Sampling With Replacement - Definition
1 page
Bernoulli Distribution (From
No ratings yet
Bernoulli Distribution (From
2 pages
Examples On Continuous Variables Expected Value
No ratings yet
Examples On Continuous Variables Expected Value
4 pages
Understanding Checksums and Cyclic Redundancy Checks
From Everand
Understanding Checksums and Cyclic Redundancy Checks
Philip Koopman
No ratings yet
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)

Standardization & Probability: Empirical Methodologies & Theory of Science

Uploaded by

Standardization & Probability: Empirical Methodologies & Theory of Science

Uploaded by

Standardization &

Empirical Methodologies & Theory of Science

What we’re doing today

What we’re doing today

What do you think is Standardization?

N: “How was your kebab?”

Standardization is the process of establishing and applying a set of

N: “How was your kebab?”

What we’re doing today

Why do you think, we do Standardization?

Imagine you have two types of measurements:

Real Life Example Kebab Standard kebab Standardized

Ras’ Slide (relevant for the exercise)

177,0 177,0 - 170 = 7,0

• Calculate the difference to the mean 159,5 159,5 - 170 = 10,5

for these observations 182,0 182,0 - 170 = 12

159,9 159,9 - 170 = 10,1

190,5 190,5 - 170 = 20,5

178,3 178,3 - 170 = 8,3

Mean: 170,04 Mean: 9.62

• The mean kebab score is 3 stars but kebab scores differ!

What we’re doing today

How do you think, we do Standardization?

Kebab Kebab score Standard kebab formula Standardized values

Kösem 3.6 Z = (3.6 − 3)/0.36 3.6 → 1.67

Kebabistan 3.4 Z = (3.4 − 3)/0.36 3.4 → 1.11

Dürüm synfonie 3.6 Z = (3.6 − 3)/0.36 3.6 → 1.67

Kebab bar 2.0 Z = (2.0 − 3)/0.36 2.0 → -2.77

Fuldkorn kebab 2.8 Z = (2.8 − 3)/0.36 2.8 → -0.55

Ramo's 3.0 Z = (3.0 − 3)/0.36 3.0 → 0.00

Gaza grill 3.2 Z = (3.2 − 3)/0.36 3.2 → 0.55

Durum bar 3.0 Z = (3.0 − 3)/0.36 3.0 → 0.00

Flamingo 3.0 Z = (3.0 − 3)/0.36 3.0 → 0.00

Berlin Döner Kebab 2.4 Z = (2.4 − 3)/0.36 2.4 → -1.67

And if we would plot this

What we’re doing today

Traditional examples are usually a coin, dice, or drawings cards.

Task: Everyone comes up with an example now (30 secs).

*How do we calculate probability?

Example of Simple Probability

What we’re doing today

A Classic Example: The Birthday Problem

• How many of you had no duplicate numbers?

TIME FOR KNIME

Opposite Approach (no shared birthdays)

You might also like