Standardization & Probability: Empirical Methodologies & Theory of Science
Standardization & Probability: Empirical Methodologies & Theory of Science
probability
*
In statistics, standardization is a way to transform data so that it
becomes easier to compare different variables, especially when they’re
on different scales or units.
For example, you might have one set of data measured in kilograms and
another in centimeters—standardizing helps you compare them more
easily by putting them on the same scale.
04.10.2024 8
*
However!
Even with standardization, data may
require attentive interpretation!
• Implicit standards
• Manipulation
• Bad questions or measurements
04.10.2024 10
04.10.2024 11
And now you are interested if your kebab quality depends on the spiciness!
04.10.2024 14
04.10.2024 15
Since kebab quality and spiciness are measured on very different scales,
comparing or analyzing them directly could be misleading.
Standardization helps by transforming these values to a common scale —
one where the mean (average) is zero and the spread (standard deviation) is
the same for all the data.
04.10.2024 16
*Standardization – Why?
1. Comparability: By transforming variables to a common scale, you can compare different
datasets or features that originally have different ranges or units.
We want to compare kebab quality and spiciness score!
2. Modeling: Many machine learning (ML) algorithms work better or only when input
features are standardized because it prevents some features from dominating.
We want to be able to put it into ML algorithms (bc that’s everything today)!
3. Normalization of Distributions: Standardizing data can convert different distributions
into the same scale, which helps identify patterns or anomalies.
We want to see whether e.g. some spicy kebabs are always better!
4. Interpretation of Z-scores: Z-scores tell you how many standard deviations a data point
is from the mean. This allows to interpret its relative position within its distribution.
We want to be able if the Kebabistan kebab is really above average! ;)
04.10.2024 17
Ras’ Slide
(relevant for the exercise)
• The average height is 175 cm but
people’s height differ
• Standard deviation: How far are people’s
height from the average
• ...on average?
• (NOT!!! the exact definition but a useful
mnemonic rule)
• Full definition next time
• Approx. ⅔ of the observations lie between
+- 1 SD (Standard Deviation)
• … IF the observations follow a normal
distribution (bell curve)
04.10.2024 20
*
This is a
Standard Distribution
We want our data to
look like this (important:
look at the values!).
04.10.2024 24
* Standardization — How?
As for everything, we have a formula. For this we need variables.
1. Z is what we want to get out, the Ztandardised Zcore
2. σ (sigma) is the standard deviation.
3. X is the data point.
4. μ (mu) is the mean.
Standardization Formula
04.10.2024 25
Probability
04.10.2024 29
Probability
I love probability, it’s everywhere.
And it has the power to express uncertainty.
*
Conditional Probability
Is the probability of an event happening given that
another event has already occurred.
It's a way to update our probabilities when we
have additional information.
Task: Everyone comes up with an example of
conditional probability now (30 secs).
04.10.2024 33
Simple Probability —
What is the chance of dicing one six?
Throw: 1 2 3 4 5 6
X
04.10.2024 34
04.10.2024 35
Conditional Probability —
What is the chance of dicing two sixes?
Two 1 2 3 4 5 6
throws
1
6 X
04.10.2024 36
Let’s test! ;)
1. Go to https://www.random.org/integers/ (or scan the QR Code)
2. Make a list of X numbers between 1 and 365, organized in a single column;
X = number of people in this room
3. It should look like this (to the right).
04.10.2024 39
Thanks! :)