0% found this document useful (0 votes)

24 views61 pages

01 ML Basics

The document provides an introduction to machine learning in the context of physics, covering basic concepts, applications, and various modeling approaches. It discusses the significance of supervised learning, classification, and regression, as well as specific techniques like logistic regression and k-nearest neighbors. Additionally, it highlights the challenges and successes of applying machine learning in fields such as particle physics, healthcare, and image recognition.

Uploaded by

Nguyen Dat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views61 pages

01 ML Basics

Uploaded by

Nguyen Dat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Introduction to Machine Learning in Physics:

1. Machine Learning Basics

Klaus Reygers

Seminar on Advanced Analysis Methods for Heavy-Ion Data, SS 2021

1
Exercises

I Exercise 1: Air shower classification (MAGIC telescope)

I Logistic regression
I 03_ml_basics_ex01_magic.ipynb
I Exercise 2: Hand-written digit recognition with logistic regression
I Logistic regression
I 03_ml_basics_ex02_mnist_softmax_regression.ipynb

2
What is machine learning? (1)

3
What is machine learning? (2)
“Machine learning is the subfield of computer science that gives computers the ability to
learn without being explicitly programmed” – Wikipedia

Example: spam detection J. Mayes, Machine learning 101

Manual feature engineering vs. automatic feature detection 4

AI, ML, and DL
“AI is the study of how to make computers perform things that, at the moment, people
do better.” Elaine Rich, Artificial intelligence, McGraw-Hill 1983
G. Marcus, E. Davis, Rebooting AI

“deep” in deep learning: artificial neural nets with many neurons and multiple layers of
nonlinear processing units for feature extraction
5
Multivariate analysis: An early example from particle physics
I Signal: e + e − → W + W −
I often 4 well separated hadron
jets
I Background: e + e − → qqgg
I 4 less well separated hadron jets
I Input variables based on jet
structure, event shape, . . . none
by itself gives much separation.

(Garrido, Juste and Martinez, ALEPH 96-144)

6
Applications of machine learning in physics

I Particle physics: Particle identification / classification

I Astronomy: Galaxy morphology classification
I Chemistry and material science: predict properties of new molecules / materials
I Many-body quantum matter: classification of quantum phases

Machine learning and the physical sciences, arXiv:1903.10563

7
Applying ML techniques in other fields, e.g., healthcare

"ML has accomplished wonders . . . on well posed-problems where the notion of a

‘solution’ is well-defined and solutions are verifyable.

Healthcare is different - problems are not well posed and the notion of a solution is
often not well-defined and solutions are hard to verify"
Mihaela van der Schaar, ICML 2020: Automated ML and its transformative impact on medicine and healthcare

I believe for many interesting problems in physics the situation is similar.

8
Some successes and unsolved problems in AI
Impressive progress in certain fields:
I Image recognition
I Speech recognition
I Recommendation systems
I Automated translation
I Analysis of medical data

How can we profit from these developments

in physics?

M. Woolridge, The road to conscious machines

9
The deep learning hype – why now?

Artificial neural networks are around for decades. Why did deep learning take off after
2012?

I Improved hardware – graphical processing units [GPUs]

I Large data sets (e.g. images) distributed via the Internet
I Algorithmic advances

10
Different modeling approaches

I Simple mathematical representation like linear regression. Favored by statisticians.

I Complex deterministic models based on scientific understanding of the physical
process. Favored by physicists.
I Complex algorithms to make predictions that are derived from a huge number of
past examples (“machine learning” as developed in the field of computer science).
These are often black boxes.
I Regression models that claim to reach causal conclusions. Used by economists.
D. Spiegelhalter, The Art of Statistics – Learning from data

11
Machine learning: The “hello world” problem

Recognition of handwritten digits

I MNIST database (Modified
National Institute of Standards
and Technology database)
I 60,000 training images and 10,000
testing images labeled with correct
answer
I 28 pixel x 28 pixel
I Algorithms have reached
“near-human performance” https://en.wikipedia.org/wiki/MNIST_database

I Smallest error rate (2018): 0.18%

12
Machine learning: Image recognition
ImageNet database
I 14 million images, 22,000 categories
I Since 2010, the annual ImageNet Large Scale Visual Recognition Challenge
(ILSVRC): 1.4 million images, 1000 categories
I In 2017, 29 of 38 competing teams got less than 5% wrong

13
ImageNet: Large Scale Visual Recognition Challenge

O. Russakovsky et al, arXiv:1409.0575

14
Adversarial attack

Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy, arXiv:1412.6572v1

15
Types of machine learning

Reinforcement learning LeCun 2018, Power And Limits of Deep Learning

I The machine (“the agent”) predicts a scalar reward

given once in a while
I Weak feedback
Supervised learning
I The machine predicts a category based on labeled
training data
I Medium feedback
Unsupervised learning
I Describe/find hidden structure from “unlabeled”
data
I Cluster data in different sub-groups with similar
properties

16
Books on machine learning
Ian Goodfellow and Yoshua Bengio and Aaron Courville, Deep Learning,
free online http://www.deeplearningbook.org/

Aurelien Geron, Hands-On Machine Learning with Scikit-Learn and

TensorFlow

Francois Chollet, Deep Learning with Python

17
Papers

A high-bias, low-variance introduction to Machine Learning for physicists

https://arxiv.org/abs/1803.08823

Machine learning and the physical sciences

https://arxiv.org/abs/1903.10563

A Living Review of Machine Learning for Particle Physics

https://iml-wg.github.io/HEPML-LivingReview/

18
Supervised learning in a nutshell
I Supervised Machine Learning requires labeled training data, i.e., a training sample
where for each event it is known whether it is a signal or background event.
I Each event is characterized by n observables: ~x = (x1 , x2 , ..., xn ) "feature vector"

I Design function y (~x , w

~ ) with adjustable parameters w
~
I Design a loss function
I Find best parameters which minimize loss
19
Supervised learning: classification and regression

The codomain Y of the function y: X → Y can be a set of labels or classes or a

continuous domain, e.g., R

I Y = finite set of labels → classification

I binary classification: Y = {0, 1}
I multi-class classification: Y = {c1 , c2 , ..., cn }
I Y = real numbers → regression

"All the impressive achievements of deep learning amount to just curve fitting"

J. Pearl, Turing Award Winner 2011

To Build Truly Intelligent Machines, Teach Them Cause and Effect, Quantamagazine

20
Classification: Learning decision boundaries

21
Supervised learning: Training, validation, and test sample

I Decision boundary fixed with training sample

I Performance on training sample becomes better with more iterations
I Danger of overtraining: Statistical fluctuations of the training sample will be learnt
I Validation sample = independent labeled data set not used for training → check for
overtraining
I Sign of overtraining: performance on validation sample becomes worse → Stop
training when signs of overtraining are observed (early stopping)
I Performance: apply classifier to independent test sample
I Often: test sample = validation sample (only small bias)

22
Supervised learning: Cross validation
Rule of thumb if training data not expensive
I Training sample: 50% Often test sample = validation
sample (bias is rather small)
I Validation sample: 25%
I Test sample: 25%

Cross validation (efficient use of scarce training data)

I Split training sample in k independent subset
Tk of the full sample T
I Train on T \ Tk resulting in k different
classifiers
I For each training event there is one classifier
that didn’t use this event for training
I Validation results are then combined
23
Often used loss functions

Square error loss:

~ ) − t)2
~ ), t) = (y (~x , w
E (y (~x , w
I often used in regression

Cross entropy:
I t ∈ {0, 1} ~ ), t) = − t log y (~x , w
E (y (~x , w ~)
I y (~x , w
~ ): predicted probability for − (1 − t) log(1 − y (~x , w
~ ))
outcome t = 1
I often used in classification

24
More on entropy
I Self-information of an event x : I(x ) = − log p(x )
I in units of nats (1 nat = information gained by observing an event of probability 1/e)

I Shannon entropy: H(P) = −

P
pi log pi
I Expected amount of information in an event drawn from a distribution P
I Measure of the minimum of amount of bits needed on average to encode symbols
drawn from a distribution

I Cross entropy: H(P, Q) = −E [log Q] = −

P
pi log qi
I Can be interpreted as a measure of the amount of bits needed when a wrong
distribution Q is assumed while the data actually follows a distribution P
I Measure of dissimilarity between distributions P and Q (i.e, a measure of how well the
model Q describes the true distribution P)

25
Hypothesis testing

test statistic
I a (usually scalar) variable which is
a function of the data alone that
can be used to test hypotheses
I example: χ2 w.r.t. a theory curve

B ≡ α: “background efficiency”, i.e., prob. to misclassify bckg. as signal

S ≡ 1 − β: “signal efficiency”
H0 is true H0 is false (i.e., H1 is true)
H0 is rejected Type I error (α) Correct decision (1 − β)
H0 is not rejected Correct decision (1 − α) Type II error (β)

26
Neyman-Pearson Lemma
The likelihood ratio

f (~x |H1 )
t(~x ) =
f (~x |H0 )
is an optimal test statistic, i.e., it provides highest “signal efficiency” 1 − β for a given
“background efficiency” α. Accept hypothesis if t(~x ) > c.

Problem: the underlying pdf’s are almost never known explicitly.

Two approaches
1. Estimate signal and background pdf’s and construct test statistic based on
Neyman-Pearson lemma
2. Decision boundaries determined directly without approximating the pdf’s (linear
discriminants, decision trees, neural networks, . . . )

27
Estimating PDFs from Histograms?

approximate PDF by N(x , y |S) and N(x , y |B)

M bins per variable in d dimensions: M d cells→ hard to generate enough training data
(often not practical for d > 1)
In general in machine learning, problems related to a large number of dimensions of the
feature space are referred to as the "curse of dimensionality"
28
Naïve Bayesian Classifier (also called “Projected Likelihood Classification”)
Application of the Neyman-Pearson lemma (ignoring correlations between the xi ):

f (x1 , x2 , ..., xn ) approximated as L = f1 (x1 ) · f2 (x2 ) · ... · fn (xn )

Z
where f1 (x1 ) = dx2 dx3 ...dxn f (x1 , x2 , ..., xn )
Z
f2 (x2 ) = dx1 dx3 ...dxn f (x1 , x2 , ..., xn )
..
.

Classification of feature vector x :

Ls (~x ) 1
y (~x ) = =
Ls (~x ) + Lb (~x ) 1 + Lb (~x )/Ls (~x )

Performance not optimal if true PDF does not factorize

29
k-Nearest Neighbor Method (1)

k-NN classifier:
I Estimates probability density around the input vector
I p(~x |S) and p(~x |B) are approximated by the number of signal and background
events in the training sample that lie in a small volume around the point ~x

Algorithms finds k nearest neighbors:

k = ks + kb

Probability for the event to be of signal type:

ks (~x )
ps (~x ) =
ks (~x ) + kb (~x )

30
k-Nearest Neighbor Method (2)
Simplest choice for distance measure in feature space
is the Euclidean distance:

R = |~x − ~y |

Better: take correlations between variables into

account:
q
R= (~x − ~y )T V −1 (~x − ~y )
V = covariance matrix, R = "Mahalanobis distance"

The k-NN classifier has best performance when the boundary that separates signal and
background events has irregular features that cannot be easily approximated by
parametric learning methods.

31
Fisher Linear Discriminant
Linear discriminant is simple. Can still be optimal if amount of training data is limited.
Ansatz for test statistic:
n
X
y (~x ) = ~ |~x
wi xi = w
i=1
Choose parameters wi so that separation between signal and background distribution is
maximum.
Need to define “separation”.

Fisher: maximize
(τs − τb )2
J(~
w) =
Σ2s + Σ2b

32
Fisher Linear Discriminant: Determining the Coefficients wi

Coefficients are obtained from:

∂J
=0
∂wi

Linear decision boundaries

Weight vector w~ can be interpreted as a direction in

feature space onto which the events are projected.

33
Linear regression revisited

"Galton family heights data": I data: {xi , yi }

origin of the term "regression"
I objective: predict y = f (x )
80 linear fit ~ = mx + b, θ~ = (m, b)
I model: f (x ; θ)
y=x
75 I loss function:
Son's height (inches)

J(θ|x , y ) = N1 N
i=1 (yi − f (xi ))
2
P
70

65
I model training: optimal parameters
ˆ
θ~ = arg min J(θ)
~
60

55
60.0 62.5 65.0 67.5 70.0 72.5 75.0 77.5 80.0
Father's height (inches)

34
Linear regression
I Data: vectors with p components (“features”): ~x = (x1 , ..., xp )
I n observations: {~xi , yi }, i = 1, ..., n
I Prediction for given vector x :

~ |~x
y = w0 + w1 x1 + w2 x2 + ... + wp xp ≡ w where x0 := 1

I Find weights that minimze loss function:

n
~ˆ = min
X
w w |~xi − yi )2
(~
~
w
i=1

I In case of linear regression closed-form solution exists:

~ˆ = (X | X )−1 X |~y
w where X ∈ Rn×p

I X is called the design matrix, row i of X is ~xi

35
Linear regression with regularization

I Standard loss function

ŵ
n ŵ
X
| 2 w2 w2
C (~
w) = w ~xi − yi )
(~
i=1
t t
I Ridge regression w1 w1

n
X
C (~
w) = w |~xi − yi )2 + λ|~
(~ w |2 LASSO Ridge

i=1
LASSO regression tends to give sparse solutions
I LASSO regression (many components wj = 0). This is why LASSO
regression is also called sparse regression.
n
X
C (~
w) = w |~xi − yi )2 + λ|~
(~ w|
i=1

36
Logistic regression (1)

I Consider binary classification task, e.g., yi ∈ {0, 1}

I Objective: Predict probability for outcome y = 1 given an observation ~x
I Starting with linear “score”

~ |~x
s = w0 + w1 x1 + w2 x2 + ... + wp xp ≡ w

I Define function that translates s into a quantity that has the properties of a
probability
1
σ(s) =
1 + e −s
I We would like to determine the optimal weights for a given training data set. They
result from the maximum-likelihood principle.

37
Logistic regression (2)
I Consider feature vector ~x . For a given set of weights w
~ the model predicts
I a probability p(1|~ w |~x ) for outcome y = 1
w ) = σ(~
I a probabiltiy p(0|~
w ) = 1 − σ(~w |~x ) for outcome y = 0
I The probability p(yi |~ w ) = p(yi |~
w ) defines the likelihood Li (~ w ) (the likelihood is a
~
function of the parameters w and the observations yi are fixed).
I Likelihood for the full data sample (n observations)
n
Y n
Y
L(~
w) = Li (~
w) = σ(~ w |~x ))1−yi
w |~x )yi (1 − σ(~
i=1 i=1

I Maximizing the log-likelihood ln L(~

w ) corresponds to minimizing the loss function
n
X
w ) = − ln L(~
C (~ w) = w |~x ) − (1 − yi ) ln(1 − σ(~
−yi ln σ(~ w |~x ))
i=1

I This is nothing else but the cross-entropy loss function

38
scikit-learn

I Free software machine learning library for Python

I Initial release: 2007
I features various classification, regression and clustering
algorithms including k-nearest neighbors, multi-layer
perceptrons, support vector machines, random forests,
gradient boosting, k-means
I Scikit-learn is one of the most popular machine learning
libraries on GitHub
I https://scikit-learn.org/

39
Example 1 - Probability of passing an exam (logistic regression) (1)
Objective: predict the probability that someone passes an exam based on the number of
hours studying
1
ppass = σ(s) = , s = w1 t + w0 , t = # hours
1 + e −s
I Data set:
I preparation t time in hours 1.0

probability of passing exam

I passed / not passes (0/1) 0.8
I Parameters need to be 0.6
determined through numerical
minimization 0.4
I w0 = −4.0777 0.2
I w1 = 1.5046
0.0
0 1 2 3 4 5 6
03_ml_basics_logistic_regression.ipynb preparation time in hours
40
Example 1 - Probability of passing an exam (logistic regression) (2)
Read data from file:

# data: 1. hours studies, 2. passed (0/1)

df = pd.read_csv(filename, engine='python', sep='\s+')
x_tmp = df['hours_studied'].values
x = np.reshape(x_tmp, (-1, 1))
y = df['passed'].values

Fit the data:

from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(penalty='none', fit_intercept=True)
clf.fit(x, y);

Calculate predictions:

hours_studied_tmp = np.linspace(0., 6., 1000)

hours_studied = np.reshape(hours_studied_tmp, (-1, 1))
y_pred = clf.predict_proba(hours_studied)

41
Precision and recall

Precision: Recall:
Fraction of correctly classified instances Fraction of positive instances that are
among all instances that obtain a certain correctly classified.
class label.

TP TP
precision = recall =
TP + FP TP + FN
"purity" "efficiency"

TP: true positives, FP: false positives, FN: false negatives

42
Example 2: Heart disease data set (logistic regression) (1)
Read data:
filename = "https://www.physi.uni-heidelberg.de/~reygers/lectures/2021/ml/data/heart.csv"
df = pd.read_csv(filename)
df

03_ml_basics_log_regr_heart_disease.ipynb
43
Example 2: Heart disease data set (logistic regression) (2)
Define array of labels and feature vectors

y = df['target'].values
X = df[[col for col in df.columns if col!="target"]]

Generate training and test data sets

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, shuffle=True)

Fit the model

from sklearn.linear_model import LogisticRegression

lr = LogisticRegression(penalty='none', fit_intercept=True, max_iter=1000, tol=1E-5)
lr.fit(X_train, y_train)

44
Example 2: Heart disease data set (logistic regression) (3)
Test predictions on test data set:

from sklearn.metrics import classification_report

y_pred_lr = lr.predict(X_test)
print(classification_report(y_test, y_pred_lr))

Output:

precision recall f1-score support

0 0.75 0.86 0.80 63

1 0.89 0.80 0.84 89

accuracy 0.82 152

macro avg 0.82 0.83 0.82 152
weighted avg 0.83 0.82 0.82 152

45
Example 2: Heart disease data set (logistic regression) (4)
Compare to another classifier usinf the receiver operating characteristic (ROC) curve
Let’s take the random forest classifier

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(max_depth=3)
rf.fit(X_train, y_train)

Use roc_curve from scikit-learn

from sklearn.metrics import roc_curve

y_pred_prob_lr = lr.predict_proba(X_test) # predicted probabilities

fpr_lr, tpr_lr, _ = roc_curve(y_test, y_pred_prob_lr[:,1])

y_pred_prob_rf = rf.predict_proba(X_test) # predicted probabilities

fpr_rf, tpr_rf, _ = roc_curve(y_test, y_pred_prob_rf[:,1])

46
Example 2: Heart disease data set (logistic regression) (5)

plt.plot(tpr_lr, 1-fpr_lr, label="log. regression")

plt.plot(tpr_rf, 1-fpr_rf, label="random forest") 1.0

0.8

Precision
0.6
Classifiers can be compared with the area
under curve (AUC) score. 0.4

from sklearn.metrics import roc_auc_score

0.2 log. regression
auc_lr = roc_auc_score(y_test,y_pred_lr) 0.0
random forest
auc_rf = roc_auc_score(y_test,y_pred_rf) 0.0 0.2 0.4 0.6 0.8 1.0
print(f"AUC scores: {auc_lr:.2f}, {auc_knn:.2f}") Recall

This gives
AUC scores: 0.82, 0.83

47
Multinomial logistic regression: Softmax function

In the previous example we considered two classes (0, 1). For multi-class classification,
the logistic function can generalized to the softmax function.

Now consider k classes and let si be the score for class i: ~s = (s1 , ..., sk )

A probability for class i can be predicted with the softmax function:

e si
σ(~s )i = Pk sj
for i = 1, ..., k
j=1 e

The softmax functions is often used as the last activation function of a neural network in
order to predict probabilities in a classification task.

Multinomial logistic regression is also known as softmax regression.

48
Example 3: Iris data set (softmax regression) (1)
Iris flower data set
I Introduced 1936 in a paper by Ronald Fisher
I Task: classify flowers
I Three species: iris setosa, iris virginica and iris versicolor
I Four features: petal width and length, sepal width/length, in centimeters

03_ml_basics_iris_softmax_regression.ipynb

https://archive.ics.uci.edu/ml/datasets/Iris
https://en.wikipedia.org/wiki/Iris_flower_data_set

49
Example 3: Iris data set (softmax regression) (2)
Get data set

# import some data to play with

# columns: Sepal Length, Sepal Width, Petal Length and Petal Width
iris = datasets.load_iris()
X = iris.data
y = iris.target

# split data into training and test data sets

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

Softmax regression

from sklearn.linear_model import LogisticRegression

log_reg = LogisticRegression(multi_class='multinomial', penalty='none')
log_reg.fit(x_train, y_train);

50
Example 3 : Iris data set (softmax regression) (3)
Accuracy and confusion matrix for different classifiers
LogisticRegression
for clf in [log_reg, kn_neigh, fisher_ld]: accuracy: 0.96
y_pred = clf.predict(x_test) [[29 0 0]
acc = accuracy_score(y_test, y_pred) [ 0 23 0]
print(type(clf).__name__) [ 0 3 20]]
print(f"accuracy: {acc:0.2f}")
KNeighborsClassifier
# confusion matrix: accuracy: 0.95
# columns: true class, row: predicted class [[29 0 0]
print(confusion_matrix(y_test, y_pred),"\n") [ 0 23 0]
[ 0 4 19]]

LinearDiscriminantAnalysis
accuracy: 0.99
[[29 0 0]
[ 0 23 0]
[ 0 1 22]]

51
General remarks on multi-variate analyses (MVAs)
I MVA Methods
I More effective than classic cut-based analyses
I Take correlations of input variables into account

I Important: find good input variables for MVA methods

I Good separation power between S and B
I No strong correlation among variables
I No correlation with the parameters you try to measure in your signal sample!

I Pre-processing
I Apply obvious variable transformations and let MVA method do the rest
I Make use of obvious symmetries: if e.g. a particle production process is symmetric in
polar angle θ use | cos θ| and not cos θ as input variable
I It is generally useful to bring all input variables to a similar numerical range

52
Example of feature transformation

53
Possible topics for more in-depth talks/discussion

I Uncertainty quantification: Bayesian neural networks

I Concrete Keras implementation of a Bayesian neural network
I Graph neural networks
I Automated machine learning (automated model selection and hyperparameter
tuning)
I Interpretability: understanding SHAP values
I ...

54
Exercise 1: Classification of air showers measured with the MAGIC
telescope
I Cosmic gamma rays (30 GeV - 30 TeV).
I Cherenkov light from air showers
I Background: air showers caused by
hadrons.

55
Exercise 1: Classification of air showers measured with the MAGIC
telescope

Gamma shower Hadronic shower

56
Exercise 1: Classification of air showers measured with the MAGIC
telescope

57
Exercise 1: Classification of air showers measured with the MAGIC
telescope
MAGIC data set
https://archive.ics.uci.edu/ml/datasets/magic+gamma+telescope

1. fLength: continuous # major axis of ellipse [mm]

2. fWidth: continuous # minor axis of ellipse [mm]
3. fSize: continuous # 10-log of sum of content of all pixels [in #phot]
4. fConc: continuous # ratio of sum of two highest pixels over fSize [ratio]
5. fConc1: continuous # ratio of highest pixel over fSize [ratio]
6. fAsym: continuous # distance from highest pixel to center, projected onto major axis [mm]
7. fM3Long: continuous # 3rd root of third moment along major axis [mm]
8. fM3Trans: continuous # 3rd root of third moment along minor axis [mm]
9. fAlpha: continuous # angle of major axis with vector to origin [deg]
10. fDist: continuous # distance from origin to center of ellipse [mm]
11. class: g,h # gamma (signal), hadron (background)

g = gamma (signal): 12332

h = hadron (background): 6688

For technical reasons, the number of h events is underestimated.

In the real data, the h class represents the majority of the events.

58
Exercise 1: Classification of air showers measured with the MAGIC
telescope

03_ml_basics_ex_1_magic.ipynb
a) Create for each variable a figure with a plot for gammas and hadrons overlayed.
b) Create training and test data set. The test data should amount to 50% of the total
data set.
c) Define the logistic regressor and fit the training data
d) Determine the model accuracy and the AUC score
e) Plot the ROC curve (background rejection vs signal efficiency)

59
Exercise 2: Hand-written digit recognition with logistic regression
03_ml_basics_ex_2_mnist_softmax_regression.ipynb
a) Define logistic regressor from scikit-learn and fit data
b) Use classification_report from scikit-learn to determine precision and recall
c) Read in a hand-written digit and classify it. Print the probabilities for each digit.
Determine the digit with the highest probability.
d) (Optional) Create you own hand-written digit with a program like gimp and check
what the classifier does

Hint: You can install required packages on the jupyter hub server like so:
!pip3 install --user pypng
60
Exercise 3: Data preprocessing

a) Read the description of the sklearn.preprocessing package.

b) Start from the example notebook on the logistic regression for the heart disease
data set (03_ml_basics_log_regr_heart_disease.ipynb). Pre-process the heart
disease data set according to the given example. Does preprocessing make a
difference in this case?

Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Machine Learning 2025
No ratings yet
Machine Learning 2025
111 pages
Deep Learning by AndrewNG Tutorial Notes
No ratings yet
Deep Learning by AndrewNG Tutorial Notes
298 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
Introduction To Machine Learning: WWW - Seas.upenn - Edu/ Cis519
100% (1)
Introduction To Machine Learning: WWW - Seas.upenn - Edu/ Cis519
51 pages
Data Science Bootcamp - UG - V1 - 0324
No ratings yet
Data Science Bootcamp - UG - V1 - 0324
30 pages
AA12 Deep Learning 2024
No ratings yet
AA12 Deep Learning 2024
30 pages
ML - Unit I - Final
No ratings yet
ML - Unit I - Final
132 pages
AML All Merged PDF Class 1 To 8
No ratings yet
AML All Merged PDF Class 1 To 8
423 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
39 pages
MLUnit 1
No ratings yet
MLUnit 1
131 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
58 pages
Machine Learning A Lecture Note
No ratings yet
Machine Learning A Lecture Note
111 pages
1803 08823 PDF
No ratings yet
1803 08823 PDF
122 pages
Day 1 S3
No ratings yet
Day 1 S3
29 pages
SEng5305-chap-1-Introduction To ML
No ratings yet
SEng5305-chap-1-Introduction To ML
85 pages
Basic Concepts of Machine Learning For Beginners 1732109263
No ratings yet
Basic Concepts of Machine Learning For Beginners 1732109263
102 pages
Unit 1&2
No ratings yet
Unit 1&2
270 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
Module 1-Basics of ML
No ratings yet
Module 1-Basics of ML
142 pages
LN ML Rug
No ratings yet
LN ML Rug
267 pages
Deep Learning For Early Plant Disease Detection
No ratings yet
Deep Learning For Early Plant Disease Detection
16 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
140 pages
ML Unit-1
No ratings yet
ML Unit-1
43 pages
The - Little - Book - of - Deep Learning
No ratings yet
The - Little - Book - of - Deep Learning
140 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
45 pages
01 Introduction
No ratings yet
01 Introduction
51 pages
LBDL
No ratings yet
LBDL
142 pages
Machine Learning Notes22
No ratings yet
Machine Learning Notes22
45 pages
ML Short U1-4
No ratings yet
ML Short U1-4
60 pages
M2 AI Chap1 Neural-Network
No ratings yet
M2 AI Chap1 Neural-Network
60 pages
AI Bootcamp Sarris2024
No ratings yet
AI Bootcamp Sarris2024
64 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
40 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
48 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
Ch7 Introduction To Machine Learning
No ratings yet
Ch7 Introduction To Machine Learning
29 pages
01 Introduction
No ratings yet
01 Introduction
50 pages
Machine Learning Notes
100% (3)
Machine Learning Notes
134 pages
Intro To ML - 1
No ratings yet
Intro To ML - 1
29 pages
An Overview of Artificial Intelligence in Medical Physics and Radiation Oncology
No ratings yet
An Overview of Artificial Intelligence in Medical Physics and Radiation Oncology
11 pages
Mlfa Autumn 22 Lec 01
No ratings yet
Mlfa Autumn 22 Lec 01
43 pages
Get Understanding Large Language Models Thimira Amaratunga Free All Chapters
100% (3)
Get Understanding Large Language Models Thimira Amaratunga Free All Chapters
76 pages
Introduction To Artificial Intelligence (AI) in Software Testing
No ratings yet
Introduction To Artificial Intelligence (AI) in Software Testing
24 pages
Department of Computer Science and Engineering Government Polytechnic For Women, Ramanagara 562159
No ratings yet
Department of Computer Science and Engineering Government Polytechnic For Women, Ramanagara 562159
72 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
60 pages
01 Introduction 1
No ratings yet
01 Introduction 1
71 pages
CE880 Lecture5 Slides
No ratings yet
CE880 Lecture5 Slides
32 pages
Black Book
No ratings yet
Black Book
42 pages
ML Lecture#1
No ratings yet
ML Lecture#1
52 pages
Lecture 1 Ai
No ratings yet
Lecture 1 Ai
38 pages
Indonesian-Sign-Language-Translation-System 11
No ratings yet
Indonesian-Sign-Language-Translation-System 11
12 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
28 pages
Ai and Machine Learning in Data Analysis
No ratings yet
Ai and Machine Learning in Data Analysis
18 pages
Lec 1,2
No ratings yet
Lec 1,2
69 pages
01 Introduction
No ratings yet
01 Introduction
43 pages
A Survey of AI Text-to-Image and AI Text-to-Video Generators
No ratings yet
A Survey of AI Text-to-Image and AI Text-to-Video Generators
5 pages
ML Unit1
No ratings yet
ML Unit1
15 pages
A High-Bias, Low-Variance Introduction To Machine Learning For Physicists PDF
No ratings yet
A High-Bias, Low-Variance Introduction To Machine Learning For Physicists PDF
117 pages
Tirth PDF
No ratings yet
Tirth PDF
19 pages
Mehta Et Al. - 2019 - A High-Bias, Low-Variance Introduction To Machine PDF
No ratings yet
Mehta Et Al. - 2019 - A High-Bias, Low-Variance Introduction To Machine PDF
116 pages
GPT 3
No ratings yet
GPT 3
15 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
Image Caption Generator: Minor Project (BCA 5005)
No ratings yet
Image Caption Generator: Minor Project (BCA 5005)
15 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
Applied Machine Learning
No ratings yet
Applied Machine Learning
49 pages
Resarch Paper Overleaf
No ratings yet
Resarch Paper Overleaf
10 pages
bk978 0 7503 4957 4ch0
No ratings yet
bk978 0 7503 4957 4ch0
14 pages
The Little Book of Deep Learning
100% (1)
The Little Book of Deep Learning
140 pages
1 s2.0 S1120179723012140 Main
No ratings yet
1 s2.0 S1120179723012140 Main
11 pages
pTIA DataX+ Dy0-001
No ratings yet
pTIA DataX+ Dy0-001
16 pages
Aerial Imagery For Roof Segmentation A Large-Scale
No ratings yet
Aerial Imagery For Roof Segmentation A Large-Scale
18 pages
Enhancing Kubernetes Automated Scheduling With Deep Learning and Reinforcement Techniques For Large-Scale Cloud Computing Optimization
No ratings yet
Enhancing Kubernetes Automated Scheduling With Deep Learning and Reinforcement Techniques For Large-Scale Cloud Computing Optimization
12 pages
As You Delve Into The World of Data Analytics
No ratings yet
As You Delve Into The World of Data Analytics
10 pages
Unit 1 ML
No ratings yet
Unit 1 ML
70 pages
ML 01
No ratings yet
ML 01
15 pages
Deep Learning: A Visual Introduction
No ratings yet
Deep Learning: A Visual Introduction
39 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
Assignment B 3 Customer Churn Modeling
No ratings yet
Assignment B 3 Customer Churn Modeling
7 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
Banking4 0TheImpactofArtificialIntelligence
No ratings yet
Banking4 0TheImpactofArtificialIntelligence
6 pages
Neural Networks Bias
No ratings yet
Neural Networks Bias
7 pages
Cbnetv2: A Composite Backbone Network Architecture For Object Detection
No ratings yet
Cbnetv2: A Composite Backbone Network Architecture For Object Detection
11 pages
Dla QB
No ratings yet
Dla QB
3 pages
Python Machine Learning - Session 2
No ratings yet
Python Machine Learning - Session 2
6 pages
International Journal of Information Management Data Insights
No ratings yet
International Journal of Information Management Data Insights
8 pages
Spatiotemporal Multi-Graph Convolution Network For Ride-Hailing Demand Forecasting
No ratings yet
Spatiotemporal Multi-Graph Convolution Network For Ride-Hailing Demand Forecasting
8 pages
On-Board Deep-Learning-Based Unmanned Aerial Vehicle Fault Cause Detection and Identification
No ratings yet
On-Board Deep-Learning-Based Unmanned Aerial Vehicle Fault Cause Detection and Identification
7 pages
Beyond Silicon
From Everand
Beyond Silicon
Piyush yadav
5/5 (1)
AI Revolution Transforming Industries and Shaping Tomorrow
From Everand
AI Revolution Transforming Industries and Shaping Tomorrow
Dr. islam Abo Amna
No ratings yet

01 ML Basics

Uploaded by

01 ML Basics

Uploaded by

Introduction to Machine Learning in Physics:

1. Machine Learning Basics

Seminar on Advanced Analysis Methods for Heavy-Ion Data, SS 2021

I Exercise 1: Air shower classification (MAGIC telescope)

Example: spam detection J. Mayes, Machine learning 101

Manual feature engineering vs. automatic feature detection 4

(Garrido, Juste and Martinez, ALEPH 96-144)

I Particle physics: Particle identification / classification

Machine learning and the physical sciences, arXiv:1903.10563

"ML has accomplished wonders . . . on well posed-problems where the notion of a

I believe for many interesting problems in physics the situation is similar.

How can we profit from these developments

M. Woolridge, The road to conscious machines

I Improved hardware – graphical processing units [GPUs]

I Simple mathematical representation like linear regression. Favored by statisticians.

Recognition of handwritten digits

I Smallest error rate (2018): 0.18%

O. Russakovsky et al, arXiv:1409.0575

Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy, arXiv:1412.6572v1

Reinforcement learning LeCun 2018, Power And Limits of Deep Learning

I The machine (“the agent”) predicts a scalar reward

Aurelien Geron, Hands-On Machine Learning with Scikit-Learn and

Francois Chollet, Deep Learning with Python

A high-bias, low-variance introduction to Machine Learning for physicists

Machine learning and the physical sciences

A Living Review of Machine Learning for Particle Physics

I Design function y (~x , w

The codomain Y of the function y: X → Y can be a set of labels or classes or a

I Y = finite set of labels → classification

J. Pearl, Turing Award Winner 2011

I Decision boundary fixed with training sample

Cross validation (efficient use of scarce training data)

Square error loss:

I Shannon entropy: H(P) = −

I Cross entropy: H(P, Q) = −E [log Q] = −

B ≡ α: “background efficiency”, i.e., prob. to misclassify bckg. as signal

Problem: the underlying pdf’s are almost never known explicitly.

approximate PDF by N(x , y |S) and N(x , y |B)

f (x1 , x2 , ..., xn ) approximated as L = f1 (x1 ) · f2 (x2 ) · ... · fn (xn )

Classification of feature vector x :

Performance not optimal if true PDF does not factorize

Algorithms finds k nearest neighbors:

Probability for the event to be of signal type:

Better: take correlations between variables into

Coefficients are obtained from:

Linear decision boundaries

Weight vector w~ can be interpreted as a direction in

"Galton family heights data": I data: {xi , yi }

I Find weights that minimze loss function:

I In case of linear regression closed-form solution exists:

I X is called the design matrix, row i of X is ~xi

I Standard loss function

I Consider binary classification task, e.g., yi ∈ {0, 1}

I Maximizing the log-likelihood ln L(~

I This is nothing else but the cross-entropy loss function

I Free software machine learning library for Python

probability of passing exam

# data: 1. hours studies, 2. passed (0/1)

Fit the data:

from sklearn.linear_model import LogisticRegression

hours_studied_tmp = np.linspace(0., 6., 1000)

TP: true positives, FP: false positives, FN: false negatives

Generate training and test data sets

from sklearn.model_selection import train_test_split

Fit the model

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import classification_report

precision recall f1-score support

0 0.75 0.86 0.80 63

accuracy 0.82 152

from sklearn.ensemble import RandomForestClassifier

Use roc_curve from scikit-learn

from sklearn.metrics import roc_curve

y_pred_prob_lr = lr.predict_proba(X_test) # predicted probabilities

y_pred_prob_rf = rf.predict_proba(X_test) # predicted probabilities

plt.plot(tpr_lr, 1-fpr_lr, label="log. regression")

B ≡ α: “background efficiency”, i.e., prob. to misclassify bckg. as signal