Lecture 1
Lecture 1
ENERGY PHYSICS
LECTURE #1
Examples:
defining type of particle (or decay
channel)
Y = {0, 1} — binary classification, 1
is signal, 0 is bck
REGRESSION
y ∈ ℝ
Examples:
predicting price of house by it's positions
predicting number of customers / money income
reconstructing real momentum of particle
1
y =
̂ yj
k ∑
j ∈knn(x)
KNN WITH WEIGHTS
COMPUTATIONAL COMPLEXITY
Given that dimensionality of space is d and there are n
training samples:
training time ~ O(save a link to data)
prediction time: n × d for each sample
SPACIAL INDEX: BALL TREE
BALL TREE
training time ~ O(d × n log(n))
prediction time ~ log(n) × d for each sample
Other option exist: KD-tree.
OVERVIEW OF KNN
1. Awesomely simple classifier and regressor
2. Have too optimistic quality on training data
3. Quite slow, though optimizations exist
4. Hard times with data of high dimensions
5. Too sensitive to scale of features
SENSITIVITY TO SCALE OF FEATURES
Euclidean distance:
ρ(x, y) 2
= (x1 −y 1)
2
+ (x2 −y 2)
2
+ ⋯ + (xd −y d)
2
ρ(x, y) ∼ 100(x − y )
2
1 1
2
|xi −y|
Canberra ρ(x, y) ∑
i
=
i
|xi | + |yi |
< x, y >
Cosine metric ρ(x, y) =
|x| |y|
x MINUTES BREAK
RECAPITULATION
1. Statistical ML: problems
2. ML in HEP
3. k nearest neighbours classifier and regressor.
MEASURING QUALITY OF BINARY
CLASSIFICATION
The classifier's output in binary classification is real variable
→
ROC AUC
(AREA UNDER THE ROC CURVE)
LEMMA (NEYMAN–PEARSON):
p(y = 1 | x)
The best classification quality is provided by
p(y = 0 | x)
p(x | y = 1) ∼ (μ 1, 1)
Σ
QDA COMPLEXITY
n samples, d dimensions
training takes O(nd 2 + d
3
)
k/2 1/2
(2 |Σ | 2
QDA
simple decision rule
fast prediction
many parameters to reconstruct in high dimensions
data almost never has gaussian distribution
WHAT ARE THE PROBLEMS WITH
GENERATIVE APPROACH?
Generative approach: trying to reconstruct p(x, y), then use
it to predict.
Real life distributions hardly can be reconstructed
Especially in high-dimensional spaces
So, we switch to discriminative approach: guessing p(y|x)
LINEAR DECISION RULE
Decision function is linear:
d(x) =< w, x > +w0
PROPERTIES
1. monotonic, σ(x) ∈ (0, 1)
2. σ(x) + σ(−x) = 1
3. σ (x) = σ(x)(1 − σ(x))
′
4. 2 σ (x) = 1 + tanh(x/2)
LOGISTIC FUNCTION
LOGISTIC REGRESSION
Optimizing log-likelihood (with probabilities obtained with
logistic function)
d(x) = < w, x > +w0
=
1
N ∑
− ln(p yi (xi )) =
1
N ∑
L(xi , yi ) → min
∈events
i i
Exercise: find expression and build plot for L(xi , yi )
Matplotlib
for drawing
Pandas
for data manipulation and analysis (based on
NumPy)
SCIENTIFIC PYTHON
Scikit-learn
most popular library for machine learning
Scipy
libraries for science and engineering
Root_numpy
convenient way to work with ROOT files
THE END