0% found this document useful (0 votes)

25 views

Lecture 1

Uploaded by

Gaurav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Lecture 1

Uploaded by

Gaurav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

MACHINE LEARNING IN HIGH

ENERGY PHYSICS
LECTURE #1

Alex Rogozhnikov, 2015

INTRO NOTES
4 days
two lectures, two practice seminars every day
this is introductory track to machine learning
kaggle competition!
WHAT IS ML ABOUT?
Inference of statistical dependencies which give us ability to
predict

Data is cheap, knowledge is precious

WHERE ML IS CURRENTLY USED?
Search engines, spam detection
Security: virus detection, DDOS defense
Computer vision and speech recognition
Market basket analysis, Customer relationship management
(CRM)
Credit scoring, fraud detection
Health monitoring
Churn prediction
... and hundreds more
ML IN HIGH ENERGY PHYSICS
High-level triggers (LHCb trigger system: 40MHz → 5kHz )
Particle identification
Tagging
Stripping line
Analysis
Different data is used on different stages
GENERAL NOTION
In supervised learning the training data is represented as set
of pairs
xi , yi

iis index of event

xi is vector of features available for event

yi is target — the value we need to predict

CLASSIFICATION EXAMPLE
yi ∈ Y , Y if finite set
on the plot: xi ∈ ℝ 2
, yi ∈ {0, 1, 2}

Examples:
defining type of particle (or decay
channel)
Y = {0, 1} — binary classification, 1

is signal, 0 is bck
REGRESSION
y ∈ ℝ
Examples:
predicting price of house by it's positions
predicting number of customers / money income
reconstructing real momentum of particle

Why need automatic classification/regression?

in applications up to thousands of features
higher quality
much faster adaptation to new problems
CLASSIFICATION BASED ON
NEAREST NEIGHBOURS
Given training set of objects and their labels {xi , yi } we
predict the label for new observation.
y = yj ,
̂ j = arg min ρ(x, x )
i
i
VISUALIZATION OF DECISION RULE
k NEAREST NEIGHBOURS
A better way is to use k neighbours:
# of knn events in class i
pi (x) =
k
k = 1, 2, 5, 30
OVERFITTING
what is the quality of classification on training dataset when
k = 1?

answer: it is ideal (closest neighbor is event itself)

quality is lower when k > 1
this doesn't mean k = 1 is the best,
it means we cannot use training events to estimate quality
when classifier's decision rule is too complex and captures
details from training data that are not relevant to
distribution, we call this overfitting (more details tomorrow)
KNN REGRESSOR
Regression with nearest neighbours is done by averaging of
output

1
y =
̂ yj
k ∑
j ∈knn(x)
KNN WITH WEIGHTS
COMPUTATIONAL COMPLEXITY
Given that dimensionality of space is d and there are n
training samples:
training time ~ O(save a link to data)
prediction time: n × d for each sample
SPACIAL INDEX: BALL TREE
BALL TREE
training time ~ O(d × n log(n))
prediction time ~ log(n) × d for each sample
Other option exist: KD-tree.
OVERVIEW OF KNN
1. Awesomely simple classifier and regressor
2. Have too optimistic quality on training data
3. Quite slow, though optimizations exist
4. Hard times with data of high dimensions
5. Too sensitive to scale of features
SENSITIVITY TO SCALE OF FEATURES
Euclidean distance:
ρ(x, y) 2
= (x1 −y 1)
2
+ (x2 −y 2)
2
+ ⋯ + (xd −y d)
2

Change scale fo first feature:

ρ(x, y) = (10x − 10y )
2
1 1
2
+ (x2 −y 2)
2
+ ⋯ + (xd −y d)
2

ρ(x, y) ∼ 100(x − y )
2
1 1
2

Scaling of features frequently increases quality.

DISTANCE FUNCTION MATTERS
Minkowski distance ρp (x, y) = ∑ (x − y )
i i i
p

|xi −y|
Canberra ρ(x, y) ∑
i
=
i
|xi | + |yi |
< x, y >
Cosine metric ρ(x, y) =
|x| |y|
x MINUTES BREAK
RECAPITULATION
1. Statistical ML: problems
2. ML in HEP
3. k nearest neighbours classifier and regressor.
MEASURING QUALITY OF BINARY
CLASSIFICATION
The classifier's output in binary classification is real variable

Which classifier is better?

All of them are identical
ROC CURVE

These distributions have the same ROC curve:

(ROC curve is passed signal vs passed bck dependency)
ROC CURVE DEMONSTRATION
ROC CURVE
Contains important information:
all possible combinations of signal and background
efficiencies you may achieve by setting threshold
Particular values of thresholds (and initial pdfs) don't
matter, ROC curve doesn't contain this information
ROC curve = information about order of events:
s s b s b ... b b s b b

Comparison of algorithms should be based on information

from ROC curve
TERMINOLOGY AND CONVENTIONS
fpr = background efficiency = b
tpr = signal efficiency = s

→
ROC AUC
(AREA UNDER THE ROC CURVE)

ROC AUC = P(x < y) where x, y are predictions of

random background and signal events.
Which classifier is better for triggers?
(they have the same ROC AUC)
STATISTICAL MACHINE LEARNING
Machine learning we use in practice is based on statistics
1. Main assumption: the data is generated from probabilistic
distribution:
p(x, y)

2. Does there really exist the distribution of people / pages?

3. In HEP these distributions do exist
OPTIMAL CLASSIFICATION. OPTIMAL
BAYESIAN CLASSIFIER
Assuming that we know real distributions p(x, y) we
reconstruct using Bayes' rule
p(x, y) p(y)p(x|y)
p(y|x) = =
p(x) p(x)

p(y = 1 | x) p(y = 1) p(x | y = 1)

=
p(y = 0 | x) p(y = 0) p(x | y = 0)

LEMMA (NEYMAN–PEARSON):
p(y = 1 | x)
The best classification quality is provided by
p(y = 0 | x)

(optimal bayesian classifier)

OPTIMAL BINARY CLASSIFICATION

Optimal bayesian classifier has highest possible ROC curve.
Since the classification quality depends only on order,
p(y = 1 | x) gives optimal classification quality too!

p(y = 1 | x) p(y = 1) p(x | y = 1)

=
p(y = 0 | x) p(y = 0) p(x | y = 0)
FISHER'S QDA (QUADRATIC DISCRIMINANT
ANALYSIS)
Reconstructing probabilities p(x | y = 1), p(x | y = 0) from
data, assuming those are multidimensional normal
distributions:
p(x | y = 0) ∼ μ( 0, 0)
Σ

p(x | y = 1) ∼  (μ 1, 1)
Σ
QDA COMPLEXITY
n samples, d dimensions
training takes O(nd 2 + d
3
)

computing covariation matrix O(nd 2 )

inverting covariation matrix O(d 3 )
prediction takes O(d 2 ) for each sample
1 1
−1
f (x) = exp − (x − μ) T
(x − μ) )
π) (
Σ

k/2 1/2
(2 |Σ | 2
QDA
simple decision rule
fast prediction
many parameters to reconstruct in high dimensions
data almost never has gaussian distribution
WHAT ARE THE PROBLEMS WITH
GENERATIVE APPROACH?
Generative approach: trying to reconstruct p(x, y), then use
it to predict.
Real life distributions hardly can be reconstructed
Especially in high-dimensional spaces
So, we switch to discriminative approach: guessing p(y|x)
LINEAR DECISION RULE
Decision function is linear:
d(x) =< w, x > +w0

d(x) > 0, class + 1

{ d(x) < 0, class −1

This is parametric model (finding parameters w, w0 ).

FINDING OPTIMAL PARAMETERS
A good initial guess: get such w, w0 , that error of
classification is minimal ([true] = 1, [false] = 0):

 = [yi ≠ sgn(d(xi ))]

∑
i∈events
Discontinuous optimization (arrrrgh!)
Let's make decision rule smooth
⎧ f (0) = 0.5
p+1 (x) = f (d(x)) ⎪
⎨ f (x) > 0.5 if x > 0
p−1 (x) = 1 −p +1 (x)
⎪
⎩ f (x) < 0.5 if x < 0
LOGISTIC FUNCTION
a smooth step rule.
x
e 1
σ(x) = x
=
−x
1 + e 1 + e

PROPERTIES
1. monotonic, σ(x) ∈ (0, 1)
2. σ(x) + σ(−x) = 1
3. σ (x) = σ(x)(1 − σ(x))
′

4. 2 σ (x) = 1 + tanh(x/2)
LOGISTIC FUNCTION
LOGISTIC REGRESSION
Optimizing log-likelihood (with probabilities obtained with
logistic function)
d(x) = < w, x > +w0

p+1 (x) = σ(d(x))

p−1 (x) = σ (−d(x))

 =
1

N ∑
− ln(p yi (xi )) =
1

N ∑
L(xi , yi ) → min

∈events
i i
Exercise: find expression and build plot for L(xi , yi )

DATA SCIENTIST PIPELINE

1. Experiments in appropriate high-level language or

environment
2. After experiments are over — implement final algorithm in
low-level language (C++, CUDA, FPGA)
Second point is not always needed.
SCIENTIFIC PYTHON
NumPy
vectorized computations in python

Matplotlib
for drawing

Pandas
for data manipulation and analysis (based on
NumPy)
SCIENTIFIC PYTHON
Scikit-learn
most popular library for machine learning

Scipy
libraries for science and engineering

Root_numpy
convenient way to work with ROOT files
THE END

Murphy Book Solution
No ratings yet
Murphy Book Solution
100 pages
Pattern Classification: Second Edition
No ratings yet
Pattern Classification: Second Edition
11 pages
datamining-lect12
No ratings yet
datamining-lect12
75 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Session 5 ppt
No ratings yet
Session 5 ppt
36 pages
Classification
100% (2)
Classification
105 pages
W4 Ecs7020p
No ratings yet
W4 Ecs7020p
48 pages
Lec 04
No ratings yet
Lec 04
70 pages
1Datamining Intro
No ratings yet
1Datamining Intro
42 pages
Datamining Lect7knearst
No ratings yet
Datamining Lect7knearst
62 pages
Introduction To Classification - PPT Slides 1
No ratings yet
Introduction To Classification - PPT Slides 1
62 pages
Chapter
100% (1)
Chapter
101 pages
Classification
No ratings yet
Classification
53 pages
Datamining-lect4 - Other Classification Techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning
No ratings yet
Datamining-lect4 - Other Classification Techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning
79 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Ml2 Script v2
No ratings yet
Ml2 Script v2
123 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Statistical Learning Slides
No ratings yet
Statistical Learning Slides
60 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
Lesson 8 - Classification
No ratings yet
Lesson 8 - Classification
74 pages
Classification
No ratings yet
Classification
4 pages
To Machine Learning: Isabelle Guyon
No ratings yet
To Machine Learning: Isabelle Guyon
40 pages
Lect 1
No ratings yet
Lect 1
24 pages
Reviews Less 1 -4
No ratings yet
Reviews Less 1 -4
115 pages
j077 2011 KulHar WileyTutorial
No ratings yet
j077 2011 KulHar WileyTutorial
14 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
Cs 171 18 IntroLearning Old
No ratings yet
Cs 171 18 IntroLearning Old
47 pages
DM - Ch4 - Classification (Part1)
No ratings yet
DM - Ch4 - Classification (Part1)
20 pages
QSRI-lecture1
No ratings yet
QSRI-lecture1
45 pages
An Introduction To Supervised Machine Learning and Pattern Classification - The Big Picture
No ratings yet
An Introduction To Supervised Machine Learning and Pattern Classification - The Big Picture
55 pages
Chapter Four
No ratings yet
Chapter Four
75 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Introduction To Pattern Recognition: Vojtěch Franc
No ratings yet
Introduction To Pattern Recognition: Vojtěch Franc
21 pages
IntroClassificationDA-2024
No ratings yet
IntroClassificationDA-2024
129 pages
Super Cheatsheet Machine Learning
100% (1)
Super Cheatsheet Machine Learning
15 pages
Pattern Classification: Second Edition
No ratings yet
Pattern Classification: Second Edition
11 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Unit4_PPT
No ratings yet
Unit4_PPT
118 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
No ratings yet
06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
32 pages
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
No ratings yet
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
101 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
EoSE QP PPM
No ratings yet
EoSE QP PPM
1 page
Central Universities Common Entrance Test - 2018: Fee Structure
No ratings yet
Central Universities Common Entrance Test - 2018: Fee Structure
1 page
PA Final Work
No ratings yet
PA Final Work
5 pages
Central University of Rajasthan Tentative Academic Calendar Odd Semester 2017 - 2018
No ratings yet
Central University of Rajasthan Tentative Academic Calendar Odd Semester 2017 - 2018
2 pages
Nest2017 A
No ratings yet
Nest2017 A
28 pages
Alumni Society (2013)
0% (1)
Alumni Society (2013)
5 pages
Character Sketch of Harris
No ratings yet
Character Sketch of Harris
2 pages
Itl Public School Holiday Homework Class Xii: Theory
No ratings yet
Itl Public School Holiday Homework Class Xii: Theory
3 pages
Semester Ordinance
No ratings yet
Semester Ordinance
21 pages
Chapter 14
No ratings yet
Chapter 14
14 pages
Chapter 15
No ratings yet
Chapter 15
32 pages
Chapter 13
No ratings yet
Chapter 13
11 pages
Rahul Classes: Fo:Id Cy (Deforming Force) :-Izkr KLFKRK %&
No ratings yet
Rahul Classes: Fo:Id Cy (Deforming Force) :-Izkr KLFKRK %&
5 pages
Chapter 08
No ratings yet
Chapter 08
16 pages
Chapter 07
No ratings yet
Chapter 07
16 pages
Motion N One Dimension
No ratings yet
Motion N One Dimension
18 pages
Chapter 05
No ratings yet
Chapter 05
16 pages
Chapter 03
No ratings yet
Chapter 03
29 pages
Noble Science Classes: Foundation For Iit-Jee (Mains), Neet (Medical)
No ratings yet
Noble Science Classes: Foundation For Iit-Jee (Mains), Neet (Medical)
14 pages
User Guide: Multi-Variable Analog Interface
No ratings yet
User Guide: Multi-Variable Analog Interface
26 pages
G31MX 2.0 Series Motherboard
No ratings yet
G31MX 2.0 Series Motherboard
76 pages
Ways To Study A System
No ratings yet
Ways To Study A System
35 pages
CMake Cache
No ratings yet
CMake Cache
11 pages
2 PREP Grade Primary
No ratings yet
2 PREP Grade Primary
5 pages
Catalogue DigiRAD-FT-TMS
No ratings yet
Catalogue DigiRAD-FT-TMS
8 pages
1cs6bbcsg 694504
No ratings yet
1cs6bbcsg 694504
11 pages
GSA 5G Device Ecosystem Summary Report September 2023
No ratings yet
GSA 5G Device Ecosystem Summary Report September 2023
4 pages
OVS-DPDK Life of A Packet.2019
No ratings yet
OVS-DPDK Life of A Packet.2019
7 pages
Mini Project Document
No ratings yet
Mini Project Document
49 pages
Gnu General Public License
No ratings yet
Gnu General Public License
11 pages
Daslight 5 Manual en
No ratings yet
Daslight 5 Manual en
77 pages
Nissan X Trail Model t32 Series Service Repair Manual
100% (1)
Nissan X Trail Model t32 Series Service Repair Manual
9,003 pages
Home Interview Questions Java SQL Python Javascript Angular
No ratings yet
Home Interview Questions Java SQL Python Javascript Angular
21 pages
PT Mathematics-4 q2
No ratings yet
PT Mathematics-4 q2
4 pages
Touch Screen Technology
No ratings yet
Touch Screen Technology
24 pages
Australian Kits
No ratings yet
Australian Kits
9 pages
Evolution of Telephone Presentation - PPT X
No ratings yet
Evolution of Telephone Presentation - PPT X
5 pages
Principal Component Analysis (PCA) in Machine Learning
No ratings yet
Principal Component Analysis (PCA) in Machine Learning
20 pages
Big Data Management and Data Analytics
No ratings yet
Big Data Management and Data Analytics
6 pages
MIS Chapter 3
No ratings yet
MIS Chapter 3
27 pages
Nastasa Alexandru en
No ratings yet
Nastasa Alexandru en
4 pages
Ada Bcsl404 Lab Manual
No ratings yet
Ada Bcsl404 Lab Manual
40 pages
CAD CAM Report
No ratings yet
CAD CAM Report
3 pages
Gourav Bajaj 2 Years
No ratings yet
Gourav Bajaj 2 Years
3 pages
Preparing_for_the_MCIE
No ratings yet
Preparing_for_the_MCIE
16 pages
SWIFTNet Online Operations Manager
No ratings yet
SWIFTNet Online Operations Manager
117 pages
Practical 2 APPLE
No ratings yet
Practical 2 APPLE
4 pages
f
No ratings yet
f
2 pages
Phase 2 Final Report Depression Detection
No ratings yet
Phase 2 Final Report Depression Detection
48 pages

Lecture 1

Uploaded by

Lecture 1

Uploaded by

MACHINE LEARNING IN HIGH

Alex Rogozhnikov, 2015

Data is cheap, knowledge is precious

iis index of event

yi is target — the value we need to predict

Why need automatic classification/regression?

answer: it is ideal (closest neighbor is event itself)

Change scale fo first feature:

Scaling of features frequently increases quality.

Which classifier is better?

These distributions have the same ROC curve:

Comparison of algorithms should be based on information

ROC AUC = P(x < y) where x, y are predictions of

2. Does there really exist the distribution of people / pages?

p(y = 1 | x) p(y = 1) p(x | y = 1)

(optimal bayesian classifier)

OPTIMAL BINARY CLASSIFICATION

p(y = 1 | x) p(y = 1) p(x | y = 1)

computing covariation matrix O(nd 2 )

d(x) > 0, class + 1

{ d(x) < 0, class −1

This is parametric model (finding parameters w, w0 ).

 = [yi ≠ sgn(d(xi ))]

p+1 (x) = σ(d(x))

DATA SCIENTIST PIPELINE

1. Experiments in appropriate high-level language or

You might also like