0% found this document useful (0 votes)

49 views

2 - Basics of Machine Learning

This document provides an introduction to machine learning, including its goals, typical workflow, and some common methods. It discusses how machine learning uses algorithms to find patterns in large amounts of data and make predictions without being explicitly programmed. The document emphasizes the importance of validating models on testing data and cautions that machine learning is best for interpolation and not extrapolation. It also introduces scikit-learn as a popular Python tool for machine learning.

Uploaded by

HERiTAGE1981

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views

2 - Basics of Machine Learning

Uploaded by

HERiTAGE1981

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

CHAPTER 2 

Introduction to 
Machine Learning

Hervé Gross, PhD - Reservoir Engineer

Advanced Resources and Risk Technology
[email protected]

o “Give computers the ability to learn without being explicitly programmed” (Arthur Samuel, 1959)
o Algorithms that use data (“learn”) to make predictions
o No “model” ( = function that maps input to output) is explicitly given
o Although no model is given, the users have many algorithms to chose from
o Validation is always required to assess the quality of our predictions

o “data science” often means machine-learning applied to massive amounts of data

o Goals:
o analyze and describe data: find trends, clusters, anomalies,
o produce predictions that improve in quality and reliability with more and more data,
o optimize decisions based on multiple previous experiments

© ADVANCED RESOURCES AND RISK TECHNOLOGY 2

ML always follows the same steps

Data Data Machine Model Predictions

acquisition preparation Learning Validation Decisions

TRAINING

TESTING

Typical machine learning workflow

Data preparation (removing redundancies, incomplete or erroneous data, select features…) : this is often
the most time-consuming step, sometimes even iterative.
Model Validation : often done lightly but it is the most important step in the process to establish credibility in
the outcomes.

© ADVANCED RESOURCES AND RISK TECHNOLOGY 3

Two families, many methods 
Non-exhaustive list: beware of hype!

UNSUPERVISED LEARNING SUPERVISED LEARNING

Provide input, not outputs  Provide input and outputs 
Find patterns, clusters, anomalies  Predict outputs when new sets of inputs are given
Predict similarity when new data is given
Analytical learning
Clustering Artificial neural network
k-means Backpropagation
mixture models Boosting (meta-algorithm)
hierarchical clustering, Bayesian statistics
Case-based reasoning
Anomaly detection Decision tree learning
Inductive logic programming
Neural Networks Gaussian process regression
Hebbian Learning Group method of data handling
Generative Adversarial Networks Kernel estimators
Learning Automata
Approaches for learning latent variable models such as Learning Classifier Systems
Expectation–maximization algorithm (EM) Minimum message length (decision trees, decision graphs, etc.)
Method of moments Multilinear subspace learning
Blind signal separation techniques Naive bayes classifier
Principal component analysis Maximum entropy classifier
Independent component analysis Conditional random field
Non-negative matrix factorization Nearest Neighbor Algorithm
Singular value decomposition Probably approximately correct learning (PAC) learning
Ripple down rules, a knowledge acquisition methodology
Symbolic machine learning algorithms
Subsymbolic machine learning algorithms
Support vector machines
Minimum Complexity Machines (MCM)
BEWARE OF HYPE Random Forests
Many ML algorithms derive from the same concepts. Ensembles of Classifiers
Ordinal classification
Their names are often marketing, and they all sell false Data Pre-processing
to solve “all problems”. All algorithms have Handling imbalanced datasets
Statistical relational learning
assumptions, weaknesses, and applicability restrictions.  Proaftn, a multicriteria classification algorithm
it is important to exert critical knowledge. … https://en.wikipedia.org/wiki/Unsupervised_learning
https://en.wikipedia.org/wiki/Supervised_learning

© ADVANCED RESOURCES AND RISK TECHNOLOGY 4

Applications

o Used vastly in computer science and anything related to data sets of observations so
large that human calibration is impossible:
o search engines (page ranking, related researches),
o social networks (suggestions, advertisement),
o image pattern recognition (classification of features, face detection, identification)
o natural language processing…

o Scientific uses in all domains of science with enough observations. Sometimes seen as
a statistical “black box” when theory-driven models are considered more elegant. Both
approaches can coexist and be complementary.

o Also used in economics, finance, medical diagnosis, opinion management,…

o Not needed when the model is obvious or simple

© ADVANCED RESOURCES AND RISK TECHNOLOGY 5

Applicability : ML is not magic!

o Everything is based on the underlying model that the algorithm forces on the data
o Interpolation = if done well, OK
o Extremes = difficult
o Extrapolation = very dangerous (especially if you cannot explain the underlying statistical model)
o Machine Learning is just a model, as such it always needs validation

lation
extrapo
extra
pola
tion
output

Training domain
input

The importance of validation

Validation (sometimes called “model assessment) consists of evaluating the predictive

quality of trained model
• Models are useless (and dangerous!) if they have not been validated
• Always validate

• Methods for validating

• Holdout methods (split the data in training and testing sets, typically 2/3 and 1/3)
• N-fold cross-validation (split the data in N subsets, train with N-1, test with the last one)
• Bootstrap: create a new (usually larger) dataset by randomly sampling the data with replacement

• Validation metrics
• Accuracy ( = bias) and precision (=variability) : correlations, spearman, bias matrices, etc.
• Sensitivity : true positive rate (or recall, or probability of detection)
• Specificity : true negative rate
• Sensitivity vs. specificity = ROC curves : receiver operating characteristics

Machine learning toolboxes

Toolboxes in (almost) all languages:

https://github.com/josephmisiti/awesome-machine-learning#awesome-machine-learning-

The number of machine learning packages is very large, free open-source or proprietary,
language-specific or not, cross-platform or not, cloud-friendly or not, etc.

For this class, we will use SCIKIT-LEARN in Python:

http://scikit-learn.org/stable/index.html

Scikit-learn

References

Fawcett, Tom (2006). "An Introduction to ROC Analysis" (PDF). Pattern Recognition Letters. 27 (8): 861 – 874. doi:10.1016/
j.patrec.2005.10.010.
Powers, David M W (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness &
Correlation" (PDF). Journal of Machine Learning Technologies. 2 (1): 37–63.
Ting, Kai Ming (2011). Encyclopedia of machine learning. Springer. ISBN 978-0-387-30164-8.

Advanced Machine Learning Mastering Level Learning With Python
No ratings yet
Advanced Machine Learning Mastering Level Learning With Python
81 pages
Bridge - 2001 - Characterization of Fluvial Hydrocarbon Reservoirs and Aquifers Problems and Solutions
100% (1)
Bridge - 2001 - Characterization of Fluvial Hydrocarbon Reservoirs and Aquifers Problems and Solutions
28 pages
Z Sap BW Note Analyzer
No ratings yet
Z Sap BW Note Analyzer
178 pages
Door Closer Simulation
No ratings yet
Door Closer Simulation
7 pages
ML Unit1(HKB)
No ratings yet
ML Unit1(HKB)
7 pages
9e27d2e7-5dfa-4b8b-b760-d1fb4a21abd0
No ratings yet
9e27d2e7-5dfa-4b8b-b760-d1fb4a21abd0
24 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
Ai Cheat Sheet Machine Learning With Python Cheat Sheet
100% (3)
Ai Cheat Sheet Machine Learning With Python Cheat Sheet
2 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
Introduction To Machine Learning: Dr.S.Sankar Ganesh Vellore Institute of Technology
No ratings yet
Introduction To Machine Learning: Dr.S.Sankar Ganesh Vellore Institute of Technology
132 pages
ML Unit-1
No ratings yet
ML Unit-1
39 pages
Introduction To Machine Learning PPT Main
No ratings yet
Introduction To Machine Learning PPT Main
15 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
10 pages
Unit III
No ratings yet
Unit III
19 pages
Module 1
No ratings yet
Module 1
22 pages
ML Unit1
No ratings yet
ML Unit1
25 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
Unit 3 - DS - 1st year
No ratings yet
Unit 3 - DS - 1st year
5 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
ML Unit 1
No ratings yet
ML Unit 1
19 pages
LECTURE-2
No ratings yet
LECTURE-2
36 pages
AIYA SESSION 4
No ratings yet
AIYA SESSION 4
42 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
6 pages
Machine Learning
No ratings yet
Machine Learning
42 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
48 pages
Machine Learning.
No ratings yet
Machine Learning.
50 pages
Seminar
No ratings yet
Seminar
26 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
13 pages
UNIT I-Machine Learning
No ratings yet
UNIT I-Machine Learning
68 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
Introduction To Learning: Frederic Precioso 24/01/2019
No ratings yet
Introduction To Learning: Frederic Precioso 24/01/2019
179 pages
Meta Motion Fitness Tracker 241109 213742[1] Removed
No ratings yet
Meta Motion Fitness Tracker 241109 213742[1] Removed
20 pages
Iu 3.6.4 ML 101
No ratings yet
Iu 3.6.4 ML 101
39 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
68 pages
DAIOT UNIT 5 (1) Own
No ratings yet
DAIOT UNIT 5 (1) Own
13 pages
Exploring Machine Learning Algorithms - A Beginner's Guide
No ratings yet
Exploring Machine Learning Algorithms - A Beginner's Guide
10 pages
Evolution of Machine Learning
No ratings yet
Evolution of Machine Learning
7 pages
Unit - 1 Mlcse
No ratings yet
Unit - 1 Mlcse
21 pages
Unit-I
No ratings yet
Unit-I
8 pages
Machine Learning: Presentation
100% (2)
Machine Learning: Presentation
23 pages
Effective Model Validation Using Machine Learning
No ratings yet
Effective Model Validation Using Machine Learning
4 pages
ML Lecture - 1
No ratings yet
ML Lecture - 1
33 pages
CP Presentation Affan, Hammad, Arman, Shayan
No ratings yet
CP Presentation Affan, Hammad, Arman, Shayan
18 pages
Introductiontomachinelearning 230723174746 1a0e5edc
No ratings yet
Introductiontomachinelearning 230723174746 1a0e5edc
27 pages
AI Presentation Machine Learning
100% (2)
AI Presentation Machine Learning
42 pages
unit V
No ratings yet
unit V
67 pages
Lecture 8
No ratings yet
Lecture 8
11 pages
Python UNIT-5
100% (1)
Python UNIT-5
67 pages
Chapter - 1 PPT
No ratings yet
Chapter - 1 PPT
56 pages
presenttion33
No ratings yet
presenttion33
2 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
ML 2
No ratings yet
ML 2
4 pages
ML Overview
No ratings yet
ML Overview
11 pages
Machine Learning BE Merged Modules
No ratings yet
Machine Learning BE Merged Modules
561 pages
An Executives Guide To AI PDF
No ratings yet
An Executives Guide To AI PDF
12 pages
Machine Learning1
100% (1)
Machine Learning1
11 pages
Introduction To Data Science Module 3
No ratings yet
Introduction To Data Science Module 3
24 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
12 pages
Machine Learning-Supervised Learning
No ratings yet
Machine Learning-Supervised Learning
31 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
The Deep Learning Engineer's Handbook: From Fundamentals to Advanced Techniques with Scikit-Learn, Keras, and TensorFlow
From Everand
The Deep Learning Engineer's Handbook: From Fundamentals to Advanced Techniques with Scikit-Learn, Keras, and TensorFlow
Aarav Joshi
No ratings yet
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
0 - Class Introduction
No ratings yet
0 - Class Introduction
6 pages
New Features in CMG 2010 Software
No ratings yet
New Features in CMG 2010 Software
27 pages
A Guide To Using CMG Licensing
0% (1)
A Guide To Using CMG Licensing
40 pages
Hydro GeoAnalyst User's Manual
No ratings yet
Hydro GeoAnalyst User's Manual
504 pages
Introduction and Seismic Method: Geophysical Data Processing
No ratings yet
Introduction and Seismic Method: Geophysical Data Processing
13 pages
Chapter - 1 Vector Analysis PDF
100% (1)
Chapter - 1 Vector Analysis PDF
39 pages
Meteodyn CFD Micro Scale Modeling Statistical Learning Neural Network Wind Power Forecasting
No ratings yet
Meteodyn CFD Micro Scale Modeling Statistical Learning Neural Network Wind Power Forecasting
21 pages
Physical Constraints On Hypercomputation: Paul Cockshott, Lewis Mackenzie, Greg Michaelson
No ratings yet
Physical Constraints On Hypercomputation: Paul Cockshott, Lewis Mackenzie, Greg Michaelson
16 pages
Driveability of a Dryer Section using Dryer Felt Drives
No ratings yet
Driveability of a Dryer Section using Dryer Felt Drives
4 pages
GW Byun Comparision Geotech Softwares PDF
No ratings yet
GW Byun Comparision Geotech Softwares PDF
52 pages
Syllabus 6500
No ratings yet
Syllabus 6500
2 pages
A AMU
No ratings yet
A AMU
9 pages
Me6702 Mechatronics Unit 4
No ratings yet
Me6702 Mechatronics Unit 4
23 pages
MATH Commulative
No ratings yet
MATH Commulative
6 pages
3D Criteria Based On Hoek and Brown
No ratings yet
3D Criteria Based On Hoek and Brown
5 pages
Roger Anderton - From Einstein To Boscovich (Journal of Physics, 2022)
No ratings yet
Roger Anderton - From Einstein To Boscovich (Journal of Physics, 2022)
6 pages
Lab 5 Heat Exchanger
100% (1)
Lab 5 Heat Exchanger
13 pages
Unit 3 - BA - July 2022
No ratings yet
Unit 3 - BA - July 2022
94 pages
3.motion in A Plane
No ratings yet
3.motion in A Plane
27 pages
Year 4 Maths Test - Addition - Questions
No ratings yet
Year 4 Maths Test - Addition - Questions
4 pages
Abacus Syllabus Advacad Solutions
50% (2)
Abacus Syllabus Advacad Solutions
2 pages
Eg Assignment
100% (1)
Eg Assignment
2 pages
Additional Coaching Problems Baquilar
No ratings yet
Additional Coaching Problems Baquilar
100 pages
Signal Detection Theory
No ratings yet
Signal Detection Theory
27 pages
Z Cryptogrphic Algorithms
No ratings yet
Z Cryptogrphic Algorithms
71 pages
Life Cycle Costing
100% (4)
Life Cycle Costing
38 pages
Choice of Functional Form
No ratings yet
Choice of Functional Form
4 pages
MA1511 2021S1 Chapter 2 Multiple Integrals
No ratings yet
MA1511 2021S1 Chapter 2 Multiple Integrals
18 pages
Distance Versus Directed Distance and Displacement
No ratings yet
Distance Versus Directed Distance and Displacement
1 page
Evaluating Complex Inlet Distortion With A Parallel Compressor Model Part 1
No ratings yet
Evaluating Complex Inlet Distortion With A Parallel Compressor Model Part 1
19 pages
Graphing Rational Functions Homework
100% (2)
Graphing Rational Functions Homework
8 pages
(Computing Supplement 11) Dr. K. Daniilidis (Auth.), Prof. Dr. W. Kropatsch, Prof. Dr. R. Klette, Prof. Dr. F. Solina, Prof. Dr. R. Albrecht (Eds.) - Theoretical Foundations of Computer Vision-Springe
No ratings yet
(Computing Supplement 11) Dr. K. Daniilidis (Auth.), Prof. Dr. W. Kropatsch, Prof. Dr. R. Klette, Prof. Dr. F. Solina, Prof. Dr. R. Albrecht (Eds.) - Theoretical Foundations of Computer Vision-Springe
259 pages
Customer Profitability - Marketing Metrics
No ratings yet
Customer Profitability - Marketing Metrics
62 pages

2 - Basics of Machine Learning

Uploaded by

2 - Basics of Machine Learning

Uploaded by

CHAPTER 2

Hervé Gross, PhD - Reservoir Engineer

o “data science” often means machine-learning applied to massive amounts of data

© ADVANCED RESOURCES AND RISK TECHNOLOGY 2

Data Data Machine Model Predictions

Typical machine learning workflow

© ADVANCED RESOURCES AND RISK TECHNOLOGY 3

UNSUPERVISED LEARNING SUPERVISED LEARNING

© ADVANCED RESOURCES AND RISK TECHNOLOGY 4

o Also used in economics, finance, medical diagnosis, opinion management,…

© ADVANCED RESOURCES AND RISK TECHNOLOGY 5

© ADVANCED RESOURCES AND RISK TECHNOLOGY 6

Validation (sometimes called “model assessment) consists of evaluating the predictive

• Methods for validating

© ADVANCED RESOURCES AND RISK TECHNOLOGY 7

Toolboxes in (almost) all languages:

For this class, we will use SCIKIT-LEARN in Python:

© ADVANCED RESOURCES AND RISK TECHNOLOGY 8

© ADVANCED RESOURCES AND RISK TECHNOLOGY 9

© ADVANCED RESOURCES AND RISK TECHNOLOGY 10

You might also like