0% found this document useful (0 votes)
36 views24 pages

Intro

Uploaded by

Tuğba Can
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views24 pages

Intro

Uploaded by

Tuğba Can
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

BLG 527E Machine Learning

FALL 2023-2024
Prepeared by Assoc. Prof. Yusuf Yaslan & Assoc. Prof. Ayşe Tosun

INTRODUCTION
What is machine learning?
• Learning or inferring a functional relationship between a set of
attributes/features and associated response/output/target variables
so that we can predict the response for any set of the attributes. [Rogers
and Girolami, A First Course in Machine Learning]
• Machine learning is programming computers to optimize a
performance criterion using example data or past experience. [Alpaydin,
Introduction to Machine Learning]
• The goal of machine learning is to develop methods that can
automatically detect patterns in data, and then to use the uncovered
patterns to predict future data or other outcomes of interest. [Murphy,
Machine Learning]
When is learning used?
• Learning is used when:
• Human expertise does not exist (navigating on Mars),
• Humans are unable to explain their expertise (speech recognition)
• Solution changes in time (routing on a computer network)
• Solution needs to be adapted to particular cases (user biometrics)
• The problem size is too vast for our limited reasoning capabilities (calculating
webpage ranks)

Lecture Notes for E Alpaydın 2010 Introduction to Machine


Learning 2e © The MIT Press (V1.0)
What We Talk About When We Talk
About“Learning”
• Learning general models from a data of particular examples
• Data is cheap and abundant (data warehouses, data marts);
knowledge is expensive and scarce.
• Example in retail: Customer transactions to consumer behavior:
People who bought “Blink” also bought “Outliers” (www.amazon.com)
• Build a model that is a good and useful approximation to the data.

Lecture Notes for E Alpaydın 2010 Introduction to Machine


4
Learning 2e © The MIT Press (V1.0)
Data Mining
• Retail: Market basket analysis, Customer relationship management
(CRM)
• Finance: Credit scoring, fraud detection
• Manufacturing: Control, robotics, troubleshooting
• Medicine: Medical diagnosis
• Telecommunications: Spam filters, intrusion detection
• Bioinformatics: Motifs, alignment
• Web mining: Search engines
• ...
Lecture Notes for E Alpaydın 2010 Introduction to Machine
5
Learning 2e © The MIT Press (V1.0)
Applications
• Supervised Learning
• Classification
• Regression
• Unsupervised Learning
• Reinforcement Learning

Lecture Notes for E Alpaydın 2010 Introduction to Machine


6
Learning 2e © The MIT Press (V1.0)
Classification
• Example: Credit
scoring
• Differentiating
between low-risk
and high-risk
customers from
their income and
savings
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
Lecture Notes for E Alpaydın 2010 Introduction to Machine
8
Learning 2e © The MIT Press (V1.0)
Classification: Applications
• Aka Pattern recognition
• Face recognition: Pose, lighting, occlusion (glasses, beard), make-up,
hair style
• Character recognition: Different handwriting styles.
• Speech recognition: Temporal dependency.
• Medical diagnosis: From symptoms to illnesses
• Biometrics: Recognition/authentication using physical and/or
behavioral characteristics: Face, iris, signature, etc
• ...
Lecture Notes for E Alpaydın 2010 Introduction to Machine
9
Learning 2e © The MIT Press (V1.0)
Face Recognition
Training examples of a person

Test images

ORL dataset,
AT&T Laboratories, Cambridge UK
Lecture Notes for E Alpaydın 2010 Introduction to Machine
10
Learning 2e © The MIT Press (V1.0)
Regression
• Example: Price of a used car
• x : car attributes
y = wx+w0
y : price
y = g (x | θ )
g ( ) model,
θ parameters

11
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Regression Applications
• Navigating a car: Angle of the steering
• Kinematics of a robot arm

(x,y) α1= g1(x,y)


α2= g2(x,y)
α2

α1

■ Response surface design


Lecture Notes for E Alpaydın 2010 Introduction to Machine
12
Learning 2e © The MIT Press (V1.0)
Supervised Learning Use cases
• Prediction of future cases: Use the rule to predict the output for
future inputs
• Knowledge extraction: The rule is easy to understand
• Compression: The rule is simpler than the data it explains
• Outlier detection: Exceptions that are not covered by the rule, e.g.,
fraud

Lecture Notes for E Alpaydın 2010 Introduction to Machine


13
Learning 2e © The MIT Press (V1.0)
Unsupervised Learning
• Learning “what normally happens”
• No output
• Clustering: Grouping similar instances
• Example applications
• Customer segmentation in CRM
• Image compression: Color quantization
• Bioinformatics: Learning motifs

Lecture Notes for E Alpaydın 2010 Introduction to Machine


14
Learning 2e © The MIT Press (V1.0)
Reinforcement Learning
• Learning a policy: A sequence of outputs
• No supervised output but delayed reward
• Credit assignment problem
• Game playing
• Robot in a maze
• Multiple agents, partial observability, ...

Lecture Notes for E Alpaydın 2010 Introduction to Machine


15
Learning 2e © The MIT Press (V1.0)
Components of a PR System
A basic pattern classification system contains
• A sensor, preprocessing and feature extraction mechanism (manual or automated)
• Dimensionality reduction step
• A classification (regression, clustering, description) algorithm
• Model selection mechanism (Cross validation or bootstrap)
• A set of examples (training set) already classified or described

images from https://psi.engr.tamu.edu/courses/


Pattern recognition: Features and patterns (1)
• Feature is any distinctive aspect, quality or characteristic
• Features may be symbolic (i.e., color) or numeric (i.e., height)
• The combination of d features is represented as a d-dimensional column vector called a feature
vector
• The d-dimensional space defined by the feature vector is called the feature space
• Objects are represented as points in feature space. This representation is called a scatter plot

• Pattern is a composite of traits or features characteristic of an individual


• In classification tasks, a pattern is a pair of variables {x,r} where
• x is a collection of observations or features (feature vector)
• r is the concept behind the observation (label) (sometimes we will use t instead of r)
Features and patterns (2)
• What makes a “good” feature vector?
• The quality of a feature vector is related to its ability to discriminate
examples from different classes
• Examples from the same class should have similar feature values
• Examples from different classes have different feature values
Feature Selection
(Salmon vs Sea Bass Recognition Problem)

Length Avg. scale intensity

Length and Avg. scale intensity


Model Selection
(Salmon vs Sea Bass Recognition Problem)

Linear Discriminant Function Nonlinear (Neural Network) Function


Performance: 95.7% Performance: 99.9%

Which model should we use?


Training vs. Test performance.
Resources: Journals
• Journal of Machine Learning Research
• Machine Learning
• Neural Computation
• Neural Networks
• IEEE Transactions on Neural Networks
• IEEE Transactions on Pattern Analysis and Machine Intelligence
• Annals of Statistics
• Journal of the American Statistical Association
• ...
21
Resources: Conferences
• International Conference on Machine Learning (ICML)
• European Conference on Machine Learning (ECML)
• Neural Information Processing Systems (NIPS)
• Uncertainty in Artificial Intelligence (UAI)
• Computational Learning Theory (COLT)
• International Conference on Artificial Neural Networks (ICANN)
• International Conference on AI & Statistics (AISTATS)
• International Conference on Pattern Recognition (ICPR)
• ...

22
Resources: Reference Books
• S. Rogers and M. Girolami, A First Course in Machine Learning,
Chapman & Hall.
• P. Flach, The Art and Science of Algorithms that Make Sense of Data,
Cambridge University Press.
• E. Alpaydin, Introduction to Machine Learning, MIT Press.
• J. Watt, R. Borhani, A. K. Katsaggelos, Machine Learning Refined, 2nd
Edition, Cambridge University Press.
• M. P. Deisenroth, A. A. Faisal, and C. S. Ong, Mathematics for Machine
Learning, Cambridge University Press.
Resources: Lectures
Yaser S. Abu-Mostafa: http://work.caltech.edu/telecourse.htm

Andrew Ng:http://www.academicearth.org/courses/machine-learning
http://see.stanford.edu/see/lecturelist.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1

Iain Murphy and Arno Onken: https://mlpr.inf.ed.ac.uk/2021/

Tom Mitchell: http://www.cs.cmu.edu/~tom/10701_sp11/lectures.shtml

Anil Jain: http://ocw.korea.edu/ocw/college-of-engineering/introduction-to-pattern-recognition

Nando de Freitas: https://www.cs.ubc.ca/~nando/540-2013/lectures.html

Dmitry Kobak, Introduction to Machine Learning (Youtube).

Gilbert Strang, Matrix Methods in Data Analysis, Signal Processing, and Machine Learning (Spring 2018) – Youtube.
HANDOUT Tentative
Week Date Topic Assignments/Exams
1 15.02.2024 Introduction Read
mathematical preliminaries & https://homepages.inf.ed.ac
.uk/sgwater/teaching/gener
Supervised Learning al/probability.pdf
Grading 2 22.02.2024 Linear Regression
Maximum Likelihood Estimation
Homeworks: 30% 3 29.02.2024 Maximum Likelihood Estimation Bayesian Decision Theory HW1 (MLE, MAP) announced
Term Project announced
(There will be 3 HWs)
Term Project: 20% 4 07.03.2024 Parametric Methods
Will be a group Project with at most 5 14.03.2024 Multivariate Methods
2-3 members 6 21.03.2024 Dimensionality Reduction
(Report and Source Codes) (PCA, LDA, Factor Analysis)
Midterm 25 % 7 28.03.2024 Clustering HW2 (PCA, LDA) announced
Final exam: 25% (K-Means, Hierarchical)

Attendance: 70% 8 04.04.2024 Midterrm- Clustering (EM)


9 18.04.2024 Nonparametric Methods (Brief Intro)
Decision Trees, Random Forest
10 25.04.2023 Linear Discrimination HW3 (MLP) announced
Multilayer Perceptron
11 02.05.2024 Multilayer Perceptron
12 09.05.2024 VAE- GANs- GNNs
13 16.05.2024 Kernel Machines (SVM)
14 23.05.2024 Assessing and Comparing Classification Algorithms
TBD FINAL

You might also like