Week 01
Week 01
Machine
Learning
Fall 2023
COURSE INFORMATION
Course Number and Title: EC-452 Machine Learning
Credits: 3-0
• Classification x1
Labelled data
Supervised Learning Direct feedback
Predict outcome/future
No labels/target
Unsupervised Learning No feedback
Find hidden structure in data
Decision process
Reinforcement Learning Reward system
Learn series of actions
Machine
Learning
Algorithm
• Target, synonymous to
• outcome, ground truth, output, response variable, dependent variable, (class) label (in classification)
•Output / Prediction, use this to distinguish from targets; here, means output from the
model
Common Understanding (Jargons)
• Identify features and examples in the following data?
Common Understanding (Jargons)
• Supervised learning:
• Learn function to map input x (features) to output y (targets)
• Structured data:
• Databases, spreadsheets/csv files
• Unstructured data:
• Features like image pixels, audio signals, text sentences (before
DL, extensive feature engineering required)
Common Understanding (Jargons)
• Unstructured data
Supervised Learning
A Roadmap for Building Machine Learning
Systems
Feature Extraction and Scaling
Feature Selection
Dimensionality Reduction Mostly not needed in DL
i
Sampling
Label
s
Training Dataset
Learning Final
Label New Data
s Model
Test Dataset
Raw Algorithm
Data Labels
Preprocessing Learnin Evaluatio
g n
Predict
Model Selection
ion
Cross-Validation
Performance
Metrics
Hyperparameter
Optimization
Supervised Learning (Notation)
"training examples"
Hypothesis: h(x) = y
Classification Regression
h : ℝm → 𝒴, 𝒴 = {1,...,k} h : ℝm → ℝ
Data Representation
x1
x= x2
⋮
xn
Feature vector
Cont
x1 x1 x1[1 x [1 ⋯ xn[1
]2 ] ]
x1[2 x 2[2 ⋯ x n[2
X= x2 X=
x = x
[i] T
⋮ ]⋮ ]⋮
2 ⋱ ]⋮
⋮ xm x1[m] x [m ⋯ xn[m]
xn 2 ]
m=
n=
33
Hypothesis Space
Entire hypothesis space
Hypothesis space
a particular learning
algorithm category
has access to
Hypothesi
s space
a particular learning
algorithm can sample
Particular hypothesis
(i.e., a model/classifier)
Classes of Machine Learning Algorithms
Below are some classes of algorithms that we are going to discuss in
this class:
• Generalized linear models (e.g., logistic regression)
• Support vector machines (e.g., linear SVM, RBF-kernel SVM)
• Artificial neural networks (e.g., multi-layer perceptrons)
• Tree- or rule-based models (e.g., decision trees)
• Graphical models (e.g., Bayesian networks)
• Ensembles (e.g., Random Forest)
• Instance-based learners (e.g., K-nearest neighbors
Algorithm Categorization Schemes
• Eager vs lazy learners
• Eager learners process training data immediately
• lazy learners defer the processing step until the prediction, e.g., the nearest neighbor algorithm.
• Batch vs online learning
• In batch learning, the model is learned on the entire set of training examples.
• Online learners, in contrast, learn from one training example at the time.
• It is common, in practical applications, to learn a model via batch learning and then update it later using
online learning.
• Parametric vs nonparametric models
• Parametric models are “fixed” models, where we assume a certain functional form for f (x) = y. For
example, linear regression with h(x) = w1x1 + ... + wmxm + b.
• Nonparametric models are more “flexible” and do not have a prespecfied number of parameters. In
fact, the number of parameters grows typically with the size of the training set. For example, a
decision tree would be an example of a nonparametric model, where each decision node (e.g., a binary
“True/False” assertion) can be regarded as a parameter.
Algorithm Categorization Schemes
• Discriminative vs generative
• Generative models (classically) describe methods that model the joint distribution P (X, Y ) =
P (Y )P (X|Y ) = P (X)P (Y|X) for training pairs < x[i], y[i] >.
• Discriminative models are taking a more “direct” approach for modeling P (Y|X) directly.
• While generative models provide typically more insights and allow sampling from the joint
distribution, discriminative models are typically easier to compute and produce more
accurate predictions.
• Discriminative modeling is like trying to extract information from text in a foreign language
without learning that language.
• Generative modeling is like generating text in a foreign language.