Lecture 02 - Warming-Up and Data and Features - Plain
Lecture 02 - Warming-Up and Data and Features - Plain
CS771: Intro to ML
3
Keep in mind: ML is like an exam
It’s the performance on the D-day which matters
In an exam, our success is measured based on how well we did on the questions in
the test (not on the questions we practiced on)
Likewise, in ML, success of the learned model is measured based on how well it
predicts/fits the future test data (not the training data) Plus, of course,
issues such as
fairness
CS771: Intro to ML
“Labeled” means, 4
A Loose Taxonomy of ML during training, for
each input, the
corresponding
Learning using Learning using output is available
(i.e., the machine
labeled data unlabeled data learner is explicitly
told that a cat image
Some examples of
supervised learning problems is of a cat)
Learning
Many other specialized flavors of ML also exist,
some of which include
Semi-supervised Learning
Active Learning
Transfer Learning
RL doesn’t use “labeled” or Multitask Learning
“unlabeled” data in the traditional Reinforcement Imitation Learning (somewhat related to RL)
sense! In RL, an agent learns via Learning Zero-Shot Learning
its interactions with an environment Few-Shot Learning
Continual learning
CS771: Intro to ML
5
A Typical Supervised Learning Workflow
Note: This example is for the
problem of binary classification,
a supervised learning problem
Labeled “dog” “dog”
Training “dog” “dog”
Data
“dog”
“dog” Is feature extraction done “manually” as a
“cat” pre-processing step before the ML algo
“Feature”
“cat” Extraction
“cat” ML Algorithm starts working? Can’t we “automate” this
“cat” “cat” (outputs a “model”) part? Can’t we “learn” good features
directly from raw inputs?
“cat”
Feature extraction converts raw inputs
to a numeric representation that the ML
algo can understand and work with.
More on feature extraction later. Predicted Label
Test “Feature”
Indeed. Deep Learning algos Extraction (cat/dog)
Image
do precisely that! Cat vs Dog
(feature + model learning). Prediction model
More on Deep Learning later.
https://www.pinclipart.com/, http://www.pngtree.com CS771: Intro to ML
6
A Typical Unsupervised Learning Workflow
Note: This example is for the
problem of data clustering, an
unsupervised learning problem
Unlabeled
Data
CS771: Intro to ML
8
CS771: Intro to ML
9
Geometric Perspective Recall that feature extraction converts
inputs into a numeric representation
Basic fact: Inputs in ML problems can often be represented as points or vectors in some vector space
Dimensionality Reduction: An
unsupervised learning problem. Goal is
to compress the size of each input
without losing much information
present in the data
CS771: Intro to ML
11
Perspective as function approximation
Supervised Learning (“predict output given input”) can be usually thought of as learning a
function f that maps each input to the corresponding output
Harder since we
don’t know the
labels in this case
p(label=“cat” | image)
CS771: Intro to ML
14
Data and Features Features represent semantics of the
inputs. Being able to extract good features
is key to the success of ML algos
Each sentence is now represented as a binary vector (each feature is a binary value,
denoting presence or absence of a word). BoW is also called “unigram” rep.
CS771: Intro to ML
16
Example: Feature Extraction for Image Data
A very simple feature extraction approach for image data is flattening
Histogram of visual patterns is another popular feature extr. method for images
Many other manual feature extraction techniques developed in computer vision and
image processing communities (SIFT, HoG, and others)
Pic credit: cat.uab.cat/Research/object-recognition CS771: Intro to ML
17
Feature Selection
Not all the extracted features may be relevant for learning the model (some may
even confuse the learner)
Feature selection (a step after feature extraction) can be used to identify the
features that matter, and discard the others, for more effective learning
Age Calculating BMI from this
Gender data doesn’t require ML
Height Body-mass index (BMI) but this simple example is
just to illustrate the idea
Weight of feature selection
Eye color
More common in supervised learning but can also be done for unsup. learning
CS771: Intro to ML
18
Some More Postprocessing: Feature Scaling
Even after feature selection, the features may not be on the same scale
This can be problematic when comparing two inputs – features that have larger scales may
dominate the result of such comparisons
Therefore helpful to standardize the features (e.g., by bringing all of them on the same
scale such as between 0 to 1)
ati o n M o del
Classific
Learning
re Le a rn in g Module
Featu
Raw Input layers_
(one or more Learned Features
(penultimate layer)
Pic an adaptation of the original from: https://deepai.org/
CS771: Intro to ML
20
Some Notation/Nomenclature/Convention
Sup. learning requires training data as input-output pairs
RL and other flavors
of ML problems also
Unsupervised learning requires training data as inputs use similar notation
Each input is (usually) a vector containing the values of the features or attributes or
covariates that encode properties of the it represents, e.g.,
Size or length of the input is commonly
For a 7 × 7 image: can be a 49 × 1 vector of pixel intensities known as data/input dimensionality or
feature dimensionality
(In sup. Learning) Each is the output or response or label associated with input
(and its value is known for the training inputs)
Output can be a scalar, a vector of numbers, or even an structured object (more on this later)
CS771: Intro to ML
21
Types of Features and Types of Outputs
Features as well as outputs can be real-valued, binary, categorical, ordinal, etc.
Real-valued: Pixel intensity, house area, house price, rainfall amount, temperature, etc
Categorical/Discrete: Zipcode, blood-group, or any “one from a finite many choices“ value
Ordinal: Grade (A/B/C etc.) in a course, or any other type where relative values matter
Often, the features can be of mixed types (some real, some categorical, some ordinal, etc.)
CS771: Intro to ML
22
Some Basic Operations of Inputs
Assume each input feature vector to of size D
What does such a
“mean” represent?
Given inputs their average or mean can be computed as
CS771: Intro to ML