Lec 2 Basics of Machine Learning
Lec 2 Basics of Machine Learning
CS F425
BITS Pilani Dr. Bharat Richhariya
Department of CSIS
Pilani Campus
Lecture 2
Basics of Machine Learning
BITS Pilani
Pilani Campus
The key components
• The core components of ML problems are:
1 2
A model of how to
The data we can learn from
transform the data
3
The key components: Data
• Generally, we are concerned with a collection of examples.
• Each example typically consists of a collection of numerical attributes
called features, or, the inputs, covariates, or independent variables
• A special feature is designated as the prediction target, (sometimes
called the label or dependent variable).
• We describe the (constant) length of the vectors as the
dimensionality of the data.
4
The key components: Data
• The more data we have, the more powerful
models we can train.
• Besides having lots of data and processing it
cleverly, we need the right data.
• Garbage dataset results in garbage model- a
model with poor predictive performance.
• Failure can also occur when the data does not
represent all the groups equally and reflects
societal prejudices.
5
The key components: Model
• A model is a computational machinery for ingesting data of one type,
and produce predictions.
• In particular, we are interested in statistical models that can be
estimated from data.
• Deep learning is characterized by a set of powerful models that
consist of many successive transformations of the data that are
chained together top to bottom
6
The key components: Objective
functions
The concept of learning means improving at some task.
• Q: What constitutes an improvement?
A: Objective (or loss) function.
o The objective function is a formal mathematical system that measures how
good (or bad) our models are.
• Conventionally, the objective function is defined that lower is better, and
commonly referred as the loss or cost function.
• Common objective functions:
o Squared error , which is used to predict numerical values.
o Error rate, which is used with classification problems.
7
The key components: Objective
functions
• The loss function is defined with respect to the model’s parameters and depends upon
the dataset.
• The best values of our model’s parameters are learned by minimizing the loss incurred
on a training set.
• However, doing well on the training data does not guarantee that we will do well on
(unseen) test data.
• Typically we split the available data into two (sometimes three) partitions:
1. Training data for fitting model parameters.
2. Test data which is held out for evaluation.
• Such data splitting reports the following two quantities:
o Training error: The error on that data on which the model was trained.
o Test Error: This is the error incurred on an unseen test set.
8
The key components: loss
Optimization algorithms
Value of a parameter
• Given a dataset and its representation, a model, and an objective function, we need
an optimization algorithm.
o An optimization algorithm is an algorithm that is capable of searching for the
best possible parameters for minimizing the loss function.
• The most popular optimization algorithm for neural networks follow an approach
called gradient descent.
10
BITS Pilani
Pilani Campus
12
Example: Iris dataset
13
Supervised Learning
• The supervision comes into play for choosing the parameters θ.
15
Supervised Learning
16
Regression
• Predicting the rating that a user will assign to a movie.
• Predicting the length of stay for patients in the hospital.
• Predicting the price of Bitcoin for the next week
• Predicting the price for which a house will sell.
• When our targets (labels) take on arbitrary values in some range, we call this a regression
problem.
• Our goal is to produce a model whose predictions closely approximate the actual target values.
• We try to learn models that minimize the distance between our predictions and the observed
values.
• We focus on one of two very common losses:
o L1 loss:
o L2 (least mean square) loss:
17
Classification
• Our model looks at a feature vector.
• Then predicts which category (class), among some (discrete) set of options, an
example belongs.
• The simplest form of classification is when there are only two classes: binary
classification.
• In regression, we sought a regressor to output a real value .
• In classification, we seek a classifier, whose output is the predicted class
assignment.
• When we have more than two possible classes, we call the problem multiclass
classification.
• The common loss function for classification problems is called cross-entropy.
18
Tagging
• Some classification problems do not fit neatly into the
binary or multiclass classification setups.
• The problem of learning to predict classes that are not
mutually exclusive is called multi-label classification.
• Tagging problems are typically best described as multi-
label classification problems.
• The model can say the image depicts a cat, a dog, a
donkey, and a rooster.
19
Search and ranking
• In the field of information retrieval, we want to impose a ranking on a set of
items.
• Web search example: the goal is not just to determine whether a particular
page is relevant for a query, but rather, which one of the plethora of search
results is most relevant for a particular user.
22
BITS Pilani
Pilani Campus
24
Clustering
25
Unsupervised Learning
• Till now we discussed Supervised Learning, wherein we feed the model a dataset
containing both the features and corresponding target values.
• Unsupervised learning: training a model to find patterns in a unlabeled dataset.
Some unsupervised learning techniques:
• Clustering: Grouping related examples.
• Principal component analysis (PCA): The process of computing the principal
components and using them to perform a change of basis on the data.
• Probabilistic graphical models: Used to describe the root causes of much of the data
that we observe.
• Generative adversarial networks (GANs): These give us a procedural way to synthesize
data, even complicated structured data like images and audio.
• Image segmentation
26
BITS Pilani
Pilani Campus
29
Atari Games
30
Reinforcement Learning
31
Reinforcement Learning
32
Reinforcement Learning
33
Reinforcement Learning
• We can cast any supervised learning problem as an RL problem.
• For a classification problem. We could create an RL agent with
one action corresponding to each class.
• Then create an environment which gave a reward that was exactly equal to
the loss function from the original supervised problem.
• RL can also address many problems that supervised learning cannot.
• For example, in supervised learning we expect that the training input comes
associated with the correct label.
• But in RL, we do not assume that for each observation, the environment tells
us the optimal action.
34
Sources
• https://uvadlc.github.io/lectures/nov2022/lecture2.pdf
35