0% found this document useful (0 votes)
21 views35 pages

Lec 2 Basics of Machine Learning

Uploaded by

f20210447
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views35 pages

Lec 2 Basics of Machine Learning

Uploaded by

f20210447
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Deep Learning

CS F425
BITS Pilani Dr. Bharat Richhariya
Department of CSIS
Pilani Campus
Lecture 2
Basics of Machine Learning
BITS Pilani
Pilani Campus
The key components
• The core components of ML problems are:

1 2
A model of how to
The data we can learn from
transform the data

3 A loss function that 4 An algorithm to adjust the


quantifies the badness of model’s parameters to
our model minimize the loss

3
The key components: Data
• Generally, we are concerned with a collection of examples.
• Each example typically consists of a collection of numerical attributes
called features, or, the inputs, covariates, or independent variables
• A special feature is designated as the prediction target, (sometimes
called the label or dependent variable).
• We describe the (constant) length of the vectors as the
dimensionality of the data.

4
The key components: Data
• The more data we have, the more powerful
models we can train.
• Besides having lots of data and processing it
cleverly, we need the right data.
• Garbage dataset results in garbage model- a
model with poor predictive performance.
• Failure can also occur when the data does not
represent all the groups equally and reflects
societal prejudices.

5
The key components: Model
• A model is a computational machinery for ingesting data of one type,
and produce predictions.
• In particular, we are interested in statistical models that can be
estimated from data.
• Deep learning is characterized by a set of powerful models that
consist of many successive transformations of the data that are
chained together top to bottom

6
The key components: Objective
functions
The concept of learning means improving at some task.
• Q: What constitutes an improvement?
A: Objective (or loss) function.
o The objective function is a formal mathematical system that measures how
good (or bad) our models are.
• Conventionally, the objective function is defined that lower is better, and
commonly referred as the loss or cost function.
• Common objective functions:
o Squared error , which is used to predict numerical values.
o Error rate, which is used with classification problems.

7
The key components: Objective
functions
• The loss function is defined with respect to the model’s parameters and depends upon
the dataset.
• The best values of our model’s parameters are learned by minimizing the loss incurred
on a training set.
• However, doing well on the training data does not guarantee that we will do well on
(unseen) test data.
• Typically we split the available data into two (sometimes three) partitions:
1. Training data for fitting model parameters.
2. Test data which is held out for evaluation.
• Such data splitting reports the following two quantities:
o Training error: The error on that data on which the model was trained.
o Test Error: This is the error incurred on an unseen test set.
8
The key components: loss

Optimization algorithms
Value of a parameter

• Given a dataset and its representation, a model, and an objective function, we need
an optimization algorithm.
o An optimization algorithm is an algorithm that is capable of searching for the
best possible parameters for minimizing the loss function.
• The most popular optimization algorithm for neural networks follow an approach
called gradient descent.

• In gradient descent approach, at each step:


o It checks to see, for each parameter, which way the training set loss would move
if you perturbed that parameter just a small amount.
o Then update the parameter in the direction that reduces the loss.
9
Gradient

10
BITS Pilani
Pilani Campus

Kinds of Machine Learning:


Supervised Learning
Supervised Learning
• It addresses the task of predicting targets given inputs.
• The targets (or labels), are denoted by y.
• The input data (features or covariates) are denoted x .
• Each (input, target) pair is called an example or instance.
• A dataset is a collection of n instances
• Our goal is to produce a model that maps any input to a prediction .

12
Example: Iris dataset

Sepal Sepal Petal Petal Species


length width length width Features or covariates
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
Labels
7.0 3.2 4.7 1.4 versicolor
6.4 3.2 4.5 1.5 versicolor
7.2 3.6 6.1 2.5 virginica Instance or example
6.5 3.2 5.1 2.0 virginica

13
Supervised Learning
• The supervision comes into play for choosing the parameters θ.

• We (the supervisors) provide the model with a dataset consisting of labeled


examples , where each example is matched with the correct label.

• In probabilistic terms, we typically are interested in estimating the conditional


probability .

• The majority of successful applications of machine learning are supervised,


because many problems can be described as estimating the probability of
something unknown given a particular set of available data.
14
Supervised Learning

15
Supervised Learning

16
Regression
• Predicting the rating that a user will assign to a movie.
• Predicting the length of stay for patients in the hospital.
• Predicting the price of Bitcoin for the next week
• Predicting the price for which a house will sell.
• When our targets (labels) take on arbitrary values in some range, we call this a regression
problem.
• Our goal is to produce a model whose predictions closely approximate the actual target values.
• We try to learn models that minimize the distance between our predictions and the observed
values.
• We focus on one of two very common losses:
o L1 loss:
o L2 (least mean square) loss:

17
Classification
• Our model looks at a feature vector.
• Then predicts which category (class), among some (discrete) set of options, an
example belongs.
• The simplest form of classification is when there are only two classes: binary
classification.
• In regression, we sought a regressor to output a real value .
• In classification, we seek a classifier, whose output is the predicted class
assignment.
• When we have more than two possible classes, we call the problem multiclass
classification.
• The common loss function for classification problems is called cross-entropy.
18
Tagging
• Some classification problems do not fit neatly into the
binary or multiclass classification setups.
• The problem of learning to predict classes that are not
mutually exclusive is called multi-label classification.
• Tagging problems are typically best described as multi-
label classification problems.
• The model can say the image depicts a cat, a dog, a
donkey, and a rooster.

19
Search and ranking
• In the field of information retrieval, we want to impose a ranking on a set of
items.

• Web search example: the goal is not just to determine whether a particular
page is relevant for a query, but rather, which one of the plethora of search
results is most relevant for a particular user.

• We care about the ordering of the relevant search results.

• The learning algorithm needs to produce ordered subsets of elements from a


larger set.
20
Recommender Systems
• Recommender systems are another problem setting that is related to
search and ranking.
• The problems are similar insofar as the goal is to display a set of
relevant items to the user.
• The main difference is the emphasis
on personalization to specific users
in the context of recommender
systems.
• Examples: movie, retail products,
music, or news recommendation.
21
Sequence Learning
• Data containing video snippets:
• Each snippet might consist of a different number of frames.
• Our guess of what is going on in each will be better if we consider the previous or
succeeding frames.
• Machine translation: ingesting sentences in some source language and
predicting their translation in another language.
• These problems are instances of sequence learning. They require a model to
either ingest sequences of inputs or to emit sequences of outputs (or both!).
• More sequence transformations:
• Automatic Speech Recognition.
• Text to Speech.

22
BITS Pilani
Pilani Campus

Kinds of Machine Learning:


Unsupervised Learning
Unsupervised Image segmentation

24
Clustering

25
Unsupervised Learning
• Till now we discussed Supervised Learning, wherein we feed the model a dataset
containing both the features and corresponding target values.
• Unsupervised learning: training a model to find patterns in a unlabeled dataset.
Some unsupervised learning techniques:
• Clustering: Grouping related examples.
• Principal component analysis (PCA): The process of computing the principal
components and using them to perform a change of basis on the data.
• Probabilistic graphical models: Used to describe the root causes of much of the data
that we observe.
• Generative adversarial networks (GANs): These give us a procedural way to synthesize
data, even complicated structured data like images and audio.
• Image segmentation
26
BITS Pilani
Pilani Campus

Kinds of Machine Learning:


Reinforcement Learning
Reinforcement Learning
• Reinforcement Learning (RL) is used to develop an agent that interacts with
an environment and takes actions.
• Applications
• Robotics.
• Dialogue systems.
• AI for video games.
• The behaviour of an RL agent is governed by a policy.
• A policy is just a function that maps from observations (of the
environment) to actions.
• The goal of reinforcement learning is to produce a good policy.
28
Robot Locomotion

29
Atari Games

30
Reinforcement Learning

31
Reinforcement Learning

32
Reinforcement Learning

33
Reinforcement Learning
• We can cast any supervised learning problem as an RL problem.
• For a classification problem. We could create an RL agent with
one action corresponding to each class.
• Then create an environment which gave a reward that was exactly equal to
the loss function from the original supervised problem.
• RL can also address many problems that supervised learning cannot.
• For example, in supervised learning we expect that the training input comes
associated with the correct label.
• But in RL, we do not assume that for each observation, the environment tells
us the optimal action.

34
Sources
• https://uvadlc.github.io/lectures/nov2022/lecture2.pdf

35

You might also like