0% found this document useful (0 votes)
7 views

ML Overview

Uploaded by

naga manasa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

ML Overview

Uploaded by

naga manasa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

ML OVERVIEW ‘-

CSE4/574, Fall 2023

Some of the figures are provided by Chris Bishop


from his textbook: ”Pattern Recognition and Machine Learning”

1
COURSE
INFORMATION ‘-

Refer the syllabus in UBLearn

2
3

Course Information
• Register for the class on piazza --- our main resource for discussion and
communication
• https://piazza.com/buffalo/fall2023/cse474574_2/home
‘-
• Python knowledge Poll in Piazza: https://piazza.com/class/lluz3rjjhb11gg/post/6
• Assignment 0(will be released on Friday Sept 08) is due Sept 14 EOD
• TA information in Syllabus/UBLearn Course Logistics page

3
4

Key points to Remember


 Tests may be in UBLearn controlled environment and will need system compatible to its environment with
webcam installed. NO exception requests will be granted
 Come to Lectures and pay attention. This will definitely make your subsequent tasks easier
 Class participation matters. In case if cannot attend the live‘-
class, please contact TAs beforehand.
 Study regularly, as I would love to see each of my students finishing this course with flying colors….☺
 Revise via Textbook (immediately). The textbook references are there in the syllabus, we will also mention
it again while discussing the topic in the class.
 Make a habit of writing in pen and paper (typing does not help much) to make sure you have
understood a topic well
 At any time you are stuck, your first job will be to “define your problem” for us
 Your first contact can be your TA, but please do not ever shy away to just drop by my office hours ( or
though scheduling an appointment would help me setting an exclusive slot for you). Just do not avoid
and leave things till the exams.
4
 We are here to help you learn. So, please help me to help you.
5

Syllabus is Important……

‘-
Failing to follow Classroom Etiquette may have fetch some untoward
consequences….

5
What is Machine Learning?
• It is very hard to write programs that solve problems like recognizing a face.
– We don’t know what program to write because we don’t know how our
brain does it.
– Even if we had a good idea about how to do‘-it, the program might be
horrendously complicated.
• Instead of writing a program by hand, we collect lots of examples that specify
the correct output for a given input.
• A machine learning algorithm then takes these examples and produces a
program that does the job.
– The program produced by the learning algorithm may look very different
from a typical hand-written program. It may contain millions of numbers.
– If we do it right, the program works for new cases as well as the ones we
trained it on.
6
What is it for?
• Automating automation
• Getting computers to program themselves
• Can we have a system learn the task from the given data, without human setting
explicitly a set of rules to solve?
• The future of Computer Science!!!
‘-

7
A classic example of a task that requires machine
learning:

It is very hard to say what makes


a2
‘-
Far better results can be obtained by
adopting a machine learning approach in
which a large set of N digits {x1,...,xN} called
a training set is used to tune the parameters
of an adaptive model. The categories of the
digits in the training set are known in
advance, typically by inspecting them
individually and hand-labelling them. We can
express the category of a digit using target
vector t, which represents the identity of the
corresponding digit. Suitable techniques for
representing cate- gories in terms of vectors
will be discussed later. Note that there is 8
one such target vector t for each digit image
Why Learn?

• One trivial solution would be to generate and share a set of rules, that are found useful
(by human) to solve the problem.
• But in most of the real life applications it is difficult to specify the rules:
‘-

How do we generate a
set of exhaustive rules
to identify a cat in an
image?

9
Some more examples of tasks that are best solved by
using a learning algorithm
• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
‘-
– Medical images
• Generating patterns:
– Generating images or motion sequences
• Recognizing anomalies:
– Unusual sequences of credit card transactions
– Unusual patterns of sensor readings in a nuclear power plant or
unusual sound in your car engine.
• Prediction: 10
– Future stock prices or currency exchange rates
How do we learn in Handwriting Recognition?
• It is very difficult to write a program for such scenarios, in which enumerating the exact rules is nearly impossible.
• What makes 2 different than 7?
• How does our brain do it?
• We collect examples of various handwritten digits, which specify the correct output.
• Machine learning algorithm uses these examples and produces a program that can also perform a similar task of predicting the
correct output. ‘-
• This is appealing as Data is cheap and abundant (data warehouses, data marts);
• Customer transactions to consumer behavior: People who bought “Blink” also bought “Outliers” (www.amazon.com)
• knowledge is expensive and scarce.
• The ability to categorize correctly new examples that differ from those used for training is known as
generalization. In practical applications, the variability of the input vectors will be such that the training data can comprise
only a tiny fraction of all possible input vectors, and so generalization is a central goal in pattern recognition.
• For most practical applications, the original input variables are typically preprocessed to transform them into some
new space of variables where, it is hoped, the pattern recognition problem will be easier to solve. For instance, in the
digit recognition problem, the images of the digits are typically translated and scaled so that each digit is contained within a box
of a fixed size. This greatly reduces the variability within each digit class, because the location and scale of all the digits are
now the same, which makes it much easier for a subsequent pattern recognition algorithm to distinguish between the different
classes. This pre-processing stage is sometimes also called feature extraction. 11
Learning Algorithm is useful for many Tasks…
1. Classification: Determine which discrete category the example is

‘-

Is this a Panda?
12
Machine Learning for Recommendation System

‘-

Recommender Systems: Noisy data, commercial pay-o (e.g., Amazon,


Netflix). 13
Machine Learning for Computer Vision

• Computer vision: detection, segmentation, search, etc. In a nutshell, make computer


understand the image and video
What
‘- kind of scene?
Where are the cars?

How far is the


building?


14
Some web-based examples of machine learning
• The web contains a lot of data. Tasks with very big datasets often use
machine learning
– especially if the data is noisy or non-stationary.
‘-
• Spam filtering, fraud detection:
– The enemy adapts so we must adapt too.
• Recommendation systems:
– Lots of noisy data. Million dollar prize!
• Information retrieval:
– Find documents or images with similar content.
• Data Visualization:
15
– Display a huge database in a revealing way
What is Intelligence for
• Reasoning
• – Puzzles, Judgments
• Planning
‘-
• – Action sequences
• Learning
• – Improve with data
• Natural language
• Integrating skills
• Abilities to sense, act

16
Slide courtesy: Srihari
Machine Learning Models
• Knowledge Representation works with facts/assertions and develops rules
of logical inference. The rules can handle quantifiers. Learning and
uncertainty are usually ignored.
• Expert Systems used logical rules or conditional ‘- probabilities provided by
“experts” for specific domains.
• Graphical Models treat uncertainty properly and allow learning (but they
often ignore quantifiers and use a fixed set of variables)
– Set of logical assertions → values of a subset of the variables and local
models of the probabilistic interactions between variables.
– Logical inference → probability distributions over subsets of the
unobserved variables (or individual ones)
– Learning = refining the local models of the interactions. 17
Statistics---------------------Artificial Intelligence
ML • Low-dimensional data (e.g. less • High-dimensional data (e.g.
Spectrum than 100 dimensions) more than 100 dimensions)
• The noise is not sufficient to
obscure the structure in the
• Lots of noise in the data
data if we process it right.

‘- • There is a huge amount of


• There is not much structure in structure in the data, but the
the data, and what structure structure is too complicated to
there is, can be represented by be represented by a simple
a fairly simple model. model.

• The main problem is figuring out


• The main problem is a way to represent the
distinguishing true structure complicated structure that
from noise. allows it to be learned. 18
A classic example of a task that requires machine
learning:

It is very hard to say what makes


a2
‘-

19
Overview of Machine Learning Steps
• Gather Training set
• Performing Feature Extraction is a frequently adopted task for a new data representation
For example, in digit recognition task, every digit image is scaled and translated so each digit may fit within
‘- class, which helps in the subsequence
a box of a fixed. This greatly reduces the variability within each digit
steps.

• Estimate a function y(x), which may take a new digit x and predict its label
• In the learning phase, the model is trained on the basis of the training data
• In the test phase, the model is expected to determine the identity of each new digit image
• The ability to categorize correctly these new samples is called Generalization

20
Types of Machine Learning
• Supervised learning
– Learn to predict output when given an input vector, based on the learning on a training collection
• Who provides the correct answer?
• Classification ( if the target is discrete like hand written digit recognition) or Regression (if the
target is continuous like predicting yield in a manufacturing firm)
• Reinforcement learning
– Find suitable action to take in a given situation to maximize payoff
‘-
• Not much information in a payoff signal, we do not have examples of optimal outputs. They are
expected to be discovered by a process of trial and error.
• Payoff is often delayed (a sequence of actions helps you win the chess game, and the reward or
penalty for every step may be evaluated at the end of the game)
• Credit assignment problem (reward must be attributed to all the moves that led to final win/loss,
but note that just winning the game at the end does not mean that each step was equally wisely
taken)
– Reinforcement learning is an important area that will not be covered in this course.
– Tradeoff between exploration and exploitation

21
Types of Machine Learning
• Unsupervised learning
– Create an internal representation of the input e.g. form clusters; extract features

• Compression
– Inputs are typically vector
– Goal: deliver an encoder and decoder such that size of encoder output is much smaller than original
input, but composition of encoder followed by decode very similar to original input
‘-
• Outlier detection
– Inputs are anything
– Goal: select highly unusual cases from new and given dataset

22
Hypothesis Space
• One way to think about a supervised learning machine is as a device that explores a
“hypothesis space”.
– Each setting of the parameters in the machine is a different hypothesis about the function
that maps input vectors to output vectors.
‘- a region of hypothesis space.
– If the data is noise-free, each training example rules out
– If the data is noisy, each training example scales the posterior probability of each point in
the hypothesis space in proportion to how likely the training example is given that
hypothesis.
• The art of supervised machine learning is in:
– Deciding how to represent the inputs and outputs
– Selecting a hypothesis space that is powerful enough to represent the relationship
between inputs and outputs but simple enough to be searched.

23
Searching for a Hypothesis Space
• The obvious method is to first formulate a loss function and then adjust the parameters to
minimize the loss function.
– This allows the optimization to be separated from the objective function that is being
optimized. ‘-
• Squared difference between actual and target real-valued outputs.
• Number of classification errors
– Problematic for optimization because the derivative is not smooth.
• Negative log probability assigned to the correct answer.
– This is usually the right function to use.
– In some cases it is the same as squared error (regression with Gaussian output noise)
– In other cases it is very different (classification with discrete classes needs cross-entropy
error)
24
Loss Functions
• Squared difference between actual and target real-valued outputs.
• Number of classification errors
– Problematic for optimization because the derivative is not smooth.
‘-
• Negative log probability assigned to the correct answer.
– This is usually the right function to use.
– In some cases it is the same as squared error (regression with Gaussian
output noise)
– In other cases it is very different (classification with discrete classes
needs cross-entropy error)

25
In the next 1/2 classes we will revise some required
Math concepts
• Suggested pre-read: Mathematics for Machine Learning Chapter 2-6

‘-

26

You might also like