0% found this document useful (0 votes)
38 views51 pages

(Fall 2024) Intro To ML

Uploaded by

David Earnest
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views51 pages

(Fall 2024) Intro To ML

Uploaded by

David Earnest
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Introduction to ML

By: ML@B Edu Team


● Meet your course staff!
Outline ● Logistics
● Course introduction
● Intro to ML
Meet your course staff!
Course staff

Saathvik Selvan Vanessa Teo Rohan


Viswanathan

Derek Xu Chuyi Shang Eric Wang


Logistics
Logistics
● Website: ml.berkeley.edu/decal
● Edstem: https://edstem.org/us/join/bdSzpg
● Gradescope code: PY37RE
● Office Hours times/location is TBD
○ Most likely some time on Thursday or Friday and most likely some location in Cory or Soda
● Enrollment codes have been sent out, make sure to use them!
● Syllabus is on the website, make sure to read it for more details and policies!
● We will take attendance starting Wednesday. Everyone is excused today.
● Course communication: make a private post on Edstem!
Brief Outline

Deep Learning Fundamental CV Transformers Generative AI


● Intro to ML ● Images and ● Sequence Modeling ● Variational
● Intro to Neural Convolutions ● Transformers Autoencoders
Networks ● Convolutional ● Vision ● Intro to GANs
● Optimization and Neural Networks Transformers ● Advanced GANs
Modern Deep ● Advanced CNNs ● Multimodality ● Vector Quantization
Learning ● Object Detection ● Self-supervised ● Intro to Diffusion
● Representations ● Segmentation Learning ● Diffusion
and Transfer ● Advanced Applications
Learning Detection and
Segmentation
Grading Breakdown
This course is graded on a PNP basis. You need a 70% to pass the course. Here is how
the points will be distributed this semester:

● Attendance (10%) — excused just for today


● Weekly Quizzes (20%) — quiz 1 due next monday Sep 16
● Programming Assignment 1 (10%) — homework 1 due monday Sep 23
● Programming Assignment 2 (20%)
● Programming Assignment 3 (20%)
● Programming Assignment 4 (20%)
Course Introduction
What is Computer Vision?
● Computer vision is broadly the subset of AI that deals with images
● It is used to clear checks, deliver mail, drive cars, and create art
● This course is a bootcamp in computer vision as it intersects with deep learning
Basic computer vision tasks Large-scale unsupervised learning

3D Vision Text-Based Image Generation


Introduction to ML
What is ML?
● ML is the paradigm of approximating a
function from data
○ A function here is just a set of rules that takes
in an input and spits out some output (like a
label or a predicted value)

● Why ML instead of programming the functions ourselves?


○ Sometimes we can’t possibly understand the patterns in our data, so it is extremely hard to come up
with these rules!
○ ML is fundamentally the process of allowing our data to guide a function’s creation
input

Nope
for i in range(10, 30):
if image[10][i] > 0.5:
count += 1
if low_thresh < count <
high_thresh:
return 7

Challenge: write a function to classify digits?


Is there some way of separating 7’s
from other digits?

Challenge: write a function to classify digits?


Narrowing in on ML
● Think of it as template creation!
○ When we usually define a function by hand, we have to
specify EVERYTHING
○ With ML, we are going to define a function (with math), but
leave out a few free parameters that will be learned from the
data: these will dictate the exact behavior of the function
○ For now, don’t think about the process of learning good
choices of parameters… that will come later!
● Example:
○ We will define our function to have the form:
if (input < a ) –> output1, else –> output2,
and learn the best value of a from our data
○ Here ‘a’ is the free parameter that specifies the exact
behavior of our example function
Note: This function is just a hypothesized
function that we hope will work well based on
what the data looks like

What would a good value for ‘a’ be?


Input dimension

What would a good value for ‘a’ be? Probably a = 1


Input dimension

Some function choices work better than others, no matter how well you choose your parameters.
Why might this function not work as well?
(Hint: we switched the inequality)
Previously, we had a single point, above which
things were blue, and red otherwise. However,
this strategy doesn’t really work in 2D…

Now, we might try and hypothesize that a 1D


line separates the data instead, above which all
points are blue but red below.

This is our FUNCTION that we are


hypothesizing exists… a 1D line in the form

y = mx+b

In this case, our parameters are m (the slope)


and b (the intercept/offset)
2D Example
This line
y = -1.09*x + 2.09
ends up being about as good as we can possibly
get, if we classify everything above the line as
blue and everything below as red

This isn’t perfect, but again, it’s not really


possible to do any better

Don’t worry for now about these values were


calculated or why it’s the best line

2D Example
This idea continues on well beyond 2D as well.
Here, our data is in 3D and we hypothesize that
a 2D plane can separate the data, above which
points are marked blue, below which they are
marked red… and this again is our function
definition.

This can further continue on forever into higher


dimensions!

The challenge is that we can’t immediately


visualize higher dimensional data, so it will be
difficult to say if the data will nicely separate
along some linear boundary like this or not…

3D and so on…
Narrowing in on ML
● The art of ML is the following:
○ What form our function takes → this can be referred to as a model class
○ What specific parts of this function we are allowed to learn → these are our parameters
○ How we learn these parameters to approximate their “best” possible values
■ We will talk about this more later
● Every ML algorithm you will ever learn follows this pattern
○ Describe the generic form of a function with free parameters
○ Use the data to decide what free parameters will work best
● This is super important, PLEASE ASK QUESTIONS IF YOU HAVE THEM,
PLEASE ASK THEM, YOU ARE EXPECTED TO STILL BE SOMEWHAT IN THE
DARK HERE
Taxonomy of ML
● We’ve now got a definition of ML that describes ALL of ML in a way that is broad
enough to capture everything
● The set of problems in ML are super varied and it is often useful to have some
framework for how to classify different types of problems
Types of Machine Learning
Vocab
● Function / Model
○ These terms are used interchangeably
○ These refer to the function template (the “model class”) we have chosen for our problem
● Weights (and Biases)
○ Another way to denote the parameters in ML models that are learned from data
● Hyperparameter
○ This is some non-learnable parameter (like model size, model type, details about training procedure,
etc) that further specifies our overall learnable function
○ We need to manually choose these ourselves before we start learning the learnable parameters
● Loss Function / Cost Function / Risk Function
○ We haven’t introduced these terms yet, but they will come up later; just note that they are the same
(at least for our purposes)
Vocab
● “Feature”
○ This can refer to bits of our data (either the inputs themselves or some representation of them) that
we feed as input to a model
○ Ex: for a house, you might input quantities like its “number of bedrooms”, “number of floors”, “area in
square feet”, “cost of construction” etc. into a model that is trying to predict its price
○ Ex: for an image input, you squish its pixel values into a vector OR extract things like corners, edges,
shapes from it — these are both different “features” of the same image that can be fed into a model!
ML Pipeline
1. Define the Problem
2. Prepare the Data
3. Define the model + loss function
4. Minimize the loss function (train the model)
5. DONE!
Define the Problem
Define the Problem
● What task are you trying to solve with ML?
● What do your inputs look like?
● What should your outputs look like?
● What is our metric for success on a project level? What do we hope to achieve?
Prepare the Data
Data Representation / Preparation
● Collecting the data
○ Don’t take this for granted in the real world… garbage in ⇒ garbage out
● We need to represent our data with numbers
○ We need to go from text –> numbers
○ We need to go from image files –> numbers
○ Every data point needs to be represented with numbers in some way
● Feature Selection / Scaling
○ Finding which parts of the data are important and should be included as inputs to a model
○ May want to rescale some features so they’re all in the same range of values: normalization
● Vectors are one of the most basic and important representations of data
○ Basically take the important numbers and put them all in a vector (1d matrix) in a specific order
Case Study: Representing Labels
● One Hot Labeling
○ One of the most common labeling schemes for multi-class classification
■ Classification is a problem where you want a model to discern between ‘n’ different kinds of inputs,
like the problem of digit recognition
○ Instead of having a label of “4” to indicate the 4th class, make the label look like:
■ [0, 0, 0, 1, 0, 0, 0, … ]
○ In other words, put a 1 in the ith position of an all zeros vector to indicate the ith class
○ This scheme lets us view labels as probability distributions
■ Instead of simply saying that a data point is labeled as class 4 (see example above), we can say that is
has a 100% probability of belong to class 4 and 0% probability of belonging to any other class
■ This is especially useful since, as we will see next time, our models will output a probability
distribution over classes as well. For example, [0, 0, 0.1, 0.75, 0.15, 0, 0, …] might be an output where
the model thinks that a sample has 10% probability of belonging to class 3, 75% probability for class
4 and 15% for class 5.
Augmenting the Data
● We might want more data than we have, what can we do?
● We will find a bunch of transforms that don’t semantically change our data, i.e.,
both an input and its transformed version should have the same label
● Images:
○ We can add noise to images or blur/sharpen them slightly
○ We can rotate images or warp them a little bit
● Text:
○ We can replace some words with known synonyms
● This artificially gives us more examples to use during training
Augmented Data

Note — be very aware of what


your data looks like before
selecting an augmentation. Is a
rotated 9 still a 9?
Partitioning the Data
● In ML, we want to know how well our systems generalize
● We want to see how well these models perform on data they haven’t seen before
○ ML is useless unless it can work for new data in the real world
○ We need to have specific data that we set aside to test generalization with: data our model hasn’t
seen before during its training phase
● We make 3 splits of our data (ratio of these splits vary):
○ Training data: data used for optimizing the parameters
○ Validation data: data used to diagnose the training stage; to help select the kinds of models and
techniques that perform the best for the current problem
○ Testing data: data used for testing a model’s generalization only at the very end of the process
● There are more advanced ways of doing this (not covered here)
○ Ex. K-fold cross validation
Define the Model and Loss Function
Define the Model
● This is where we define our model
○ We are going to talk a LOT about different kinds of models in this course; don’t worry too much if
you don’t know any of them right now
● Which model can be used to solve our ML problem?
○ Some modeling techniques only work for very specific tasks
● Which model can best capture the structure of our data?
○ You may not know this right away but, for now, take an educated guess!
○ Remember what your model outputs should look like:
■ Will it be a single value, multiple values, images, text, etc?
○ We can also train different models on our training data, test them on the validation data and choose
the best-performing one
■ This is called hyperparameter tuning
○ Don’t spam model tests, quality over quantity
Define the Loss
● Reminder: we want to learn an optimal selection of parameters
○ We need some metric to optimize for
○ Different models will have different ways of learning specific parameters, but ALL of them will try to
optimize some kind of metric/function
● Once you have a model and your data, your job will be to minimize a loss function
○ High loss ⇒ bad parameters, low loss ⇒ good parameters
● Example: Supervised Learning
○ In supervised learning, we want our model’s output to match some labels, both of which are vectors
of the same shape
○ We can define the loss as the Mean Squared Error between our labels and model predictions
Train the Model,
Minimize the Loss
Training
● By now, we have our data, model, and loss function selected
● Now is the phase when we apply different algorithms to select the best
parameters using the training dataset
● Different models have different training procedures
○ These will be covered when we introduce each model
DONE!
Finishing Steps
● Use your testing set to measure your model’s final performance
● If it is a common, widely-available dataset, you can use it to compare your results
against state-of-the-art systems and see how you stack up!
○ MNIST (this is the handwritten digits dataset you saw earlier) is often used for proof-of-concepts
○ Imagenet is a major benchmark for image classification and generation
Generalization in ML
Generalization
● How well a model generalizes can be
characterized by the difference between
its performance on data it has seen vs
not seen
● If a model is made more “complex”, it
might be able to learn more “complex
patterns” but we also risk simply
memorizing the training data instead of
truly learning anything from it
Bias / Variance
● Bias:
○ A tendency towards certain predictions
○ How wrong is the model on average, regardless
of its training data?
● Variance:
○ How sensitive is the model to changes in the
training data?
○ Small changes in dataset → large changes in our
model and its predictions
● A good model needs to be both firm and
flexible: able to capture varying and
complex data yet robust enough to
generalize beyond just the training samples
How different hyperparameter settings (“model complexity”) can
affect generalization
Overfitting and Underfitting
● When we train (or perform hyperparameter tuning), we care about generalization
○ We need to hold out a small segment of our data to test our model with as we train (validation set)
○ We care about the discrepancy between training and validation metrics, as it is a good proxy for the
model’s final generalization capabilities on the unseen test set
● This involves a balance between the bias and variance errors
○ Underfitting: the model performs poorly on both the training and validation data
■ This indicates that you can likely increase model complexity without taking too much of a hit
to generalization performance
■ In terms of bias/variance, this means you have a high bias error
○ Overfitting: the model performs great on the training data but poorly on the validation data
■ This indicates that our model is in some way too complex and has started memorizing instead
of learning; it needs to be scaled down
■ In terms of bias/variance, this means you have a high variance error
Wrapping Up
Important Takeaways
● ML is template creation
● ML Pipeline
○ Define the problem
○ Prepare the data
○ Define the model and a loss function
○ Train the model
○ Report results
● There frequently exists a tradeoff between model complexity and generalization,
known as the bias-variance tradeoff
Contributors
● Slides by Jake Austin and Brian Liu
● Edited by Aryan Jain

You might also like