(Fall 2024) Intro To ML
(Fall 2024) Intro To ML
Nope
for i in range(10, 30):
if image[10][i] > 0.5:
count += 1
if low_thresh < count <
high_thresh:
return 7
Some function choices work better than others, no matter how well you choose your parameters.
Why might this function not work as well?
(Hint: we switched the inequality)
Previously, we had a single point, above which
things were blue, and red otherwise. However,
this strategy doesn’t really work in 2D…
y = mx+b
2D Example
This idea continues on well beyond 2D as well.
Here, our data is in 3D and we hypothesize that
a 2D plane can separate the data, above which
points are marked blue, below which they are
marked red… and this again is our function
definition.
3D and so on…
Narrowing in on ML
● The art of ML is the following:
○ What form our function takes → this can be referred to as a model class
○ What specific parts of this function we are allowed to learn → these are our parameters
○ How we learn these parameters to approximate their “best” possible values
■ We will talk about this more later
● Every ML algorithm you will ever learn follows this pattern
○ Describe the generic form of a function with free parameters
○ Use the data to decide what free parameters will work best
● This is super important, PLEASE ASK QUESTIONS IF YOU HAVE THEM,
PLEASE ASK THEM, YOU ARE EXPECTED TO STILL BE SOMEWHAT IN THE
DARK HERE
Taxonomy of ML
● We’ve now got a definition of ML that describes ALL of ML in a way that is broad
enough to capture everything
● The set of problems in ML are super varied and it is often useful to have some
framework for how to classify different types of problems
Types of Machine Learning
Vocab
● Function / Model
○ These terms are used interchangeably
○ These refer to the function template (the “model class”) we have chosen for our problem
● Weights (and Biases)
○ Another way to denote the parameters in ML models that are learned from data
● Hyperparameter
○ This is some non-learnable parameter (like model size, model type, details about training procedure,
etc) that further specifies our overall learnable function
○ We need to manually choose these ourselves before we start learning the learnable parameters
● Loss Function / Cost Function / Risk Function
○ We haven’t introduced these terms yet, but they will come up later; just note that they are the same
(at least for our purposes)
Vocab
● “Feature”
○ This can refer to bits of our data (either the inputs themselves or some representation of them) that
we feed as input to a model
○ Ex: for a house, you might input quantities like its “number of bedrooms”, “number of floors”, “area in
square feet”, “cost of construction” etc. into a model that is trying to predict its price
○ Ex: for an image input, you squish its pixel values into a vector OR extract things like corners, edges,
shapes from it — these are both different “features” of the same image that can be fed into a model!
ML Pipeline
1. Define the Problem
2. Prepare the Data
3. Define the model + loss function
4. Minimize the loss function (train the model)
5. DONE!
Define the Problem
Define the Problem
● What task are you trying to solve with ML?
● What do your inputs look like?
● What should your outputs look like?
● What is our metric for success on a project level? What do we hope to achieve?
Prepare the Data
Data Representation / Preparation
● Collecting the data
○ Don’t take this for granted in the real world… garbage in ⇒ garbage out
● We need to represent our data with numbers
○ We need to go from text –> numbers
○ We need to go from image files –> numbers
○ Every data point needs to be represented with numbers in some way
● Feature Selection / Scaling
○ Finding which parts of the data are important and should be included as inputs to a model
○ May want to rescale some features so they’re all in the same range of values: normalization
● Vectors are one of the most basic and important representations of data
○ Basically take the important numbers and put them all in a vector (1d matrix) in a specific order
Case Study: Representing Labels
● One Hot Labeling
○ One of the most common labeling schemes for multi-class classification
■ Classification is a problem where you want a model to discern between ‘n’ different kinds of inputs,
like the problem of digit recognition
○ Instead of having a label of “4” to indicate the 4th class, make the label look like:
■ [0, 0, 0, 1, 0, 0, 0, … ]
○ In other words, put a 1 in the ith position of an all zeros vector to indicate the ith class
○ This scheme lets us view labels as probability distributions
■ Instead of simply saying that a data point is labeled as class 4 (see example above), we can say that is
has a 100% probability of belong to class 4 and 0% probability of belonging to any other class
■ This is especially useful since, as we will see next time, our models will output a probability
distribution over classes as well. For example, [0, 0, 0.1, 0.75, 0.15, 0, 0, …] might be an output where
the model thinks that a sample has 10% probability of belonging to class 3, 75% probability for class
4 and 15% for class 5.
Augmenting the Data
● We might want more data than we have, what can we do?
● We will find a bunch of transforms that don’t semantically change our data, i.e.,
both an input and its transformed version should have the same label
● Images:
○ We can add noise to images or blur/sharpen them slightly
○ We can rotate images or warp them a little bit
● Text:
○ We can replace some words with known synonyms
● This artificially gives us more examples to use during training
Augmented Data