0% found this document useful (0 votes)
12 views8 pages

ML Video

Uploaded by

sjit212024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views8 pages

ML Video

Uploaded by

sjit212024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Machine Learning

Machine Learning Algorithms:


● Supervised Learning.
● Unsupervised Learning.
● Reinforcement learning.

.Supervised Learning: Make predictions from our given data.

Unsupervised Learning: Learn patterns from data.

Linear Regression:

● Hypothesis Function:
● Objective: Fit the data by minimizing the error between predictions and actual values.

Cost Function:
● Measures the squared error between predicted and actual values.

Gradient Descent: (Concept is not clear to me)


● Algorithm to minimize the cost function.
Gradient Descent for multiple variables:

Computing partial derivatives of the cost function with respect to each variable to minimize the
error.

Feature Scaling:
● Standardizes input features for faster convergence during optimization.
.
● Method: Mean Normalization:

Learning Rate: α

● Determine size of step to minimize cost function.


● For sufficiently small α, cost function decrease in every iteration.
● Large α may not converge.
Logistic Regression for Classification

● Predicts probabilities for binary outcomes using the sigmoid function.


● Assigns a label based on a threshold (e.g., 0.5).

Normal equation:

Hypothesis :

● The hypothesis hθ(x) models the relationship between inputs and output .

● hθ(x)= 1/(1+e^-(θt(x)

Decision Boundary:
● Separates data classes based on the hypothesis.
● If hθ(x) >=0.5; then output y=1.
● If hθ(x) <0.5; then y=0.

Advanced Optimization for Logistic Regression:


Advanced optimization methods aim to minimize the cost function J(θ) more efficiently than
gradient descent.

One-vs-All Classification:
One-vs-All trains a separate binary classifier for each class.

Octave:
Language that help plotting data, performing matrix operations for machine learning algorithms.

Neural Networks:

● Neural networks are models inspired by the human brain, designed to process inputs,
learn patterns, and make predictions.
● It is capable of learning both linear and non-linear relationships in data

Input layers: Takes in the data( image/number)


Hidden layers: Process the data using weights and biases. (shape/color)
Output layers: Give final prediction.

Non-Linear Hypothesis
Neural networks create non-linear hypotheses, enabling them to handle complex problems like
classifying patterns.

Neurons and Brain:

Algorithms that try to mimic the brain.

Model Representation:
● Receives raw features.
● Perform transformations using weights, biases, and activation functions.
● Produces prediction
Examples and Intuitions:
Binary Classification: Spam vs. Not Spam.
Non-Linear Problems: Solving tasks like the OR, XOR problem, where linear models fail.
Multi-Class Classification: Recognizing handwritten digits

Multi-Class Classification:

Multi-class classification is the task of predicting one label from three or more possible classes.

Examples:Handwritten digit recognition (classes: 0–9)

Cost Function
The cost function measures how well the neural network's predictions match the actual labels.
● Binary Classification: Measures the error for outputs in [0,1].
● Multi-Class Classification: K classes.

Backpropagation Algorithm:
Backpropagation calculates gradients of the cost function
● Perform forward propagation to compute predictions.
● Calculate errors at the output layer.
● Propagate the error backward through the layers, adjusting weights and biases.

Implementation Note - Unrolling Parameters:


● Have initial parameters :Ø1,Ø2,Ø3.
● Require parameters in vector form.
● Unrolling parameters involves converting weight matrices and bias vectors into a single
column vector.

Gradient checking
Gradient checking ensures the correctness of backpropagation by comparing analytically
computed gradients to numerical approximations.

𝑑/𝑑0(𝐽(Ø) = 𝐽(Ø + ε) − 𝐽(Ø − ε)/2ε


Random Initialization:
● Weights are initialized to small random values to avoid symmetry problems, where all
neurons in a layer learn the same features.
● Random initialization ensures diverse learning paths for different neurons.

Summary of Neural Networks:


● Initialize weights and biases.
● Perform forward propagation to calculate predictions.
● Compute cost/loss.
● Use backpropagation to calculate gradients.
● Update weights using gradient descent

Example:
Neural network used for lane detection, object recognition, and path planning in autonomous
vehicles.

Advice for Applying Machine Learning:


Evaluating a Hypothesis

● Focuses on understanding the performance of a model using training, validation, and


test datasets.
● Overfitting occurs when the model performs well on training data but poorly on unseen
data.

Model Selection:
● Training Set:60%
● Cross validation set:20%
● Test Set: 20%

Diagnosing Bias vs Variance:

Bias(Underfit):

● Train set error high.


● Cross validation high.

Variance:(Underfit):

● Train set low.


● Cost function is much greater than Train set error.

Regularization :

Helps reduce variance(overfitting) and improves generalization ability of model without


significantly incresing bias(underfitting).

Learning Curves:

If a learning algorithm is suffering from high bias getting more data will not help much.

Error Metrics:

Precision= True positives / Predicted Positive

Recall= True positive / Number of actual positive.

Trading Off Precision and Recall:

● Lower threshold: increase recall, decreasing precision.


● Higher threshold: increase precision, decreasing recall.
F1 score balanced precision & recall

F1= 2* (P * R)/(P+R).

Support Vector Machines (SVMs):


SVM finds best boundary to separate data into classes.

● SVM tries to find a hyperplane that maximizes the margin between classes.
● The points closest to the hyperlane are called support vector.

Large Margin Intuition:

● Maximize margin to improve model performance.


● Reduce risk of overfitting.

Clustering:

K-Means Algorithm: Alternates between assigning points to the nearest centroid and
updating centroids.

Choosing Number of Clusters:

Look for the point where adding more clusters doesn't significantly reduce the within-cluster
variance.

You might also like