0% found this document useful (0 votes)

2 views21 pages

Machine Learning Notes

The document provides an overview of machine learning, covering key concepts such as algorithms, feature selection, and types of learning including supervised and unsupervised. It discusses various machine learning methods like K-Nearest Neighbour, Naive Bayes, decision trees, and regression techniques, explaining their algorithms and applications. Additionally, it introduces neural networks, specifically the perceptron model, highlighting its components and types.

Uploaded by

Muhammed Shammas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views21 pages

Machine Learning Notes

Uploaded by

Muhammed Shammas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Module 1

Introduction to Machine Learning

Machine Learning (ML) is a field of artificial intelligence that focuses on
developing algorithms that allow computers to learn patterns from data and
make decisions or predictions without being explicitly programmed.

How Do Machines Learn?

Machines learn by using algorithms to process data, identify patterns, and
make decisions based on that data. Learning can be supervised (with labeled
data), unsupervised (with unlabeled data), or semi-supervised.

Selecting the Right Features

Feature selection involves identifying the most relevant variables (features) in
your data that contribute most significantly to the prediction or classification
task. Good feature selection improves model accuracy and reduces
computational cost.

Understanding Data
Numeric Variables
- **Mean**: The average value of a set of numbers.
- **Median**: The middle value when the numbers are sorted in ascending
order.
- **Mode**: The most frequently occurring value in a set of numbers.

Measuring Spread
- **Range**: The difference between the maximum and minimum values.
- **Variance**: The average of the squared differences from the mean.
- **Standard Deviation**: The square root of the variance, representing how
spread out the numbers are from the mean.

Review of Distribution
Uniform Distribution
A distribution where all outcomes are equally likely. Each value in the range has
the same probability of occurring.

Normal Distribution
A bell-shaped distribution that is symmetric about the mean. Most of the data
points cluster around the mean, with probabilities tapering off equally on both
sides.
Categorical Variables
These are variables that represent categories or groups. They can be nominal
(no natural order) or ordinal (natural order). Examples include gender, race, or
levels of satisfaction.

Dimensionality Reduction - Principal Component Analysis (PCA)

PCA is a technique used to reduce the dimensionality of data by transforming
the original variables into a new set of variables (principal components) that
are uncorrelated and that capture the maximum variance in the data. This
helps in simplifying the model, reducing computation time, and eliminating
multicollinearity.
MODULE 2

Lazy Learning
Lazy learning is a type of machine learning where the model generalizes the
training data only when a query is made, rather than during the initial training
phase. The algorithm does not explicitly construct a model, instead it stores the
training data and performs computations at prediction time.

Classification using K-Nearest Neighbour (KNN) Algorithm

KNN is a simple, non-parametric, and lazy learning algorithm used for
classification and regression.

- How KNN Works:

1. **Store Training Data**: During the training phase, the algorithm stores all
the training examples.
2. **Compute Distance**: When a query instance is given, it calculates the
distance between the query instance and all the training examples. Common
distance metrics include Euclidean, Manhattan, and Minkowski distances.
3. **Find Nearest Neighbours**: Select the k training examples that are
closest to the query instance.
4. **Predict Class**: For classification, the algorithm assigns the class that is
most common among the k nearest neighbours. For regression, it averages the
values of the k nearest neighbours.

- **Example**:
If you want to classify a new data point, the algorithm looks at the k nearest
data points in the training set and assigns the most common class among
them.
Probabilistic Learning - Naive Bayes Classifier
Naive Bayes is a probabilistic classifier based on applying Bayes' theorem with
strong (naive) independence assumptions between the features.

Bayes' Theorem
Bayes' theorem describes the probability of an event based on prior knowledge
of conditions that might be related to the event. The formula is:

P(A|B) = [p(B/A)*p(A)]/p(B)

Where:
- ( P(A|B) ) is the posterior probability of class A given predictor B.
- ( P(B|A) ) is the likelihood of predictor B given class A.
- ( P(A)) is the prior probability of class A.
- ( P(B) ) is the prior probability of predictor B.

Naive Bayes Algorithm

1. **Calculate Prior Probabilities**: Compute the prior probabilities for each
class.
2. **Calculate Likelihood**: For each feature, calculate the likelihood of the
feature given the class.
3. **Calculate Posterior Probability**: Use Bayes' theorem to compute the
posterior probability for each class.
4. **Predict Class**: Assign the class with the highest posterior probability to
the query instance.
- **Example**:
In spam email detection, the algorithm calculates the probability of an email
being spam given the presence of certain words and classifies the email based
on the highest probability.

Joint Probability
The joint probability of two events A and B is the probability that both events
occur. It is denoted as ( P(A / B) ) or ( P(A, B) ).

- **Example**:
If the probability of it raining (A) is 0.3 and the probability of it being windy (B)
is 0.4, the joint probability ( P(A/ B) ) would depend on the relationship
between A and B.

Conditional Probability
The conditional probability of an event A given that another event B has
occurred is denoted as P(A|B) .

- **Example**:
The probability that it will rain today given that it rained yesterday can be
calculated if the two events are dependent.

Summary
- **Lazy Learning**: Stores data and delays generalization until query time
(e.g., KNN).
- **K-Nearest Neighbour (KNN)**: Classifies based on the majority class of the
nearest k neighbours.
- **Probabilistic Learning (Naive Bayes)**: Uses Bayes' theorem with the
assumption of feature independence to classify data.
- **Bayes' Theorem**: Calculates posterior probabilities to update predictions.
- **Joint Probability**: The probability of two events occurring together.
- **Conditional Probability**: The probability of one event occurring given that
another event has occurred.

Classification using Decision Trees and Rules

Decision trees are a popular method for classification and regression tasks.
They use a tree-like model of decisions and their possible consequences,
including chance event outcomes, resource costs, and utility. Decision trees are
constructed using a divide-and-conquer strategy.

Divide and Conquer Strategy

The divide and conquer strategy in decision trees involves recursively
partitioning the data into subsets based on feature values. Each split
corresponds to a decision node in the tree, which branches out to further splits
or to a leaf node representing a class label or a continuous value in the case of
regression.

Decision Tree Algorithm

Steps in Building a Decision Tree

1. Select the Best Feature:

- At each node, the algorithm selects the feature that best splits the data into
subsets. This is typically done using criteria like Gini impurity, information gain,
or Chi-square.
2. **Split the Data**:
- Divide the data into subsets based on the selected feature. Each subset
represents a branch of the tree.

3. Create Decision Nodes and Leaf Nodes:

- Decision nodes correspond to the selected features and their conditions.
- Leaf nodes represent the final output (class labels or values).

4. **Recursive Partitioning**:
- Repeat the process for each subset until one of the stopping conditions is
met, such as:
- All instances in a subset belong to the same class.
- No remaining features to split on.
- A pre-defined maximum tree depth is reached.

5. **Pruning** (Optional):
- Pruning is used to reduce the size of the tree and prevent overfitting by
removing branches that have little importance.

### Example

Consider a dataset with features like "Weather" (Sunny, Rainy), "Temperature"

(Hot, Cold), and a target variable "Play" (Yes, No).

1. **Root Node**:
- Calculate entropy for the entire dataset.
- Calculate information gain for each feature.
- Choose the feature with the highest information gain, e.g., "Weather".

2. **Split Data**:
- Split data based on "Weather".
- Create branches for "Sunny" and "Rainy".

3. **Sub-Nodes**:
- For each branch, repeat the process:
- Calculate entropy for the subset.
- Calculate information gain for remaining features.
- Choose the best feature, e.g., "Temperature".

4. **Leaf Nodes**:
- Continue until all subsets are pure (only contain one class) or another
stopping criterion is met.

Summary

- **Decision Trees**: Use a tree structure to make decisions and classify data
based on features.
- **Divide and Conquer**: Recursively split data into subsets based on the best
feature at each step.
- **Decision Nodes**: Represent decisions based on features.
- **Leaf Nodes**: Represent class labels or values.
- **Algorithms**: ID3, C4.5, CART are common decision tree algorithms.
- **Pruning**: Can be applied to prevent overfitting by removing less
important branches.

Decision trees are intuitive and easy to interpret, making them a valuable tool
for both classification and regression tasks in machine learning.

Regression Methods
Regression analysis is a statistical method used to examine the relationship
between a dependent variable and one or more independent variables. It helps
in understanding how the dependent variable changes when any one of the
independent variables is varied, while the other independent variables are held
fixed.

Simple Linear Regression

Simple linear regression is a method used to model the relationship between a
single independent variable (X) and a dependent variable (Y) by fitting a linear
equation to the observed data.

y= a0+a1x+ ε
Where,

a0= It is the intercept of the Regression line (can be obtained putting x=0)
a1= It is the slope of the regression line, which tells whether the line is
increasing or decreasing.

ε = The error term. (For a good model it will be negligible)

Ordinary Least Squares (OLS) Estimation
The OLS method estimates the parameters of the linear regression model by
minimizing the sum of the squared differences between the observed values
and the values predicted by the linear model. The linear model is represented
as:

Ordinary Least Squares Formula – How to Calculate

OLS
In mathematical terms, the OLS formula can be written as the following:

Minimize ∑(yi – ŷi)^2

where yi is the actual value, ŷi is the predicted value. A linear regression model used for
determining the value of the response variable, ŷ, can be represented as the following
equation.
y = b0 + b1x1 + b2x2 + … + bnxn + e
where:

• y is the dependent variable

• b0 is the intercept
• b1, b2, …, bn are the coefficients of the independent variables x1, x2, …, xn
• e is the error term
The coefficients b1, b2, …, bn can also be called the coefficients of determination.
The goal of the OLS method can be used to estimate the unknown parameters (b1, b2,
…, bn) by minimizing the sum of squared residuals (SSR). The sum of squared residuals
is also termed the sum of squared error (SSE).
This method is also known as the least-squares method for regression.
Correlation

Correlation measures the strength and direction of the linear relationship

between two variables. The correlation coefficient (( r )) ranges from -1 to 1.

- Positive Correlation: As one variable increases, the other variable tends

to increase (( r > 0 )).
- **Negative Correlation**: As one variable increases, the other variable tends
to decrease (( r < 0 )).
- **No Correlation**: There is no linear relationship between the variables (( r
= 0 )).

The Pearson correlation coefficient is calculated as:

The Pearson correlation coefficient is the most often used metric of

correlation. It expresses the linear relationship between two variables in
numerical terms. The Pearson correlation coefficient, written as “r,” is as
follows:
𝑟=∑(𝑥𝑖−𝑥ˉ)(𝑦𝑖−𝑦ˉ)/√∑(𝑥𝑖−𝑥ˉ)2∑(𝑦𝑖−𝑦ˉ)2

where,
• r: Correlation coefficient
• 𝑥𝑖xi : i^th value first dataset X
• 𝑥ˉxˉ : Mean of first dataset X
• 𝑦𝑖yi : i^th value second dataset Y
• 𝑦ˉyˉ : Mean of second dataset Y
Multiple Linear Regression

Multiple linear regression models the relationship between a dependent

variable and two or more independent variables. The model is represented as:

• = the predicted value of the dependent variable

• = the y-intercept (value of y when all other parameters are set to 0)
• = the regression coefficient ( ) of the first independent variable ( )
(a.k.a. the effect that increasing the value of the independent variable has on
the predicted y value)
• … = do the same for however many independent variables you are testing
• = the regression coefficient of the last independent variable
• = model error (a.k.a. how much variation there is in our estimate of )

Summary

- Simple Linear Regression: Models the relationship between one

independent variable and one dependent variable using a linear equation.
- **Ordinary Least Squares (OLS)**: Estimates the parameters of the linear
regression model by minimizing the sum of squared residuals.
- **Correlation**: Measures the strength and direction of the linear
relationship between two variables.
- **Multiple Linear Regression**: Models the relationship between a
dependent variable and multiple independent variables using a linear
equation.

These regression methods are foundational tools in statistical analysis and

machine learning, used to predict outcomes and understand relationships
between variables.
MODULE 3

Neural Networks

Neural networks are computational models inspired by the human brain. They
are designed to recognize patterns, make decisions, and predict outcomes
based on input data.

Biological Motivation

Neural networks are inspired by the structure and function of the human brain,
which consists of interconnected neurons. Each neuron receives input signals,
processes them, and transmits output signals to other neurons. Similarly,
artificial neural networks consist of interconnected nodes (neurons) that
process information in a layered structure.

Perceptron

The perceptron is the simplest type of artificial neural network and serves as
the building block for more complex networks. It consists of a single neuron
with adjustable weights and a bias.

Basic Components of Perceptron

Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which
contains three main components. These are as follows:
o Input Nodes or Input Layer:

This is the primary component of Perceptron which accepts the initial data into the
system for further processing. Each input node contains a real numerical value.

o Wight and Bias:

Weight parameter represents the strength of the connection between units. This is
another most important parameter of Perceptron components. Weight is directly
proportional to the strength of the associated input neuron in deciding the output.
Further, Bias can be considered as the line of intercept in a linear equation.

o Activation Function:

These are the final and important components that help to determine whether the
neuron will fire or not. Activation Function can be considered primarily as a step
function. Activation functions introduce non-linearity into the network, enabling
it to learn complex patterns.

Types of Activation functions:

o Sign function
o Step function, and
o Sigmoid function
Perceptrons are a type of artificial neural network used in machine learning for
binary classification tasks. They are the simplest form of a neural network,
consisting of a single layer of weights that connect input features to the output.
There are several types of perceptrons, each with different characteristics and
applications:

1. **Single-Layer Perceptron**:
- **Definition**: A single-layer perceptron is the most basic form of a neural
network. It consists of a single layer of weights connecting input features to the
output.
- **Applications**: It is used for linearly separable problems, meaning
problems where a single hyperplane can separate the data into two classes.

2. Multi-Layer Perceptron (MLP):

- **Definition**: An extension of the single-layer perceptron, MLPs contain
one or more hidden layers of neurons between the input and output layers.
- **Applications**: MLPs can solve more complex, non-linearly separable
problems by using multiple layers and non-linear activation functions. They are
used for tasks such as image and speech recognition.

3. **Binary Perceptron**:
- **Definition**: A binary perceptron is a type of single-layer perceptron that
outputs binary results (0 or 1).
- **Applications**: It is used for binary classification tasks where the goal is
to categorize data into two distinct classes.

4. **Multi-Class Perceptron**:
- **Definition**: A perceptron that has been adapted to handle multi-class
classification problems. This is often achieved using a technique such as one-vs-
all (OvA) or one-vs-one (OvO) to extend the binary perceptron.
- **Applications**: Used for classification tasks with more than two classes,
such as categorizing types of animals in an image.

5. **Probabilistic Perceptron**:
- **Definition**: A probabilistic perceptron incorporates probabilistic
methods, such as using a sigmoid or softmax function for the output layer to
provide a probability distribution over possible output classes.
- **Applications**: Used in scenarios where probabilistic interpretation of
the output is beneficial, such as in probabilistic decision-making systems.

6. **Kernel Perceptron**:
- **Definition**: An extension of the perceptron that uses kernel functions to
map input features into a higher-dimensional space, allowing for the
classification of non-linearly separable data.
- **Applications**: Useful in scenarios where the data is not linearly
separable in its original space, similar to the application of support vector
machines (SVMs) with kernel tricks.

Understanding these different types of perceptrons helps in choosing the

appropriate model for specific machine learning tasks and problems.
Network Models

Neural networks are composed of multiple layers:

- Input Layer: Receives the input data.

- **Hidden Layers**: Perform intermediate computations and feature
extraction.
- **Output Layer**: Produces the final output.

Common types of neural networks include:

- Feedforward Neural Network: Information flows in one direction from

input to output.
- **Convolutional Neural Network (CNN)**: Specialized for image and spatial
data processing.
- **Recurrent Neural Network (RNN)**: Handles sequential data by
maintaining information in memory.

Cost Function

The cost function measures the difference between the predicted output and
the actual output. It guides the training process by quantifying the error.
Common cost functions include:

1. Regression Cost Function

2. Binary Classification cost Functions
3. Multi-class Classification Cost Function.

1. Regression Cost Function

Regression models are used to make a prediction for the continuous variables such
as the price of houses, weather prediction, loan predictions, etc. When a cost
function is used with Regression, it is known as the "Regression Cost Function." In
this, the cost function is calculated as the error based on the distance, such as:

1. Error= Actual Output-Predicted output

There are three commonly used Regression cost functions, which are as follows:

a. Means Error

this type of cost function, the error is calculated for each training data, and then the

mean of all error values is taken.

It is one of the simplest ways possible.

The errors that occurred from the training data can be either negative or positive.

While finding mean, they can cancel out each other and result in the zero-mean error for

the model, so it is not recommended cost function for a model.

However, it provides a base for other cost functions of regression models.

b. Mean Squared Error (MSE)

Means Square error is one of the most commonly used Cost function methods.

It improves the drawbacks of the Mean error cost function, as it calculates the square of the

difference between the actual value and predicted value. Because of the square of the

difference, it avoids any possibility of negative error.

The formula for calculating MSE is given below:

c. Mean Absolute Error (MAE)

Mean Absolute error also overcome the issue of the Mean error cost function by
taking the absolute difference between the actual value and predicted value.

The formula for calculating Mean Absolute Error is given below:

This means the Absolute error cost function is also known as L1 Loss.

It is not affected by noise or outliers, hence giving better results if the dataset has noise or outlier.

2. Binary Classification Cost Functions

Classification models are used to make predictions of categorical variables, such as predictions

for 0 or 1, Cat or dog, etc. The cost function used in the classification problem is known as

the Classification cost function. However, the classification cost function is different from

the Regression cost function.

One of the commonly used loss functions for classification is cross-entropy loss.

The binary Cost function is a special case of Categorical cross-entropy, where there is only

one output class. For example, classification between red and blue.

To better understand it, let's suppose there is only a single output variable Y

1. Cross-entropy(D) = - y*log(p) when y = 1

2.
3. Cross-entropy(D) = - (1-y)*log(1-p) when y = 0

The error in binary classification is calculated as the mean of cross-entropy for all N training data.

Which means:

1. Binary Cross-Entropy = (Sum of Cross-Entropy for N data)/N

3. Multi-class Classification Cost Function
A multi-class classification cost function is used in the classification problems for which

instances are allocated to one of more than two classes. Here also, similar to binary

class classification cost function, cross-entropy or categorical cross-entropy is commonly used

cost function.

It is designed in a way that it can be used with multi-class classification with the target values

ranging from 0 to 1, 3, ….,n classes.

In a multi-class classification problem, cross-entropy will generate a score that summarizes

the mean difference between actual and anticipated probability distribution.

Backpropagation Algorithm

Backpropagation is the algorithm used to train neural networks by updating the

weights to minimize the cost function.

1. **Forward Pass**: Calculate the output of the network for a given input.
2. **Compute Loss**: Calculate the error using the cost function.
3. **Backward Pass**: Propagate the error backward through the network to
compute gradients of the cost function with respect to the weights.
4. **Update Weights**: Adjust the weights using the gradients to minimize the
error (often using gradient descent).
Introduction to Deep Learning

Deep learning is a subset of machine learning that involves neural networks

with many layers (deep neural networks). It is particularly effective in
processing large amounts of data and extracting high-level features. Deep
learning has led to significant advancements in areas such as:

- Image and speech recognition

- Natural language processing
- Autonomous vehicles

Deep learning models, such as CNNs, RNNs, and transformers, are used to
solve complex tasks by learning hierarchical representations of data.

Summary

- **Neural Networks**: Inspired by the brain, used for pattern recognition and
prediction.
- **Perceptron**: Basic building block of neural networks.
- **Activation Functions**: Introduce non-linearity (e.g., Sigmoid, Tanh, ReLU).
- **Network Models**: Comprise input, hidden, and output layers.
- **Cost Function**: Measures error (e.g., MSE, Cross-Entropy).
- **Backpropagation**: Algorithm for training neural networks.
- **Deep Learning**: Utilizes deep neural networks for complex tasks.

R23-DWDM Syllabus
No ratings yet
R23-DWDM Syllabus
5 pages
Solutions Manual To Accompany Digital Signal Processing A Computer Based Approach 3rd Edition 9780073048376
100% (53)
Solutions Manual To Accompany Digital Signal Processing A Computer Based Approach 3rd Edition 9780073048376
8 pages
6th_SEM Machine Learning Notes PDF
100% (1)
6th_SEM Machine Learning Notes PDF
36 pages
Slides 3
No ratings yet
Slides 3
18 pages
ML Unit II_Final
No ratings yet
ML Unit II_Final
138 pages
Ee8591 Digital Signal Processing Part B & Part C Questions: Anna University Exams Regulation 2017
No ratings yet
Ee8591 Digital Signal Processing Part B & Part C Questions: Anna University Exams Regulation 2017
2 pages
unit 5
No ratings yet
unit 5
25 pages
Selection Sort
No ratings yet
Selection Sort
6 pages
PA.UNIT-III
No ratings yet
PA.UNIT-III
75 pages
Ezehdominic CS 3304 Analysis of Algorithm 4
No ratings yet
Ezehdominic CS 3304 Analysis of Algorithm 4
2 pages
Karl Andrea C
No ratings yet
Karl Andrea C
5 pages
202307_PavicJakov_WEKA
No ratings yet
202307_PavicJakov_WEKA
40 pages
AI_PROJECT_RESEARCH_PAPER (6)
No ratings yet
AI_PROJECT_RESEARCH_PAPER (6)
7 pages
Transfer Learning
No ratings yet
Transfer Learning
7 pages
Bi12-019 Bi12-263 LW3
No ratings yet
Bi12-019 Bi12-263 LW3
35 pages
Optimization Techniques
No ratings yet
Optimization Techniques
3 pages
Shortest Path Between Two Nodes On A Potential Railway System in Cyprus
No ratings yet
Shortest Path Between Two Nodes On A Potential Railway System in Cyprus
14 pages
Handwritten Text Recognition
No ratings yet
Handwritten Text Recognition
4 pages
Slide 3
No ratings yet
Slide 3
23 pages
NM Laboratory 3 Roots of Non Linear Functions Open Methods
No ratings yet
NM Laboratory 3 Roots of Non Linear Functions Open Methods
13 pages
Symbol Library Submenu Description Block Parameters Description
No ratings yet
Symbol Library Submenu Description Block Parameters Description
2 pages
AIch5 (2)
No ratings yet
AIch5 (2)
50 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Introduction To Communications: Source Coding
No ratings yet
Introduction To Communications: Source Coding
20 pages
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
No ratings yet
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
10 pages
Ensmble - Learning - ML - 5 - Jupyter Notebook
No ratings yet
Ensmble - Learning - ML - 5 - Jupyter Notebook
7 pages
ml important
No ratings yet
ml important
11 pages
Neural Networks and Fuzzy Systems: Neurolab
No ratings yet
Neural Networks and Fuzzy Systems: Neurolab
17 pages
Module 3_ Machine Learning Algorithms
No ratings yet
Module 3_ Machine Learning Algorithms
17 pages
Entropy (S) Log (P) : I 1c I I
No ratings yet
Entropy (S) Log (P) : I 1c I I
5 pages
A Practical Handbook of Speech Coders
No ratings yet
A Practical Handbook of Speech Coders
14 pages
Compression For Prefix-Free Codes.: // Make A Lookup Table From Trie
No ratings yet
Compression For Prefix-Free Codes.: // Make A Lookup Table From Trie
2 pages
DM - MP (1)
No ratings yet
DM - MP (1)
15 pages
Genetic Algorithms A Step by Step Tutorial
No ratings yet
Genetic Algorithms A Step by Step Tutorial
32 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
Ds Notes Mca
No ratings yet
Ds Notes Mca
30 pages
Big Data Notes
No ratings yet
Big Data Notes
33 pages
Introduction to Classification and Classification Algorithms
No ratings yet
Introduction to Classification and Classification Algorithms
9 pages
DM_06-Mar-2025
No ratings yet
DM_06-Mar-2025
13 pages
Machine learning algorithms laiki
No ratings yet
Machine learning algorithms laiki
123 pages
Image Sharpening Using Frequency Domain Filters
No ratings yet
Image Sharpening Using Frequency Domain Filters
18 pages
Divorce Prediction System: Devansh Kapoor 179202050
No ratings yet
Divorce Prediction System: Devansh Kapoor 179202050
12 pages
Fake Job Post Detection Using Machine Learning
100% (1)
Fake Job Post Detection Using Machine Learning
24 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Chapter 7 Supervised Learning
No ratings yet
Chapter 7 Supervised Learning
71 pages
PRELIM
No ratings yet
PRELIM
22 pages
UNIT-3[MLT]
No ratings yet
UNIT-3[MLT]
42 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
14 pages
ML4 - Decision Trees & Random Forest
No ratings yet
ML4 - Decision Trees & Random Forest
44 pages
Unit-4 Data Mining
No ratings yet
Unit-4 Data Mining
19 pages
Data Mining Intro IEP
No ratings yet
Data Mining Intro IEP
47 pages
Machine
No ratings yet
Machine
61 pages
Unit 3
No ratings yet
Unit 3
16 pages
Data Mining NOTES
No ratings yet
Data Mining NOTES
57 pages
2023-24_ML_NOTES_2
No ratings yet
2023-24_ML_NOTES_2
16 pages
C++ Challenge
No ratings yet
C++ Challenge
6 pages
AIYA SESSION 4
No ratings yet
AIYA SESSION 4
42 pages
ML Notes
No ratings yet
ML Notes
15 pages
Classification
No ratings yet
Classification
50 pages
optiSLang Einfuehrung Webinar
No ratings yet
optiSLang Einfuehrung Webinar
34 pages
MLE
No ratings yet
MLE
15 pages
Bisection Method: Muataz Abdulsmad
No ratings yet
Bisection Method: Muataz Abdulsmad
8 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
22 pages
Data Minning Unit 2-1
No ratings yet
Data Minning Unit 2-1
10 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
FMLanswerkey-IT 2.docx (1) (1) (1)
No ratings yet
FMLanswerkey-IT 2.docx (1) (1) (1)
11 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
ML UNIT 2 Sir
No ratings yet
ML UNIT 2 Sir
46 pages
Data Mining
No ratings yet
Data Mining
68 pages
2482
No ratings yet
2482
41 pages
Machine Learning Notes 1
No ratings yet
Machine Learning Notes 1
120 pages
3.popular Machine Learning Algorithm
No ratings yet
3.popular Machine Learning Algorithm
11 pages
Complete Download Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, PDF All Chapters
100% (4)
Complete Download Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, PDF All Chapters
55 pages
Proof CHT Is Zero - 240220 - 124904
No ratings yet
Proof CHT Is Zero - 240220 - 124904
2 pages
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
No ratings yet
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
16 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
Ai Notes
No ratings yet
Ai Notes
8 pages
Data Mining All Summary
No ratings yet
Data Mining All Summary
47 pages
Guidebook Machine Learning Basics PDF
100% (1)
Guidebook Machine Learning Basics PDF
16 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Machine Learning Algorithms For Breast Cancer Prediction
No ratings yet
Machine Learning Algorithms For Breast Cancer Prediction
8 pages
K. Deergha Rao - Signals and Systems (2018, Birkhäuser) - 2
50% (2)
K. Deergha Rao - Signals and Systems (2018, Birkhäuser) - 2
434 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Machine Learning - Brief
No ratings yet
Machine Learning - Brief
12 pages
Machine Learning 1707965934
No ratings yet
Machine Learning 1707965934
15 pages
Preface To The Second Edition V 1 1
No ratings yet
Preface To The Second Edition V 1 1
9 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Machine Learning Notes

Uploaded by

Machine Learning Notes

Uploaded by

Module 1

Introduction to Machine Learning

How Do Machines Learn?

Selecting the Right Features

Dimensionality Reduction - Principal Component Analysis (PCA)

Classification using K-Nearest Neighbour (KNN) Algorithm

- **How KNN Works**:

Naive Bayes Algorithm

Classification using Decision Trees and Rules

Divide and Conquer Strategy

Decision Tree Algorithm

Steps in Building a Decision Tree

1. **Select the Best Feature**:

3. **Create Decision Nodes and Leaf Nodes**:

Consider a dataset with features like "Weather" (Sunny, Rainy), "Temperature"

Simple Linear Regression

ε = The error term. (For a good model it will be negligible)

Ordinary Least Squares Formula – How to Calculate

Minimize ∑(yi – ŷi)^2

• y is the dependent variable

Correlation measures the strength and direction of the linear relationship

- **Positive Correlation**: As one variable increases, the other variable tends

The Pearson correlation coefficient is calculated as:

The Pearson correlation coefficient is the most often used metric of

Multiple linear regression models the relationship between a dependent

• = the predicted value of the dependent variable

- **Simple Linear Regression**: Models the relationship between one

These regression methods are foundational tools in statistical analysis and

Basic Components of Perceptron

o Wight and Bias:

Types of Activation functions:

2. **Multi-Layer Perceptron (MLP)**:

Understanding these different types of perceptrons helps in choosing the

Neural networks are composed of multiple layers:

- **Input Layer**: Receives the input data.

Common types of neural networks include:

- **Feedforward Neural Network**: Information flows in one direction from

1. Regression Cost Function

1. Regression Cost Function

1. Error= Actual Output-Predicted output

mean of all error values is taken.

It is one of the simplest ways possible.

the model, so it is not recommended cost function for a model.

However, it provides a base for other cost functions of regression models.

b. Mean Squared Error (MSE)

difference, it avoids any possibility of negative error.

The formula for calculating MSE is given below:

c. Mean Absolute Error (MAE)

The formula for calculating Mean Absolute Error is given below:

2. Binary Classification Cost Functions

the Regression cost function.

1. Cross-entropy(D) = - y*log(p) when y = 1

1. Binary Cross-Entropy = (Sum of Cross-Entropy for N data)/N

class classification cost function, cross-entropy or categorical cross-entropy is commonly used

ranging from 0 to 1, 3, ….,n classes.

In a multi-class classification problem, cross-entropy will generate a score that summarizes

the mean difference between actual and anticipated probability distribution.

Backpropagation is the algorithm used to train neural networks by updating the

Deep learning is a subset of machine learning that involves neural networks

- Image and speech recognition

You might also like

- How KNN Works:

1. Select the Best Feature:

3. Create Decision Nodes and Leaf Nodes:

- Positive Correlation: As one variable increases, the other variable tends

- Simple Linear Regression: Models the relationship between one

2. Multi-Layer Perceptron (MLP):

- Input Layer: Receives the input data.

- Feedforward Neural Network: Information flows in one direction from