Machine: Learning ATO Z - I
Machine: Learning ATO Z - I
Machine: Learning ATO Z - I
EDITION
MACHINE
LEARNING
ATO Z -i
MASTERY FROM A TO Z WITH CODE
PRACTICAL EXAMPLES,
ALGORITHMS, AND APPLICATION.
VAN SHI KA
INDEX
1. Introduction to machine learning
• In a nutshell, machine learning
• Key point
• features of machine learning
6. Three sets
• Training set
• Validation set
• Test set
8. Supervised learning
• Classification
• Regression
9. Classification
• Introduction to Classification
• Type of classification
• Common classification algorithms
• Type of learners in classification algorithm
• Evaluating classification models
• How classification works
• Applications of classification algorithm
• Implementation
• Understanding classification in data mining
• Steps in building a classification model
• Importance of classification in data mining
10. Explanation
12. Regression
• Linear regression
• Cost function for linear regression(squared error)
• Gradient descent or stochastic gradient descent
• Batch gradient descent
• Batch SGD vs Minibatch SGD vs SGD
• Explain Briefly Batch Gradient Descent,
Stochastic Gradient Descent, and Mini-batch
Gradient Descent?
• List the Pros and Cons of Each.
• Polynomial Regression
• Overfitting and Underfitting
• Implementation
• How does the gradient descent algorithm work
23. Overfitting
• Why does overfitting occur?
• How can you detect overfitting?
24. Underfitting
• Why does underfitting occur?
• How can you detect underfitting?
25. Unsupervised learning
• How unsupervised learning works
Introduction
to Machine Learning
Key Points
~ ML allows computers to learn from data and make
decisions like humans.
~ It's a subset of AI and relies on statistical techniques
and algorithms.
~ Input data and outputs are used to train ML models
during the learning phase.
~ ML requires good quality data for effective training.
~ Different algorithms are used depending on the data
and the task.
Types of Data
1. Labeled Data: Includes a target variable for
prediction.
Data Splitting
~ Training Data: Used to train the model on input~output
pairs.
~ Validation Data: Used to optimize model
hyperparameters during training.
~ Testing Data: Evaluates the model's performance on
unseen data after training.
Data Preprocessing
~ Cleaning and Normalizing: Preparing data for analysis
by handling missing values and scaling features.
~ Feature Selection/Engineering: Selecting relevant
features or creating new ones to improve model
performance.
Data Advantages
~ Improved Accuracy: More data allows ML models to
learn complex relationships, leading to better predictions.
~ Automation: ML automates decision~making and
repetitive tasks efficiently.
~ Personalization: ML enables personalized experiences
for users, increasing satisfaction.
Data Disadvantages
~ Bias: Biased data can result in biased predictions and
classifications.
2. Speech Recognition:
- Empowering smart systems like Alexa and Siri for
seamless interactions.
- Enabling convenient voice-based Google searches and
virtual assistants.
3. Recommender Systems:
- Personalizing services based on user preferences and
search history.
- Examples: YouTube video recommendations,
personalized Netflix movie suggestions.
4. Fraud Detection:
- Efficiently identifying and preventing fraudulent
transactions and activities.
- Providing real-time notifications for suspicious user
behavior.
5. Self-Driving Cars:
- Enabling cars to navigate autonomously without human
intervention.
- Tesla cars as prominent examples of successful
autonomous driving technology.
6. Medical Diagnosis:
- Achieving high accuracy in disease classification and
diagnosis.
- Utilizing machine learning models for detecting human
and plant diseases.
1. Explainability:
• Consider whether your model needs to be
explainable to a non-technical audience.
• Some accurate algorithms, like neural networks,
can be "black boxes," making it challenging to
understand and explain their predictions.
• Simpler algorithms such as kNN, linear
regression, or decision trees offer more
transparency in how predictions are made.
2. In-memory vs. Out-of-memory:
• Determine if your dataset can fit into the
memory (RAM) of your server or computer.
• If it fits in memory, you have a broader range of
algorithms to choose from.
• If not, consider incremental learning algorithms
that can handle data in smaller chunks.
3. Number of Features and Examples:
• Assess the number of training examples and
features in your dataset.
• Some algorithms, like neural networks and
gradient boosting, can handle large datasets with
millions of features.
• Others, like SVM, may perform well with more
modest capacity.
4. Categorical vs. Numerical Features:
• Identify if your data consists of categorical
features, numerical features, or a mix.
• Certain algorithms require numeric input,
necessitating techniques like one-hot encoding for
categorical data.
5. Nonlinearity of the Data:
• Determine whether your data exhibits linear
separability or can be effectively modeled with
linear techniques.
• Linear models like SVM with linear kernels,
logistic regression, or linear regression are suitable
for linear data.
• Complex, nonlinear data may require deep
neural networks or ensemble algorithms.
6. Training Speed:
• Consider the time allowance for training your
model.
• Some algorithms, like neural networks, are
slower to train, while simpler ones like logistic
regression or decision trees are faster.
• Parallel processing can significantly speed up
certain algorithms like random forests.
7. Prediction Speed:
• Evaluate the speed requirements for generating
predictions, especially if the model will be used in
production.
• Algorithms like SVMs, linear regression, or
logistic regression are fast for prediction.
• Others, like kNN or deep neural networks, can
be slower.
8. Validation Set Testing:
• If unsure about the best algorithm, it's common
to test multiple algorithms on a validation set to
assess their performance.
• The choice of algorithm can be guided by
empirical testing and validation results.
How good
Which model
Models learn the task is this
is the best?
model truly?
Certainly, here are the important key points about the use
of three sets (training set, validation set, and test set) in
machine learning:
1. Three Sets of Labeled Examples:
2. Data Splitting:
3. Set Sizes:
8. Model Generalization:
2. Unsupervised Learning:
- Introduction: Unsupervised learning deals with unlabeled
data, where the model explores patterns and relationships
within the data on its own.
- Learning Process: The model identifies hidden structures
or clusters in the data without any explicit guidance.
- Common Algorithms: K-Means Clustering, Hierarchical
Clustering,PrincipalComponentAnalysis(PCA).
Unsupervised Learning
Input Output
3. Reinforcement Learning:
- Introduction: Reinforcement learning involves an agent
interacting with an environment to achieve a goal.
Here's an example of
supervised learning using
Python code with the scikit-
learn library and a simple
linear regression algorithm:
# Import the necessary libraries
from sklearn import linear.model
import numpy as np
# Make predictions
X.new = np.array([[6]])
prediction = model.predict(X.new)
Introduction to Classification
- Classification is the process of categorizing data or objects
into predefined classes or categories based on their features
or attributes.
- It falls under supervised machine learning, where an
algorithm is trained on labeled data to predict the class or
category of new, unseen data.
Types of Classification
1. Binary Classification:
- Involves classifying data into two distinct classes or
categories.
- Example: Determining whether a person has a certain
disease or not.
2. Multiclass Classification:
- Involves classifying data into multiple classes or
categories.
- Example: Identifying the species of a flower based on its
characteristics.
import numpy as np
import pandas as pd
iris = datasets.load_iris()
X = iris.data
y = iris.target
gnb = GaussianNB()
# make predictions
gnb_pred = gnb.predict(X_test)
dt = DecisionTreeClassifier(random_state=0)
# make predictions
dt_pred = dt.predict(X_test)
# print the accuracy
print("Accuracy of Decision Tree Classifier: ",
accuracy_score(y_test, dt_pred))
Output:
- Linear Regression:
- Polynomial Regression:
- Decision Trees for Regression:
2. Classification:
- Logistic Regression:
- Decision Trees for Classification:
- Support Vector Machines (SVM) for Classification:
■ ■
Aspect Classification Regression
Target Class labels (e.g., spam or not spam) Numeric values (e.g., temperature)
Evaluation Accuracy, Precision, Recall, F1-score Mean Squared Error, R-squared, etc.
Metrics
Visualization Confusion Matrix, ROC Curve, etc. Scatter Plots, Residual Plots, etc.
Regression Polynomial Fits curves to data for more Modeling stock market
Regression complex relationships price trends
#explanation
1. Machine Learning:
Machine learning is the process of enabling machines to
learn from data and improve their performance over time
without being explicitly programmed.
- Example: An email spam filter that learns to identify
spam messages based on patterns in the text content.
2. Supervised Learning:
Supervised learning is a type of machine learning where
the algorithm learns from labeled data, which includes input
features and corresponding output labels.
- Example: Training a model to predict housing prices
using historical data where each data point includes
features like square footage and location along with the
actual sale price.
3. Labeled Datasets:
Labeled datasets consist of input data points along with
the correct corresponding output labels, which serve as the
ground truth for training the model.
- Example: A dataset containing images of cats and dogs
along with labels indicating whether each image contains a
cat or a dog.
4. Regression:
Regression is a type of supervised learning where the goal
is to predict continuous numerical values based on input
features.
- Example: Predicting a person's annual income based on
factors such as education level, work experience, and
location.
5. Classification:
Classification is a type of supervised learning where the
goal is to categorize input data into predefined classes or
categories.
- Example: Classifying emails as either spam or not spam
based on the words and phrases contained in the email
content.
6. Linear Regression:
Linear regression is a regression algorithm that aims to
find a linear relationship between input features and the
predicted output.
- Example: Predicting the price of a used car based on its
mileage and age using a straight-line equation.
7. Logistic Regression:
8. Decision Trees:
9. Random Forests:
2.Regression:
• Objective : Regression is also a supervised
learning task, but its goal is to predict a
continuous numeric output or target variable. It's
used for tasks like predicting stock prices, house
prices, temperature, and more.
• Output : The output of a regression model is a
continuous value. For example, in predicting house
prices, the output might be a price in dollars.
• Algorithms : Common regression algorithms
include linear regression, polynomial regression,
decision trees, support vector regression (SVR),
and various neural network architectures.
• Evaluation : Regression models are evaluated
using metrics like mean squared error (MSE),
mean absolute error (MAE), root mean squared
error (RMSE), and R-squared (coefficient of
determination).
• Loss Function : Mean squared error (MSE) is a
widely used loss function for training regression
models. It measures the average squared
difference between predicted and actual values.
• Example Application : Predicting house prices
based on features like square footage, number of
bedrooms, and location; forecasting stock prices;
estimating temperature based on historical data.
Salary
Here's an example of regression using Python code
with the scikit-learn library and a simple linear
regressionalgorithm:______________________
# Import the necessary libraries
from sklearn import linear_model
import numpy as np
Regression
• Linear Regression
• Cost Function for Linear Regression (Squared Error)
• Gradient Descent or Stochastic Gradient Descent
Type of regression
Linear
Regression
? Polynomial Regression
Types
Decision tree
of Regression
Ridge Regression
7 Lasso Regression
Logistic
Regression
❖ LINEAR REGRESSION:
• Cost Function for linear regression(squared
error)
• Gradient descent or stochastic gradient
descent
• Adam algorithm (adaptive moment
estimation)
• Feature scaling
• Batch gradient descent
1. Hypothesis
Function (h0(x)): The
hypothesis function
represents the linear
relationship between the
input features (x1 and x2)
and the predicted output.
2. 00, 01, and 02:
These are the parameters
(coefficients) of the linear
regression model. 00 is the
intercept (bias term), 01
represents the weight for
the first feature (x1), and
02 represents the weight
for the second feature (x2).
These parameters are
learned during the training
process to minimize the
prediction error.
3. x1 and x2: These are
the input features or
independent variables. In a
linear regression model,
you have multiple features,
but in this equation, we
focus on x1 and x2.
4. Prediction: The
equation allows you to
make predictions for a
given set of input features
(x1 and x2). You plug these
features into the equation,
and the result h0(x)
represents the predicted
output.
5. Linear Relationship:
Linear regression assumes
a linear relationship
between the features and
the output. This means that
the predicted output is a
linear combination of the
features, and the model
tries to find the best linear
fit to the data.
6. Training: During the
training phase, the model
adjusts the parameters (00,
01, and 02) to minimize the
difference between the
predicted values (h0(x))
and the actual target
values in the training
dataset. This process is
typically done using a cost
function and optimization
techniques like gradient
descent.
7. Bias Term
(Intercept): 00 represents
the bias term or intercept.
It accounts for the constant
offset in the prediction,
even when the input
features are zero.
8. Gradient Descent:
Gradient descent is often
used to find the optimal
values of 90, 91, and 92 by
iteratively updating them in
the direction that reduces
the cost (prediction error).
9. Cost Function: The
cost function quantifies
how well the model's
predictions match the
actual target values. The
goal is to minimize this cost
function during training.
10. Least Squares: In
the context of linear
regression, the method of
least squares is commonly
used to find the optimal
parameters (90, 91, 92) by
minimizing the sum of
squared differences
between predicted and
actual values.
In summary, linear regression is a simple but powerful
algorithm for modeling and predicting continuous numeric
values. It relies on finding the best-fit linear relationship
between input features and the output, represented by 90,
91, and 92.
Salary of Employee
Where:
mse = np.mean(squared.errors)
In this code:
• actual_values represents the actual target values or
ground truth for your dataset.
• predicted_values represents the predicted values
generated by your linear regression model.
• squared_errors calculates the squared difference
between each actual and predicted value.
• mse computes the mean squared error by taking the
average of all the squared errors.
Naive Bayes
3. Independence Assumption:
4. Formula:
• The Naive Bayes classification formula can be written
as:
P(y | X) = (P(X | y) * P(y)) / P(X)
Where:
• P(y | X) is the posterior probability of class y given
features X.
• P(X | y) is the likelihood of observing features X given
class y.
• P(y) is the prior probability of class y.
• P(X) is the evidence, the probability of observing
features X across all classes.
5. Classification Process:
Prior probabilities
• Prior probabilities are like starting points in figuring
out if an email is "spam" or "not spam." They are like
initial guesses about the chances of an email being
spam before we look at the email's content. These
initial guesses are important because they affect our
final decision.
Predicted classes
Predictions
Example
Certainly! Let's simplify the explanation of Bayes' formula
and its application with an example:
Gradient Descent
• Objective: Gradient descent is a method used to find
the optimal values of parameters (in this case, 'w' and
'b') that minimize a cost function. It's commonly
applied in machine learning to train models like linear
regression.
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
learning_rate = 0.1
iterations = 1000
Parameters: 0o, #i
m 2
Cost Function: J(#o,0i) = (M3^) - J/w)
i=l
# Sample data
X = np.array([l, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])
# Learning rate
alpha = 0.01
# Number of iterations
num.iterations = 1OOO
# Gradient Descent
for _ in range(num_iterations):
# Calculate predictions
predictions = thetaO + thetal * X
# Calculate predictions
predictions = thetaO + thetal * X
gradient.thetaO = np.mean(predictions - y)
gradient.thetal = np.mean((predictions - y) * X)
Pros:
Stable convergence to the minimum.
Cons:
Slow convergence, especially with large datasets.
Pros:
• Faster training, especially with large datasets.
• Potential to escape local minima due to its
randomness.
Cons:
• Less stable convergence, as it oscillates around the
minimum.
• Final parameters may not be optimal due to
randomness.
• Requires tuning of the learning rate and a well-
designed training schedule.
Pros:
• Balances stability and efficiency.
• Utilizes hardware optimizations, especially with GPUs.
• Helps escape local minima better than Batch GD.
Cons:
• May still struggle to escape local minima compared to
SGD.
• The choice of mini-batch size can impact convergence
and may require tuning.
Key Differences:
• Batch GD uses the entire dataset, SGD uses single
instances, and Mini-batch GD uses small random
subsets.
• Batch GD provides stable convergence but is slow,
while SGD is fast but less stable. Mini-batch GD
balances speed and stability.
Polynomial Regression
1. Polynomial Regression is a type of regression
analysis used in machine learning and statistics to
model the relationship between a dependent
variable (target) and one or more independent
variables (predictors) when the relationship is not
linear but follows a polynomial form. Here are the
key points to understand about Polynomial
Regression:
Where:
• Y: dependent variable.
• X: independent variable.
• p0, pl, p2, ..., pn are the coefficients of the
polynomial terms.
• n is the degree of the polynomial, which determines
the number of terms.
x = [1,2,3,5,6,7,8,9,10,12,13,14,15,16,18,19,21,22]
y = [100,90,80,60,60,55,60,65,70,70,75,76,78,79,90,99,99,100]
pit.scatter(x, y)
pit.show()
Overfitting
Overfitting
A modeling error in
statistics that occurs when
a function is too closely
aligned to a limited set of
data points.
• Overfitting occurs when a machine learning model
learns the training data too well, capturing noise and
irrelevant details.
• It often results in poor generalization to new, unseen
data, as the model is too tailored to the training set.
• Overfit models have excessively complex
representations, with too many parameters or features.
Underfitting
• Underfitting happens when a model is too simple to
capture the underlying patterns in the data.
• It leads to poor performance on both the training and
test data.
• Underfit models may lack the capacity or complexity
needed to represent the data adequately.
Clustering algorithms
3. Types of Clustering:
• Hard Clustering: Assigns each data point to a single
cluster exclusively.
• Soft Clustering: Allows data points to belong to
multiple clusters with associated probabilities or
membership scores.
I import numpy as np
I kmeans - KMeans(n_clusters=2)
J labels - kmeans.labels.
3. Types of Anomalies:
• Point Anomalies: Isolation of individual data points as
anomalies.
• Contextual Anomalies: Anomalies that depend on
context or other data points.
• Collective Anomalies: Groups of data points together
form an anomaly.
print("Original Data:")
print(data)
print("Transformed Data (2 Components):")
print(transformed.data)
Dimensionality Reduction
5. Autoencoders:
code
from sklearn.ensemble import IsolationForest
import numpy as np
print("Anomaly Scores:")
print(anomaly_scores)
1. Tree Structure:
2. Root Node:
4. Leaves:
5. Feature Tests:
6. Splitting Criteria:
7. Recursive Partitioning:
• Decision trees recursively split the data into subsets
based on feature tests.
• This process continues until a stopping condition is
met, such as a maximum depth, a minimum number of
samples per leaf, or a purity threshold.
8. Pruning:
9. Interpretability:
1. Gini Index
2. Information Gain
Gini Index
The Gini Index is a splitting measure used in decision
trees to determine the degree of impurity of a
particular variable when it is randomly chosen . The
degree of Gini Index varies between 0 and 1, where 0
denotes that all elements belong to a certain class or
there exists only one class (pure), and 1 denotes that
the elements are randomly distributed across various
classes (impure) . A Gini Index of 0.5 denotes equally
distributed elements into some classes
where
67 4 7
Ginilndex = 1 — (—) — (—) = 0.48
v107 v107
ANSWER:
Information Gain
It's a measure of the reduction in uncertainty (or entropy)
achieved by partitioning a dataset based on a particular
attribute or feature. Here's a detailed explanation along with
some key points:
1. Entropy: Entropy is a measure of impurity or disorder in
a dataset. In the context of decision trees, it quantifies the
randomness or unpredictability of the class labels in a
dataset. High entropy means the data is highly disordered,
and low entropy means it's very well-structured.
The formula for calculating entropy in the context of
information theory is as follows:
Where:
Where:
5. Key Points:
a. High Information Gain: An attribute with high
Information Gain is considered more informative, as it
reduces uncertainty in classifying data points.
Logistic Regression
where:
• z is the linear combination of weights and variables.
• b0 is the intercept or bias term.
• b1, b2, ..., bn are the coefficients (weights) associated
with the independent variables x1, x2, ..., xn.
code
In Python, you can use libraries like scikit-learn to perform
Logistic Regression. Below is an example of implementing a
binomial Logistic Regression using scikit-learn:
1. Binomial
# Import the necessary libraries
from sklearn.linear.model import LogisticRegression
from sklearn.model.selection import train_test_split
from sklearn.metrics import accuracy.score
# Sample data
X = [[2.5], [3.5], [5.5], [6.7], [8.9], [10.1]]
y = [0, 0, 1, 1, 1, 1] #0 represents "Fail", and 1 represents "Pass"
2. Multinomial
from sklearn.linear.model import LogisticRegression as M
X = [[2.5, 1], [3.5, 2], [5.5, 2.5], [6.7, 2.8], [8.9, 3.2], [10.1, 3.5]]
y = [0, 1, 2, 1, 2, 0]
x,y,p,q=T(X,y,t,.2)
m=M(multinomial=1 multinomial1,solver='Ibfgs').fit(x,p)
3.Ordinal
y = [0, 1, 2, 1, 2, 0]
m = LogisticIT().fit(x, y)
Neural Networks
1. The original motivation behind the development of
neural networks was to create software that could
emulate the functioning of the human brain.
1. Input Layer:
• The input layer is the first layer of a neural network.
• Its nodes (neurons) represent the features or input
variables of the problem.
• Data is fed into the input layer, and each node
typically corresponds to a specific input feature.
2. Hidden Layers:
• Hidden layers are the intermediate layers between the
input and output layers.
• They perform complex transformations on the input
data.
• The number of hidden layers and the number of
neurons in each layer are design choices and can vary
depending on the problem.
3. Neurons or Nodes:
• Neurons in a layer process information. Each neuron
receives inputs, performs a computation, and produces
an output.
• Neurons in the hidden layers use activation functions
to introduce non-linearity into the network, enabling it
to learn complex patterns.
5. Activation Functions:
• Activation functions introduce non-linearity to the
network, enabling it to learn complex functions.
• Common activation functions include ReLU (Rectified
Linear Unit), sigmoid, and tanh.
• The choice of activation function can impact how well
the network learns and converges.
6. Output Layer:
• The output layer produces the final result or prediction
of the neural network.
• The number of neurons in the output layer depends on
the type of problem. For example, in binary
classification, there may be one neuron, while in multi
class classification, there could be multiple neurons.
• The activation function in the output layer depends on
the nature of the problem (e.g., sigmoid for binary
classification, softmax for multi-class classification).
7. Feedforward Process:
• During training and inference, data flows through the
neural network in a forward direction, from the input
layer to the output layer.
• Each layer computes its output based on the input,
weights, bias, and activation function.
8. Backpropagation:
• Backpropagation is the process of updating the
weights of the neural network to minimize the
difference between the predicted output and the actual
target (training data).
• It uses techniques like gradient descent to adjust
weights and biases.
9. Deep Learning:
• Deep neural networks consist of multiple hidden
layers, enabling them to model highly complex
relationships in data.
• Deep learning has shown remarkable success in tasks
such as image recognition, natural language
processing, and reinforcement learning.