Types Machine Learning Algorithm New (1)
Types Machine Learning Algorithm New (1)
Machine learning algorithms are broadly classified into four types based on how they learn from
data:
Supervised Learning
Supervised Learning is a type of machine learning where historical and labeled data is
applied and machine learning algorithm predict a value.
Historical data means known data from the past.( for example on what price house have
been sold in past).
Machine learning algorithm learns from labeled data (i.e., input-output pairs).
Labeled data means desired output is known.
The model is trained using historical data, where each input (features) has a
corresponding correct output (label).
The goal is to learn a mapping function that can make accurate predictions on new,
unseen data.
Key Points
Uses labeled data
Learns from historical data to make predictions.
Used for classification and regression tasks.
Performance is evaluated using metrics (e.g., accuracy, MSE, precision, recall).
Supervised Model
Unsupervised Learning
In Unsupervised Learning, there is no labeled output (such as house prices). Instead, the model
identifies patterns or groups within the data.
For housing data, unsupervised learning can be used for:
1. Clustering (Grouping Similar Houses)
K-Means Clustering: Groups houses into clusters based on features like area,
number of bedrooms, and location. For example, it can segment houses into
"Luxury," "Affordable," and "Mid-range" categories.
Hierarchical Clustering: Builds a tree-like structure to show relationships
between different house categories.
DBSCAN: Identifies housing price anomalies and clusters based on density.
2. Anomaly Detection (Identifying Outliers)
Helps detect houses with abnormally high or low prices compared to similar
properties.
Techniques: Isolation Forest, One-Class SVM, Autoencoders
3. Dimensionality Reduction (Feature Reduction)
If there are many features (e.g., location, area, amenities), Principal
Component Analysis (PCA) can reduce complexity while preserving
important patterns.
NumPy is a fundamental library for numerical computing in Python. It provides support for
large, multi-dimensional arrays and matrices, along with a collection of mathematical functions.
Use Cases:
Pandas is a data manipulation and analysis library that provides data structures like Series (1D)
and DataFrame (2D, similar to tables in databases).
Use Cases:
Matplotlib is a powerful plotting library for visualizing data through graphs, charts, and
histograms.
Use Cases:
Seaborn is built on top of Matplotlib and provides more aesthetically pleasing and informative
statistical graphics.
Use Cases:
OpenCV is an open-source library for image processing and computer vision tasks.
Use Cases:
Scikit-Learn is the most widely used machine learning library for building and evaluating
models.
Use Cases:
TensorFlow and PyTorch are the most popular deep learning frameworks for building neural
networks.
Use Cases:
o Image and text classification
o Building deep learning models (CNNs, RNNs, Transformers)
o Large-scale ML model training
Example: Defining a simple neural network using PyTorch
NumPy
import numpy as np
# Python List
mylist = [1, 2, 3, 4, 5]
print(mylist) # Output: [1, 2, 3, 4, 5]
type(mylist) #list
# NumPy Array
np.array(mylist) #array([1,2,3,4,5]
myarr=np.array(mylist)
myarr #([1,2,3,4,5])
type myarr #numpy.ndarray
The np.arange() function in NumPy creates an array with evenly spaced values between a
start and stop value.
It works similarly to Python’s range() but produces a NumPy array instead of a list.
Syntax
Import numpy as np
np.arange(start, stop, step, dtype)
import numpy as np
output
[0 1 2 3 4 5 6 7 8 9]
print(arr)
output
[1 3 5 7 9]
print(arr)
print(arr)
The np.zeros() function creates an array filled with zero values, useful for initializing
arrays in numerical computing, data science, and machine learning.
Create the shape.
Syntax
import numpy as np
np.zeros(shape, dtype=float)
shape → The shape of the array (integer for 1D, tuple for multi-dimensional).
dtype → (Optional) The data type of the array elements (default = float).
import numpy as np
arr = np.zeros(5) # Creates an array with 5 zeros
print(arr)
O/P
[0. 0. 0. 0. 0.]
O/P
Output
[[0 0 0]
[0 0 0]]
Output
[[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]]
The np.ones() function creates an array filled with ones, useful for initializing arrays in
numerical computing, data science, and machine learning.
Syntax
import numpy as np
np.ones(shape, dtype=float)
import numpy as np
arr = np.ones(4) # Creates an array with 5 ones
print(arr)
output
[1. 1. 1. 1. ]
output
[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]]
Specifying Data Type (dtype)
output
[[1 1 1]
[1 1 1]]
output
[[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]]
[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]]]
import numpy as np
Syntax
import numpy as np
arr = np.linspace(0, 10, 3) # 3 evenly spaced numbers between 1 and 10
print(arr)
output
[ 0. 5. 10. ]
Unlike np.arange(), you specify the number of elements instead of the step size
output
[1. 2.8 4.6 6.4 8.2]
output
[ 1 3 5 7 10]
output
Array: [ 1. 3.25 5.5 7.75 10. ]
Step size: 2.25
The np.eye() function creates an identity matrix, a square matrix with 1s on the diagonal
and 0s everywhere else.
Identity matrices are widely used in linear algebra, machine learning, and deep learning.
Syntax
import numpy as np
np.eye(N, M=None, k=0, dtype=float)
N → Number of rows.
M → (Optional) Number of columns (default = N, creating a square matrix).
k → (Optional) Diagonal offset (0 for main diagonal, positive for upper diagonals,
negative for lower diagonals).
dtype → (Optional) Data type of the output matrix (default = float).
import numpy as np
arr = np.eye(4) # 4×4 identity matrix
print(arr)
output
[[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]
output
[[1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 1. 0. 0.]]
Shifting the Diagonal (k parameter)
output
[[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]
[0. 0. 0. 0.]]
output
[[1 0 0]
[0 1 0]
[0 0 1]]
Sufficient/Insufficient Data
It is often unclear whether a dataset is sufficient. While we have data, we may not
know if it is enough to train a reliable model and make accurate predictions.
When training a Machine Learning model, we typically split the dataset into three parts:
1. Training Set: Used to train the model by learning patterns from the data.
2. Validation Set: Used to fine-tune the model by selecting the best hyperparameters and
preventing overfitting.
3. Test Set: Used to evaluate the final performance of the trained model on completely
unseen data.
Differences between Validation and Test Data
1. Purpose: Validation data is used to tune the model and optimize hyperparameters, while
test data is used to evaluate the final model’s performance on unseen data.
2. When Used: The validation set is used during model training to improve performance,
whereas the test set is only used after training is complete to assess how well the model
generalizes.
3. Impact: The validation set helps in selecting the best model by preventing overfitting,
while the test set provides an unbiased estimate of the model’s real-world performance.
Underfitting happens when a model is too simple to learn patterns in data, leading to poor
performance
If possible, collect more training data to help the model learn better general patterns
instead of memorizing specific examples.
5.Reduce Noise :
Overfitting happens when a model learns too much from training data, including noise, making it
perform poorly on new data
1. Model Performs well on training data, but it does not generalize well(high test
error)
When a model performs well on training data but not working with test Data.This
means the model has learned patterns, including noise, from the training set but fails to
generalize to new data.
2. Happens When
1. Model is too complex relative to the amount
Data is simple and model is complex
2. Noisiness of the training data
4. Constraining a model to make it simpler and reduce the risk of overfitting is called
regularization
Bias-Variance Tradeoff
The Bias-Variance Tradeoff is the balance between bias (underfitting) and variance (overfitting)
in a machine learning model.
Bias Error
Variance Error
A model with high bias is too simple and underfits the data, while a model with high
variance is too complex and overfits. The goal is to find a balance where the model captures
patterns without memorizing noise, ensuring good generalization to new data. Techniques like
regularization, cross-validation, and ensemble methods help manage this tradeoff.
Understanding bias, variance, and the bias-variance tradeoff helps avoid the mistakes
of overfitting (high variance) and underfitting (high bias). A well-balanced model captures
important patterns without memorizing noise. Techniques like regularization, cross-validation,
and feature selection help achieve this balance.
High Variance
High Variance means a model learns too much from training data, including noise, and
performs poorly on new data (overfitting). It can be fixed using regularization, more data, or
ensemble methods.
A model that is more data sensitive has high variance, meaning it learns too much from
training data, including noise, and struggles to generalize to new data (overfitting). This can be
fixed using regularization, more training data, or ensemble methods.
Bias Error
Bias Error occurs when a model is too simple to capture patterns in the data, leading to
underfitting. It results in high errors on both training and test data. To reduce bias, use a more
complex model, add relevant features, or decrease regularization.
Difference between the average prediction of our model and the correct value which
we are trying to predict
Model with high bias pays very little attention to the training data and
oversimplifies the model
It always leads to high error on training and test data
Which occurs when the algorithm is unable to capture relevant between features and
target output
Variance in machine learning refers to how much a model's predictions change when trained on
different datasets. High variance means the model is too sensitive to small changes in data,
leading to overfitting, while low variance helps in better generalization.
Variance indicate how much the estimate of the target function will alter for a given
data point if different training data were used
variance indicates how much a model's predictions change when trained
on different datasets. High variance means the model is too sensitive to training
data, leading to overfitting, while low variance improves generalization.
Model has high variance pays a lot of attention to training data and does not
generalize on the data which has not seen before.
A model with high variance focuses too much on the training data, learning even
noise, and fails to generalize to new data, leading to overfitting. This can be fixed
using regularization, more training data, or ensemble methods.
As a result model perform well on training data but has high error on test data
Variance can lead to overfitting, in which small fluctuations in the training set are
magnified