0% found this document useful (0 votes)
3 views

Machine Learning Algorithms

The document provides an overview of machine learning algorithms, including supervised, unsupervised, and reinforcement learning, along with their applications and challenges. It details supervised learning techniques such as regression and classification, and unsupervised learning methods like clustering, specifically K-means clustering. Key concepts such as overfitting, bias, and the importance of data quality are also discussed.

Uploaded by

sanjudxbreddy
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Machine Learning Algorithms

The document provides an overview of machine learning algorithms, including supervised, unsupervised, and reinforcement learning, along with their applications and challenges. It details supervised learning techniques such as regression and classification, and unsupervised learning methods like clustering, specifically K-means clustering. Key concepts such as overfitting, bias, and the importance of data quality are also discussed.

Uploaded by

sanjudxbreddy
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

MACHINE LEARNING ALGORITHMS

MACHINE LEARNING
ML: improve automatically through experience and by the use of data. ML
algorithms include decision trees, neural networks. Trained models serve as
representations of the learned data
Applications: Netflix, speech recognition, medical diagnosis, and autonomous
vehicles, Chatbots, personalized ads, and fraud detection systems.
Problems: Overfitting, where models become too specialized on training data,
lead to poor performance on new data. Bias in training data causes boxes.

TYPES OF MACHINE LEARNING


Supervised learning: Labelled data. Learns to map input to output labels based
on examples in training. Eg: linear regression, decision trees etc.
Unsupervised learning: unlabelled data. Finds hidden patterns or structures.
Eg: k means clustering, clustering etc.
Reinforcement learning: trial and error method. Reward system and penalty
system.

SUPERVISED LEARNING: allows machines to learn from labeled data, making


predictions or decisions based on that learning.
1. Regression – works with continuous data
2. Classification – works with discrete data

REGRESSION CORRELATION:
measure of the strength of a linear relationship between two quantitative
variables (e.g. price, sales). If the change in one variable appears to be
accompanied by a change in the other variable the two variables are said to be
correlated and this is called correlation.
Causation: one event is the result of the occurrence of the other event.
Pearson’s R: measures the strength and direction of the linear relationship
between two continuous variables. Pearson correlation coefficient is 0.35
1. Scale of measurement should be interval or ratio.
2. Variables should be approximately normally distributed.
3. The association should be linear.
4. There should be no outliers in the data.
May not be suitable in situations like: No correlation, outliners, non linear
relationships and violation of assumptions.
When we make a distribution in which there is an involvement of more than one
variable, then such an analysis is called Regression Analysis. Depends on
regression line or curve.
The least squares method is commonly employed to find this best-fit line or
curve. This method minimizes the squared differences between observed and
predicted values

Linear Regression: consists of a predictor variable and a dependent variable


related linearly to each other
a) Simple Linear Regression: Value is predicted using a single independent
variable in simple linear regression.
b) Multiple Linear Regression: More than one independent variable is used to
predict the value of the dependent variable

Applications: market analysis, sales forecasting, prediction salary, sports and


med research.
Advantages: simple, easy, efficient to train
Disadvantages: sensitive to outliners, which impacts analysis. Limited to linear
relations btw variables.

CLASSIFITCATION: categorizing data into predefined classes or categories.


Assign labels based on features.
Working:
 classes or categories
 features/attributes
 training data
 classification model
 prediction
Types
1) Binary Classification: with 2 class labels. Email spam, exam result
2) Multi-Class Classification: more than 2 class labels. Img classification
3) Multi-Label Classification: each example may belong to multiple class labels.
Photo classification
4) Imbalanced Classification: unequally distributed class, like majority and
minority. Fraud det

KNN (k nearest neighbour algorithm): operates based on the principle of


proximity, making predictions or classifications by considering the similarity
between data points.
Need: useful with classification problems where the decision boundaries are not
clearly defined or when the dataset does not have a well-defined structure.
Provides a simple yet effective method for identifying the category.
UNSUPERVISED LEARNING:
CLUSTERING: group unlabelled dataset into clusters or groups based on
similarity. It is unsupervised learning. The clustering technique is commonly
used for statistical data analysis.
How it works:
1) Prepare the Data: Select the right features for clustering
2) Create Similarity Metrics: Define how similar data points are by comparing
their features.
3) Run the Clustering Algorithm: Apply a clustering algorithm to group the data.
4) Interpret the Results: Analyse the clusters to understand what they
represent.
Types:
 Partitioning Clustering: divides the data into non-hierarchical groups. It
is also known as the centroid based method. Eg: k means clustering.
 Density Based clustering: connects the highly-dense areas into clusters,
and the arbitrarily shaped distributions are formed as long as the dense
region can be connected.
 Distribution model based: data is divided based on the probability of how
a dataset belongs to a particular distribution. Also called gaussian
distribution. Eg: GMM
 Hierarchical Clustering: the dataset is divided into clusters to create a
tree-like structure, which is also called a dendrogram.
________________________________________________________________
K MEANS CLIUSTERING: unsupervised learning algorithm that is used to solve
the clustering problems in machine learning. Classifies the dataset by dividing
the samples into different clusters of equal variances.
Applications: Market segmentation, Image segmentation, document clustering
and customer segmentation.
Advantages: easy to implement, handles large datasets, easy to understand,
works well w various features.
Limitations: results vary on centroid placement, no of clusters must be known
beforehand, outliners distort clusters.

You might also like