0% found this document useful (0 votes)
168 views31 pages

ML Notes MAKAUT 7th Sem

Uploaded by

deysatwik093
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
168 views31 pages

ML Notes MAKAUT 7th Sem

Uploaded by

deysatwik093
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Machine Learning: Machine learning is a type of artificial intelligence (AI) that enables computers to learn

and make decisions without being explicitly programmed. Instead of being programmed with specific
instructions, a machine learning system uses algorithms and statistical models to analyze data, identify
patterns, and make
predictions or decisions. Categories Of Machine Learning:
• In simpler terms, machine learning is like
teaching a computer to learn from examples
and experiences, allowing it to improve its
performance over time. The process involves
feeding the system a large amount of data,
and the machine learning algorithm learns
from
this data to recognize patterns and make
predictions or decisions without being explicitly
programmed for each task.

Artificial Intelligence (AI) Machine Learning (ML) Deep Learning (DL)


AI simulates human intelligence ML is a subset of AI that uses DL is a subset of ML that employs
to perform tasks and make algorithms to learn patterns from artificial neural networks for complex
decisions. data. tasks.
AI may or may not require large ML heavily relies on labelled DL requires extensive labelled data and
datasets; it can use predefined data for training and making performs exceptionally with big
rules. predictions. datasets.
AI can be rule-based, requiring ML automates learning from data DL automates feature extraction,
human programming and and requires less manual reducing the need for manual
intervention. intervention. engineering.
AI can handle various tasks, from ML specializes in data-driven DL excels at complex tasks like image
simple to complex, across tasks like classification, recognition, natural language
domains. regression, etc. processing, and more.
AI algorithms can be simple or ML employs various algorithms DL relies on deep neural networks,
complex, depending on the like decision trees, SVM, and which can have numerous hidden layers
application. random forests. for complex learning.
AI may require less training time ML training time varies with the DL training demands substantial
and resources for rule-based algorithm complexity and dataset computational resources and time for
systems. size. deep networks.
AI systems may offer ML models can be interpretable DL models are often considered less
interpretable results based on or less interpretable based on the interpretable due to complex network
human rules. algorithm. architectures.
AI is used in virtual assistants, ML is applied in image DL is utilized in autonomous vehicles,
recommendation systems, and recognition, spam filtering, and speech recognition, and advanced AI
more. other data tasks. applications.
Future Of ML: The future of machine learning (ML) holds tremendous potential, and several trends and
developments are likely to shape its trajectory. Here are some key aspects of the future of machine
learning:

1. Advancements in Deep Learning: Continued progress in deep learning techniques, architectures, and
training methods is expected. This will enable more sophisticated and efficient models for tasks such as
image recognition, natural language processing, and reinforcement learning.
2. Explainable AI (XAI): As AI and ML systems become more prevalent in critical decision-making
processes, there is a growing demand for models that are transparent and explainable. XAI focuses on
making machine learning models more interpretable and understandable, enhancing trust and
accountability.
3. Automated Machine Learning (AutoML): The development of tools and platforms for automated
machine learning is on the rise. AutoML aims to streamline the machine learning pipeline, automating
tasks such as feature engineering, model selection, and hyperparameter tuning, making ML more
accessible to non-experts.
4. Edge Computing and Federated Learning: The integration of machine learning with edge computing
allows for processing data closer to the source, reducing latency and improving efficiency. Federated
learning, where models are trained across decentralized devices, enables privacy-preserving machine
learning by keeping data localized.
5. AI in Healthcare: Machine learning is expected to play a significant role in advancing healthcare, from
personalized medicine and drug discovery to predictive analytics and medical imaging. ML models can
help in diagnosing diseases, predicting patient outcomes, and optimizing treatment plans.
6. Ethics and Responsible AI: As ML systems are deployed in various applications, there is an increasing
focus on ethical considerations and responsible AI practices. This includes addressing issues of bias,
fairness, accountability, and ensuring that AI technologies benefit society as a whole.
7. Natural Language Processing (NLP) Advances: Progress in natural language processing, including
language understanding, generation, and contextual understanding, will lead to more sophisticated
applications in chatbots, virtual assistants, sentiment analysis, and language translation.
8. Continued Integration with Other Technologies: Machine learning will continue to integrate with other
emerging technologies such as blockchain, augmented reality (AR), virtual reality (VR), and the Internet
of Things (IoT), creating synergies and enabling innovative applications.
9. Quantum Machine Learning: Research in quantum computing may impact machine learning in the
future. Quantum computing has the potential to solve complex problems more efficiently than classical
computers, offering new possibilities for machine learning algorithms.
10. Increased Industry Adoption: Industries across the board are recognizing the value of machine learning
in optimizing processes, improving decision-making, and gaining insights from data. Increased adoption
is expected across sectors such as finance, manufacturing, retail, and more.
What is IoT (Internet of Things)? - IoT, or the Internet of Things, refers to the network of physical
devices embedded with sensors, software, and connectivity features that enable them to collect and
exchange data. These devices can range from everyday objects like household appliances and wearable
devices to industrial machinery. The goal of IoT is to enhance efficiency, automation, and decision-
making by enabling these devices to communicate with each other and with centralized systems.
➢ Connection between IoT and Machine Learning: The connection between IoT and machine
learning lies in the ability to analyze and derive meaningful insights from the massive amounts of
data generated by IoT devices. Here's a simplified explanation:
1. Data Collection from IoT Devices: IoT devices generate vast amounts of data as they collect
information from their surroundings. For example, smart thermostats can collect data on
temperature patterns,
fitness trackers can monitor physical activities, and industrial sensors can track machine
performance.
2. Data Processing and Analysis: Machine learning algorithms are applied to process and analyze
the data collected by IoT devices. These algorithms can identify patterns, trends, and anomalies
within the data, providing valuable insights that might be challenging for traditional
programming to uncover.
3. Smart Decision-Making: With the help of machine learning, IoT systems can make intelligent
decisions based on the analyzed data. For instance, a smart home system might learn a user's
preferences and adjust the thermostat automatically, or an industrial IoT system could predict
equipment failures and
schedule maintenance proactively.
4. Adaptive Systems: Machine learning allows IoT systems to adapt and improve over time. As
more data is collected, the algorithms can continuously learn and refine their models, leading to
better predictions and decisions.
In essence, machine learning enhances the capabilities of IoT systems by making them smarter,
more adaptive, and capable of deriving actionable insights from the wealth of data generated by
IoT devices.
➢ ML in Cyber Security: Machine learning is increasingly applied to cybersecurity to enhance
the detection and prevention of cyber threats. Here's a simplified explanation of how
machine learning is used in cybersecurity:
1. Anomaly Detection: Machine learning algorithms can be trained on normal patterns of
behavior within a computer network. When there is an abnormal deviation from these
patterns, the system can flag it as a potential security threat. For example, unusual login times,
atypical data access patterns, or unexpected network traffic could indicate a cyber-attack.
2. Pattern Recognition: Machine learning algorithms excel at recognizing patterns in large
datasets. In cybersecurity, these algorithms can analyze historical data to identify common
attack patterns or signatures associated with known malware or malicious activities.
3. Malware Detection: Machine learning models can be trained to recognize the characteristics of
malware based on features like file behavior, code analysis, or network behavior. This enables
systems to detect and block new, previously unseen malware based on its similarities to known
malicious patterns.
4. Phishing Detection: Phishing attacks often involve deceptive emails or websites designed to
trick users into revealing sensitive information. Machine learning can analyze the content of
emails, URLs, and other communication to identify phishing attempts by recognizing
suspicious patterns and content.
5. User Behavior Analysis: By monitoring and analyzing user behavior, machine learning can
identify unusual activities that may indicate compromised accounts. For example, if a user
suddenly accesses sensitive data they have never accessed before, it could be a sign of
unauthorized access.
6. Automated Response: Machine learning models can enable automated responses to certain
types of cyber threats. For instance, if a system detects a known pattern of attack, it can
automatically trigger actions such as isolating affected devices, blocking malicious IP
addresses, or updating security
configurations.

For Download More Notes Join Our Telegram Channel


https://t.me/thecoderbro
Scan to Join -
7. Adaptive Security: Machine learning systems can adapt and learn from new cyber threats over time. As
the algorithms encounter and analyze new types of attacks, they can improve their ability to detect and
mitigate emerging threats without requiring constant manual updates.
In summary, machine learning in cybersecurity involves training algorithms to recognize normal and
malicious patterns, detect anomalies, and respond to cyber threats in real-time. By leveraging the power of
data analysis and pattern recognition, machine learning enhances the effectiveness of cybersecurity
systems in identifying and addressing a wide range of security issues.

➢ Basic Components of Learning / How Machine Learns


The learning process, whether by a human or a machine, can be divided into four components,
namely, data storage, abstraction, generalization and evaluation. Figure 1.1 illustrates the various
components and the steps involved in the learning process.

1. Data storage: Facilities for storing and retrieving huge amounts of data are an important component of
the learning process. Humans and computers alike utilize data storage as a foundation for advanced
reasoning.
• In a human being, the data is stored in the brain and data is retrieved using electrochemical signals.
• Computers use hard disk drives, flash memory, random access memory and similar devices to store data
and use cables and other technology to retrieve data.
2. Abstraction: The second component of the learning process is known as abstraction.
Abstraction is the process of extracting knowledge about stored data. This involves creating general
concepts about the data as a whole. The creation of knowledge involves the application of known models
and creation of new models.
The process of fitting a model to a dataset is known as training. When the model has been trained, the data
is transformed into an abstract form that summarizes the original information.
3. Generalization: The third component of the learning process is known as generalization.
The term generalization describes the process of turning the knowledge about stored data into a form that
can be utilized for future action. These actions are to be carried out on tasks that are similar, but not
identical to those what have been seen before. In generalization, the goal is to discover those properties of
the data that will be most relevant to future tasks.
4. Evaluation: Evaluation is the last component of the learning process. It is the process of giving feedback
to the user to measure the utility of the learned knowledge. This feedback is then utilized to effect
improvements in the whole learning process.
Supervised Learning Unsupervised Learning

Supervised learning algorithms are trained using Unsupervised learning algorithms are trained
labelled data. using unlabelled data.

Supervised learning model takes direct feedback to Unsupervised learning model does not take
check if it is predicting correct output or not. any feedback.

Supervised learning model predicts the output. Unsupervised learning model finds the hidden
patterns in data.

In supervised learning, input data is provided to the In unsupervised learning, only input data is
model along with the output. provided to the model.

The goal of supervised learning is to train the The goal of unsupervised learning is to find the
model so that it can predict the output when it is hidden patterns and useful insights from the
given new data. unknown dataset.

Supervised learning needs supervision to train the Unsupervised learning does not need any
model. supervision to train the model.

Supervised learning can be categorized Unsupervised Learning can be classified


in Classification and Regression problems. in Clustering and Associations problems.

Supervised learning can be used for those cases Unsupervised learning can be used for those
where we know the input as well as corresponding cases where we have only input data and no
outputs. corresponding output data.

Supervised learning model produces an accurate Unsupervised learning model may give less
result. accurate result as compared to supervised
learning.

Supervised learning is not close to true Artificial Unsupervised learning is closer to the true
intelligence as in this, we first train the model for Artificial Intelligence as it learns similarly as a
each data, and then only it can predict the correct child learns daily routine things by his
output. experiences.

It includes various algorithms such as Linear It includes various algorithms such as


Regression, Logistic Regression, Support Vector Clustering, KNN, and Apriori algorithm.
Machine, Multi-class Classification, Decision tree,
Bayesian Logic, etc.
Supervised learning can be used for two types of problems: Classification and Regression.

Regression Algorithm Classification Algorithm

In Regression, the output variable must be of In Classification, the output variable must be a
continuous nature or real value. discrete value.

The task of the regression algorithm is to map The task of the classification algorithm is to map
the input value (x) with the continuous output the input value(x) with the discrete output
variable(y). variable(y).

Input Data are independent variables and Input Data are Independent variables and
continuous dependent variables. categorical dependent variable.

In Regression, we try to find the best fit line, In Classification, we try to find the decision
which can predict the output more accurately. boundary, which can divide the dataset into
different classes.

Objective is to Predicting continuous numerical Objective is to Predict categorical/class labels.


values.

Regression algorithms can be used to solve the Classification Algorithms can be used to solve
regression problems such as Weather classification problems such as Identification of
Prediction, House price prediction, etc. spam emails, Speech Recognition, Identification of
cancer cells, etc.

Examples of regression algorithms are: Examples of classification algorithms are:


Linear Regression, Polynomial Regression, Logistic Regression, Decision Trees, Random
Ridge Regression, Lasso Regression, Support Forest, Support Vector Machines (SVM), K-Nearest
Vector Regression (SVR), Decision Trees for Neighbors (K-NN), Naive Bayes, Neural Networks,
Regression, Random Forest Regression, K- K-Means Clustering, Multi-layer Perceptron (MLP),
Nearest Neighbors (K-NN) Regression, Neural etc.
Networks for Regression, etc.

The regression Algorithm can be further The Classification algorithms can be divided into
divided into Linear and Non-linear Regression. Binary Classifier and Multi-class Classifier.

➢ Classification: Naïve Bayes, Logistic Regression, SVM


➢ Regression: Linear Regression
➢ Both: Decision Trees, Random Forest, KNN, Gradient Boosting Algorithms
• K- Nearest-Neighbours: K-Nearest Neighbour is one of the simplest Machine Learning
algorithms based on Supervised Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
• K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
• It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
• Example: Suppose, we have an image of a creature that looks similar to cat and dog, but
we want to know either it is a cat or dog. So for this identification, we can use the KNN
algorithm, as it works on a similarity measure. Our KNN model will find the similar
features of the new data set to the cats and dogs images and based on the most similar
features it will put it in either cat or dog category.
Need a K-NN Algorithm: Suppose
there are two categories, i.e.,
Category A and Category B, and we
have a new data point x1, so this
data point will lie in which of these
categories. To solve this type of
problem, we need a K-NN
algorithm. With the help of K-NN,
we can easily identify the category
or class of a particular dataset.
The K-NN working can be explained based on the below algorithm:
Step-1: Select the number K of the neighbors.
Step-2: Calculate the Euclidean distance of K number of neighbors.
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance. The Euclidean distance is
the distance between two points
Step-4: Among these k neighbors, count the number of data points in each category.
Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.
Step-6: Our model is ready.
➢ How to select the value of K in the K-NN Algorithm?
There is no particular way to determine the best value for "K", so we need to try some values to find the
best out of them. The most preferred value for K is 5. A very low value for K such as K=1 or K=2, can be
noisy and lead to the effects of outliers in the model. Large values for K are good, but it may find some
difficulties.

➢ Advantages of KNN Algorithm:


➢ It is simple to implement.
➢ It is robust to the noisy training data.
➢ It can be more effective if the training data is large.
➢ Disadvantages of KNN Algorithm:
• Always needs to determine the value of K which may be complex some time.
• The computation cost is high because of calculating the distance between the data points for all the
training samples.

Decision Trees

o Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the features of a dataset, branches
represent the decision rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a problem/decision
based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which expands
on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
o A decision tree simply asks a question and based on the answer
(Yes/No), it further split the tree into subtrees.
o Step-1: Begin the tree with the root node, says S, which contains
the complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute
Selection Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3.
Continue this process until a stage is reached where you cannot further classify the nodes and
called the final node as a leaf node.
Decision Tree Terminologies

• Root Node: Root node is from where the decision tree starts. It represents the entire dataset,
which further gets divided into two or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after
getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted branches from the tree.
• Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.

Suppose there is a candidate who has a job offer and


wants to decide whether he should accept the offer or not.
So, to solve this problem, the decision tree starts with the
root node (Salary attribute by ASM). The root node splits
further into the next decision node (distance from the
office) and one leaf node based on the corresponding
labels. The next decision node further gets split into one
decision node (Cab facility) and one leaf node. Finally, the
decision node splits into two leaf nodes (Accepted offers
and Declined offer). Consider the diagram:

Advantages of the Decision Tree

• It is simple to understand as it follows the same process which a human follow while making any
decision in real-life.
• It can be very useful for solving decision-related problems.
• It helps to think about all the possible outcomes for a problem.
• There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree

• The decision tree contains lots of layers, which makes it complex.


• It may have an overfitting issue, which can be resolved using the Random Forest algorithm.
• For more class labels, the computational complexity of the decision tree may increase.

Information Gain: Information gain measures how much a particular feature contributes to
reducing uncertainty in predicting the outcome. In decision trees, it helps decide the order
in which features are used to split the data.
Information Gain= Entropy(S)- [(Weighted Avg) *Entropy (each feature)
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Gini Index: The Gini index quantifies the impurity or disorder of a set of data points. In
decision trees, it is used as a criterion for selecting the best feature to split the data. A lower
Gini index indicates a more pure and homogenous set.
Entropy: Entropy is a measure of disorder or randomness in a set of data. In decision trees,
entropy is used to calculate the information gain. Lower entropy implies a more ordered
and predictable set of data. Gini Index= 1- ∑jPj2

Naïve Bayes Classifier Algorithm


• Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional training dataset.
• Naïve Bayes Classifier is one of the simple and most effective Classification algorithms
which helps in building the fast machine learning models that can make quick
predictions.
• It is a probabilistic classifier, which means it predicts on the basis of the probability of an
object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes
Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identify that it is an apple without
depending on each other.
Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Bayes' Theorem:
• Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine
the probability of a hypothesis with prior knowledge. It depends on the conditional
probability.
• The formula for Bayes' theorem is given as:
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence
P(B) is Marginal Probability: Probability of Evidence.
Advantages of Naïve Bayes Classifier:
• Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
• It can be used for Binary as well as Multi-class Classifications.
• It performs well in Multi-class predictions as compared to the other Algorithms.
• It is the most popular choice for text classification problems.
Disadvantages of Naïve Bayes Classifier:
• Naive Bayes assumes that all features are independent or unrelated, so it cannot
learn the relationship between features.
Applications of Naïve Bayes Classifier:
• It is used for Credit Scoring.
• It is used in medical data classification.
• It can be used in real-time predictions because Naïve Bayes Classifier is an eager
learner.
• It is used in Text classification such as Spam filtering and Sentiment analysis.

Linear Regression Logistic Regression

Linear regression is used to predict the Logistic Regression is used to predict the
continuous dependent variable using a given categorical dependent variable using a given set
set of independent variables. of independent variables.

Linear Regression is used for solving Regression Logistic regression is used for solving
problem. Classification problems.

In Linear regression, we predict the value of In logistic Regression, we predict the values of
continuous variables. categorical variables.

In linear regression, we find the best fit line, by In Logistic Regression, we find the S-curve by
which we can easily predict the output. which we can classify the samples.

Least square estimation method is used for Maximum likelihood estimation method is used
estimation of accuracy. for estimation of accuracy.

The output for Linear Regression must be a The output of Logistic Regression must be a
continuous value, such as price, age, etc. Categorical value such as 0 or 1, Yes or No, etc.

In Linear regression, it is required that In Logistic regression, it is not required to have


relationship between dependent variable and the linear relationship between the dependent
independent variable must be linear. and independent variable.

In linear regression, there may be collinearity In logistic regression, there should not be
between the independent variables. collinearity between the independent variable.
Generalized Linear Models (GLMs)
Generalized Linear Models (GLMs) are a class of statistical models that extend linear regression to
accommodate non-normally distributed response variables and apply to a broader range of data
distributions. While GLMs have roots in traditional statistics, they are also used in machine learning for
various tasks.
1. Linear Predictor: Like linear regression, GLMs have a linear combination of input features, but instead
of predicting the outcome directly, this linear combination is transformed using a link function.
2. Link Function: The link function connects the linear predictor to the mean of the response variable. It
defines the relationship between the linear combination of input features and the expected value of
the response variable.
3. Probability Distribution: GLMs can handle different probability distributions for the response variable.
Common distributions include:
• Gaussian (Normal): For continuous outcomes.
• Binomial: For binary outcomes (success/failure).
• Poisson: For count data.
• And others like Gamma, Tweedie, etc.

Steps in a GLM:
1. Define the Model: Specify the form of the linear predictor and choose a link function based on the
nature of the response variable.
2. Estimate Parameters: Use statistical techniques (usually maximum likelihood estimation) to find the
values of the model parameters that maximize the likelihood of the observed data.
3. Link Function Transformation: Apply the link function to transform the linear combination of features
into a prediction that aligns with the distribution of the response variable.
4. Make Predictions: Use the model to make predictions on new data by applying the learned parameters
and the link function.
Example:
Consider a binary classification problem where you want to predict whether an email is spam or not based
on the length of the email. A GLM for this problem might involve a logistic regression model (link function:
logistic) with a binomial distribution.
Advantages of GLMs:

• Flexibility in handling different types of data and distributions.


• Interpretability of model parameters.
• Well-established statistical foundations.
Limitations:

• Assumes a linear relationship between features and the linear predictor.


• Sensitive to outliers.

Support Vector Machines


Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems.
However, primarily, it is used for Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put the
new data point in the correct category in the future. This best decision boundary
is called a hyperplane.
SVM can be of two types:
Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be
classified into two classes by using a single straight line, then such data is termed as linearly
separable data, and classifier is used called as Linear SVM classifier.
Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a
dataset cannot be classified by using a straight line, then such data is termed as non-linear
data and classifier used is called as Nonlinear SVM classifier.
The key steps involved in SVM are as follows:
1. Data Preparation: Convert input data into numerical feature vectors and ensure
appropriate scaling or normalization.
2. Training Phase: Given labeled training data, the SVM algorithm seeks to find the optimal
hyperplane that best separates the classes. This involves solving a convex optimization
problem to maximize the margin while minimizing the classification errors.
3. Kernel Trick: SVM can handle non-linearly separable data by employing the kernel trick.
The kernel function transforms the input features into a higher-dimensional space, where
the data may become linearly separable. Commonly used kernel functions include linear,
polynomial, Gaussian (RBF), and sigmoid.
4. Classification or Regression: Once the SVM model is trained, it can be used to predict the
class label or regression value of new, unseen data points. The decision boundary is
determined based on the learned hyperplane and support vectors.
Suppose we have a dataset that has two classes (green(.) and blue(*)). We want to classify that
the new data point as either blue or green.

To classify these points, we can have many decision boundaries,


but the question is which is the best and how do we find it? NOTE:
Since we are plotting the data points in a 2dimensional graph we
call this decision boundary a straight line but if we have more
dimensions, we call this decision boundary a “hyperplane”

The best hyperplane is that plane that has the maximum distance from

both the classes, and this is the main aim of SVM. This is done by finding

different hyperplanes which classify the labels in the best way then it will

choose the one which is farthest from the data points or the one which

has a maximum margin.

Advantages:

1. Effective in High-Dimensional Spaces: SVMs


perform well even in cases where the number of
features (dimensions) is greater than the number
of samples.
2. Versatile: Suitable for both classification and
regression tasks. Can handle linear and non-
linear relationships through the use of different
kernel functions.
3. Robust to Overfitting: SVMs aim to find a
balance between maximizing the margin and minimizing the classification error, making them
less prone to overfitting.
4. Memory Efficient: SVMs use a subset of training points (support vectors) in decision-making,
making them memory-efficient, especially in high-dimensional spaces.
Disadvantages:
• Computationally Intensive: Training SVMs can be computationally intensive, especially on
large datasets. The time complexity is often higher compared to simpler algorithms.
• Sensitive to Noise: SVMs can be sensitive to noisy data and outliers, impacting their
performance if not properly handled.
• Not Suitable for Large Datasets: Training on large datasets can be slow and resource-
intensive.
• Applications:
• Image Classification: SVMs are used for tasks such as image recognition and object
classification in computer vision.
• Text Classification: Effective in classifying text documents, spam detection, sentiment
analysis, and topic categorization.
• Biomedical Sciences: Used in areas like bioinformatics for protein structure prediction and
disease classification.
• Finance: SVMs are applied in credit scoring, fraud detection, and stock market prediction.
• Handwriting Recognition: SVMs can be used in optical character recognition systems for
handwriting and printed text.
Non-Linearity and Kernel Methods
1. Non-Linearity: In machine learning, many problems involve relationships between
variables that are not linear. Non-linearity refers to situations where the output is not a
simple linear combination of the input features.
Example: Consider a scenario where you are predicting the price of a house based on its
size and the number of bedrooms. If the relationship between size and price is not strictly
linear (perhaps there's a diminishing return for larger sizes), you need a non-linear model to
capture this complexity.
Methods for Handling Non-Linearity:
• Polynomial Regression: Introducing polynomial terms to capture non-linear
relationships.
• Decision Trees: Recursive partitioning of data into non-linear segments.
• Neural Networks: Layers of interconnected nodes enable the modeling of complex
non-linear relationships.
2. Kernel Methods: Kernel methods are a type of technique used to handle non-linear
problems within linear models. They achieve this by implicitly mapping the input features
into a higher-dimensional space, where the problem might become linearly separable.
Example: In a 2D space, a linear model might not be able to separate classes that are
circular in shape. By applying a kernel function, the data can be implicitly transformed into a
higher-dimensional space where a linear separator can be found.
Kernel Methods in Support Vector Machines (SVM): SVMs use kernel methods to
transform the input data into a higher-dimensional space. This allows SVMs to efficiently
handle non-linear decision boundaries.
Advantages:
• Flexibility: Kernel methods enhance the flexibility of linear models, enabling them to
capture complex relationships.
• Efficiency: The implicit transformation of data through kernels can be computationally
more efficient than explicitly working in high-dimensional spaces.
Challenges:
• Choice of Kernel: The selection of the appropriate kernel is crucial, and different kernels
may perform better for different types of data.
• Computational Complexity: For some kernels, especially in high-dimensional spaces,
computation can be resource-intensive.
Applications:
• Image Recognition: Kernels are often used in image recognition tasks where the
relationships between features can be highly non-linear.
• Bioinformatics: Kernel methods are applied in tasks such as protein structure prediction
and genomic data analysis.
Binary Classification
Binary classification is a type of machine learning task where the goal is to categorize items into one of two
classes or categories. The two classes are typically denoted as the positive class (denoted as 1) and the
negative class (denoted as 0).
Examples: Spam detection (spam or not spam). Disease diagnosis (presence or absence of a disease).
Credit approval (approved or rejected).
Algorithms:
Logistic Regression, Support Vector Machines, Decision Trees, and Random Forests are commonly used for
binary classification.
Evaluation Metrics:
Accuracy, Precision, Recall, F1 Score, and ROC-AUC are common metrics for assessing the performance of
binary classifiers.

Multi-Class Classification: Multi-class classification involves assigning items into one of three or more
classes or categories. Each item can belong to only one class, and the goal is to correctly assign items to
their respective classes.

Examples: Handwritten digit recognition (classifying digits 0-9). Species classification in biology
(classifying animals into different species). News categorization (assigning news articles to categories like
politics, sports, entertainment).
Algorithms:
Logistic Regression (with one-vs-rest or softmax), Support Vector Machines, Decision Trees, Random
Forests, and Neural Networks are commonly used for multi-class classification.

Evaluation Metrics:
Accuracy, Precision, Recall, F1 Score, and confusion matrices can still be used in multi-class classification,
but they may be extended or adapted to handle multiple classes.
One-vs-Rest (OvR) and One-vs-One (OvO) Strategies:

• OvR: Trains a separate classifier for each class, treating it as the positive class and the rest as the
negative class. The final prediction is the class with the highest confidence.
• OvO: Constructs a binary classifier for each pair of classes. The final prediction is the class that
wins the most pairwise comparisons.

Considerations:
• The choice between binary and multi-class classification depends on the nature of the problem and the
number of classes involved.
• Algorithms designed for binary classification can be extended to handle multi-class problems using
various strategies, as mentioned above.

Parameters Binary classification Multi-class classification

It is a classification of two groups, There can be any number of classes in


No. of it, i.e., classifies the object into more
i.e. classifies objects in at most two
classes than two classes.
classes.

The most popular algorithms used Popular algorithms that can be used for
by the binary classification are- multi-class classification include:
• k-Nearest Neighbors
Algorithms • Logistic Regression • Decision Trees
used • k-Nearest Neighbors • Naive Bayes
• Decision Trees • Random Forest.
• Support Vector Machine • Gradient Boosting
• Naive Bayes

Examples of binary classification


include- Examples of multi-class classification
• Email spam detection (spam or include:
Examples not). • Face classification.
• Churn prediction (churn or not). • Plant species classification.
• Conversion prediction (buy or • Optical character recognition.
not).
Ranking: Ranking in supervised machine learning involves training models to sort data in
an optimal and relevant order based on labeled datasets. The primary goal is to predict
outcomes by assigning a ranking to different data points. Initially deployed in search
engines, ranking algorithms reorder search results to display the most relevant information
to users.
How Ranking Works:
1. Components: Ranking models consist of queries and documents. Queries are input
values (e.g., search terms), and documents are the output or results associated with
the queries.
2. Scoring Documents: For a given query, a function scores each document based on
parameters like relevance. This scoring helps rank documents in order of their
perceived importance or relevance to the query.
3. Machine Learning Algorithm: The learning-to-rank algorithm takes the scores from
the model and uses them to predict future outcomes for new and unseen documents.
LTR has many different applications. Here are some of them:
Search engines. A user types a query into a browser search bar. The search engine should
rank the web pages in a way that the most relevant results appear in top positions.
Recommender systems. A movie recommender system choosing which film should be
recommended to a user based on an input query.
Ranking types:
1. Pointwise Ranking: Pointwise ranking involves assigning a score or ranking
independently to each item, and the items are then ranked based on their individual
scores.
2. Pairwise Ranking: Pairwise ranking compares items in pairs and assigns a preference
or order between them. The model learns to rank items based on their pairwise
comparisons.
3. Listwise Ranking: Listwise ranking considers the entire list or set of items as a whole
and optimizes the ranking directly for the entire list, taking into account the
relationships and interactions between items.
All of these methods transform ranking task to a classification or regression problem.
Unsupervised Learning
Unsupervised learning is a type of machine learning algorithm used to draw inferences from
datasets consisting of input data without labeled responses.
In unsupervised learning algorithms, classification or categorization is not included in the
observations. There are no output values and so there is no estimation of functions. Since
the examples given to the learner are unlabeled, the accuracy of the structure that is output
by the algorithm cannot be evaluated.
The most common unsupervised learning method is cluster analysis, which is used for
exploratory data analysis to find hidden patterns or grouping in data.
Advantages of Unsupervised Learning:
• Unsupervised learning is used for more complex tasks as compared to supervised
learning because, in unsupervised learning, we don't have labeled input data.
• Unsupervised learning is preferable as it is easy to get unlabeled data in comparison
to labeled data.
Disadvantages of Unsupervised Learning
• Unsupervised learning is intrinsically more difficult than supervised learning as it does
not have corresponding output.
• The result of the unsupervised learning algorithm might be less accurate as input data
is not labeled, and algorithms do not know the exact output in advance.
The unsupervised learning algorithm can be further categorized into two types of problem
• Clustering: Clustering is a method of grouping the objects into clusters such that objects
with most similarities remain into a group and has less or no similarities with the objects
of another group. Cluster analysis finds the commonalities between the data objects and
categorizes them as per the presence and absence of those commonalities.
• Association: An association rule is an unsupervised learning method which is used for finding the
relationships between variables in the large database. It determines the set of items that occurs
together in the dataset. Association rule makes marketing strategy more effective. Such as people who
buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical example of
Association rule is Market Basket Analysis.

K-Means Clustering Algorithm: K-Means Clustering is an unsupervised learning algorithm that is used
to solve the clustering problems in machine learning or data science which groups the unlabeled dataset
into different clusters. Here K defines the number of pre-defined clusters that need to be created in the
process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.It is an
iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each
dataset belongs only one group that has similar properties.

• It allows us to cluster the data into different groups and a convenient way to discover the categories of
groups in the unlabeled dataset on its own without the need for any training.
• It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this
algorithm is to minimize the sum of distances between the data point and their corresponding clusters.
• The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and
repeats the process until it does not find the best clusters. The value of k should be predetermined in
this algorithm.
The k-means clustering algorithm mainly performs two tasks:

• Determines the best value for K center points or centroids by an iterative process.
• Assigns each data point to its closest k-center. Those data points which are near to the particular k-
center, create a cluster.

The working of the K-Means algorithm is explained in the below steps:


Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which
will form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each
cluster.
Step-5: Repeat the third steps, which means reassign each
datapoint to the new closest centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to
FINISH.
Step-7: The model is ready.
How to choose the value of "K number of clusters" in K-means Clustering?
The performance of the K-means clustering algorithm depends upon highly efficient clusters that it forms.
But choosing the optimal number of clusters is a big task. There are some different ways to find the optimal
number of clusters, but here we are discussing the most appropriate method to find the number of clusters
or value of K.
The Elbow Method is a simple and visual technique to help choose the optimal number of clusters (K) in K-
means clustering. Here's an easy explanation of how to use the Elbow Method Step-by-Step Explanation:

• Run K-means for Different Values of K: Start by running the K-means clustering algorithm for a range of
values of K. You can choose a reasonable range based on your understanding of the data.
• Compute the Sum of Squared Distances (Inertia): For each value of K, compute the sum of squared
distances (inertia or within-cluster sum of squares). This measures how far each point in a cluster is
from the center of that cluster.
• Plot the Elbow Curve: Plot a curve with the number of clusters (K) on the x-axis and the corresponding
sum of squared distances on the y-axis.
• Identify the "Elbow" Point: Examine the plot. The "elbow" is the point where the reduction in the sum
of squared distances starts to slow down, forming an elbow-like bend in the curve.
• Choose the Elbow Point as the Optimal K: The number of clusters (K) corresponding to the elbow point
is considered the optimal choice. At this point, adding more clusters doesn't significantly improve the
clustering quality.
What is Dimensionality Reduction?
• The number of input features, variables, or columns present in a given dataset is known as
dimensionality, and the process to reduce these features is called dimensionality reduction.
• A dataset contains a huge number of input features in various cases, which makes the predictive
modeling task more complicated. Because it is very difficult to visualize or make predictions for the
training dataset with a high number of features, for such cases, dimensionality reduction techniques are
required to use
• Dimensionality reduction technique can be defined as, "It is a way of converting the higher dimensions
dataset into lesser dimensions dataset ensuring that it provides similar information." These techniques
are widely used in machine learning for obtaining a better fit predictive model while solving the
classification and regression problems.
• It is commonly used in the fields that deal with high-dimensional data, such as speech recognition,
signal processing, bioinformatics, etc. It can also be used for data visualization, noise reduction, cluster
analysis, etc.

Main Goals of Dimensionality Reduction:


Simplify Data: By reducing the number of dimensions, the data becomes more manageable and easier to
comprehend.
Speed up Computation: High-dimensional data often requires more computational resources.
Dimensionality reduction speeds up algorithms and analyses.

Avoid Overfitting: With fewer dimensions, the risk of overfitting diminishes. The model becomes more
focused on patterns rather than noise.

Visualize Data: It's challenging to visualize data in high dimensions. Dimensionality reduction helps project
data onto lower-dimensional spaces, making visualization feasible.

Kernal Principal Component Analysis is a nonlinear Dimensionality Reduction.

Principal Component Analysis(PCA)


Principal Component Analysis is an unsupervised learning algorithm that is used for the dimensionality
reduction in machine learning. It is a statistical process that converts the observations of correlated
features into a set of linearly uncorrelated features with the help of orthogonal transformation. These new
transformed features are called the Principal Components. It is one of the popular tools that is used for
exploratory data analysis and predictive modeling. It is a technique to draw strong patterns from the given
dataset by reducing the variances.

• PCA generally tries to find the lower-dimensional surface to project the high-dimensional data.
• PCA works by considering the variance of each attribute because the high attribute shows the good split
between the classes, and hence it reduces the dimensionality. Some real-world applications of PCA are
image processing, movie recommendation system, optimizing the power allocation in various
communication channels. It is a feature extraction technique, so it contains the important variables and
drops the least important variable.
The PCA algorithm is based on some mathematical concepts such as:
Variance and Covariance Eigenvalues and Eigen factors
Some common terms used in PCA algorithm:

• Dimensionality: It is the number of features or variables present in the given dataset. More easily, it is
the number of columns present in the dataset.
• Correlation: It signifies that how strongly two variables are related to each other. Such as if one
changes, the other variable also gets changed. The correlation value ranges from -1 to +1. Here, -1
occurs if variables are inversely proportional to each other, and +1 indicates that variables are directly
proportional to each other.
• Orthogonal: It defines that variables are not correlated to each other, and hence the correlation
between the pair of variables is zero.
• Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will be eigenvector
if Av is the scalar multiple of v.
• Covariance Matrix: A matrix containing the covariance between the pair of variables is called the
Covariance Matrix.
Steps for PCA Algorithm:

• Getting the Dataset: Take the input dataset and split it into two parts: X (training set) and Y (validation
set).
• Representing Data Structure: Represent the dataset in a matrix structure (X). Each row corresponds to
data items, and each column corresponds to features. The number of columns is the dimensions of the
dataset.
• Standardizing the Data: Standardize the dataset to give importance to features with high variance. If
feature importance is independent of variance, divide each data item in a column by the standard
deviation of the column, creating a matrix named Z.
• Calculating the Covariance of Z: Calculate the covariance matrix of Z by transposing Z and multiplying it
by Z.
• Calculating Eigenvalues and Eigenvectors: Calculate the eigenvalues and eigenvectors for the
covariance matrix Z. Eigenvectors represent directions of axes with high information, and eigenvalues
are their coefficients.
• Sorting Eigen Vectors: Sort eigenvalues in decreasing order (largest to smallest) and simultaneously sort
the corresponding eigenvectors in matrix P.
• Calculating New Features (Principal Components): Multiply the sorted eigenvector matrix P* by Z to
obtain the new feature matrix Z*. Each observation in Z* is a linear combination of original features,
and columns are independent.
• Remove Unimportant Features: Decide which features to keep and remove in the new dataset Z*. Keep
relevant and important features, removing less important ones.
• Applications of Principal Component Analysis: PCA is mainly used as the dimensionality reduction
technique in various AI applications such as computer vision, image compression, etc.
• It can also be used for finding hidden patterns if data has high dimensions. Some fields where PCA is
used are Finance, data mining, Psychology, etc.
• Linear Limitation of PCA: Traditional Principal Component Analysis (PCA) is powerful for linearly
separable datasets. However, it may not be optimal for non-linear datasets, as it assumes linear
relationships between features.
• Introduction to Kernel PCA: Kernel PCA is a technique designed to handle non-linear datasets. It
extends the capabilities of PCA by incorporating a kernel function, allowing the algorithm to project the
data into a higher-dimensional space where it becomes linearly separable.
• Kernel Functions: Kernel functions (e.g., linear, polynomial, Gaussian) play a key role in KPCA. They map
the input data into a space where complex relationships between data points can be linearly
represented.
• Workflow of KPCA:
➢ Nonlinear Mapping: Apply a chosen kernel function to map the original data into a higher-dimensional
feature space.
➢ PCA in the New Space: Conduct PCA in the transformed space. Now, even though the original data may
not be linearly separable, the transformed data might be.
➢ Capture Nonlinear Relationships: Extract principal components in the higher-dimensional space,
capturing complex and nonlinear relationships between data points.
➢ Applications: Use the principal components for various tasks, such as data visualization, clustering, or
classification, in the transformed space.
➢ Benefits of KPCA:
• Handling Non-Linearity: Effective in scenarios where relationships between features are non-linear.
• Complex Pattern Recognition: Captures intricate patterns and structures in the data that linear
methods might miss.
• Versatility: Can be applied to various types of datasets, especially those with non-linear characteristics.

Matrix Factorization
Matrix factorization is a technique used in various fields, including machine learning and data analysis. At
its core, it involves breaking down a matrix into a product of simpler matrices. This process is valuable for
tasks such as recommendation systems, collaborative filtering, and dimensionality reduction.
➢ Basic Idea: Consider a matrix, say A, with dimensions m×n. The goal of matrix factorization is to
express this matrix as the product of two matrices, B and C, where B has dimensions m×k and C has
dimensions k×n. The parameter k is a user-defined value representing the desired reduced
dimensionality.
➢ Mathematical Representation: A≈B×C
Use Cases:
➢ Recommendation Systems: In collaborative filtering, matrix factorization can represent users and
items in a lower-dimensional space, helping make personalized recommendations.
➢ Image Compression: For images represented as matrices, factorizing them can lead to a more
compact representation, saving storage space.
➢ Text Mining: Applied to document-term matrices in natural language processing to discover latent
topics in a collection of documents.
Matrix Factorization Algorithms:
➢ Singular Value Decomposition (SVD): One of the most common methods, SVD decomposes a matrix
into three matrices representing singular values and left and right singular vectors.
➢ Alternating Least Squares (ALS): Often used in collaborative filtering problems, ALS iteratively updates
user and item matrices to minimize the reconstruction error.
➢ Gradient Descent: Optimization algorithms like gradient descent can be applied to minimize the
difference between the original matrix and the product of the factorized matrices.
Benefits:

• Dimensionality Reduction: Matrix factorization allows the representation of data in a lower-


dimensional space, reducing the complexity of the original data.
• Pattern Discovery: By capturing latent features, matrix factorization helps in discovering underlying
patterns and relationships in the data.

Matrix Completion
Matrix completion is a technique used in data analysis to fill in or "complete" missing values in a matrix.
Imagine you have a matrix with some entries missing, and you want to predict or estimate those missing
values based on the available information. Matrix completion algorithms are designed to achieve precisely
that.

Key Concepts:
• Incomplete Matrix: Imagine you have a matrix where some entries are unknown or missing. This matrix
could represent various types of data, such as user-item ratings, sensor measurements, or any situation
where not all information is available.
• Objective: The goal of matrix completion is to fill in the missing entries in the matrix accurately. This is
achieved by leveraging the patterns and relationships present in the observed (non-missing) entries.
• Assumption: Matrix completion assumes that the underlying data has some inherent structure or low-
rank property. Low-rank matrices have a reduced number of independent columns or rows, suggesting
that the data can be well-approximated using a smaller number of features.

Steps in Matrix Completion:


• Initialization: Begin with the incomplete matrix, where certain entries are missing.
• Low-Rank Approximation: Assume that the matrix has a low-rank structure. This means that the
matrix can be approximated by a product of two lower-dimensional matrices.
• Optimization: Optimize the low-rank approximation to minimize the difference between the
estimated matrix and the observed entries.
• Prediction: Once the optimization is complete, use the filled-in matrix to predict or estimate the
missing values.

Applications of Matrix Completion:


• Recommendation Systems: Matrix completion is widely used in recommendation systems to predict
user ratings for items (movies, products) based on known ratings.
• Image and Video Recovery: In image and video processing, matrix completion can help recover missing
or corrupted pixels in images or frames.
• Sensor Networks: In sensor networks, where not all sensors may provide readings at all times, matrix
completion can estimate the missing sensor measurements.
• Collaborative Filtering: Collaborative filtering, a technique used in personalized content
recommendations, often involves matrix completion to predict user preferences.
Generative Models: Mixture Models and Latent Factor Models
Generative Models: Generative models are a class of machine learning models that aim to understand and
mimic the underlying distribution of the training data. Once trained, generative models can generate new,
realistic samples that resemble the original data distribution. Two common types of generative models are
mixture models and latent factor models.

Mixture Models
Mixture models assume that the data is generated by a mixture of several underlying probability
distributions. Each component in the mixture represents a different source or process that contributes to
the overall distribution.
Components: Each component has its own parameters (mean, variance) and a weight that determines its
contribution to the overall mixture.
Example: Think of a mixture of Gaussian distributions. Each Gaussian component represents a cluster in the
data, and the mixture model can capture complex patterns where data points may come from different
clusters.
Use Cases: Mixture models are used in clustering problems, where the goal is to assign data points to
different clusters based on their probability of belonging to each component.

Latent Factor Models


Latent factor models assume that there are underlying, unobservable factors (latent factors) that
contribute to the observed data. These latent factors help explain the structure and patterns in the data.

Components: Latent factors are hidden variables that are not directly observed but influence the observed
data.
Example: In collaborative filtering for recommendation systems, users and items can be represented by
latent factors. The model aims to discover these latent factors to predict how users might rate items.
Use Cases: Latent factor models are commonly used in recommendation systems, matrix factorization, and
other applications where understanding the underlying factors influencing the data is essential.

➢ Key Takeaways:
Generative Models:

• Aim to understand and model the underlying data distribution.


• Can generate new samples that resemble the training data.
Mixture Models:

• Assume data comes from a mixture of different probability distributions.


• Commonly used for clustering problems.
Latent Factor Models:

• Assume there are hidden factors influencing the observed data.


• Commonly used in recommendation systems and matrix factorization.
Scalable Machine Learning
Scalable machine learning refers to the ability of a machine learning system or algorithm to efficiently
handle and process increasing amounts of data and computational resources as the scale of the problem
grows. In simpler terms, it means that the machine learning solution can effectively scale up its
performance when faced with larger datasets, more complex models, or higher computational demands.
Key Components of Scalable Machine Learning:

• Data Volume: Scalable machine learning systems can handle large and growing datasets. As the
amount of data increases, the system remains effective in training models and making predictions.
• Model Complexity: Scalability also applies to the complexity of the machine learning models. A
scalable system should be able to accommodate more intricate models without a significant drop in
performance.
• Computational Resources: Scalable machine learning algorithms efficiently use available
computational resources. This includes the ability to parallelize computations, distribute tasks
across multiple processors or machines, and make use of specialized hardware when available.
• Performance Consistency: Scalability doesn't just mean handling larger volumes of data; it also
involves maintaining consistent performance. As the system scales, it should still provide reliable
and timely results without becoming prohibitively slow or resource-intensive.
Why Scalability Matters:

• Big Data Challenges: In the era of big data, organizations deal with massive datasets. Scalable
machine learning is crucial for extracting meaningful insights from these vast amounts of
information.
• Complex Models: As machine learning models become more sophisticated, they often require more
computational resources. Scalability ensures that these models can be trained and deployed
efficiently.
• Real-Time Processing: Some applications, such as real-time analytics or online services, require
quick responses. Scalable machine learning allows for timely predictions even when faced with large
workloads.
• Cost Efficiency: Efficient use of resources is vital for cost-effectiveness. Scalable solutions can
leverage resources more effectively, reducing the total cost of computation.

Examples of Scalable Machine Learning:

• Distributed Computing Frameworks: Technologies like Apache Spark and Hadoop enable the
distribution of machine learning tasks across a cluster of machines, enhancing scalability.
• Parallel Processing: Algorithms that can be parallelized, such as stochastic gradient descent, allow
for efficient use of multiple processors, speeding up training on large datasets.
• Cloud Computing: Cloud platforms provide scalable infrastructure for machine learning, allowing
users to scale up or down based on their computational needs.

Semi-Supervised Learning
Semi-supervised learning is a type of machine learning where the model is trained on a dataset that
contains both labeled and unlabeled data. In traditional supervised learning, the model is trained solely on
labeled data, where each input is associated with a corresponding output. However, obtaining labeled data
can be expensive and time-consuming. Semi-supervised learning seeks to leverage both labeled and
unlabeled data to build a more robust and accurate model.
How Semi-Supervised Learning Works:
• Initial Labeled Training: The model is first trained on the available labeled data in a supervised
manner. This helps the model learn from the explicit input-output associations.
• Unlabeled Data Utilization: After the initial training, the model is fine-tuned or further trained on
the unlabeled data. The model generalizes from the patterns observed in the labeled data to make
predictions on the unlabeled instances.
• Semi-Supervised Algorithms: Various algorithms and techniques are designed for semi-supervised
learning, including self-training, co-training, and multi-view learning. These approaches leverage the
unlabeled data in different ways to enhance model performance.

Benefits of Semi-Supervised Learning:


• Cost-Effective: Reduces the cost and effort associated with obtaining labeled data for training, as
unlabeled data is often more readily available.
• Improved Generalization: The additional information from unlabeled data can lead to improved
generalization and better model performance, especially in scenarios with limited labeled data.
• Versatility: Suitable for situations where obtaining labeled data is challenging, such as in medical
imaging, natural language processing, and various other domains.

Challenges:
• Quality of Unlabeled Data: The effectiveness of semi-supervised learning depends on the quality
and representativeness of the unlabeled data.
• Model Sensitivity: The model's performance may be sensitive to the proportion of labeled and
unlabeled data, and the choice of algorithm.

Reinforcement Learning
Reinforcement learning is a type of machine learning where an agent learns how to behave in an
environment by performing actions and receiving feedback in the form of rewards or penalties. The agent
aims to discover the optimal strategy or policy that maximizes cumulative rewards over time. In simpler
terms, it's like training a computer to make decisions by trial and error, figuring out what actions lead to
better outcomes.

Key Concepts:
• Agent: The learner or decision-maker is referred to as the agent. It interacts with the
environment and makes decisions to achieve specific goals.
• Environment: The external system or context in which the agent operates is called the
environment. It can be anything from a game environment to a robotic system or a
simulated world.
• Actions: The set of possible moves or decisions that the agent can make within the
environment. Actions are taken based on the current state.
• States: Representations of the current situation or configuration of the environment. The
agent's actions influence the transition from one state to another.
• Rewards: Numeric values that the agent receives as feedback after taking an action in a
specific state. Rewards indicate the immediate benefit or cost associated with the action.
• Policy: The strategy or set of rules that the agent follows to decide its actions. The goal is to
find the optimal policy that maximizes the cumulative reward.
How Reinforcement Learning Works:
• Initialization: The agent starts in an initial state within the environment.
• Action Selection: The agent selects an action based on its current state, following a certain
policy.
• Environment Interaction: The selected action influences a transition to a new state within
the environment.
• Reward Assignment: The agent receives a reward or penalty based on the action taken and
the new state reached.
• Learning: The agent adjusts its strategy or policy based on the received feedback, aiming to
improve its decision-making over time.
• Iterative Process: The agent continues to interact with the environment, refining its policy
through repeated trial and error.
Applications of Reinforcement Learning:
• Game Playing: Training agents to play games like chess, Go, or video games.
• Robotics: Teaching robots to perform complex tasks in real-world environments.
• Autonomous Vehicles: Enabling self-driving cars to make decisions in dynamic traffic
situations.
• Recommendation Systems: Optimizing recommendations to users based on their
interactions.
Inference in graphical models, particularly in the context of probabilistic graphical models (PGMs), involves
making predictions or estimating unknown variables given observed data and the structure of the
model. Graphical models, such as Bayesian networks or Markov random fields, use graphical
representations to encode probabilistic relationships among a set of variables.
Here's a brief explanation of inference in graphical models:

• Probabilistic Graphical Models (PGMs): PGMs are a family of statistical models that represent
the dependencies between random variables using a graph structure. Nodes in the graph
represent variables, and edges represent probabilistic dependencies.
• Types of Graphical Models:
o Bayesian Networks (BN): Directed acyclic graphs representing probabilistic dependencies
among variables.
o Markov Random Fields (MRF): Undirected graphs representing dependencies using
pairwise potentials.
• Inference Tasks:
o Marginalization: Computing the marginal distribution of one or more variables by
summing or integrating over other variables.
o Conditioning: Updating the probability distribution based on observed evidence or
conditions.
o Maximum A Posteriori (MAP) Estimation: Finding the most probable values of variables
given evidence.
o Joint Probability Estimation: Computing the joint probability of a set of variables.
• Message Passing: Many inference algorithms in graphical models involve message passing
between nodes in the graph. Algorithms like the Belief Propagation algorithm or the Junction
Tree algorithm use message passing to efficiently compute probabilities.
• Applications:
o Medical Diagnosis: Bayesian networks can be used to model the dependencies among
symptoms and diseases for diagnostic purposes.
o Image Segmentation: Markov random fields can represent spatial dependencies among
pixels for image segmentation tasks.
• Challenges: Computational Complexity: Inference in graphical models can be computationally
expensive, especially for large and complex graphs.
o Graph Structure: The accuracy of inference often depends on the correct modeling of the
underlying dependencies in the graph.
For Download More Computer
Science Notes Join Our
Telegram Channel -
https://t.me/thecoderbro

Scan to Join -

You might also like