0% found this document useful (0 votes)
16 views27 pages

AI Unit 5

The document provides an overview of machine learning, detailing its types: supervised, unsupervised, semi-supervised, and reinforcement learning. Each type is explained with definitions, examples, advantages, disadvantages, and applications. The conclusion emphasizes the significance of machine learning in enhancing data prediction capabilities across various industries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views27 pages

AI Unit 5

The document provides an overview of machine learning, detailing its types: supervised, unsupervised, semi-supervised, and reinforcement learning. Each type is explained with definitions, examples, advantages, disadvantages, and applications. The conclusion emphasizes the significance of machine learning in enhancing data prediction capabilities across various industries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

AI unit 5

Types of Machine Learning

Machine learning is the branch of Artificial Intelligence that focuses on developing models and algorithms that
let computers learn from data and improve from previous experience without being explicitly programmed for
every task. In simple words, ML teaches the systems to think and understand like humans by learning from the
data.

In this article, we will explore the various types of machine learning algorithms that are important for future
requirements. Machine learning is generally a training system to learn from past experiences and improve
performance over time. Machine learning helps to predict massive amounts of data. It helps to deliver fast and
accurate results to get profitable opportunities.

Types of Machine Learning

There are several types of machine learning, each with special characteristics and applications. Some of the
main types of machine learning algorithms are as follows:

1. Supervised Machine Learning

2. Unsupervised Machine Learning

3. Semi-Supervised Machine Learning

4. Reinforcement Learning

Types of Machine Learning

1. Supervised Machine Learning

Supervised learning is defined as when a model gets trained on a “Labelled Dataset”. Labelled datasets have
both input and output parameters. In Supervised Learning algorithms learn to map points between inputs and
correct outputs. It has both training and validation datasets labelled.

Supervised Learning

Let’s understand it with the help of an example.

Example: Consider a scenario where you have to build an image classifier to differentiate between cats and
dogs. If you feed the datasets of dogs and cats labelled images to the algorithm, the machine will learn to
classify between a dog or a cat from these labeled images. When we input new dog or cat images that it has
never seen before, it will use the learned algorithms and predict whether it is a dog or a cat. This is
how supervised learning works, and this is particularly an image classification.
There are two main categories of supervised learning that are mentioned below:

 Classification

 Regression

Classification

Classification deals with predicting categorical target variables, which represent discrete classes or labels. For
instance, classifying emails as spam or not spam, or predicting whether a patient has a high risk of heart disease.
Classification algorithms learn to map the input features to one of the predefined classes.

Here are some classification algorithms:

 Logistic Regression

 Support Vector Machine

 Random Forest

 Decision Tree

 K-Nearest Neighbors (KNN)

 Naive Bayes

Regression

Regression, on the other hand, deals with predicting continuous target variables, which represent numerical
values. For example, predicting the price of a house based on its size, location, and amenities, or forecasting the
sales of a product. Regression algorithms learn to map the input features to a continuous numerical value.

Here are some regression algorithms:

 Linear Regression

 Polynomial Regression

 Ridge Regression

 Lasso Regression

 Decision tree

 Random Forest

Advantages of Supervised Machine Learning

 Supervised Learning models can have high accuracy as they are trained on labelled data.

 The process of decision-making in supervised learning models is often interpretable.

 It can often be used in pre-trained models which saves time and resources when developing new
models from scratch.

Disadvantages of Supervised Machine Learning

 It has limitations in knowing patterns and may struggle with unseen or unexpected patterns that are not
present in the training data.

 It can be time-consuming and costly as it relies on labeled data only.

 It may lead to poor generalizations based on new data.

Applications of Supervised Learning


Supervised learning is used in a wide variety of applications, including:

 Image classification: Identify objects, faces, and other features in images.

 Natural language processing: Extract information from text, such as sentiment, entities, and
relationships.

 Speech recognition: Convert spoken language into text.

 Recommendation systems: Make personalized recommendations to users.

 Predictive analytics: Predict outcomes, such as sales, customer churn, and stock prices.

 Medical diagnosis: Detect diseases and other medical conditions.

 Fraud detection: Identify fraudulent transactions.

 Autonomous vehicles: Recognize and respond to objects in the environment.

 Email spam detection: Classify emails as spam or not spam.

 Quality control in manufacturing: Inspect products for defects.

 Credit scoring: Assess the risk of a borrower defaulting on a loan.

 Gaming: Recognize characters, analyze player behavior, and create NPCs.

 Customer support: Automate customer support tasks.

 Weather forecasting: Make predictions for temperature, precipitation, and other meteorological
parameters.

 Sports analytics: Analyze player performance, make game predictions, and optimize strategies.

2. Unsupervised Machine Learning

Unsupervised Learning Unsupervised learning is a type of machine learning technique in which an algorithm
discovers patterns and relationships using unlabeled data. Unlike supervised learning, unsupervised learning
doesn’t involve providing the algorithm with labeled target outputs. The primary goal of Unsupervised learning
is often to discover hidden patterns, similarities, or clusters within the data, which can then be used for various
purposes, such as data exploration, visualization, dimensionality reduction, and more.

Unsupervised Learning

Let’s understand it with the help of an example.

Example: Consider that you have a dataset that contains information about the purchases you made from the
shop. Through clustering, the algorithm can group the same purchasing behavior among you and other
customers, which reveals potential customers without predefined labels. This type of information can help
businesses get target customers as well as identify outliers.

There are two main categories of unsupervised learning that are mentioned below:

 Clustering

 Association

Clustering

Clustering is the process of grouping data points into clusters based on their similarity. This technique is useful
for identifying patterns and relationships in data without the need for labeled examples.

Here are some clustering algorithms:

 K-Means Clustering algorithm

 Mean-shift algorithm

 DBSCAN Algorithm

 Principal Component Analysis

 Independent Component Analysis

Association

Association rule learning is a technique for discovering relationships between items in a dataset. It identifies
rules that indicate the presence of one item implies the presence of another item with a specific probability.

Here are some association rule learning algorithms:

 Apriori Algorithm

 Eclat

 FP-growth Algorithm

Advantages of Unsupervised Machine Learning

 It helps to discover hidden patterns and various relationships between the data.

 Used for tasks such as customer segmentation, anomaly detection, and data exploration.

 It does not require labeled data and reduces the effort of data labeling.

Disadvantages of Unsupervised Machine Learning

 Without using labels, it may be difficult to predict the quality of the model’s output.

 Cluster Interpretability may not be clear and may not have meaningful interpretations.

 It has techniques such as autoencoders and dimensionality reduction that can be used to extract
meaningful features from raw data.

Applications of Unsupervised Learning

Here are some common applications of unsupervised learning:

 Clustering: Group similar data points into clusters.

 Anomaly detection: Identify outliers or anomalies in data.

 Dimensionality reduction: Reduce the dimensionality of data while preserving its essential
information.
 Recommendation systems: Suggest products, movies, or content to users based on their historical
behavior or preferences.

 Topic modeling: Discover latent topics within a collection of documents.

 Density estimation: Estimate the probability density function of data.

 Image and video compression: Reduce the amount of storage required for multimedia content.

 Data preprocessing: Help with data preprocessing tasks such as data cleaning, imputation of missing
values, and data scaling.

 Market basket analysis: Discover associations between products.

 Genomic data analysis: Identify patterns or group genes with similar expression profiles.

 Image segmentation: Segment images into meaningful regions.

 Community detection in social networks: Identify communities or groups of individuals with similar
interests or connections.

 Customer behavior analysis: Uncover patterns and insights for better marketing and product
recommendations.

 Content recommendation: Classify and tag content to make it easier to recommend similar items to
users.

 Exploratory data analysis (EDA): Explore data and gain insights before defining specific tasks.

3. Semi-Supervised Learning

Semi-Supervised learning is a machine learning algorithm that works between the supervised and
unsupervised learning so it uses both labelled and unlabelled data. It’s particularly useful when obtaining
labeled data is costly, time-consuming, or resource-intensive. This approach is useful when the dataset is
expensive and time-consuming. Semi-supervised learning is chosen when labeled data requires skills and
relevant resources in order to train or learn from it.

We use these techniques when we are dealing with data that is a little bit labeled and the rest large portion of it is
unlabeled. We can use the unsupervised techniques to predict labels and then feed these labels to supervised
techniques. This technique is mostly applicable in the case of image data sets where usually all images are not
labeled.

Semi-Supervised Learning

Let’s understand it with the help of an example.

Example: Consider that we are building a language translation model, having labeled translations for every
sentence pair can be resources intensive. It allows the models to learn from labeled and unlabeled sentence pairs,
making them more accurate. This technique has led to significant improvements in the quality of machine
translation services.

Types of Semi-Supervised Learning Methods


There are a number of different semi-supervised learning methods each with its own characteristics. Some of the
most common ones include:

 Graph-based semi-supervised learning: This approach uses a graph to represent the relationships
between the data points. The graph is then used to propagate labels from the labeled data points to the
unlabeled data points.

 Label propagation: This approach iteratively propagates labels from the labeled data points to the
unlabeled data points, based on the similarities between the data points.

 Co-training: This approach trains two different machine learning models on different subsets of the
unlabeled data. The two models are then used to label each other’s predictions.

 Self-training: This approach trains a machine learning model on the labeled data and then uses the
model to predict labels for the unlabeled data. The model is then retrained on the labeled data and the
predicted labels for the unlabeled data.

 Generative adversarial networks (GANs): GANs are a type of deep learning algorithm that can be
used to generate synthetic data. GANs can be used to generate unlabeled data for semi-supervised
learning by training two neural networks, a generator and a discriminator.

Advantages of Semi- Supervised Machine Learning

 It leads to better generalization as compared to supervised learning, as it takes both labeled and
unlabeled data.

 Can be applied to a wide range of data.

Disadvantages of Semi- Supervised Machine Learning

 Semi-supervised methods can be more complex to implement compared to other approaches.

 It still requires some labeled data that might not always be available or easy to obtain.

 The unlabeled data can impact the model performance accordingly.

Applications of Semi-Supervised Learning

Here are some common applications of semi-supervised learning:

 Image Classification and Object Recognition: Improve the accuracy of models by combining a small
set of labeled images with a larger set of unlabeled images.

 Natural Language Processing (NLP): Enhance the performance of language models and classifiers
by combining a small set of labeled text data with a vast amount of unlabeled text.

 Speech Recognition: Improve the accuracy of speech recognition by leveraging a limited amount of
transcribed speech data and a more extensive set of unlabeled audio.

 Recommendation Systems: Improve the accuracy of personalized recommendations by supplementing


a sparse set of user-item interactions (labeled data) with a wealth of unlabeled user behavior data.

 Healthcare and Medical Imaging: Enhance medical image analysis by utilizing a small set of labeled
medical images alongside a larger set of unlabeled images.

4. Reinforcement Machine Learning

Reinforcement machine learning algorithm is a learning method that interacts with the environment by
producing actions and discovering errors. Trial, error, and delay are the most relevant characteristics of
reinforcement learning. In this technique, the model keeps on increasing its performance using Reward
Feedback to learn the behavior or pattern. These algorithms are specific to a particular problem e.g. Google Self
Driving car, AlphaGo where a bot competes with humans and even itself to get better and better performers in
Go Game. Each time we feed in data, they learn and add the data to their knowledge which is training data. So,
the more it learns the better it gets trained and hence experienced.

Here are some of most common reinforcement learning algorithms:

 Q-learning: Q-learning is a model-free RL algorithm that learns a Q-function, which maps states to
actions. The Q-function estimates the expected reward of taking a particular action in a given state.

 SARSA (State-Action-Reward-State-Action): SARSA is another model-free RL algorithm that learns


a Q-function. However, unlike Q-learning, SARSA updates the Q-function for the action that was
actually taken, rather than the optimal action.

 Deep Q-learning: Deep Q-learning is a combination of Q-learning and deep learning. Deep Q-learning
uses a neural network to represent the Q-function, which allows it to learn complex relationships
between states and actions.

Reinforcement Machine Learning

Let’s understand it with the help of examples.

Example: Consider that you are training an AI agent to play a game like chess. The agent explores different
moves and receives positive or negative feedback based on the outcome. Reinforcement Learning also finds
applications in which they learn to perform tasks by interacting with their surroundings.

Types of Reinforcement Machine Learning

There are two main types of reinforcement learning:

Positive reinforcement

 Rewards the agent for taking a desired action.

 Encourages the agent to repeat the behavior.

 Examples: Giving a treat to a dog for sitting, providing a point in a game for a correct answer.

Negative reinforcement

 Removes an undesirable stimulus to encourage a desired behavior.

 Discourages the agent from repeating the behavior.

 Examples: Turning off a loud buzzer when a lever is pressed, avoiding a penalty by completing a task.

Advantages of Reinforcement Machine Learning

 It has autonomous decision-making that is well-suited for tasks and that can learn to make a sequence
of decisions, like robotics and game-playing.

 This technique is preferred to achieve long-term results that are very difficult to achieve.

 It is used to solve a complex problems that cannot be solved by conventional techniques.

Disadvantages of Reinforcement Machine Learning


 Training Reinforcement Learning agents can be computationally expensive and time-consuming.

 Reinforcement learning is not preferable to solving simple problems.

 It needs a lot of data and a lot of computation, which makes it impractical and costly.

Applications of Reinforcement Machine Learning

Here are some applications of reinforcement learning:

 Game Playing: RL can teach agents to play games, even complex ones.

 Robotics: RL can teach robots to perform tasks autonomously.

 Autonomous Vehicles: RL can help self-driving cars navigate and make decisions.

 Recommendation Systems: RL can enhance recommendation algorithms by learning user preferences.

 Healthcare: RL can be used to optimize treatment plans and drug discovery.

 Natural Language Processing (NLP): RL can be used in dialogue systems and chatbots.

 Finance and Trading: RL can be used for algorithmic trading.

 Supply Chain and Inventory Management: RL can be used to optimize supply chain operations.

 Energy Management: RL can be used to optimize energy consumption.

 Game AI: RL can be used to create more intelligent and adaptive NPCs in video games.

 Adaptive Personal Assistants: RL can be used to improve personal assistants.

 Virtual Reality (VR) and Augmented Reality (AR): RL can be used to create immersive and
interactive experiences.

 Industrial Control: RL can be used to optimize industrial processes.

 Education: RL can be used to create adaptive learning systems.

 Agriculture: RL can be used to optimize agricultural operations.

Conclusion

In conclusion, each type of machine learning serves its own purpose and contributes to the overall role in
development of enhanced data prediction capabilities, and it has the potential to change various industries
like Data Science. It helps deal with massive data production and management of the datasets.

Decision Tree in Machine Learning

A decision tree in machine learning is a versatile, interpretable algorithm used for predictive modelling. It
structures decisions based on input data, making it suitable for both classification and regression tasks. This
article delves into the components, terminologies, construction, and advantages of decision trees, exploring their
applications and learning algorithms.

Decision Tree in Machine Learning

A decision tree is a type of supervised learning algorithm that is commonly used in machine learning to model
and predict outcomes based on input data. It is a tree-like structure where each internal node tests on attribute,
each branch corresponds to attribute value and each leaf node represents the final decision or prediction. The
decision tree algorithm falls under the category of supervised learning. They can be used to solve
both regression and classification problems.

Decision Tree Terminologies


There are specialized terms associated with decision trees that denote various components and facets of the tree
structure and decision-making procedure. :

 Root Node: A decision tree’s root node, which represents the original choice or feature from which the
tree branches, is the highest node.

 Internal Nodes (Decision Nodes): Nodes in the tree whose choices are determined by the values of
particular attributes. There are branches on these nodes that go to other nodes.

 Leaf Nodes (Terminal Nodes): The branches’ termini, when choices or forecasts are decided upon.
There are no more branches on leaf nodes.

 Branches (Edges): Links between nodes that show how decisions are made in response to particular
circumstances.

 Splitting: The process of dividing a node into two or more sub-nodes based on a decision criterion. It
involves selecting a feature and a threshold to create subsets of data.

 Parent Node: A node that is split into child nodes. The original node from which a split originates.

 Child Node: Nodes created as a result of a split from a parent node.

 Decision Criterion: The rule or condition used to determine how the data should be split at a decision
node. It involves comparing feature values against a threshold.

 Pruning: The process of removing branches or nodes from a decision tree to improve its generalisation
and prevent overfitting.

Understanding these terminologies is crucial for interpreting and working with decision trees in machine
learning applications.

How Decision Tree is formed?

The process of forming a decision tree involves recursively partitioning the data based on the values of different
attributes. The algorithm selects the best attribute to split the data at each internal node, based on certain criteria
such as information gain or Gini impurity. This splitting process continues until a stopping criterion is met, such
as reaching a maximum depth or having a minimum number of instances in a leaf node.

Why Decision Tree?

Decision trees are widely used in machine learning for a number of reasons:

 Decision trees are so versatile in simulating intricate decision-making processes, because of their
interpretability and versatility.

 Their portrayal of complex choice scenarios that take into account a variety of causes and outcomes is
made possible by their hierarchical structure.

 They provide comprehensible insights into the decision logic, decision trees are especially helpful for
tasks involving categorisation and regression.

 They are proficient with both numerical and categorical data, and they can easily adapt to a variety of
datasets thanks to their autonomous feature selection capability.

 Decision trees also provide simple visualization, which helps to comprehend and elucidate the
underlying decision processes in a model.

Decision Tree Approach

Decision tree uses the tree representation to solve the problem in which each leaf node corresponds to a class
label and attributes are represented on the internal node of the tree. We can represent any boolean function on
discrete attributes using the decision tree.
Below are some assumptions that we made while using the decision tree:

At the beginning, we consider the whole training set as the root.

 Feature values are preferred to be categorical. If the values are continuous then they are discretized
prior to building the model.

 On the basis of attribute values, records are distributed recursively.

 We use statistical methods for ordering attributes as root or the internal node.

As you can see from the above image the Decision Tree works on the Sum of Product form which is also
known as Disjunctive Normal Form. In the above image, we are predicting the use of computer in the daily life
of people. In the Decision Tree, the major challenge is the identification of the attribute for the root node at each
level. This process is known as attribute selection. We have two popular attribute selection measures:

1. Information Gain

2. Gini Index

1. Information Gain:

When we use a node in a decision tree to partition the training instances into smaller subsets the entropy
changes. Information gain is a measure of this change in entropy.

 Suppose S is a set of instances,

 A is an attribute

 Sv is the subset of S

 v represents an individual value that the attribute A can take and Values (A) is the set of all possible
values of A, then
Gain(S,A)=Entropy(S)–∑vA∣Sv∣∣S∣.Entropy(Sv)Gain(S,A)=Entropy(S)–∑vA∣S∣∣Sv∣.Entropy(Sv)

Entropy: is the measure of uncertainty of a random variable, it characterizes the impurity of an arbitrary
collection of examples. The higher the entropy more the information content.

Suppose S is a set of instances, A is an attribute, Sv is the subset of S with A = v, and Values (A) is the set of all
possible values of A, then

Gain(S,A)=Entropy(S)–∑vϵValues(A)∣Sv∣∣S∣.Entropy(Sv) Gain(S,A)=Entropy(S)–∑vϵValues(A)∣S∣∣Sv∣
.Entropy(Sv)
Example:

For the set X = {a,a,a,b,b,b,b,b}


Total instances: 8
Instances of b: 5
Instances of a: 3
Entropy H(X)=[(38)log⁡238+(58)log⁡258]=−[0.375(−1.415)+0.625(−0.678)]=−(−0.53−0.424)=0.954Entropy
H(X)=[(83)log283+(85)log285]=−[0.375(−1.415)+0.625(−0.678)]=−(−0.53−0.424)=0.954

Building Decision Tree using Information Gain The essentials:

 Start with all training instances associated with the root node

 Use info gain to choose which attribute to label each node with

 Note: No root-to-leaf path should contain the same discrete attribute twice

 Recursively construct each subtree on the subset of training instances that would be classified down
that path in the tree.

 If all positive or all negative training instances remain, the label that node “yes” or “no” accordingly

 If no attributes remain, label with a majority vote of training instances left at that node

 If no instances remain, label with a majority vote of the parent’s training instances.

Example: Now, let us draw a Decision Tree for the following data using Information gain. Training set: 3
features and 2 classes

X Y Z C

1 1 1 I

1 1 0 I

0 0 1 II

1 0 0 II

Here, we have 3 features and 2 output classes. To build a decision tree using Information gain. We will take each
of the features and calculate the information for each feature.

Split on feature X
Split on feature Y

Split on feature Z

From the above images, we can see that the information gain is maximum when we make a split on feature Y.
So, for the root node best-suited feature is feature Y. Now we can see that while splitting the dataset by feature
Y, the child contains a pure subset of the target variable. So we don’t need to further split the dataset. The final
tree for the above dataset would look like this:

2. Gini Index

 Gini Index is a metric to measure how often a randomly chosen element would be incorrectly
identified.

 It means an attribute with a lower Gini index should be preferred.

 Sklearn supports “Gini” criteria for Gini Index and by default, it takes “gini” value.

 The Formula for the calculation of the Gini Index is given below.

The Formula for Gini Index is given by :

Gini Impurity
The Gini Index is a measure of the inequality or impurity of a distribution, commonly used in decision trees and
other machine learning algorithms. It ranges from 0 to 0.5, where 0 indicates a pure set (all instances belong to
the same class), and 0.5 indicates a maximally impure set (instances are evenly distributed across classes).

Some additional features and characteristics of the Gini Index are:

 It is calculated by summing the squared probabilities of each outcome in a distribution and subtracting
the result from 1.

 A lower Gini Index indicates a more homogeneous or pure distribution, while a higher Gini Index
indicates a more heterogeneous or impure distribution.

 In decision trees, the Gini Index is used to evaluate the quality of a split by measuring the difference
between the impurity of the parent node and the weighted impurity of the child nodes.

 Compared to other impurity measures like entropy, the Gini Index is faster to compute and more
sensitive to changes in class probabilities.

 One disadvantage of the Gini Index is that it tends to favour splits that create equally sized child nodes,
even if they are not optimal for classification accuracy.

 In practice, the choice between using the Gini Index or other impurity measures depends on the specific
problem and dataset, and often requires experimentation and tuning.

Example of a Decision Tree Algorithm

Forecasting Activities Using Weather Information

 Root node: Whole dataset

 Attribute : “Outlook” (sunny, cloudy, rainy).

 Subsets: Overcast, Rainy, and Sunny.

 Recursive Splitting: Divide the sunny subset even more according to humidity, for example.

 Leaf Nodes: Activities include “swimming,” “hiking,” and “staying inside.”

Beginning with the entire dataset as the root node of the decision tree:

 Determine the best attribute to split the dataset based on information gain, which is calculated by the
formula: Information gain = Entropy(parent) – [Weighted average] * Entropy(children), where entropy
is a measure of impurity or disorder of a set of examples, and the weighted average is based on the
number of examples in each child node.

 Create a new internal node that corresponds to the best attribute and connects it to the root node. For
example, if the best attribute is “outlook” (which can have values “sunny”, “overcast”, or “rainy”), we
create a new node labeled “outlook” and connect it to the root node.

 Partition the dataset into subsets based on the values of the best attribute. For example, we create three
subsets: one for instances where the outlook is “sunny”, one for instances where the outlook is
“overcast”, and one for instances where the outlook is “rainy”.

 Recursively repeat steps 1-4 for each subset until all instances in a given subset belong to the same
class or no further splitting is possible. For example, if the subset of instances where the outlook is
“overcast” contains only instances where the activity is “hiking”, we assign a leaf node labeled
“hiking” to this subset. If the subset of instances where the outlook is “sunny” is further split based on
the humidity attribute, we repeat steps 2-4 for this subset.

 Assign a leaf node to each subset that contains instances that belong to the same class. For example, if
the subset of instances where the outlook is “rainy” contains only instances where the activity is “stay
inside”, we assign a leaf node labeled “stay inside” to this subset.
 Make predictions based on the decision tree by traversing it from the root node to a leaf node that
corresponds to the instance being classified. For example, if the outlook is “sunny” and the humidity is
“high”, we traverse the decision tree by following the “sunny” branch and then the “high humidity”
branch, and we end up at a leaf node labeled “swimming”, which is our predicted activity.

Advantages of Decision Tree

 Easy to understand and interpret, making them accessible to non-experts.

 Handle both numerical and categorical data without requiring extensive preprocessing.

 Provides insights into feature importance for decision-making.

 Handle missing values and outliers without significant impact.

 Applicable to both classification and regression tasks.

Disadvantages of Decision Tree

 Disadvantages include the potential for overfitting

 Sensitivity to small changes in data, limited generalization if training data is not representative

 Potential bias in the presence of imbalanced data.

Conclusion

Decision trees, a key tool in machine learning, model and predict outcomes based on input data through a tree-
like structure. They offer interpretability, versatility, and simple visualization, making them valuable for both
categorization and regression tasks. While decision trees have advantages like ease of understanding, they may
face challenges such as overfitting. Understanding their terminologies and formation process is essential for
effective application in diverse scenarios.

Support Vector Machine (SVM) Algorithm

Support Vector Machine (SVM) is a powerful machine learning algorithm used for linear or nonlinear
classification, regression, and even outlier detection tasks. SVMs can be used for a variety of tasks, such as text
classification, image classification, spam detection, handwriting identification, gene expression analysis, face
detection, and anomaly detection. SVMs are adaptable and efficient in a variety of applications because they can
manage high-dimensional data and nonlinear relationships.

SVM algorithms are very effective as we try to find the maximum separating hyperplane between the different
classes available in the target feature.

Support Vector Machine

Support Vector Machine (SVM) is a supervised machine learning algorithm used for both classification and
regression. Though we say regression problems as well it’s best suited for classification. The main objective of
the SVM algorithm is to find the optimal hyperplane in an N-dimensional space that can separate the data points
in different classes in the feature space. The hyperplane tries that the margin between the closest points of
different classes should be as maximum as possible. The dimension of the hyperplane depends upon the number
of features. If the number of input features is two, then the hyperplane is just a line. If the number of input
features is three, then the hyperplane becomes a 2-D plane. It becomes difficult to imagine when the number of
features exceeds three.

Let’s consider two independent variables x1, x2, and one dependent variable which is either a blue circle or a
red circle.
Linearly Separable Data points

From the figure above it’s very clear that there are multiple lines (our hyperplane here is a line because we are
considering only two input features x1, x2) that segregate our data points or do a classification between red and
blue circles. So how do we choose the best line or in general the best hyperplane that segregates our data points?

How does SVM work?

One reasonable choice as the best hyperplane is the one that represents the largest separation or margin between
the two classes.

Multiple hyperplanes separate the data from two classes

So we choose the hyperplane whose distance from it to the nearest data point on each side is maximized. If such
a hyperplane exists it is known as the maximum-margin hyperplane/hard margin. So from the above figure,
we choose L2. Let’s consider a scenario like shown below

Selecting hyperplane for data with outlier

Here we have one blue ball in the boundary of the red ball. So how does SVM classify the data? It’s simple! The
blue ball in the boundary of red ones is an outlier of blue balls. The SVM algorithm has the characteristics to
ignore the outlier and finds the best hyperplane that maximizes the margin. SVM is robust to outliers.
Hyperplane which is the most optimized one

So in this type of data point what SVM does is, finds the maximum margin as done with previous data sets
along with that it adds a penalty each time a point crosses the margin. So the margins in these types of cases are

minimize (1/margin+∧(∑penalty)). Hinge loss is a commonly used penalty. If no violations no hinge loss. If
called soft margins. When there is a soft margin to the data set, the SVM tries to

violations hinge loss proportional to the distance of violation.

Till now, we were talking about linearly separable data(the group of blue balls and red balls are separable by a
straight line/linear line). What to do if data are not linearly separable?

Original 1D dataset for classification

Say, our data is shown in the figure above. SVM solves this by creating a new variable using a kernel. We call a
point xi on the line and we create a new variable yi as a function of distance from origin o.so if we plot this we
get something like as shown below

Mapping 1D data to 2D to become able to separate the two classes

In this case, the new variable y is created as a function of distance from the origin. A non-linear function that
creates a new variable is referred to as a kernel.

Support Vector Machine Terminology

1. Hyperplane: Hyperplane is the decision boundary that is used to separate the data points of different
classes in a feature space. In the case of linear classifications, it will be a linear equation i.e. wx+b = 0.

2. Support Vectors: Support vectors are the closest data points to the hyperplane, which makes a critical
role in deciding the hyperplane and margin.
3. Margin: Margin is the distance between the support vector and hyperplane. The main objective of the
support vector machine algorithm is to maximize the margin. The wider margin indicates better
classification performance.

4. Kernel: Kernel is the mathematical function, which is used in SVM to map the original input data
points into high-dimensional feature spaces, so, that the hyperplane can be easily found out even if the
data points are not linearly separable in the original input space. Some of the common kernel functions
are linear, polynomial, radial basis function(RBF), and sigmoid.

5. Hard Margin: The maximum-margin hyperplane or the hard margin hyperplane is a hyperplane that
properly separates the data points of different categories without any misclassifications.

6. Soft Margin: When the data is not perfectly separable or contains outliers, SVM permits a soft margin
technique. Each data point has a slack variable introduced by the soft-margin SVM formulation, which
softens the strict margin requirement and permits certain misclassifications or violations. It discovers a
compromise between increasing the margin and reducing violations.

7. C: Margin maximisation and misclassification fines are balanced by the regularisation parameter C in
SVM. The penalty for going over the margin or misclassifying data items is decided by it. A stricter
penalty is imposed with a greater value of C, which results in a smaller margin and perhaps fewer
misclassifications.

8. Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes incorrect classifications or
margin violations. The objective function in SVM is frequently formed by combining it with the
regularisation term.

9. Dual Problem: A dual Problem of the optimisation problem that requires locating the Lagrange
multipliers related to the support vectors can be used to solve SVM. The dual formulation enables the
use of kernel tricks and more effective computing.

Mathematical intuition of Support Vector Machine

Consider a binary classification problem with two classes, labeled as +1 and -1. We have a training dataset
consisting of input feature vectors X and their corresponding class labels Y.

The equation for the linear hyperplane can be written as:

wTx+b=0wTx+b=0

The vector W represents the normal vector to the hyperplane. i.e the direction perpendicular to the hyperplane.
The parameter b in the equation represents the offset or distance of the hyperplane from the origin along the
normal vector w.

The distance between a data point x_i and the decision boundary can be calculated as:

di=wTxi+b∣∣w∣∣di=∣∣w∣∣wTxi+b

where ||w|| represents the Euclidean norm of the weight vector w. Euclidean norm of the normal vector W

For Linear SVM classifier :

y^={1: wTx+b≥00: wTx+b <0y^={10: wTx+b≥0: wTx+b <0

Optimization:

 For Hard margin linear SVM classifier:

minimizew,b12wTw=minimizeW,b12∥w∥2subject toyi(wTxi+b)≥1fori=1,2,3,⋯,mw,bminimize21
wTw=W,bminimize21∥w∥2subject toyi(wTxi+b)≥1fori=1,2,3,⋯,m

The target variable or label for the ith training instance is denoted by the symbol ti in this statement. And ti=-1
for negative occurrences (when yi= 0) and ti=1positive instances (when yi = 1) respectively. Because we require
the decision boundary that satisfy the constraint: ti(wTxi+b)≥1ti(wTxi+b)≥1
 For Soft margin linear SVM classifier:

minimize w,b12wTw+C∑i=1mζisubject to yi(wTxi+b)≥ 1−ζiandζi≥0fori=1,2,3,⋯,mw,bminimize 21


wTw+C∑i=1mζisubject to yi(wTxi+b)≥ 1−ζiandζi≥0fori=1,2,3,⋯,m

 Dual Problem: A dual Problem of the optimisation problem that requires locating the Lagrange
multipliers related to the support vectors can be used to solve SVM. The optimal Lagrange multipliers
α(i) that maximize the following dual objective function

maximizeα:12∑i→m∑j→mαiαjtitjK(xi,xj)−∑i→mαiαmaximize:21i→m∑j→m∑αiαjtitjK(xi,xj)−i→m∑αi

where,

 αi is the Lagrange multiplier associated with the ith training sample.

 K(xi, xj) is the kernel function that computes the similarity between two samples xi and xj. It allows
SVM to handle nonlinear classification problems by implicitly mapping the samples into a higher-
dimensional feature space.

 The term ∑αi represents the sum of all Lagrange multipliers.

The SVM decision boundary can be described in terms of these optimal Lagrange multipliers and the support
vectors once the dual issue has been solved and the optimal Lagrange multipliers have been discovered. The
training samples that have i > 0 are the support vectors, while the decision boundary is supplied by:

w=∑i→mαitiK(xi,x)+bti(wTxi−b)=1⟺b=wTxi−tiw=i→m∑αitiK(xi,x)+bti(wTxi−b)=1⟺b=wTxi−ti

Types of Support Vector Machine

Based on the nature of the decision boundary, Support Vector Machines (SVM) can be divided into two main
parts:

 Linear SVM: Linear SVMs use a linear decision boundary to separate the data points of different
classes. When the data can be precisely linearly separated, linear SVMs are very suitable. This means
that a single straight line (in 2D) or a hyperplane (in higher dimensions) can entirely divide the data
points into their respective classes. A hyperplane that maximizes the margin between the classes is the
decision boundary.

 Non-Linear SVM: Non-Linear SVM can be used to classify data when it cannot be separated into two
classes by a straight line (in the case of 2D). By using kernel functions, nonlinear SVMs can handle
nonlinearly separable data. The original input data is transformed by these kernel functions into a
higher-dimensional feature space, where the data points can be linearly separated. A linear SVM is used
to locate a nonlinear decision boundary in this modified space.

Popular kernel functions in SVM

The SVM kernel is a function that takes low-dimensional input space and transforms it into higher-dimensional
space, ie it converts nonseparable problems to separable problems. It is mostly useful in non-linear separation
problems. Simply put the kernel, does some extremely complex data transformations and then finds out the
process to separate the data based on the labels or outputs defined.

Linear : K(w,b)=wTx+bPolynomial : K(w,x)=(γwTx+b)NGaussian RBF: K(w,x)=exp⁡(−γ∣∣xi−xj∣∣nSigmoid :K(xi,


xj)=tanh⁡(αxiTxj+b)Linear : K(w,b)Polynomial : K(w,x)Gaussian RBF: K(w,x)Sigmoid :K(xi,xj)
=wTx+b=(γwTx+b)N=exp(−γ∣∣xi−xj∣∣n=tanh(αxiTxj+b)

Advantages of SVM

 Effective in high-dimensional cases.

 Its memory is efficient as it uses a subset of training points in the decision function called support
vectors.
 Different kernel functions can be specified for the decision functions and its possible to specify custom
kernels.

Unsupervised Machine Learning

In the previous topic, we learned supervised machine learning in which models are trained using labeled data
under the supervision of training data. But there may be many cases in which we do not have labeled data and
need to find the hidden patterns from the given dataset. So, to solve such types of cases in machine learning, we
need unsupervised learning techniques.

What is Unsupervised Learning?

As the name suggests, unsupervised learning is a machine learning technique in which models are not
supervised using training dataset. Instead, models itself find the hidden patterns and insights from the given
data. It can be compared to learning which takes place in the human brain while learning new things. It can be
defined as:

Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset and
are allowed to act on that data without any supervision.

Unsupervised learning cannot be directly applied to a regression or classification problem because unlike
supervised learning, we have the input data but no corresponding output data. The goal of unsupervised learning
is to find the underlying structure of dataset, group that data according to similarities, and represent that
dataset in a compressed format.

Example: Suppose the unsupervised learning algorithm is given an input dataset containing images of different
types of cats and dogs. The algorithm is never trained upon the given dataset, which means it does not have any
idea about the features of the dataset. The task of the unsupervised learning algorithm is to identify the image
features on their own. Unsupervised learning algorithm will perform this task by clustering the image dataset
into the groups according to similarities between images.

Why use Unsupervised Learning?

Below are some main reasons which describe the importance of Unsupervised Learning:

o Unsupervised learning is helpful for finding useful insights from the data.

o Unsupervised learning is much similar as a human learns to think by their own experiences, which
makes it closer to the real AI.

o Unsupervised learning works on unlabeled and uncategorized data which make unsupervised learning
more important.

o In real-world, we do not always have input data with the corresponding output so to solve such cases,
we need unsupervised learning.

Working of Unsupervised Learning

Working of unsupervised learning can be understood by the below diagram:

Here, we have taken an unlabeled input data, which means it is not categorized and corresponding outputs are
also not given. Now, this unlabeled input data is fed to the machine learning model in order to train it. Firstly, it
will interpret the raw data to find the hidden patterns from the data and then will apply suitable algorithms such
as k-means clustering, Decision tree, etc.

Once it applies the suitable algorithm, the algorithm divides the data objects into groups according to the
similarities and difference between the objects.

Types of Unsupervised Learning Algorithm:

The unsupervised learning algorithm can be further categorized into two types of problems:

o Clustering: Clustering is a method of grouping the objects into clusters such that objects with most
similarities remains into a group and has less or no similarities with the objects of another group.
Cluster analysis finds the commonalities between the data objects and categorizes them as per the
presence and absence of those commonalities.

o Association: An association rule is an unsupervised learning method which is used for finding the
relationships between variables in the large database. It determines the set of items that occurs together
in the dataset. Association rule makes marketing strategy more effective. Such as people who buy X
item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical example of Association
rule is Market Basket Analysis.

Note: We will learn these algorithms in later chapters.

Unsupervised Learning algorithms:

Below is the list of some popular unsupervised learning algorithms:

o K-means clustering

o KNN (k-nearest neighbors)

o Hierarchal clustering

o Anomaly detection

o Neural Networks

o Principle Component Analysis

o Independent Component Analysis

o Apriori algorithm

o Singular value decomposition

Advantages of Unsupervised Learning

o Unsupervised learning is used for more complex tasks as compared to supervised learning because, in
unsupervised learning, we don't have labeled input data.

o Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to labeled data.
Disadvantages of Unsupervised Learning

o Unsupervised learning is intrinsically more difficult than supervised learning as it does not have
corresponding output.

o The result of the unsupervised learning algorithm might be less accurate as input data is not labeled,
and algorithms do not know the exact output in advance.

Market Basket Analysis in Data Mining

A data mining technique that is used to uncover purchase patterns in any retail setting is known as Market
Basket Analysis. Basically, market basket analysis in data mining involves analyzing the combinations of
products that are bought together.

This is a technique that gives the careful study of purchases done by a customer in a supermarket. This concept
identifies the pattern of frequent purchase items by customers. This analysis can help to promote deals, offers,
sale by the companies, and data mining techniques helps to achieve this analysis task. Example:

 Data mining concepts are in use for Sales and marketing to provide better customer service, to improve
cross-selling opportunities, to increase direct mail response rates.

 Customer Retention in the form of pattern identification and prediction of likely defections is possible
by Data mining.

 Risk Assessment and Fraud area also use the data-mining concept for identifying inappropriate or
unusual behavior etc.

Market basket analysis mainly works with the ASSOCIATION RULE {IF} -> {THEN}.

 IF means Antecedent: An antecedent is an item found within the data

 THEN means Consequent: A consequent is an item found in combination with the antecedent.

Let’s see ASSOCIATION RULE {IF} -> {THEN} rules used in Market Basket Analysis in Data Mining. For
example, customers buying a domain means they definitely need extra plugins/extensions to make it easier for
the users.

Like we said above Antecedent is the item sets that are available in data. By formulating from the rules
means {if} component and from the example is the domain.

Same as Consequent is the item that is found with the combination of Antecedents. By formulating from the
rules means {THEN} component and from the example is extra plugins/extensions.

With the help of these, we are able to predict customer behavioral patterns. From this, we are able to make
certain combinations with offers that customers will probably buy those products. That will automatically
increase the sales and revenue of the company.

With the help of the Apriori Algorithm, we can further classify and simplify the item sets which are frequently
bought by the consumer.

There are three components in APRIORI ALGORITHM:

 SUPPORT

 CONFIDENCE

 LIFT
Now take an example, suppose 5000 transactions have been made through a popular eCommerce website. Now
they want to calculate the support, confidence, and lift for the two products, let’s say pen and notebook for
example out of 5000 transactions, 500 transactions for pen, 700 transactions for notebook, and 1000
transactions for both.

SUPPORT: It is been calculated with the number of transactions divided by the total number of transactions
made,

Support=freq(A,B)/NSupport=freq(A,B)/N

support(pen) = transactions related to pen/total transactions

i.e support -> 500/5000=10 percent

CONFIDENCE: It is been calculated for whether the product sales are popular on individual sales or through
combined sales. That is calculated with combined transactions/individual transactions.

Confidence=freq(A,B)/freq(A)Confidence=freq(A,B)/freq(A)

Confidence = combine transactions/individual transactions

i.e confidence-> 1000/500=20 percent

LIFT: Lift is calculated for knowing the ratio for the sales.

Lift=confidencepercent/supportpercentLift=confidencepercent/supportpercent

Lift-> 20/10=2

When the Lift value is below 1 means the combination is not so frequently bought by consumers. But in this
case, it shows that the probability of buying both the things together is high when compared to the transaction
for the individual items sold.

With this, we come to an overall view of the Market Basket Analysis in Data Mining and how to calculate the
sales for combination products.

Types of Market Basket Analysis

There are three types of Market Basket Analysis. They are as follow:

1. Descriptive market basket analysis: This sort of analysis looks for patterns and connections in the
data that exist between the components of a market basket. This kind of study is mostly used to
understand consumer behavior, including what products are purchased in combination and what the
most typical item combinations. Retailers can place products in their stores more profitably by
understanding which products are frequently bought together with the aid of descriptive market basket
analysis.

2. Predictive Market Basket Analysis: Market basket analysis that predicts future purchases based on
past purchasing patterns is known as predictive market basket analysis. Large volumes of data are
analyzed using machine learning algorithms in this sort of analysis in order to create predictions about
which products are most likely to be bought together in the future. Retailers may make data-driven
decisions about which products to carry, how to price them, and how to optimize shop layouts with the
use of predictive market basket research.

3. Differential Market Basket Analysis: Differential market basket analysis analyses two sets of market
basket data to identify variations between them. Comparing the behavior of various client segments or
the behavior of customers over time is a common usage for this kind of study. Retailers can respond to
shifting consumer behavior by modifying their marketing and sales tactics with the help of differential
market basket analysis.

Benefits of Market Basket Analysis


1. Enhanced Customer Understanding: Market basket research offers insights into customer behavior,
including what products they buy together and which products they buy the most frequently. Retailers
can use this information to better understand their customers and make informed decisions.

2. Improved Inventory Management: By examining market basket data, retailers can determine which
products are sluggish sellers and which ones are commonly bought together. Retailers can use this
information to make well-informed choices about what products to stock and how to manage their
inventory most effectively.

3. Better Pricing Strategies: A better understanding of the connection between product prices and
consumer behavior might help merchants develop better pricing strategies. Using this knowledge,
pricing plans that boost sales and profitability can be created.

4. Sales Growth: Market basket analysis can assist businesses in determining which products are most
frequently bought together and where they should be positioned in the store to grow sales. Retailers
may boost revenue and enhance customer shopping experiences by improving store layouts and
product positioning.

Applications of Market Basket Analysis

1. Retail: Market basket research is frequently used in the retail sector to examine consumer buying
patterns and inform decisions about product placement, inventory management, and pricing tactics.
Retailers can utilize market basket research to identify which items are sluggish sellers and which ones
are commonly bought together, and then modify their inventory management strategy accordingly.

2. E-commerce: Market basket analysis can help online merchants better understand the customer buying
habits and make data-driven decisions about product recommendations and targeted advertising
campaigns. The behaviour of visitors to a website can be examined using market basket analysis to
pinpoint problem areas.

3. Finance: Market basket analysis can be used to evaluate investor behaviour and forecast the types of
investment items that investors will likely buy in the future. The performance of investment portfolios
can be enhanced by using this information to create tailored investment strategies.

4. Telecommunications: To evaluate consumer behaviour and make data-driven decisions about which
goods and services to provide, the telecommunications business might employ market basket analysis.
The usage of this data can enhance client happiness and the shopping experience.

5. Manufacturing: To evaluate consumer behaviour and make data-driven decisions about which
products to produce and which materials to employ in the production process, the manufacturing sector
might use market basket analysis. Utilizing this knowledge will increase effectiveness and cut costs.

What is a neural network?

A neural network, or artificial neural network, is a type of computing architecture that is based on a model of
how a human brain functions — hence the name "neural." Neural networks are made up of a collection of
processing units called "nodes." These nodes pass data to each other, just like how in a brain, neurons pass
electrical impulses to each other.

Neural networks are used in machine learning, which refers to a category of computer programs that learn
without definite instructions. Specifically, neural networks are used in deep learning — an advanced type of
machine learning that can draw conclusions from unlabeled data without human intervention. For instance, a
deep learning model built on a neural network and fed sufficient training data could be able to identify items in a
photo it has never seen before.
Neural networks make many types of artificial intelligence (AI) possible. Large language models (LLMs) such
as ChatGPT, AI image generators like DALL-E, and predictive AI models all rely to some extent on neural
networks.

How do neural networks work?

Neural networks are composed of a collection of nodes. The nodes are spread out across at least three layers.
The three layers are:

 An input layer

 A "hidden" layer

 An output layer

These three layers are the minimum. Neural networks can have more than one hidden layer, in addition to the
input layer and output layer.

No matter which layer it is part of, each node performs some sort of processing task or function on whatever
input it receives from the previous node (or from the input layer). Essentially, each node contains a
mathematical formula, with each variable within the formula weighted differently. If the output of applying that
mathematical formula to the input exceeds a certain threshold, the node passes data to the next layer in the
neural network. If the output is below the threshold, no data is passed to the next layer.

Imagine that the Acme Corporation has an accounting department with a strict hierarchy. Acme accounting
department employees at the manager level approve expenses below $1,000, directors approve expenses below
$10,000, and the CFO approves any expenses that exceed $10,000. When employees from other departments of
Acme Corp. submit their expenses, they first go to the accounting managers. Any expense over $1,000 gets
passed to a director, while expenses below $1,000 stay at the managerial level — and so on.

The accounting department of the Acme Corp. functions somewhat like a neural network. When employees
submit their expense reports, this is like a neural network's input layer. Each manager and director is like a node
within the neural network.

And, just as one accounting manager may ask another manager for assistance in interpreting an expense report
before passing it along to an accounting director, neural networks can be architected in a variety of ways. Nodes
can communicate in multiple directions.

What are the types of neural networks?

There is no limit on how many nodes and layers a neural network can have, and these nodes can interact in
almost any way. Because of this, the list of types of neural networks is ever-expanding. But, they can roughly be
sorted into these categories:

 Shallow neural networks usually have only one hidden layer

 Deep neural networks have multiple hidden layers

Shallow neural networks are fast and require less processing power than deep neural networks, but they cannot
perform as many complex tasks as deep neural networks.

Below is an incomplete list of the types of neural networks that may be used today:
Perceptron neural networks are simple, shallow networks with an input layer and an output layer.

Multilayer perceptron neural networks add complexity to perceptron networks, and include a hidden layer.

Feed-forward neural networks only allow their nodes to pass information to a forward node.

Recurrent neural networks can go backwards, allowing the output from some nodes to impact the input of
preceding nodes.

Modular neural networks combine two or more neural networks in order to arrive at the output.
Radial basis function neural network nodes use a specific kind of mathematical function called a radial basis
function.

Liquid state machine neural networks feature nodes that are randomly connected to each other.

Residual neural networks allow data to skip ahead via a process called identity mapping, combining the output
from early layers with the output of later layers.

What is a transformer neural network?

Transformer neural networks are worth highlighting because they have assumed a place of outsized importance
in the AI models in widespread use today.

First proposed in 2017, transformer models are neural networks that use a technique called "self-attention" to
take into account the context of elements in a sequence, not just the elements themselves. Via self-attention, they
can detect even subtle ways that parts of a data set relate to each other.
This ability makes them ideal for analyzing (for example) sentences and paragraphs of text, as opposed to just
individual words and phrases. Before transformer models were developed, AI models that processed text would
often "forget" the beginning of a sentence by the time they got to the end of it, with the result that they would
combine phrases and ideas in ways that did not make sense to human readers. Transformer models, however,
can process and generate human language in a much more natural way.

Transformer models are an integral component of generative AI, in particular LLMs that can produce text in
response to arbitrary human prompts.

History of neural networks

Neural networks are actually quite old. The concept of neural networks can be dated to a 1943 mathematical
paper that modeled how the brain could work. Computer scientists began attempting to construct simple neural
networks in the 1950s and 1960s, but eventually the concept fell out of favor. In the 1980s the concept was
revived, and by the 1990s neural networks were in widespread use in AI research.

However, only with the advent of hyper-fast processing, massive data storage capabilities, and access to
computing resources were neural networks able to advance to the point they have reached today, where they can
imitate or even exceed human cognitive abilities. Developments are still being made in this field; one of the
most important types of neural networks in use today, the transformer, dates to 2017.

How does Cloudflare support neural networks?

With locations in more than 330 cities around the world, Cloudflare is in a unique position to offer
computational power to AI developers anywhere with minimal latency. Cloudflare for AI lets developers run AI
tasks on a global network of graphics processing units (GPUs) with no extra setup. Cloudflare also offers cost-
effective cloud storage options for the vast amounts of data required to train neural networks.

You might also like