0% found this document useful (0 votes)

83 views125 pages

Unit 5

Uploaded by

vanieswari762002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views125 pages

Unit 5

Uploaded by

vanieswari762002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 125

UNIT 5

https://www.ibm.com/topics/artificial-intelligence

What is AI?
Artificial intelligence, or AI, is technology that enables computers and machines to simulate
human intelligence and problem-solving capabilities.

On its own or combined with other technologies (e.g., sensors, geolocation, robotics) AI can
perform tasks that would otherwise require human intelligence or intervention. Digital assistants,
GPS guidance, autonomous vehicles, and generative AI tools (like Open AI's Chat GPT) are just
a few examples of AI in the daily news and our daily lives.

As a field of computer science, artificial intelligence encompasses (and is often mentioned together
with) machine learning and deep learning. These disciplines involve the development of AI
algorithms, modeled after the decision-making processes of the human brain, that can ‘learn’ from
available data and make increasingly more accurate classifications or predictions over time.

Artificial intelligence has gone through many cycles of hype, but even to skeptics, the release of
ChatGPT seems to mark a turning point. The last time generative AI loomed this large, the
breakthroughs were in computer vision, but now the leap forward is in natural language processing
(NLP). Today, generative AI can learn and synthesize not just human language but other data types
including images, video, software code, and even molecular structures.

https://cloud.google.com/learn/what-is-artificial-intelligence

What is Artificial Intelligence (AI)?

Artificial intelligence (AI) is a set of technologies that enable computers to perform a variety of
advanced functions, including the ability to see, understand and translate spoken and written
language, analyze data, make recommendations, and more.

AI is the backbone of innovation in modern computing, unlocking value for individuals and
businesses. For example, optical character recognition (OCR) uses AI to extract text and data from
images and documents, turns unstructured content into business-ready structured data, and unlocks
valuable insights.

https://www.techtarget.com/searchenterpriseai/definition/AI-Artificial-Intelligence

What is AI?

Artificial intelligence is the simulation of human intelligence processes by machines, especially

computer systems. Examples of AI applications include expert systems, natural language
processing (NLP), speech recognition and machine vision.

As the hype around AI has accelerated, vendors have scrambled to promote how their products
and services incorporate it. Often, what they refer to as "AI" is a well-established technology such
as machine learning.

AI requires specialized hardware and software for writing and training machine learning
algorithms. No single programming language is used exclusively in AI, but Python, R, Java, C++
and Julia are all popular languages among AI developers.

https://en.wikipedia.org/wiki/Artificial_intelligence

Artificial intelligence (AI), in its broadest sense, is intelligence exhibited by machines,

particularly computer systems. It is a field of research in computer science that develops and
studies methods and software that enable machines to perceive their environment and
use learning and intelligence to take actions that maximize their chances of achieving defined
goals.[1] Such machines may be called AIs.

Some high-profile applications of AI include advanced web search engines (e.g., Google
Search); recommendation systems (used by YouTube, Amazon, and Netflix); interacting via
human speech (e.g., Google Assistant, Siri, and Alexa); autonomous
vehicles (e.g., Waymo); generative and creative tools (e.g., ChatGPT, Apple Intelligence, and AI
art); and superhuman play and analysis in strategy games (e.g., chess and Go). However, many AI
applications are not perceived as AI: "A lot of cutting edge AI has filtered into general applications,
often without being called AI because once something becomes useful enough and common
enough it's not labeled AI anymore."[2][3]

Alan Turing was the first person to conduct substantial research in the field that he called "machine
intelligence".[4] Artificial intelligence was founded as an academic discipline in 1956,[5] by those
now considered the founding fathers of AI: John McCarthy, Marvin Minksy, Nathaniel Rochester,
and Claude Shannon.[6][7] The field went through multiple cycles of optimism,[8][9] followed by
periods of disappointment and loss of funding, known as AI winter.[10][11] Funding and interest
vastly increased after 2012 when deep learning surpassed all previous AI techniques,[12] and after
2017 with the transformer architecture.[13] This led to the AI boom of the early 2020s, with
companies, universities, and laboratories overwhelmingly based in the United States pioneering
significant advances in artificial intelligence.[14]

https://builtin.com/artificial-intelligence

What Is Artificial Intelligence?

Artificial intelligence refers to computer systems that are capable of performing tasks traditionally
associated with human intelligence — such as making predictions, identifying objects, interpreting
speech and generating natural language. AI systems learn how to do so by processing massive
amounts of data and looking for patterns to model in their own decision-making. In many cases,
humans will supervise an AI’s learning process, reinforcing good decisions and discouraging bad
ones, but some AI systems are designed to learn without supervision.

Over time, AI systems improve on their performance of specific tasks, allowing them to adapt to
new inputs and make decisions without being explicitly programmed to do so. In essence, artificial
intelligence is about teaching machines to think and learn like humans, with the goal of automating
work and solving problems more efficiently.

https://www.ibm.com/topics/machine-learning

What is ML?
Machine learning (ML) is a branch of artificial intelligence (AI) and computer science that focuses
on the using data and algorithms to enable AI to imitate the way that humans learn, gradually
improving its accuracy.
How does machine learning work?

UC Berkeley (link resides outside ibm.com) breaks out the learning system of a machine learning
algorithm into three main parts.

1. A Decision Process: In general, machine learning algorithms are used to make a prediction
or classification. Based on some input data, which can be labeled or unlabeled, your
algorithm will produce an estimate about a pattern in the data.
2. An Error Function: An error function evaluates the prediction of the model. If there are
known examples, an error function can make a comparison to assess the accuracy of the
model.
3. A Model Optimization Process: If the model can fit better to the data points in the training
set, then weights are adjusted to reduce the discrepancy between the known example and
the model estimate. The algorithm will repeat this iterative “evaluate and optimize”
process, updating weights autonomously until a threshold of accuracy has been met.

Common machine learning algorithms

A number of machine learning algorithms are commonly used. These include:

 Neural networks: Neural networks simulate the way the human brain works, with a huge
number of linked processing nodes. Neural networks are good at recognizing patterns and
play an important role in applications including natural language translation, image
recognition, speech recognition, and image creation.
 Linear regression: This algorithm is used to predict numerical values, based on a linear
relationship between different values. For example, the technique could be used to predict
house prices based on historical data for the area.
 Logistic regression: This supervised learning algorithm makes predictions for categorical
response variables, such as “yes/no” answers to questions. It can be used for applications
such as classifying spam and quality control on a production line.
 Clustering: Using unsupervised learning, clustering algorithms can identify patterns in
data so that it can be grouped. Computers can help data scientists by identifying differences
between data items that humans have overlooked.
 Decision trees: Decision trees can be used for both predicting numerical values
(regression) and classifying data into categories. Decision trees use a branching sequence
of linked decisions that can be represented with a tree diagram. One of the advantages of
decision trees is that they are easy to validate and audit, unlike the black box of the neural
network.
 Random forests: In a random forest, the machine learning algorithm predicts a value or
category by combining the results from a number of decision trees.

Machine learning methods

Machine learning models fall into three primary categories.

Supervised machine learning

Supervised learning, also known as supervised machine learning, is defined by its use of labeled
datasets to train algorithms to classify data or predict outcomes accurately. As input data is fed
into the model, the model adjusts its weights until it has been fitted appropriately. This occurs as
part of the cross validation process to ensure that the model avoids overfitting or underfitting.
Supervised learning helps organizations solve a variety of real-world problems at scale, such as
classifying spam in a separate folder from your inbox. Some methods used in supervised learning
include neural networks, naïve bayes, linear regression, logistic regression, random forest, and
support vector machine (SVM).

Unsupervised machine learning

Unsupervised learning, also known as unsupervised machine learning, uses machine learning
algorithms to analyze and cluster unlabeled datasets (subsets called clusters). These algorithms
discover hidden patterns or data groupings without the need for human intervention. This method’s
ability to discover similarities and differences in information make it ideal for exploratory data
analysis, cross-selling strategies, customer segmentation, and image and pattern recognition. It’s
also used to reduce the number of features in a model through the process of dimensionality
reduction. Principal component analysis (PCA) and singular value decomposition (SVD) are two
common approaches for this. Other algorithms used in unsupervised learning include neural
networks, k-means clustering, and probabilistic clustering methods.

Semi-supervised learning

Semi-supervised learning offers a happy medium between supervised and unsupervised learning.
During training, it uses a smaller labeled data set to guide classification and feature extraction from
a larger, unlabeled data set. Semi-supervised learning can solve the problem of not having enough
labeled data for a supervised learning algorithm. It also helps if it’s too costly to label enough data.

Real-world machine learning use cases

Here are just a few examples of machine learning you might encounter every day:

Speech recognition: It is also known as automatic speech recognition (ASR), computer speech
recognition, or speech-to-text, and it is a capability which uses natural language processing (NLP)
to translate human speech into a written format. Many mobile devices incorporate speech
recognition into their systems to conduct voice search—e.g. Siri—or improve accessibility for
texting.

Customer service: Online chatbots are replacing human agents along the customer journey,
changing the way we think about customer engagement across websites and social media
platforms. Chatbots answer frequently asked questions (FAQs) about topics such as shipping, or
provide personalized advice, cross-selling products or suggesting sizes for users. Examples
include virtual agents on e-commerce sites; messaging bots, using Slack and Facebook Messenger;
and tasks usually done by virtual assistants and voice assistants.
Computer vision: This AI technology enables computers to derive meaningful information from
digital images, videos, and other visual inputs, and then take the appropriate action. Powered by
convolutional neural networks, computer vision has applications in photo tagging on social media,
radiology imaging in healthcare, and self-driving cars in the automotive industry.

Recommendation engines: Using past consumption behavior data, AI algorithms can help to
discover data trends that can be used to develop more effective cross-selling strategies.
Recommendation engines are used by online retailers to make relevant product recommendations
to customers during the checkout process.

Robotic process automation (RPA): Also known as software robotics, RPA uses intelligent
automation technologies to perform repetitive manual tasks.

Automated stock trading: Designed to optimize stock portfolios, AI-driven high-frequency

trading platforms make thousands or even millions of trades per day without human intervention.

Fraud detection: Banks and other financial institutions can use machine learning to spot
suspicious transactions. Supervised learning can train a model using information about known
fraudulent transactions. Anomaly detection can identify transactions that look atypical and deserve
further investigation.

https://cloud.google.com/learn/what-is-machine-learning

What is Machine Learning (ML)?

Today’s enterprises are inundated with data. To drive better business decisions, they have to make
sense of it. But the sheer volume coupled with complexity makes data difficult to analyze using
traditional tools. Building, testing, iterating, and deploying analytical models for identifying
patterns and insights in data eats up employees’ time in a way that scales poorly. Machine learning
can enable an organization to derive insights quickly as data scales.
Machine learning defined

Machine learning is a subset of artificial intelligence that enables a system to autonomously learn
and improve using neural networks and deep learning, without being explicitly programmed, by
feeding it large amounts of data.

Machine learning allows computer systems to continuously adjust and enhance themselves as they
accrue more “experiences.” Thus, the performance of these systems can be improved by providing
larger and more varied datasets to be processed.

Scope of use cases

Machine learning is being used in nearly every industry and business activity. Machine learning
helps the logistics industry optimize shipping and delivery routes, the retail industry personalize
shopping experiences and manage inventory, manufacturers automate factories, and helps secure
organizations everywhere. When a person uses their voice to query their smartphone or speaker,
machine learning is used to understand the request, and to help find the result. The scope of use
cases for machine learning is vast and constantly expanding.
Importance of machine learning
The rate of data generation is accelerating every day. The world is creating more data every day
than it ever has in its history. It would be nearly impossible to analyze and utilize all that data
without machine learning. As such, machine learning is opening an entirely new realm of what
humans can do with computers and other machines. Machine learning helps businesses with
important functions like fraud detection, identifying security threats, personalization and
recommendations, automated customer service through chatbots, transcription and translation,
data analysis, and more. Machine learning is also driving the exciting innovation of tomorrow,
such as autonomous vehicles, drones, and airplanes, augmented and virtual reality, and robotics.
What is the difference between machine learning, artificial intelligence, and deep learning?
While artificial intelligence (AI) and machine learning (ML) are often used synonymously, they
are not interchangeable terms.

Artificial intelligence is an area of computer science concerned with building computers and
machines that can reason, learn, and act in a way resembling human intelligence, or systems that
involve data whose scale exceeds what humans can analyze. The field includes many different
disciplines including data analytics, statistics, hardware and software engineering, neuroscience,
and even philosophy.

Whereas artificial intelligence is a broad category of computer science, machine learning is an

application of AI that involves training machines to execute a task without being specifically
programmed for it. Machine learning is more explicitly used as a means to extract knowledge from
data through techniques such as neural networks, supervised and unsupervised learning, decision
trees, and linear regression.

Just as machine learning is a subset of artificial intelligence, deep learning is a subset of machine
learning. Deep learning works by training neural networks on sets of data. A neural network is a
model that uses a system of artificial neurons that are computational nodes used to classify and
analyze data. Data is fed into the first layer of a neural network, with each node making a decision,
and then passing that information onto multiple nodes in the next layer. Training models with more
than three layers are referred to as “deep neural networks” or “deep learning.” Some modern neural
networks have hundreds or thousands of layers.

How does machine learning work?

Machine learning works by training algorithms on sets of data to achieve an expected outcome
such as identifying a pattern or recognizing an object. Machine learning is the process of
optimizing the model so that it can predict the correct response based on the training data samples.

Assuming the training data is of high quality, the more training samples the machine learning
algorithm receives, the more accurate the model will become. The algorithm fits the model to the
data during training, in what is called the “fitting process.” If the outcome does not fit the expected
outcome, the algorithm is re-trained again and again until it outputs the accurate response. In
essence, the algorithm learns from the data and reaches outcomes based on whether the input and
response fit with a line, cluster, or other statistical correlation.

Types of machine learning

What is training data in machine learning? It depends on the type of machine learning model being
used.

In broad strokes, there are three kinds of models used in machine learning.

Supervised learning is a machine learning model that uses labeled training data (structured data)
to map a specific feature to a label. In supervised learning, the output is known (such as recognizing
a picture of an apple) and the model is trained on data of the known output. In simple terms, to
train the algorithm to recognize pictures of apples, feed it pictures labeled as apples.

The most common supervised learning algorithms used today include:

 Linear regression
 Polynomial regression
 K-nearest neighbors
 Naive Bayes
 Decision trees

Unsupervised learning is a machine learning model that uses unlabeled data (unstructured data)
to learn patterns. Unlike supervised learning, the “correctness” of the output is not known ahead
of time. Rather, the algorithm learns from the data without human input (and is thus, unsupervised)
and categorizes it into groups based on attributes. For instance, if the algorithm is given pictures
of apples and bananas, it will work by itself to categorize which picture is an apple and which is a
banana. Unsupervised learning is good at descriptive modeling and pattern matching.

The most common unsupervised learning algorithms used today include:

 Fuzzy means
 K-means clustering
 Hierarchical clustering
 Partial least squares

There’s also a mixed approach to machine learning called semi-supervised learning in which only
some data is labeled. In semi-supervised learning, the algorithm must figure out how to organize
and structure the data to achieve a known result. For instance, the machine learning model is told
that the result is a pear, but only some training data is labeled as a pear.

Reinforcement learning is a machine learning model that can be described as “learn by doing”
through a series of trial and error experiments. An “agent” learns to perform a defined task through
a feedback loop until its performance is within a desirable range. The agent receives positive
reinforcement when it performs the task well and negative reinforcement when it performs poorly.
An example of reinforcement learning is when Google researchers taught a reinforcement learning
algorithm to play the game Go. The model, which had no prior knowledge of the rules of Go,
simply moved pieces at random and “learned” the best moves to make. The algorithm was trained
via positive and negative reinforcement to the point that the machine learning model could beat a
human player at the game.

Advantages of machine learning

Pattern recognition

The more data consumed by a machine learning algorithm, the better it gets in finding trends and
patterns in that data. For instance, an ecommerce website might use machine learning to
understand how people shop on their site and use that information to give people better
recommendations or find trend data that can lead to new product opportunities.

Automation

Machine learning and artificial intelligence can take away much of the dull and dreary work from
human workers. Utilities like robotic process automation can perform some of the tedious business
tasks that keep people from performing more meaningful work. Computer vision and objection
detection algorithms can help robots pick and pack items from an assembly line. Always-on fraud
detection and threat-assessment machine learning can find security flaws before they become a
problem.

Continuous improvement

Given the right kinds of data, machine learning algorithms will continue to improve to be faster
and more accurate. A good example is the GPT-3 dataset that continues to improve how it
generates text.

Disadvantages of machine learning

Bias potential

Machine learning is often only as good as the data it is being fed. If a machine learning algorithm
is fed a biased dataset, it will deliver biased results.

Data acquisition

Machine learning can require a lot of data before it can be useful. As many machine learning use
cases are based on supervised learning, acquiring and cleaning structured data to train the
algorithms is an important first step, which can be difficult if data resides in a variety of siloed
locations within an organization.

Technical expertise required

While machine learning, artificial intelligence, and cloud vendors try to make it as easy as possible
to set up and run machine learning algorithms, organizations often need programmers and data
scientists to understand and utilize the training algorithms and their results.

Resource intensive

Machine learning can be time consuming, requiring a lot of computing resources and employee
hours to begin processing data and achieving results.
https://www.geeksforgeeks.org/ml-machine-learning/
What is Machine Learning?
Machine learning is a branch of artificial intelligence that enables algorithms to uncover hidden
patterns within datasets, allowing them to make predictions on new, similar data without
explicit programming for each task. Traditional machine learning combines data with statistical
tools to predict outputs, yielding actionable insights. This technology finds applications in
diverse fields such as image and speech recognition, natural language processing,
recommendation systems, fraud detection, portfolio optimization, and automating tasks.
For instance, recommender systems use historical data to personalize suggestions. Netflix, for
example, employs collaborative and content-based filtering to recommend movies and TV
shows based on user viewing history, ratings, and genre preferences. Reinforcement learning
further enhances these systems by enabling agents to make decisions based on environmental
feedback, continually refining recommendations.
Machine learning’s impact extends to autonomous vehicles, drones, and robots, enhancing their
adaptability in dynamic environments. This approach marks a breakthrough where machines
learn from data examples to generate accurate outcomes, closely intertwined with data mining
and data science.
Difference between Machine Learning and Traditional Programming
The Difference between Machine Learning and Traditional Programming is as follows:

Traditional
Machine Learning Programming Artificial Intelligence

Machine Learning is a subset Artificial Intelligence

In traditional
of artificial intelligence(AI) involves making the machine
programming, rule-based
that focus on learning from as much capable, So that it can
code is written by the
data to develop an algorithm perform the tasks that
developers depending on
that can be used to make a typically require human
the problem statements.
prediction. intelligence.

Traditional programming AI can involve many different

Machine Learning uses a
is typically rule-based and techniques, including
data-driven approach, It is
deterministic. It hasn’t Machine Learning and Deep
typically trained on historical
self-learning features like Learning, as well as
data and then used to make
Machine Learning and traditional rule-based
predictions on new data.
AI. programming.

Sometimes AI uses a
Traditional programming combination of both Data and
ML can find patterns and
is totally dependent on the Pre-defined rules, which gives
insights in large datasets that
intelligence of it a great edge in solving
might be difficult for humans
developers. So, it has very complex tasks with good
to discover.
limited capability. accuracy which seem
impossible to humans.
Traditional
Machine Learning Programming Artificial Intelligence

Machine Learning is the

Traditional programming AI is a broad field that
subset of AI. And Now it is
is often used to build includes many different
used in various AI-based
applications and software applications, including natural
tasks like Chatbot Question
systems that have specific language processing,
answering, self-driven car.,
functionality. computer vision, and robotics.
etc.

How machine learning algorithms work

Machine Learning works in the following manner.
A machine learning algorithm works by learning patterns and relationships from data to make
predictions or decisions without being explicitly programmed for each task. Here’s a simplified
overview of how a typical machine learning algorithm works:
1. Data Collection:
First, relevant data is collected or curated. This data could include examples, features, or
attributes that are important for the task at hand, such as images, text, numerical data, etc.
2. Data Preprocessing:
Before feeding the data into the algorithm, it often needs to be preprocessed. This step may
involve cleaning the data (handling missing values, outliers), transforming the data
(normalization, scaling), and splitting it into training and test sets.
3. Choosing a Model:
Depending on the task (e.g., classification, regression, clustering), a suitable machine learning
model is chosen. Examples include decision trees, neural networks, support vector machines,
and more advanced models like deep learning architectures.
4. Training the Model:
The selected model is trained using the training data. During training, the algorithm learns
patterns and relationships in the data. This involves adjusting model parameters iteratively to
minimize the difference between predicted outputs and actual outputs (labels or targets) in the
training data.
5. Evaluating the Model:
Once trained, the model is evaluated using the test data to assess its performance. Metrics such
as accuracy, precision, recall, or mean squared error are used to evaluate how well the model
generalizes to new, unseen data.
6. Fine-tuning:
Models may be fine-tuned by adjusting hyperparameters (parameters that are not directly
learned during training, like learning rate or number of hidden layers in a neural network) to
improve performance.
7. Prediction or Inference:
Finally, the trained model is used to make predictions or decisions on new data. This process
involves applying the learned patterns to new inputs to generate outputs, such as class labels in
classification tasks or numerical values in regression tasks.

Machine Learning lifecycle:

The lifecycle of a machine learning project involves a series of steps that include:
1. Study the Problems:
The first step is to study the problem. This step involves understanding the business problem
and defining the objectives of the model.
2. Data Collection:
When the problem is well-defined, we can collect the relevant data required for the model. The
data could come from various sources such as databases, APIs, or web scraping.
3. Data Preparation:
When our problem-related data is collected. then it is a good idea to check the data properly
and make it in the desired format so that it can be used by the model to find the hidden patterns.
This can be done in the following steps:
 Data cleaning
 Data Transformation
 Explanatory Data Analysis and Feature Engineering
 Split the dataset for training and testing.
4. Model Selection:
The next step is to select the appropriate machine learning algorithm that is suitable for our
problem. This step requires knowledge of the strengths and weaknesses of different algorithms.
Sometimes we use multiple models and compare their results and select the best model as per
our requirements.
5. Model building and Training:
 After selecting the algorithm, we have to build the model.
 In the case of traditional machine learning building mode is easy it is just a few
hyperparameter tunings.
 In the case of deep learning, we have to define layer-wise architecture along with input and
output size, number of nodes in each layer, loss function, gradient descent optimizer, etc.
 After that model is trained using the preprocessed dataset.
6. Model Evaluation:
Once the model is trained, it can be evaluated on the test dataset to determine its accuracy and
performance using different techniques. like classification report, F1 score, precision, recall,
ROC Curve, Mean Square error, absolute error, etc.
7. Model Tuning:
Based on the evaluation results, the model may need to be tuned or optimized to improve its
performance. This involves tweaking the hyperparameters of the model.
8. Deployment:
Once the model is trained and tuned, it can be deployed in a production environment to make
predictions on new data. This step requires integrating the model into an existing software
system or creating a new system for the model.
9. Monitoring and Maintenance:
Finally, it is essential to monitor the model’s performance in the production environment and
perform maintenance tasks as required. This involves monitoring for data drift, retraining the
model as needed, and updating the model as new data becomes available.

Types of Machine Learning

 Supervised Machine Learning
 Unsupervised Machine Learning
 Reinforcement Machine Learning
1. Supervised Machine Learning:
Supervised learning is a type of machine learning in which the algorithm is trained on the
labeled dataset. It learns to map input features to targets based on labeled training data. In
supervised learning, the algorithm is provided with input features and corresponding output
labels, and it learns to generalize from this data to make predictions on new, unseen data.
There are two main types of supervised learning:
 Regression: Regression is a type of supervised learning where the algorithm learns to
predict continuous values based on input features. The output labels in regression are
continuous values, such as stock prices, and housing prices. The different regression
algorithms in machine learning are: Linear Regression, Polynomial Regression, Ridge
Regression, Decision Tree Regression, Random Forest Regression, Support Vector
Regression, etc
 Classification: Classification is a type of supervised learning where the algorithm learns to
assign input data to a specific category or class based on input features. The output labels
in classification are discrete values. Classification algorithms can be binary, where the
output is one of two possible classes, or multiclass, where the output can be one of several
classes. The different Classification algorithms in machine learning are: Logistic
Regression, Naive Bayes, Decision Tree, Support Vector Machine (SVM), K-Nearest
Neighbors (KNN), etc
2. Unsupervised Machine Learning:
Unsupervised learning is a type of machine learning where the algorithm learns to recognize
patterns in data without being explicitly trained using labeled examples. The goal of
unsupervised learning is to discover the underlying structure or distribution in the data.
There are two main types of unsupervised learning:
 Clustering: Clustering algorithms group similar data points together based on their
characteristics. The goal is to identify groups, or clusters, of data points that are similar to
each other, while being distinct from other groups. Some popular clustering algorithms
include K-means, Hierarchical clustering, and DBSCAN.
 Dimensionality reduction: Dimensionality reduction algorithms reduce the number of
input variables in a dataset while preserving as much of the original information as possible.
This is useful for reducing the complexity of a dataset and making it easier to visualize and
analyze. Some popular dimensionality reduction algorithms include Principal Component
Analysis (PCA), t-SNE, and Autoencoders.
3. Reinforcement Machine Learning
Reinforcement learning is a type of machine learning where an agent learns to interact with an
environment by performing actions and receiving rewards or penalties based on its actions. The
goal of reinforcement learning is to learn a policy, which is a mapping from states to actions,
that maximizes the expected cumulative reward over time.
There are two main types of reinforcement learning:
 Model-based reinforcement learning: In model-based reinforcement learning, the agent
learns a model of the environment, including the transition probabilities between states and
the rewards associated with each state-action pair. The agent then uses this model to plan
its actions in order to maximize its expected reward. Some popular model-based
reinforcement learning algorithms include Value Iteration and Policy Iteration.
 Model-free reinforcement learning: In model-free reinforcement learning, the agent
learns a policy directly from experience without explicitly building a model of the
environment. The agent interacts with the environment and updates its policy based on the
rewards it receives. Some popular model-free reinforcement learning algorithms include Q-
Learning, SARSA, and Deep Reinforcement Learning.
Need for machine learning:
Machine learning is important because it allows computers to learn from data and improve their
performance on specific tasks without being explicitly programmed. This ability to learn from
data and adapt to new situations makes machine learning particularly useful for tasks that
involve large amounts of data, complex decision-making, and dynamic environments.
Here are some specific areas where machine learning is being used:
 Predictive modeling: Machine learning can be used to build predictive models that can
help businesses make better decisions. For example, machine learning can be used to predict
which customers are most likely to buy a particular product, or which patients are most
likely to develop a certain disease.
 Natural language processing: Machine learning is used to build systems that can
understand and interpret human language. This is important for applications such as voice
recognition, chatbots, and language translation.
 Computer vision: Machine learning is used to build systems that can recognize and
interpret images and videos. This is important for applications such as self-driving cars,
surveillance systems, and medical imaging.
 Fraud detection: Machine learning can be used to detect fraudulent behavior in financial
transactions, online advertising, and other areas.
 Recommendation systems: Machine learning can be used to build recommendation systems
that suggest products, services, or content to users based on their past behavior and
preferences.
Overall, machine learning has become an essential tool for many businesses and industries, as
it enables them to make better use of data, improve their decision-making processes, and deliver
more personalized experiences to their customers.

Various Applications of Machine Learning

Now in this Machine learning tutorial, let’s learn the applications of Machine Learning:
 Automation: Machine learning, which works entirely autonomously in any field without
the need for any human intervention. For example, robots perform the essential process
steps in manufacturing plants.
 Finance Industry: Machine learning is growing in popularity in the finance industry. Banks
are mainly using ML to find patterns inside the data but also to prevent fraud.
 Government organization: The government makes use of ML to manage public safety and
utilities. Take the example of China with its massive face recognition. The government
uses Artificial intelligence to prevent jaywalking.
 Healthcare industry: Healthcare was one of the first industries to use machine learning
with image detection.
 Marketing: Broad use of AI is done in marketing thanks to abundant access to data. Before
the age of mass data, researchers develop advanced mathematical tools like Bayesian
analysis to estimate the value of a customer. With the boom of data, the marketing
department relies on AI to optimize customer relationships and marketing campaigns.
 Retail industry: Machine learning is used in the retail industry to analyze customer
behavior, predict demand, and manage inventory. It also helps retailers to personalize the
shopping experience for each customer by recommending products based on their past
purchases and preferences.
 Transportation: Machine learning is used in the transportation industry to optimize routes,
reduce fuel consumption, and improve the overall efficiency of transportation systems. It
also plays a role in autonomous vehicles, where ML algorithms are used to make decisions
about navigation and safety.

https://www.spiceworks.com/tech/artificial-intelligence/articles/what-is-ml/#_001
What Is Machine Learning?

Machine learning (ML) is a discipline of artificial intelligence (AI) that provides machines
with the ability to automatically learn from data and past experiences while identifying
patterns to make predictions with minimal human intervention.

Machine learning methods enable computers to operate autonomously without explicit

programming. ML applications are fed with new data, and they can independently learn, grow,
develop, and adapt.

Machine learning derives insightful information from large volumes of data by leveraging
algorithms to identify patterns and learn in an iterative process. ML algorithms use computation
methods to learn directly from data instead of relying on any predetermined equation that may
serve as a model.

The performance of ML algorithms adaptively improves with an increase in the number of

available samples during the ‘learning’ processes. For example, deep learning is a sub-domain of
machine learning that trains computers to imitate natural human traits like learning from examples.
It offers better performance parameters than conventional ML algorithms.

While machine learning is not a new concept – dating back to World War II when the Enigma
Machine was used – the ability to apply complex mathematical calculations automatically to
growing volumes and varieties of available data is a relatively recent development.
Today, with the rise of big data, IoT, and ubiquitous computing, machine learning has become
essential for solving problems across numerous areas, such as

 Computational finance (credit scoring, algorithmic trading)

 Computer vision (facial recognition, motion tracking, object detection)
 Computational biology (DNA sequencing, brain tumor detection, drug discovery)
 Automotive, aerospace, and manufacturing (predictive maintenance)
 Natural language processing (voice recognition)
How does machine learning work?

Machine learning algorithms are molded on a training dataset to create a model. As new input data
is introduced to the trained ML algorithm, it uses the developed model to make a prediction.
Types of Machine Learning

Machine learning algorithms can be trained in many ways, with each method having its pros and
cons. Based on these methods and ways of learning, machine learning is broadly categorized into
four main types:

Types
of Machine Learning

1. Supervised machine learning

This type of ML involves supervision, where machines are trained on labeled datasets and enabled
to predict outputs based on the provided training. The labeled dataset specifies that some input and
output parameters are already mapped. Hence, the machine is trained with the input and
corresponding output. A device is made to predict the outcome using the test dataset in subsequent
phases.

For example, consider an input dataset of parrot and crow images. Initially, the machine is trained
to understand the pictures, including the parrot and crow’s color, eyes, shape, and size. Post-
training, an input picture of a parrot is provided, and the machine is expected to identify the object
and predict the output. The trained machine checks for the various features of the object, such as
color, eyes, shape, etc., in the input picture, to make a final prediction. This is the process of object
identification in supervised machine learning.

The primary objective of the supervised learning technique is to map the input variable (a) with
the output variable (b). Supervised machine learning is further classified into two broad categories:

 Classification: These refer to algorithms that address classification problems where

the output variable is categorical; for example, yes or no, true or false, male or
female, etc. Real-world applications of this category are evident in spam detection
and email filtering.

Some known classification algorithms include the Random Forest Algorithm, Decision Tree
Algorithm, Logistic Regression Algorithm, and Support Vector Machine Algorithm.

 Regression: Regression algorithms handle regression problems where input and

output variables have a linear relationship. These are known to predict continuous
output variables. Examples include weather prediction, market trend analysis, etc.

Popular regression algorithms include the Simple Linear Regression Algorithm, Multivariate
Regression Algorithm, Decision Tree Algorithm, and Lasso Regression.

2. Unsupervised machine learning

Unsupervised learning refers to a learning technique that’s devoid of supervision. Here, the
machine is trained using an unlabeled dataset and is enabled to predict the output without any
supervision. An unsupervised learning algorithm aims to group the unsorted dataset based on the
input’s similarities, differences, and patterns.

For example, consider an input dataset of images of a fruit-filled container. Here, the images are
not known to the machine learning model. When we input the dataset into the ML model, the task
of the model is to identify the pattern of objects, such as color, shape, or differences seen in the
input images and categorize them. Upon categorization, the machine then predicts the output as it
gets tested with a test dataset.
Unsupervised machine learning is further classified into two types:

 Clustering: The clustering technique refers to grouping objects into clusters based
on parameters such as similarities or differences between objects. For example,
grouping customers by the products they purchase.

Some known clustering algorithms include the K-Means Clustering Algorithm, Mean-Shift
Algorithm, DBSCAN Algorithm, Principal Component Analysis, and Independent Component
Analysis.

 Association: Association learning refers to identifying typical relations between the

variables of a large dataset. It determines the dependency of various data items and
maps associated variables. Typical applications include web usage mining and
market data analysis.

Popular algorithms obeying association rules include the Apriori Algorithm, Eclat Algorithm, and
FP-Growth Algorithm.

3. Semi-supervised learning

Semi-supervised learning comprises characteristics of both supervised and unsupervised machine

learning. It uses the combination of labeled and unlabeled datasets to train its algorithms. Using
both types of datasets, semi-supervised learning overcomes the drawbacks of the options
mentioned above.

Consider an example of a college student. A student learning a concept under a teacher’s

supervision in college is termed supervised learning. In unsupervised learning, a student self-learns
the same concept at home without a teacher’s guidance. Meanwhile, a student revising the concept
after learning under the direction of a teacher in college is a semi-supervised form of learning.

4. Reinforcement learning

Reinforcement learning is a feedback-based process. Here, the AI component automatically takes

stock of its surroundings by the hit & trial method, takes action, learns from experiences, and
improves performance. The component is rewarded for each good action and penalized for every
wrong move. Thus, the reinforcement learning component aims to maximize the rewards by
performing good actions.

Unlike supervised learning, reinforcement learning lacks labeled data, and the agents learn via
experiences only. Consider video games. Here, the game specifies the environment, and each move
of the reinforcement agent defines its state. The agent is entitled to receive feedback via
punishment and rewards, thereby affecting the overall game score. The ultimate goal of the agent
is to achieve a high score.

Reinforcement learning is applied across different fields such as game theory, information theory,
and multi-agent systems. Reinforcement learning is further divided into two types of methods or
algorithms:

 Positive reinforcement learning: This refers to adding a reinforcing stimulus after

a specific behavior of the agent, which makes it more likely that the behavior may
occur again in the future, e.g., adding a reward after a behavior.
 Negative reinforcement learning: Negative reinforcement learning refers to
strengthening a specific behavior that avoids a negative outcome.

Top 5 Machine Learning Applications

Industry verticals handling large amounts of data have realized the significance and value of
machine learning technology. As machine learning derives insights from data in real-time,
organizations using it can work efficiently and gain an edge over their competitors.

Every industry vertical in this fast-paced digital world, benefits immensely from machine learning
tech. Here, we look at the top five ML application sectors.
1. Healthcare industry

Machine learning is being increasingly adopted in the healthcare industry, credit to wearable
devices and sensors such as wearable fitness trackers, smart health watches, etc. All such devices
monitor users’ health data to assess their health in real-time.

Moreover, the technology is helping medical practitioners in analyzing trends or flagging events
that may help in improved patient diagnoses and treatment. ML algorithms even allow medical
experts to predict the lifespan of a patient suffering from a fatal disease with increasing accuracy.

Additionally, machine learning is contributing significantly to two areas:

 Drug discovery: Manufacturing or discovering a new drug is expensive and involves

a lengthy process. Machine learning helps speed up the steps involved in such a
multi-step process. For example, Pfizer uses IBM’s Watson to analyze massive
volumes of disparate data for drug discovery.
 Personalized treatment: Drug manufacturers face the stiff challenge of validating
the effectiveness of a specific drug on a large mass of the population. This is because
the drug works only on a small group in clinical trials and possibly causes side effects
on some subjects.

To address these issues, companies like Genentech have collaborated with GNS Healthcare to
leverage machine learning and simulation AI platforms, innovating biomedical treatments to
address these issues. ML technology looks for patients’ response markers by analyzing individual
genes, which provides targeted therapies to patients.

2. Finance sector

Today, several financial organizations and banks use machine learning technology to tackle
fraudulent activities and draw essential insights from vast volumes of data. ML-derived insights
aid in identifying investment opportunities that allow investors to decide when to trade.
Moreover, data mining methods help cyber-surveillance systems zero in on warning signs of
fraudulent activities, subsequently neutralizing them. Several financial institutes have already
partnered with tech companies to leverage the benefits of machine learning.

For example,

 Citibank has partnered with fraud detection company Feedzai to handle online and
in-person banking frauds.
 PayPal uses several machine learning tools to differentiate between legitimate and
fraudulent transactions between buyers and sellers.
3. Retail sector

Retail websites extensively use machine learning to recommend items based on users’ purchase
history. Retailers use ML techniques to capture data, analyze it, and deliver personalized shopping
experiences to their customers. They also implement ML for marketing campaigns, customer
insights, customer merchandise planning, and price optimization.

According to a September 2021 report by Grand View Research, Inc., the global recommendation
engine market is expected to reach a valuation of $17.30 billion by 2028. Common day-to-day
examples of recommendation systems include:

 When you browse items on Amazon, the product recommendations that you see on
the homepage result from machine learning algorithms. Amazon uses artificial
neural networks (ANN) to offer intelligent, personalized recommendations relevant
to customers based on their recent purchase history, comments, bookmarks, and
other online activities.
 Netflix and YouTube rely heavily on recommendation systems to suggest shows and
videos to their users based on their viewing history.

Moreover, retail sites are also powered with virtual assistants or conversational chatbots that
leverage ML, natural language processing (NLP), and natural language understanding (NLU) to
automate customer shopping experiences.
4. Travel industry

Machine learning is playing a pivotal role in expanding the scope of the travel industry. Rides
offered by Uber, Ola, and even self-driving cars have a robust machine learning backend.

Consider Uber’s machine learning algorithm that handles the dynamic pricing of their rides. Uber
uses a machine learning model called ‘Geosurge’ to manage dynamic pricing parameters. It uses
real-time predictive modeling on traffic patterns, supply, and demand. If you are getting late for a
meeting and need to book an Uber in a crowded area, the dynamic pricing model kicks in, and you
can get an Uber ride immediately but would need to pay twice the regular fare.

Moreover, the travel industry uses machine learning to analyze user reviews. User comments are
classified through sentiment analysis based on positive or negative scores. This is used for
campaign monitoring, brand monitoring, compliance monitoring, etc., by companies in the travel
industry.

5. Social media

With machine learning, billions of users can efficiently engage on social media networks. Machine
learning is pivotal in driving social media platforms from personalizing news feeds to delivering
user-specific ads. For example, Facebook’s auto-tagging feature employs image recognition to
identify your friend’s face and tag them automatically. The social network uses ANN to recognize
familiar faces in users’ contact lists and facilitates automated tagging.

Similarly, LinkedIn knows when you should apply for your next role, whom you need to connect
with, and how your skills rank compared to peers. All these features are enabled by machine
learning.

https://www.ibm.com/topics/deep-learning

What is deep learning?

Deep learning is a subset of machine learning that uses multilayered neural networks, called deep
neural networks, to simulate the complex decision-making power of the human brain. Some form
of deep learning powers most of the artificial intelligence (AI) applications in our lives today.

The chief difference between deep learning and machine learning is the structure of the underlying
neural network architecture. “Nondeep,” traditional machine learning models use simple neural
networks with one or two computational layers. Deep learning models use three or more layers—
but typically hundreds or thousands of layers—to train the models.

While supervised learning models require structured, labeled input data to make accurate outputs,
deep learning models can use unsupervised learning. With unsupervised learning, deep learning
models can extract the characteristics, features and relationships they need to make accurate
outputs from raw, unstructured data. Additionally, these models can even evaluate and refine their
outputs for increased precision.

Deep learning is an aspect of data science that drives many applications and services that
improve automation, performing analytical and physical tasks without human intervention. This
enables many everyday products and services—such as digital assistants, voice-enabled TV
remotes, credit card fraud detection, self-driving cars and generative AI.

Types of deep learning models

Deep learning algorithms are incredibly complex, and there are different types of neural networks
to address specific problems or datasets. Here are six. Each has its own advantages and they are
presented here roughly in the order of their development, with each successive model adjusting to
overcome a weakness in a previous model.
One potential weakness across them all is that deep learning models are often “black boxes,”
making it difficult to understand their inner workings and posing interpretability challenges. But
this can be balanced against the overall benefits of high accuracy and scalability.
CNNs

Convolutional neural networks (CNNs or ConvNets) are used primarily in computer vision and
image classification applications. They can detect features and patterns within images and videos,
enabling tasks such as object detection, image recognition, pattern recognition and face
recognition. These networks harness principles from linear algebra, particularly matrix
multiplication, to identify patterns within an image.

CNNs are a specific type of neural network, which is composed of node layers, containing an input
layer, one or more hidden layers and an output layer. Each node connects to another and has an
associated weight and threshold. If the output of any individual node is above the specified
threshold value, that node is activated, sending data to the next layer of the network. Otherwise,
no data is passed along to the next layer of the network.

At least three main types of layers make up a CNN: a convolutional layer, pooling layer and fully
connected (FC) layer. For complex uses, a CNN might contain up to thousands of layers, each
layer building on the previous layers. By “convolution”—working and reworking the original
input—detailed patterns can be discovered. With each layer, the CNN increases in its complexity,
identifying greater portions of the image. Earlier layers focus on simple features, such as colors
and edges. As the image data progresses through the layers of the CNN, it starts to recognize larger
elements or shapes of the object until it finally identifies the intended object.

CNNs are distinguished from other neural networks by their superior performance with image,
speech or audio signal inputs. Before CNNs, manual and time-consuming feature extraction
methods were used to identify objects in images. However, CNNs now provide a more scalable
approach to image classification and object recognition tasks, and process high-dimensional data.
And CNNs can exchange data between layers, to deliver more efficient data processing. While
information might be lost in the pooling layer, this might be outweighed by the benefits of CNNs,
which can help to reduce complexity, improve efficiency and limit risk of overfitting.

There are other disadvantages to CNNs, which are computationally demanding—costing time and
budget, requiring many graphical processing units (GPUs). They also require highly trained
experts with cross-domain knowledge, and careful testing of configurations, hyperparameters and
configurations.
RNNs

Recurrent neural networks (RNNs) are typically used in natural language and speech
recognition applications as they use sequential or time-series data. RNNs can be identified by their
feedback loops. These learning algorithms are primarily used when using time-series data to make
predictions about future outcomes. Use cases include stock market predictions or sales forecasting,
or ordinal or temporal problems, such as language translation, natural language processing (NLP),
speech recognition and image captioning. These functions are often incorporated into popular
applications such as Siri, voice search and Google Translate.

RNNs use their “memory” as they take information from prior inputs to influence the current input
and output. While traditional deep neural networks assume that inputs and outputs are independent
of each other, the output of RNNs depends on the prior elements within the sequence. While future
events would also be helpful in determining the output of a given sequence, unidirectional
recurrent neural networks cannot account for these events in their predictions.

RNNs share parameters across each layer of the network and share the same weight parameter
within each layer of the network, with the weights adjusted through the processes of
backpropagation and gradient descent to facilitate reinforcement learning.

RNNs use a backpropagation through time (BPTT) algorithm to determine the gradients, which is
slightly different from traditional backpropagation as it is specific to sequence data. The principles
of BPTT are the same as traditional backpropagation, where the model trains itself by calculating
errors from its output layer to its input layer. BPTT differs from the traditional approach in that
BPTT sums errors at each time step, whereas feedforward networks do not need to sum errors as
they do not share parameters across each layer.

An advantage over other neural network types is that RNNs use both binary data processing and
memory. RNNs can plan out multiple inputs and productions so that rather than delivering only
one result for a single input, RMMs can produce one-to-many, many-to-one or many-to-many
outputs.
There are also options within RNNs. For example, the long short-term memory (LSTM) network
is superior to simple RNNs by learning and acting on longer-term dependencies.

However, RNNs tend to run into two basic problems, known as exploding gradients and vanishing
gradients. These issues are defined by the size of the gradient, which is the slope of the loss
function along the error curve.

 When the gradient is vanishing and is too small, it continues to become smaller, updating
the weight parameters until they become insignificant—that is: zero (0). When that occurs,
the algorithm is no longer learning.
 Exploding gradients occur when the gradient is too large, creating an unstable model. In
this case, the model weights grow too large, and they will eventually be represented as NaN
(not a number). One solution to these issues is to reduce the number of hidden layers within
the neural network, eliminating some of the complexity in the RNN models.
Some final disadvantages: RNNs might also require long training time and be difficult to use on
large datasets. Optimizing RNNs add complexity when they have many layers and parameters.

Autoencoders and variational autoencoders

Deep learning made it possible to move beyond the analysis of numerical data, by adding the
analysis of images, speech and other complex data types. Among the first class of models to
achieve this were variational autoencoders (VAEs). They were the first deep-learning models to
be widely used for generating realistic images and speech, which empowered deep generative
modeling by making models easier to scale—which is the cornerstone of what we think of
as generative AI.

Autoencoders work by encoding unlabeled data into a compressed representation, and then
decoding the data back into its original form. Plain autoencoders were used for a variety of
purposes, including reconstructing corrupted or blurry images. Variational autoencoders added the
critical ability not just to reconstruct data, but also to output variations on the original data.
This ability to generate novel data ignited a rapid-fire succession of new technologies, from
generative adversarial networks (GANs) to diffusion models, capable of producing ever more
realistic—but fake—images. In this way, VAEs set the stage for today’s generative AI.

Autoencoders are built out of blocks of encoders and decoders, an architecture that also underpins
today’s large language models. Encoders compress a dataset into a dense representation, arranging
similar data points closer together in an abstract space. Decoders sample from this space to create
something new while preserving the dataset’s most important features.

The biggest advantage to autoencoders is the ability to handle large batches of data and show input
data in a compressed form, so the most significant aspects stand out—enabling anomaly detection
and classification tasks. This also speeds transmission and reduces storage requirements.
Autoencoders can be trained on unlabeled data so they might be used where labeled data is not
available. When unsupervised training is used, there is a time savings advantage: deep learning
algorithms learn automatically and gain accuracy without needing manual feature engineering. In
addition, VAEs can generate new sample data for text or image generation.

There are disadvantages to autoencoders. The training of deep or intricate structures can be a drain
on computational resources. And during unsupervised training, the model might overlook the
needed properties and instead simply replicate the input data. Autoencoders might also overlook
complex data linkages in structured data so that it does not correctly identify complex
relationships.

GANs

Generative adversarial networks (GANs) are neural networks that are used both in and outside of
artificial intelligence (AI) to create new data resembling the original training data. These can
include images appearing to be human faces—but are generated, not taken of real people. The
“adversarial” part of the name comes from the back-and-forth between the two portions of the
GAN: a generator and a discriminator.

 The generator creates something: images, video or audio and then producing an output
with a twist. For example, a horse can be transformed into a zebra with some degree of
accuracy. The result depends on the input and how well-trained the layers are in the
generative model for this use case.
 The discriminator is the adversary, where the generative result (fake image) is compared
against the real images in the dataset. The discriminator tries to distinguish between the
real and fake images, video or audio.

GANs train themselves. The generator creates fakes while the discriminator learns to spot the
differences between the generator's fakes and the true examples. When the discriminator is able to
flag the fake, then the generator is penalized. The feedback loop continues until the generator
succeeds in producing output that the discriminator cannot distinguish.

The prime GAN benefit is creating realistic output that can be difficult to distinguish from the
originals, which in turn may be used to further train machine learning models. Setting up a GAN
to learn is straightforward, since they are trained by using unlabeled data or with minor labeling.
However, the potential disadvantage is that the generator and discriminator might go back-and-
forth in competition for a long time, creating a large system drain. One training limitation is that a
huge amount of input data might be required to obtain a satisfactory output. Another potential
problem is “mode collapse,” when the generator produces a limited set of outputs rather than a
wider variety.

Diffusion models

Diffusion models are generative models that are trained using the forward and reverse diffusion
process of progressive noise-addition and denoising. Diffusion models generate data—most often
images—similar to the data on which they are trained, but then overwrite the data used to train
them. They gradually add Gaussian noise to the training data until it’s unrecognizable, then learn
a reversed “denoising” process that can synthesize output (usually images) from random noise
input.

A diffusion model learns to minimize the differences of the generated samples versus the desired
target. Any discrepancy is quantified and the model's parameters are updated to minimize the
loss—training the model to produce samples closely resembling the authentic training data.
Beyond image quality, diffusion models have the advantage of not requiring adversarial training,
which speeds the learning process and also offering close process control. Training is more stable
than with GANs and diffusion models are not as prone to mode collapse.

But, compared to GANs, diffusion models can require more computing resources to train,
including more fine-tuning. IBM Research® has also discovered that this form of generative AI
can be hijacked with hidden backdoors, giving attackers control over the image creation process
so that AI diffusion models can be tricked into generating manipulated images.

Transformer models

Transformer models combine an encoder-decoder architecture with a text-processing mechanism

and have revolutionized how language models are trained. An encoder converts raw, unannotated
text into representations known as embeddings; the decoder takes these embeddings together with
previous outputs of the model, and successively predicts each word in a sentence.

Using fill-in-the-blank guessing, the encoder learns how words and sentences relate to each other,
building up a powerful representation of language without having to label parts of speech and other
grammatical features. Transformers, in fact, can be pretrained at the outset without a particular
task in mind. After these powerful representations are learned, the models can later be
specialized—with much less data—to perform a requested task.

Several innovations make this possible. Transformers process words in a sentence simultaneously,
enabling text processing in parallel, speeding up training. Earlier techniques including recurrent
neural networks (RNNs) processed words one by one. Transformers also learned the positions of
words and their relationships—this context enables them to infer meaning and disambiguate words
such as “it” in long sentences.

By eliminating the need to define a task upfront, transformers made it practical to pretrain language
models on vast amounts of raw text, enabling them to grow dramatically in size. Previously,
labeled data was gathered to train one model on a specific task. With transformers, one model
trained on a massive amount of data can be adapted to multiple tasks by fine-tuning it on a small
amount of labeled task-specific data.
Language transformers today are used for nongenerative tasks such as classification and entity
extraction as well as generative tasks including machine translation, summarization and question
answering. Transformers have surprised many people with their ability to generate convincing
dialog, essays and other content.

Natural language processing (NLP) transformers provide remarkable power since they can run in
parallel, processing multiple portions of a sequence simultaneously, which then greatly speeds
training. Transformers also track long-term dependencies in text, which enables them to
understand the overall context more clearly and create superior output. In addition, transformers
are more scalable and flexible in order to be customized by task.

As to limitations, because of their complexity, transformers require huge computational resources

and a long training time. Also, the training data must be accurately on-target, unbiased and plentiful
to produce accurate results.
https://aws.amazon.com/what-is/deep-learning/

What is Deep Learning?

Deep learning is a method in artificial intelligence (AI) that teaches computers to process data in
a way that is inspired by the human brain. Deep learning models can recognize complex patterns
in pictures, text, sounds, and other data to produce accurate insights and predictions. You can use
deep learning methods to automate tasks that typically require human intelligence, such as
describing images or transcribing a sound file into text.

Why is deep learning important?

Artificial intelligence (AI) attempts to train computers to think and learn as humans do. Deep
learning technology drives many AI applications used in everyday products, such as the following:

 Digital assistants

 Voice-activated television remotes

 Fraud detection

 Automatic facial recognition

It is also a critical component of emerging technologies such as self-driving cars, virtual reality,
and more.

Deep learning models are computer files that data scientists have trained to perform tasks using an
algorithm or a predefined set of steps. Businesses use deep learning models to analyze data and
make predictions in various applications.

What are the uses of deep learning?

Deep learning has several use cases in automotive, aerospace, manufacturing, electronics, medical
research, and other fields. These are some examples of deep learning:

 Self-driving cars use deep learning models to automatically detect road signs and pedestrians.

 Defense systems use deep learning to automatically flag areas of interest in satellite images.

 Medical image analysis uses deep learning to automatically detect cancer cells for medical
diagnosis.

 Factories use deep learning applications to automatically detect when people or objects are within
an unsafe distance of machines.

You can group these various use cases of deep learning into four broad categories—computer
vision, speech recognition, natural language processing (NLP), and recommendation engines.

Computer vision

Computer vision is the computer's ability to extract information and insights from images and
videos. Computers can use deep learning techniques to comprehend images in the same way that
humans do. Computer vision has several applications, such as the following:

 Content moderation to automatically remove unsafe or inappropriate content from image and
video archives

 Facial recognition to identify faces and recognize attributes like open eyes, glasses, and facial hair

 Image classification to identify brand logos, clothing, safety gear, and other image details
Speech recognition

Deep learning models can analyze human speech despite varying speech patterns, pitch, tone,
language, and accent. Virtual assistants such as Amazon Alexa and automatic transcription
software use speech recognition to do the following tasks:

 Assist call center agents and automatically classify calls.

 Convert clinical conversations into documentation in real time.

 Accurately subtitle videos and meeting recordings for a wider content reach.
Natural language processing

Computers use deep learning algorithms to gather insights and meaning from text data and
documents. This ability to process natural, human-created text has several use cases, including in
these functions:

 Automated virtual agents and chatbots

 Automatic summarization of documents or news articles

 Business intelligence analysis of long-form documents, such as emails and forms

 Indexing of key phrases that indicate sentiment, such as positive and negative comments on social
media
Recommendation engines

Applications can use deep learning methods to track user activity and develop personalized
recommendations. They can analyze the behavior of various users and help them discover new
products or services. For example, many media and entertainment companies, such as Netflix,
Fox, and Peacock, use deep learning to give personalized video recommendations.

How does deep learning work?

Deep learning algorithms are neural networks that are modeled after the human brain. For example,
a human brain contains millions of interconnected neurons that work together to learn and process
information. Similarly, deep learning neural networks, or artificial neural networks, are made of
many layers of artificial neurons that work together inside the computer.

Artificial neurons are software modules called nodes, which use mathematical calculations to
process data. Artificial neural networks are deep learning algorithms that use these nodes to solve
complex problems.

What are the components of a deep learning network?

The components of a deep neural network are the following.

Input layer

An artificial neural network has several nodes that input data into it. These nodes make up the
input layer of the system.

Hidden layer

The input layer processes and passes the data to layers further in the neural network. These hidden
layers process information at different levels, adapting their behavior as they receive new
information. Deep learning networks have hundreds of hidden layers that they can use to analyze
a problem from several different angles.

For example, if you were given an image of an unknown animal that you had to classify, you would
compare it with animals you already know. For example, you would look at the shape of its eyes
and ears, its size, the number of legs, and its fur pattern. You would try to identify patterns, such
as the following:

 The animal has hooves, so it could be a cow or deer.

 The animal has cat eyes, so it could be some type of wild cat.

The hidden layers in deep neural networks work in the same way. If a deep learning algorithm is
trying to classify an animal image, each of its hidden layers processes a different feature of the
animal and tries to accurately categorize it.
Output layer

The output layer consists of the nodes that output the data. Deep learning models that output "yes"
or "no" answers have only two nodes in the output layer. On the other hand, those that output a
wider range of answers have more nodes.

What is deep learning in the context of machine learning?

Deep learning is a subset of machine learning. Deep learning algorithms emerged in an attempt to
make traditional machine learning techniques more efficient. Traditional machine learning
methods require significant human effort to train the software. For example, in animal image
recognition, you need to do the following:

 Manually label hundreds of thousands of animal images.

 Make the machine learning algorithms process those images.

 Test those algorithms on a set of unknown images.

 Identify why some results are inaccurate.

 Improve the dataset by labeling new images to improve result accuracy.

This process is called supervised learning. In supervised learning, result accuracy improves only
when you have a broad and sufficiently varied dataset. For instance, the algorithm might accurately
identify black cats but not white cats because the training dataset had more images of black cats.
In that case, you would need to label more white cat images and train the machine learning models
once again.

What are the benefits of deep learning over machine learning?

A deep learning network has the following benefits over traditional machine learning.

Efficient processing of unstructured data

Machine learning methods find unstructured data, such as text documents, challenging to process
because the training dataset can have infinite variations. On the other hand, deep learning models
can comprehend unstructured data and make general observations without manual feature
extraction. For instance, a neural network can recognize that these two different input sentences
have the same meaning:

 Can you tell me how to make the payment?

 How do I transfer money?

Hidden relationships and pattern discovery

A deep learning application can analyze large amounts of data more deeply and reveal new insights
for which it might not have been trained. For example, consider a deep learning model that is
trained to analyze consumer purchases. The model has data only for the items you have already
purchased. However, the artificial neural network can suggest new items that you haven't bought
by comparing your buying patterns to those of other similar customers.

Unsupervised learning

Deep learning models can learn and improve over time based on user behavior. They do not require
large variations of labeled datasets. For example, consider a neural network that automatically
corrects or suggests words by analyzing your typing behavior. Let's assume it was trained in the
English language and can spell-check English words. However, if you frequently type non-English
words, such as danke, the neural network automatically learns and autocorrects these words too.

Volatile data processing

Volatile datasets have large variations. One example is loan repayment amounts in a bank. A deep
learning neural network can categorize and sort that data as well, such as by analyzing financial
transactions and flagging some of them for fraud detection.

What are the challenges of deep learning?

As deep learning is a relatively new technology, certain challenges come with its practical
implementation.
Large quantities of high-quality data

Deep learning algorithms give better results when you train them on large amounts of high-quality
data. Outliers or mistakes in your input dataset can significantly affect the deep learning process.
For instance, in our animal image example, the deep learning model might classify an airplane as
a turtle if non-animal images were accidentally introduced in the dataset.

To avoid such inaccuracies, you must clean and process large amounts of data before you can train
deep learning models. The input data preprocessing requires large amounts of data storage
capacity.

Large processing power

Deep learning algorithms are compute-intensive and require infrastructure with sufficient compute
capacity to properly function. Otherwise, they take a long time to process results.

https://cloud.google.com/discover/what-is-deep-learning

What is Deep Learning?

Deep learning is a type of machine learning that uses artificial neural networks to learn from data.
Artificial neural networks are inspired by the human brain, and they can be used to solve a wide
variety of problems, including image recognition, natural language processing, and speech
recognition.

How does deep learning work?

Deep learning works by using artificial neural networks to learn from data. Neural networks are
made up of layers of interconnected nodes, and each node is responsible for learning a specific
feature of the data. Building on our previous example with images – in an image recognition
network, the first layer of nodes might learn to identify edges, the second layer might learn to
identify shapes, and the third layer might learn to identify objects.
As the network learns, the weights on the connections between the nodes are adjusted so that the
network can better classify the data. This process is called training, and it can be done using a
variety of techniques, such as supervised learning, unsupervised learning, and reinforcement
learning.

Once a neural network has been trained, it can be used to make predictions with new data it’s
received.

Deep learning vs. machine learning

Both deep learning and machine learning are branches of artificial intelligence, with machine
learning being a broader term encompassing various techniques, including deep learning. Both
machine learning and deep learning algorithms can be trained on labeled or unlabeled data,
depending on the task and algorithm.

Machine learning and deep learning are both applicable to tasks such as image recognition, speech
recognition, and natural language processing. However, deep learning often outperforms
traditional machine learning in complex pattern recognition tasks like image classification and
object detection due to its ability to learn hierarchical representations of data.

Deep learning applications

Deep learning can be used in a wide variety of applications, including:

 Image recognition: To identify objects and features in images, such as people, animals, places,
etc.
 Natural language processing: To help understand the meaning of text, such as in customer
service chatbots and spam filters.
 Finance: To help analyze financial data and make predictions about market trends
 Text to image: Convert text into images, such as in the Google Translate app.

Types of deep learning

There are many different types of deep learning models. Some of the most common types include:
Convolutional neural networks (CNNs)

CNNs are used for image recognition and processing. They are particularly good at identifying
objects in images, even when those objects are partially obscured or distorted.

Deep reinforcement learning

Deep reinforcement learning is used for robotics and game playing. It is a type of machine learning
that allows an agent to learn how to behave in an environment by interacting with it and receiving
rewards or punishments.

Recurrent neural networks (RNNs)

RNNs are used for natural language processing and speech recognition. They are particularly good
at understanding the context of a sentence or phrase, and they can be used to generate text or
translate languages.

What are the benefits of using deep learning models?

There are a number of benefits to using deep learning models, including:

 Can learn complex relationships between features in data: This makes them more powerful
than traditional machine learning methods.
 Large dataset training: This makes them very scalable, and able to learn from a wider range of
experiences, making more accurate predictions.
 Data-driven learning: DL models can learn in a data-driven way, requiring less human
intervention to train them, increasing efficiency and scalability. These models learn from data that
is constantly being generated, such as data from sensors or social media.

Challenges of using deep learning models

Deep learning also has a number of challenges, including:

 Data requirements: Deep learning models require large amounts of data to learn from, making it
difficult to apply deep learning to problems where there is not a lot of data available.
 Overfitting: DL models may be prone to overfitting. This means that they can learn the noise in
the data rather than the underlying relationships.
 Bias: These models can potentially be biased, depending on the data that it’s based on. This can
lead to unfair or inaccurate predictions. It is important to take steps to mitigate bias in deep learning
models.

https://www.geeksforgeeks.org/introduction-deep-learning/

What is Deep Learning?

The definition of Deep learning is that it is the branch of machine learning that is based on
artificial neural network architecture. An artificial neural network or ANN uses layers of
interconnected nodes called neurons that work together to process and learn from the input data.
In a fully connected Deep neural network, there is an input layer and one or more hidden layers
connected one after the other. Each neuron receives input from the previous layer neurons or
the input layer. The output of one neuron becomes the input to other neurons in the next layer
of the network, and this process continues until the final layer produces the output of the
network. The layers of the neural network transform the input data through a series of nonlinear
transformations, allowing the network to learn complex representations of the input data.
Today Deep learning AI has become one of the most popular and visible areas of machine
learning, due to its success in a variety of applications, such as computer vision, natural
language processing, and Reinforcement learning.
Deep learning AI can be used for supervised, unsupervised as well as reinforcement machine
learning. it uses a variety of ways to process these.
 Supervised Machine Learning: Supervised machine learning is the machine
learning technique in which the neural network learns to make predictions or classify data
based on the labeled datasets. Here we input both input features along with the target
variables. the neural network learns to make predictions based on the cost or error that
comes from the difference between the predicted and the actual target, this process is known
as backpropagation. Deep learning algorithms like Convolutional neural networks,
Recurrent neural networks are used for many supervised tasks like image classifications and
recognization, sentiment analysis, language translations, etc.
 Unsupervised Machine Learning: Unsupervised machine learning is the machine
learning technique in which the neural network learns to discover the patterns or to cluster
the dataset based on unlabeled datasets. Here there are no target variables. while the machine
has to self-determined the hidden patterns or relationships within the datasets. Deep learning
algorithms like autoencoders and generative models are used for unsupervised tasks like
clustering, dimensionality reduction, and anomaly detection.
 Reinforcement Machine Learning: Reinforcement Machine Learning is the machine
learning technique in which an agent learns to make decisions in an environment to
maximize a reward signal. The agent interacts with the environment by taking action and
observing the resulting rewards. Deep learning can be used to learn policies, or a set of
actions, that maximizes the cumulative reward over time. Deep reinforcement learning
algorithms like Deep Q networks and Deep Deterministic Policy Gradient (DDPG) are used
to reinforce tasks like robotics and game playing etc.

Artificial neural networks

Artificial neural networks are built on the principles of the structure and operation of human
neurons. It is also known as neural networks or neural nets. An artificial neural network’s input
layer, which is the first layer, receives input from external sources and passes it on to the hidden
layer, which is the second layer. Each neuron in the hidden layer gets information from the
neurons in the previous layer, computes the weighted total, and then transfers it to the neurons
in the next layer. These connections are weighted, which means that the impacts of the inputs
from the preceding layer are more or less optimized by giving each input a distinct weight.
These weights are then adjusted during the training process to enhance the performance of the
model.
Artificial neurons, also known as units, are found in artificial neural networks. The whole
Artificial Neural Network is composed of these artificial neurons, which are arranged in a series
of layers. The complexities of neural networks will depend on the complexities of the
underlying patterns in the dataset whether a layer has a dozen units or millions of
units. Commonly, Artificial Neural Network has an input layer, an output layer as well as
hidden layers. The input layer receives data from the outside world which the neural network
needs to analyze or learn about.
In a fully connected artificial neural network, there is an input layer and one or more hidden
layers connected one after the other. Each neuron receives input from the previous layer neurons
or the input layer. The output of one neuron becomes the input to other neurons in the next layer
of the network, and this process continues until the final layer produces the output of the
network. Then, after passing through one or more hidden layers, this data is transformed into
valuable data for the output layer. Finally, the output layer provides an output in the form of an
artificial neural network’s response to the data that comes in.
Units are linked to one another from one layer to another in the bulk of neural networks. Each
of these links has weights that control how much one unit influences another. The neural
network learns more and more about the data as it moves from one unit to another, ultimately
producing an output from the output layer.

Difference between Machine Learning and Deep Learning :

machine learning and deep learning AI both are subsets of artificial intelligence but there are
many similarities and differences between them.

Machine Learning Deep Learning

Apply statistical algorithms to learn the Uses artificial neural network architecture
hidden patterns and relationships in the to learn the hidden patterns and
dataset. relationships in the dataset.

Requires the larger volume of dataset

Can work on the smaller amount of dataset
compared to machine learning

Better for complex task like image

Better for the low-label task. processing, natural language processing,
etc.

Takes less time to train the model. Takes more time to train the model.

A model is created by relevant features which Relevant features are automatically

are manually extracted from images to detect extracted from images. It is an end-to-end
an object in the image. learning process.

More complex, it works like the black box

Less complex and easy to interpret the result.
interpretations of the result are not easy.
Machine Learning Deep Learning

It can work on the CPU or requires less

It requires a high-performance computer
computing power as compared to deep
with GPU.
learning.

Types of neural networks

Deep Learning models are able to automatically learn features from the data, which makes them
well-suited for tasks such as image recognition, speech recognition, and natural language
processing. The most widely used architectures in deep learning are feedforward neural
networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs).
1. Feedforward neural networks (FNNs) are the simplest type of ANN, with a linear flow of
information through the network. FNNs have been widely used for tasks such as image
classification, speech recognition, and natural language processing.
2. Convolutional Neural Networks (CNNs) are specifically for image and video recognition
tasks. CNNs are able to automatically learn features from the images, which makes them
well-suited for tasks such as image classification, object detection, and image segmentation.
3. Recurrent Neural Networks (RNNs) are a type of neural network that is able to process
sequential data, such as time series and natural language. RNNs are able to maintain an
internal state that captures information about the previous inputs, which makes them well-
suited for tasks such as speech recognition, natural language processing, and language
translation.
Deep Learning Applications:
The main applications of deep learning AI can be divided into computer vision, natural language
processing (NLP), and reinforcement learning.
1. Computer vision
The first Deep Learning applications is Computer vision. In computer vision, Deep learning AI
models can enable machines to identify and understand visual data. Some of the main
applications of deep learning in computer vision include:
 Object detection and recognition: Deep learning model can be used to identify and locate
objects within images and videos, making it possible for machines to perform tasks such as
self-driving cars, surveillance, and robotics.
 Image classification: Deep learning models can be used to classify images into categories
such as animals, plants, and buildings. This is used in applications such as medical imaging,
quality control, and image retrieval.
 Image segmentation: Deep learning models can be used for image segmentation into
different regions, making it possible to identify specific features within images.
2. Natural language processing (NLP):
In Deep learning applications, second application is NLP. NLP, the Deep learning model can
enable machines to understand and generate human language. Some of the main applications of
deep learning in NLP include:
 Automatic Text Generation – Deep learning model can learn the corpus of text and new
text like summaries, essays can be automatically generated using these trained models.
 Language translation: Deep learning models can translate text from one language to
another, making it possible to communicate with people from different linguistic
backgrounds.
 Sentiment analysis: Deep learning models can analyze the sentiment of a piece of text,
making it possible to determine whether the text is positive, negative, or neutral. This is
used in applications such as customer service, social media monitoring, and political
analysis.
 Speech recognition: Deep learning models can recognize and transcribe spoken words,
making it possible to perform tasks such as speech-to-text conversion, voice search, and
voice-controlled devices.
3. Reinforcement learning:
In reinforcement learning, deep learning works as training agents to take action in an
environment to maximize a reward. Some of the main applications of deep learning in
reinforcement learning include:
 Game playing: Deep reinforcement learning models have been able to beat human experts
at games such as Go, Chess, and Atari.
 Robotics: Deep reinforcement learning models can be used to train robots to perform
complex tasks such as grasping objects, navigation, and manipulation.
 Control systems: Deep reinforcement learning models can be used to control complex
systems such as power grids, traffic management, and supply chain optimization.

Challenges in Deep Learning

Deep learning has made significant advancements in various fields, but there are still some
challenges that need to be addressed. Here are some of the main challenges in deep learning:
1. Data availability: It requires large amounts of data to learn from. For using deep learning
it’s a big concern to gather as much data for training.
2. Computational Resources: For training the deep learning model, it is computationally
expensive because it requires specialized hardware like GPUs and TPUs.
3. Time-consuming: While working on sequential data depending on the computational
resource it can take very large even in days or months.
4. Interpretability: Deep learning models are complex, it works like a black box. it is very
difficult to interpret the result.
5. Overfitting: when the model is trained again and again, it becomes too specialized for the
training data, leading to overfitting and poor performance on new data.

Advantages of Deep Learning:

1. High accuracy: Deep Learning algorithms can achieve state-of-the-art performance in
various tasks, such as image recognition and natural language processing.
2. Automated feature engineering: Deep Learning algorithms can automatically discover
and learn relevant features from data without the need for manual feature engineering.
3. Scalability: Deep Learning models can scale to handle large and complex datasets, and can
learn from massive amounts of data.
4. Flexibility: Deep Learning models can be applied to a wide range of tasks and can handle
various types of data, such as images, text, and speech.
5. Continual improvement: Deep Learning models can continually improve their
performance as more data becomes available.
Disadvantages of Deep Learning:
1. High computational requirements: Deep Learning AI models require large amounts of
data and computational resources to train and optimize.
2. Requires large amounts of labeled data: Deep Learning models often require a large
amount of labeled data for training, which can be expensive and time- consuming to acquire.
3. Interpretability: Deep Learning models can be challenging to interpret, making it difficult
to understand how they make decisions.
Overfitting: Deep Learning models can sometimes overfit to the training data, resulting in
poor performance on new and unseen data.
4. Black-box nature: Deep Learning models are often treated as black boxes, making it
difficult to understand how they work and how they arrived at their predictions.
https://www.geeksforgeeks.org/clustering-in-machine-learning/

What is Clustering?
The task of grouping data points based on their similarity with each other is called Clustering
or Cluster Analysis. This method is defined under the branch of Unsupervised Learning, which
aims at gaining insights from unlabelled data points, that is, unlike supervised learning we don’t
have a target variable.
Clustering aims at forming groups of homogeneous data points from a heterogeneous dataset.
It evaluates the similarity based on a metric like Euclidean distance, Cosine similarity,
Manhattan distance, etc. and then group the points with highest similarity score together.
For Example, In the graph given below, we can clearly see that there are 3 circular clusters
forming on the basis of distance
Now it is not necessary that the clusters formed must be circular in shape. The shape of clusters
can be arbitrary. There are many algortihms that work well with detecting arbitrary shaped
clusters.
For example, In the below given graph we can see that the clusters formed are not circular in
shape.
Types of Clustering
Broadly speaking, there are 2 types of clustering that can be performed to group similar data
points:
 Hard Clustering: In this type of clustering, each data point belongs to a cluster completely
or not. For example, Let’s say there are 4 data point and we have to cluster them into 2
clusters. So each data point will either belong to cluster 1 or cluster 2.

Data Points Clusters

A C1

B C2

C C2

D C1

 Soft Clustering: In this type of clustering, instead of assigning each data point into a
separate cluster, a probability or likelihood of that point being that cluster is evaluated. For
example, Let’s say there are 4 data point and we have to cluster them into 2 clusters. So we
will be evaluating a probability of a data point belonging to both clusters. This probability
is calculated for all data points.

Data Points Probability of C1 Probability of C2

A 0.91 0.09

B 0.3 0.7

C 0.17 0.83

D 1 0
Uses of Clustering
Now before we begin with types of clustering algorithms, we will go through the use cases of
Clustering algorithms. Clustering algorithms are majorly used for:
 Market Segmentation – Businesses use clustering to group their customers and use targeted
advertisements to attract more audience.
 Market Basket Analysis – Shop owners analyze their sales and figure out which items are
majorly bought together by the customers. For example, In USA, according to a study
diapers and beers were usually bought together by fathers.
 Social Network Analysis – Social media sites use your data to understand your browsing
behaviour and provide you with targeted friend recommendations or content
recommendations.
 Medical Imaging – Doctors use Clustering to find out diseased areas in diagnostic images
like X-rays.
 Anomaly Detection – To find outliers in a stream of real-time dataset or forecasting
fraudulent transactions we can use clustering to identify them.
 Simplify working with large datasets – Each cluster is given a cluster ID after clustering is
complete. Now, you may reduce a feature set’s whole feature set into its cluster ID.
Clustering is effective when it can represent a complicated case with a straightforward
cluster ID. Using the same principle, clustering data can make complex datasets simpler.
There are many more use cases for clustering but there are some of the major and common use
cases of clustering. Moving forward we will be discussing Clustering Algorithms that will help
you perform the above tasks.
Types of Clustering Algorithms
At the surface level, clustering helps in the analysis of unstructured data. Graphing, the shortest
distance, and the density of the data points are a few of the elements that influence cluster
formation. Clustering is the process of determining how related the objects are based on a metric
called the similarity measure. Similarity metrics are easier to locate in smaller sets of features.
It gets harder to create similarity measures as the number of features increases. Depending on
the type of clustering algorithm being utilized in data mining, several techniques are employed
to group the data from the datasets. In this part, the clustering techniques are described. Various
types of clustering algorithms are:
1. Centroid-based Clustering (Partitioning methods)
2. Density-based Clustering (Model-based methods)
3. Connectivity-based Clustering (Hierarchical clustering)
4. Distribution-based Clustering
We will be going through each of these types in brief.
1. Centroid-based Clustering (Partitioning methods)
Partitioning methods are the most easiest clustering algorithms. They group data points on the
basis of their closeness. Generally, the similarity measure chosen for these algorithms are
Euclidian distance, Manhattan Distance or Minkowski Distance. The datasets are separated into
a predetermined number of clusters, and each cluster is referenced by a vector of values. When
compared to the vector value, the input data variable shows no difference and joins the cluster.
The primary drawback for these algorithms is the requirement that we establish the number of
clusters, “k,” either intuitively or scientifically (using the Elbow Method) before any clustering
machine learning system starts allocating the data points. Despite this, it is still the most popular
type of clustering. K-means and K-medoids clustering are some examples of this type
clustering.
2. Density-based Clustering (Model-based methods)
Density-based clustering, a model-based method, finds groups based on the density of data
points. Contrary to centroid-based clustering, which requires that the number of clusters be
predefined and is sensitive to initialization, density-based clustering determines the number of
clusters automatically and is less susceptible to beginning positions. They are great at handling
clusters of different sizes and forms, making them ideally suited for datasets with irregularly
shaped or overlapping clusters. These methods manage both dense and sparse data regions by
focusing on local density and can distinguish clusters with a variety of morphologies.
In contrast, centroid-based grouping, like k-means, has trouble finding arbitrary shaped clusters.
Due to its preset number of cluster requirements and extreme sensitivity to the initial positioning
of centroids, the outcomes can vary. Furthermore, the tendency of centroid-based approaches
to produce spherical or convex clusters restricts their capacity to handle complicated or
irregularly shaped clusters. In conclusion, density-based clustering overcomes the drawbacks
of centroid-based techniques by autonomously choosing cluster sizes, being resilient to
initialization, and successfully capturing clusters of various sizes and forms. The most popular
density-based clustering algorithm is DBSCAN.
3. Connectivity-based Clustering (Hierarchical clustering)
A method for assembling related data points into hierarchical clusters is called hierarchical
clustering. Each data point is initially taken into account as a separate cluster, which is
subsequently combined with the clusters that are the most similar to form one large cluster that
contains all of the data points.
Think about how you may arrange a collection of items based on how similar they are. Each
object begins as its own cluster at the base of the tree when using hierarchical clustering, which
creates a dendrogram, a tree-like structure. The closest pairings of clusters are then combined
into larger clusters after the algorithm examines how similar the objects are to one another.
When every object is in one cluster at the top of the tree, the merging process has finished.
Exploring various granularity levels is one of the fun things about hierarchical clustering. To
obtain a given number of clusters, you can select to cut the dendrogram at a particular height.
The more similar two objects are within a cluster, the closer they are. It’s comparable to
classifying items according to their family trees, where the nearest relatives are clustered
together and the wider branches signify more general connections. There are 2 approaches for
Hierarchical clustering:
 Divisive Clustering: It follows a top-down approach, here we consider all data points to be
part one big cluster and then this cluster is divide into smaller groups.
 Agglomerative Clustering: It follows a bottom-up approach, here we consider all data
points to be part of individual clusters and then these clusters are clubbed together to make
one big cluster with all data points.
4. Distribution-based Clustering
Using distribution-based clustering, data points are generated and organized according to their
propensity to fall into the same probability distribution (such as a Gaussian, binomial, or other)
within the data. The data elements are grouped using a probability-based distribution that is
based on statistical distributions. Included are data objects that have a higher likelihood of being
in the cluster. A data point is less likely to be included in a cluster the further it is from the
cluster’s central point, which exists in every cluster.
A notable drawback of density and boundary-based approaches is the need to specify the
clusters a priori for some algorithms, and primarily the definition of the cluster form for the
bulk of algorithms. There must be at least one tuning or hyper-parameter selected, and while
doing so should be simple, getting it wrong could have unanticipated repercussions.
Distribution-based clustering has a definite advantage over proximity and centroid-based
clustering approaches in terms of flexibility, accuracy, and cluster structure. The key issue is
that, in order to avoid overfitting, many clustering methods only work with simulated or
manufactured data, or when the bulk of the data points certainly belong to a preset distribution.
The most popular distribution-based clustering algorithm is Gaussian Mixture Model.

Applications of Clustering in different fields:

1. Marketing: It can be used to characterize & discover customer segments for marketing
purposes.
2. Biology: It can be used for classification among different species of plants and animals.
3. Libraries: It is used in clustering different books on the basis of topics and information.
4. Insurance: It is used to acknowledge the customers, their policies and identifying the
frauds.
5. City Planning: It is used to make groups of houses and to study their values based on their
geographical locations and other factors present.
6. Earthquake studies: By learning the earthquake-affected areas we can determine the
dangerous zones.
7. Image Processing: Clustering can be used to group similar images together, classify images
based on content, and identify patterns in image data.
8. Genetics: Clustering is used to group genes that have similar expression patterns and
identify gene networks that work together in biological processes.
9. Finance: Clustering is used to identify market segments based on customer behavior,
identify patterns in stock market data, and analyze risk in investment portfolios.
10. Customer Service: Clustering is used to group customer inquiries and complaints into
categories, identify common issues, and develop targeted solutions.
11. Manufacturing: Clustering is used to group similar products together, optimize production
processes, and identify defects in manufacturing processes.
12. Medical diagnosis: Clustering is used to group patients with similar symptoms or diseases,
which helps in making accurate diagnoses and identifying effective treatments.
13. Fraud detection: Clustering is used to identify suspicious patterns or anomalies in financial
transactions, which can help in detecting fraud or other financial crimes.
14. Traffic analysis: Clustering is used to group similar patterns of traffic data, such as peak
hours, routes, and speeds, which can help in improving transportation planning and
infrastructure.
15. Social network analysis: Clustering is used to identify communities or groups within social
networks, which can help in understanding social behavior, influence, and trends.
16. Cybersecurity: Clustering is used to group similar patterns of network traffic or system
behavior, which can help in detecting and preventing cyberattacks.
17. Climate analysis: Clustering is used to group similar patterns of climate data, such as
temperature, precipitation, and wind, which can help in understanding climate change and
its impact on the environment.
18. Sports analysis: Clustering is used to group similar patterns of player or team performance
data, which can help in analyzing player or team strengths and weaknesses and making
strategic decisions.
19. Crime analysis: Clustering is used to group similar patterns of crime data, such as location,
time, and type, which can help in identifying crime hotspots, predicting future crime trends,
and improving crime prevention strategies.

https://www.subex.com/blog/introduction-to-clustering-in-data-science/
What is Clustering and How it Works?
Clustering is the task of dividing the population or data points into several groups such that data
points in the same groups are similar to other data points in that group and dissimilar to the data
points in other groups. It is basically an assembly of objects based on similarity and dissimilarity
between them.
The Importance of Clustering
Clustering helps in understanding the natural grouping in a dataset. Their motivation is to check
out to parcel the information into some gathering of legitimate groupings. Grouping quality relies
upon the strategies and the identification of hidden patterns. The biggest advantage of clustering
over-classification is it can adapt to the changes made and helps single out useful features that
differentiate different groups.

The Usage of Clustering Algorithms in Real World

It is widely used in many applications such as image processing, data analysis, and pattern
recognition.

It can be used in the field of biology, by deriving animal and plant taxonomies, identifying genes
with the same capabilities.

It also helps in information discovery by classifying documents on the web.

It helps marketers to find the distinct groups in their customer base and they can characterize their
customer groups by using purchasing patterns.
Different Types of Clustering Methods
Connectivity-based Clustering (Hierarchical clustering)

Hierarchical Clustering is a method of unsupervised machine learning clustering where it begins

with a pre-defined top to bottom hierarchy of clusters. It then proceeds to perform a decomposition
of the data objects based on this hierarchy, hence obtaining the clusters

Centroids-based Clustering (Partitioning methods)

Centroid based clustering is considered as one of the simplest clustering algorithms, yet the most
effective way of creating clusters and assigning data points to it. The intuition behind centroid-
based clustering is that a cluster is characterized and represented by a central vector and data points
that are in close proximity to these vectors are assigned to the respective clusters.

Distribution-based Clustering

Distribution-based clustering creates, and groups data points based on their likely hood of
belonging to the same probability distribution in the data

Density-based Clustering (Model-based methods)

Density-based clustering methods take density into consideration instead of distances. Clusters are
considered as the densest region in a data space, which is separated by regions of lower object
density, and it is defined as a maximal set of connected points.

Fuzzy Clustering

The general idea about clustering revolves around assigning data points to mutually exclusive
clusters, meaning, a data point always resides uniquely inside a cluster, and it cannot belong to
more than one cluster. Fuzzy clustering methods change this paradigm by assigning a data-point
to multiple clusters with a quantified degree of belongingness metric.
Constraint-based (Supervised Clustering)

The clustering process, in general, is based on the approach that the data can be divided into an
optimal number of “unknown” groups. The underlying stages of all the clustering algorithms to
find those hidden patterns and similarities, without any intervention or predefined conditions

If you are working with ML algorithms, chances are you will be widely using Clustering.
Clustering is an incredibly useful unsupervised machine learning method that has a wide variety
of applications.

https://www.javatpoint.com/clustering-in-machine-learning

Clustering in Machine Learning

Clustering or cluster analysis is a machine learning technique, which groups the unlabelled
dataset. It can be defined as "A way of grouping the data points into different clusters, consisting
of similar data points. The objects with the possible similarities remain in a group that has less or
no similarities with another group."

It does it by finding some similar patterns in the unlabelled dataset such as shape, size, color,
behavior, etc., and divides them as per the presence and absence of those similar patterns.

It is an unsupervised learning method, hence no supervision is provided to the algorithm, and it

deals with the unlabeled dataset.

After applying this clustering technique, each cluster or group is provided with a cluster-ID. ML
system can use this id to simplify the processing of large and complex datasets. The clustering
technique is commonly used for statistical data analysis.

Example: Let's understand the clustering technique with the real-world example of Mall: When
we visit any shopping mall, we can observe that the things with similar usage are grouped together.
Such as the t-shirts are grouped in one section, and trousers are at other sections, similarly, at
vegetable sections, apples, bananas, Mangoes, etc., are grouped in separate sections, so that we
can easily find out the things. The clustering technique also works in the same way. Other examples
of clustering are grouping documents according to the topic.

The clustering technique can be widely used in various tasks. Some most common uses of this
technique are:

o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.

Apart from these general usages, it is used by the Amazon in its recommendation system to
provide the recommendations as per the past search of products. Netflix also uses this technique
to recommend the movies and web-series to its users as per the watch history.

The below diagram explains the working of the clustering algorithm. We can see the different
fruits are divided into several groups with similar properties.

Types of Clustering Methods

The clustering methods are broadly divided into Hard clustering (datapoint belongs to only one
group) and Soft Clustering (data points can belong to another group also). But there are also other
various approaches of Clustering exist. Below are the main clustering methods used in Machine
learning:

1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering

Partitioning Clustering

It is a type of clustering that divides the data into non-hierarchical groups. It is also known as
the centroid-based method. The most common example of partitioning clustering is the K-
Means Clustering algorithm.

In this type, the dataset is divided into a set of k groups, where K is used to define the number of
pre-defined groups. The cluster center is created in such a way that the distance between the data
points of one cluster is minimum as compared to another cluster centroid.
Density-Based Clustering

The density-based clustering method connects the highly-dense areas into clusters, and the
arbitrarily shaped distributions are formed as long as the dense region can be connected. This
algorithm does it by identifying different clusters in the dataset and connects the areas of high
densities into clusters. The dense areas in data space are divided from each other by sparser areas.

These algorithms can face difficulty in clustering the data points if the dataset has varying densities
and high dimensions.

Distribution Model-Based Clustering

In the distribution model-based clustering method, the data is divided based on the probability of
how a dataset belongs to a particular distribution. The grouping is done by assuming some
distributions commonly Gaussian Distribution.

The example of this type is the Expectation-Maximization Clustering algorithm that uses
Gaussian Mixture Models (GMM).
Hierarchical Clustering

Hierarchical clustering can be used as an alternative for the partitioned clustering as there is no
requirement of pre-specifying the number of clusters to be created. In this technique, the dataset is
divided into clusters to create a tree-like structure, which is also called a dendrogram. The
observations or any number of clusters can be selected by cutting the tree at the correct level. The
most common example of this method is the Agglomerative Hierarchical algorithm.

Fuzzy Clustering
Fuzzy clustering is a type of soft method in which a data object may belong to more than one group
or cluster. Each dataset has a set of membership coefficients, which depend on the degree of
membership to be in a cluster. Fuzzy C-means algorithm is the example of this type of clustering;
it is sometimes also known as the Fuzzy k-means algorithm.

Clustering Algorithms

The Clustering algorithms can be divided based on their models that are explained above. There
are different types of clustering algorithms published, but only a few are commonly used. The
clustering algorithm is based on the kind of data that we are using. Such as, some algorithms need
to guess the number of clusters in the given dataset, whereas some are required to find the
minimum distance between the observation of the dataset.

Here we are discussing mainly popular Clustering algorithms that are widely used in machine
learning:

1. K-Means algorithm: The k-means algorithm is one of the most popular clustering
algorithms. It classifies the dataset by dividing the samples into different clusters of equal
variances. The number of clusters must be specified in this algorithm. It is fast with fewer
computations required, with the linear complexity of O(n).
2. Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth
density of data points. It is an example of a centroid-based model, that works on updating
the candidates for centroid to be the center of the points within a given region.
3. DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications
with Noise. It is an example of a density-based model similar to the mean-shift, but with
some remarkable advantages. In this algorithm, the areas of high density are separated by
the areas of low density. Because of this, the clusters can be found in any arbitrary shape.
4. Expectation-Maximization Clustering using GMM: This algorithm can be used as an
alternative for the k-means algorithm or for those cases where K-means can be failed. In
GMM, it is assumed that the data points are Gaussian distributed.
5. Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm
performs the bottom-up hierarchical clustering. In this, each data point is treated as a single
cluster at the outset and then successively merged. The cluster hierarchy can be represented
as a tree-structure.
6. Affinity Propagation: It is different from other clustering algorithms as it does not require
to specify the number of clusters. In this, each data point sends a message between the pair
of data points until convergence. It has O(N2T) time complexity, which is the main
drawback of this algorithm.

Applications of Clustering

Below are some commonly known applications of clustering technique in Machine Learning:

o In Identification of Cancer Cells: The clustering algorithms are widely used for the
identification of cancerous cells. It divides the cancerous and non-cancerous data sets into
different groups.
o In Search Engines: Search engines also work on the clustering technique. The search
result appears based on the closest object to the search query. It does it by grouping similar
data objects in one group that is far from the other dissimilar objects. The accurate result
of a query depends on the quality of the clustering algorithm used.
o Customer Segmentation: It is used in market research to segment the customers based on
their choice and preferences.
o In Biology: It is used in the biology stream to classify different species of plants and
animals using the image recognition technique.
o In Land Use: The clustering technique is used in identifying the area of similar lands use
in the GIS database. This can be very useful to find that for what purpose the particular
land should be used, that means for which purpose it is more suitable.

https://www.techtarget.com/searchenterpriseai/definition/clustering-in-machine-learning

What is clustering in machine learning and how does it work?

Clustering is a data science technique in machine learning that groups similar rows in a data set.
After running a clustering technique, a new column appears in the data set to indicate the group
each row of data fits into best. Since rows of data, or data points, often represent people, financial
transactions, documents or other important entities, these groups tend to form clusters of similar
entities that have several kinds of real-world applications.

Why is clustering important?

Clustering is sometimes referred to as unsupervised machine learning. To perform clustering,

labels for past known outcomes -- a dependent, y, target or label variable -- are generally
unnecessary. For example, when applying a clustering method in a mortgage loan application
process, it's not necessary to know whether the applicants made their past mortgage payments.
Rather, you need demographic, psychographic, behavioral, geographic or other information about
the applicants in a mortgage portfolio. A clustering method will then attempt to group the
applicants based on that information. This method stands in contrast to supervised learning, in
which mortgage default risk for new applicants, for example, can be predicted based on patterns
in data labeled with past default outcomes.

Since clustering generally groups input data, it can be very creative and flexible. Clustering can be
used for data exploration and preprocessing, as well as specific applications. From a technical
perspective, common applications of clustering include the following:

 Data visualization. Data often contains natural groups or segments, and clustering should be
able to find them. Visualizing clusters can be a highly informative data analysis approach.

 Prototypes. Prototypes are data points that represent many other points and help explain data
and models. If a cluster represents a large market segment, then the data point at the cluster
center -- or cluster centroid -- is the prototypical member of that market segment.

 Sampling. Since clustering can define groups in the data, clusters can be used to create
different types of data samples. Drawing an equal number of data points from each cluster in
a data set, for example, can create a balanced sample of the population represented by that data
set.

 Segments for models. Sometimes the predictive performance of supervised models --

regression, decision tree and neural networks, for example -- can be improved by using the
information learned from unsupervised approaches such as clusters. Data scientists might
include clusters as inputs to other models or build separate models for each cluster.

For business applications, clustering is a battle-tested tool in market segmentation and fraud
detection. Clustering is also useful for categorizing documents, making product recommendations
and in other applications where grouping entities makes sense.

Types of clustering algorithms

There are many types of clustering algorithms, but K-means and hierarchical clustering are the
most widely available in data science tools.

K-means clustering

The K-means clustering algorithm, choose a specific number of clusters to create in the data and
denote that number as k. K can be 3, 10, 1,000 or any other number of clusters, but smaller numbers
work better. The algorithm then makes k clusters and the center point of each cluster or centroid
becomes the mean, or average, value of each variable inside the cluster. K-means and related
approaches -- such as k-mediods for character data or k-prototypes for mixed numeric and
character data -- are fast and work well on large data sets. However, they usually make simple,
spherical clusters of roughly the same size.

Hierarchical clustering

If you're seeking more complex and realistic clusters of different shapes and sizes, and don't want
to pick the k before starting the analysis, hierarchical clustering might be a better choice.
Hierarchical clustering accommodates a divisive approach: start with one big cluster, break that
cluster into smaller ones until each point is in its own cluster and then choose from all the
interesting clustering solutions in between.

Another option is an agglomerative approach, in which each data point starts in its own cluster.
Combine the data points into clusters until all the points are in one big cluster and then choose the
best clusters in between. Unfortunately, hierarchical clustering algorithms tend to be slow or
impossible for big data, so a k still has to be chosen to arrive at the final answer.
One of the hardest parts of clustering is choosing the number of clusters that best suits the data and
application. There are data-driven methods to estimate k, such as silhouette score and gap statistics.
These quantitative formulas provide a numeric score that helps choose the best number of clusters.
Domain knowledge can also be used: For example, a project has enough budget for 10 different
marketing campaigns, so commercial concerns dictate 10 is a good number of clusters or
experienced marketers who have worked in a certain vertical for a long time know the best number
of segments for the market. Combining quantitative analysis and domain knowledge often works
well, too.

Cluster profiling: How do you know your clusters are right?

In clustering, answers are usually validated through a technique known as profiling, which
involves naming the clusters. For example, DINKs (dual income, no kids), HINRYs (high income,
not rich yet) and hockey moms are all names that refer to groups of consumers. These names are
usually determined by looking at the centroid -- or prototypical data point -- for each cluster and
ensuring they're logical and different from the other discovered prototypes.

Visualization is also a key aspect of profiling. Clusters can be plotted to ensure they don't overlap
and that their arrangement makes sense. For example, clusters for very different market segments
should appear visually distant in a plot.

Clustering use cases

Clustering has many business applications. Two of these use cases are explained below and
illustrated in Figure 1 and Figure 2 in the graphic titled "Clustering use cases."

Market segment application

For a data set of customers in which each row of data -- or data point -- is a customer, clustering
techniques can be used to create groups of similar customers. Known as market segments, these
customer groups can improve marketing efforts.

Figure 1 uses data pertaining to consumers' income and property value and K-means clustering to
find three larger, roughly circular and similarly sized clusters within that market.
Cluster 1 appears to be a group of affluent consumers who own homes -- perhaps some DINKs.
Cluster 2 likely represents middle-class homeowners -- probably some hockey moms and dads.
Cluster 3 contains higher income consumers who don't appear to own homes -- HINRYs in many
cases.

One of the more common applications of market segments is to optimize the money spent on
marketing. For example, it probably doesn't make sense to send grocery coupons to Clusters 1 and
3 because they're unlikely to use them. On the other hand, premium co-branded credit card offers
are likely wasted on Cluster 2 because they don't want the annual fees. With this knowledge of
market segments, marketers can spend their budgets in a more efficient manner.

https://www.geeksforgeeks.org/association-rule/
Association Rule
Association rule mining finds interesting associations and relationships among large sets of data
items. This rule shows how frequently a itemset occurs in a transaction. A typical example is a
Market Based Analysis. Market Based Analysis is one of the key techniques used by large
relations to show associations between items.It allows retailers to identify relationships between
the items that people buy together frequently. Given a set of transactions, we can find rules that
will predict the occurrence of an item based on the occurrences of other items in the transaction.

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

Before we start defining the rule, let us first see the basic definitions. Support Count( )
– Frequency of occurrence of a itemset.
Here ({Milk, Bread, Diaper})=2

Frequent Itemset – An itemset whose support is greater than or equal to minsup

threshold. Association Rule – An implication expression of the form X -> Y, where X and Y
are any 2 itemsets.
Example: {Milk, Diaper}->{Beer}

Rule Evaluation Metrics –

 Support(s) – The number of transactions that include items in the {X} and {Y} parts of the
rule as a percentage of the total number of transaction.It is a measure of how frequently the
collection of items occur together as a percentage of all transactions.
 Support = (X+Y) total – It is interpreted as fraction of transactions that contain both X
and Y.
 Confidence(c) – It is the ratio of the no of transactions that includes all items in {B} as well
as the no of transactions that includes all items in {A} to the no of transactions that includes
all items in {A}.
 Conf(X=>Y) = Supp(X Y) Supp(X) – It measures how often each item in Y appears in
transactions that contains items in X also.
 Lift(l) – The lift of the rule X=>Y is the confidence of the rule divided by the expected
confidence, assuming that the itemsets X and Y are independent of each other.The expected
confidence is the confidence divided by the frequency of {Y}.
 Lift(X=>Y) = Conf(X=>Y) Supp(Y) – Lift value near 1 indicates X and Y almost often
appear together as expected, greater than 1 means they appear together more than expected
and less than 1 means they appear less than expected.Greater lift values indicate stronger
association.
Example – From the above table, {Milk, Diaper}=>{Beer}
s= ({Milk, Diaper, Beer}) |T|

= 2/5

= 0.4

c= (Milk, Diaper, Beer) (Milk, Diaper)

= 2/3

= 0.67

l= Supp({Milk, Diaper, Beer}) Supp({Milk, Diaper})*Supp({Beer})

= 0.4/(0.6*0.6)

= 1.11

The Association rule is very useful in analyzing datasets. The data is collected using bar -code
scanners in supermarkets. Such databases consists of a large number of transaction records
which list all items bought by a customer on a single purchase. So the manager could know if
certain groups of items are consistently purchased together and use this data for adjusting store
layouts, cross-selling, promotions based on statistics.
https://www.javatpoint.com/association-rule-learning

Association Rule Learning

Association rule learning is a type of unsupervised learning technique that checks for the
dependency of one data item on another data item and maps accordingly so that it can be more
profitable. It tries to find some interesting relations or associations among the variables of dataset.
It is based on different rules to discover the interesting relations between variables in the database.

The association rule learning is one of the very important concepts of machine learning, and it is
employed in Market Basket analysis, Web usage mining, continuous production, etc. Here
market basket analysis is a technique used by the various big retailer to discover the associations
between items. We can understand it by taking an example of a supermarket, as in a supermarket,
all products that are purchased together are put together.

For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk, so these
products are stored within a shelf or mostly nearby. Consider the below diagram:

Association rule learning can be divided into three types of algorithms:

1. Apriori
2. Eclat
3. F-P Growth Algorithm
How does Association Rule Learning work?

Association rule learning works on the concept of If and Else Statement, such as if A then B.

Here the If element is called antecedent, and then statement is called as Consequent. These types
of relationships where we can find out some association or relation between two items is known as
single cardinality. It is all about creating rules, and if the number of items increases, then
cardinality also increases accordingly. So, to measure the associations between thousands of data
items, there are several metrics. These metrics are given below:

o Support
o Confidence
o Lift

Support

Support is the frequency of A or how frequently an item appears in the dataset. It is defined as the
fraction of the transaction T that contains the itemset X. If there are X datasets, then for transactions
T, it can be written as:

Confidence

Confidence indicates how often the rule has been found to be true. Or how often the items X and
Y occur together in the dataset when the occurrence of X is already given. It is the ratio of the
transaction that contains X and Y to the number of records that contain X.
Lift

It is the strength of any rule, which can be defined as below formula:

It is the ratio of the observed support measure and expected support if X and Y are independent of
each other. It has three possible values:

o If Lift= 1: The probability of occurrence of antecedent and consequent is independent of

each other.
o Lift>1: It determines the degree to which the two itemsets are dependent to each other.
o Lift<1: It tells us that one item is a substitute for other items, which means one item has a
negative effect on another.

Types of Association Rule Learning

Association rule learning can be divided into three algorithms:

Apriori Algorithm

This algorithm uses frequent datasets to generate association rules. It is designed to work on the
databases that contain transactions. This algorithm uses a breadth-first search and Hash Tree to
calculate the itemset efficiently.

It is mainly used for market basket analysis and helps to understand the products that can be bought
together. It can also be used in the healthcare field to find drug reactions for patients.
Eclat Algorithm

Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a depth-first
search technique to find frequent itemsets in a transaction database. It performs faster execution
than Apriori Algorithm.

F-P Growth Algorithm

The F-P growth algorithm stands for Frequent Pattern, and it is the improved version of the
Apriori Algorithm. It represents the database in the form of a tree structure that is known as a
frequent pattern or tree. The purpose of this frequent tree is to extract the most frequent patterns.

Applications of Association Rule Learning

It has various applications in machine learning and data mining. Below are some popular
applications of association rule learning:

o Market Basket Analysis: It is one of the popular examples and applications of association
rule mining. This technique is commonly used by big retailers to determine the association
between items.
o Medical Diagnosis: With the help of association rules, patients can be cured easily, as it
helps in identifying the probability of illness for a particular disease.
o Protein Sequence: The association rules help in determining the synthesis of artificial
Proteins.
o It is also used for the Catalog Design and Loss-leader Analysis and many more other
applications.

https://towardsdatascience.com/association-rules-2-aa9a77241654

Association Rules is one of the very important concepts of machine learning being used in market
basket analysis. In a store, all vegetables are placed in the same aisle, all dairy items are placed
together and cosmetics form another set of such groups. Investing time and resources on deliberate
product placements like this not only reduces a customer’s shopping time, but also reminds the
customer of what relevant items (s)he might be interested in buying, thus helping stores cross-sell
in the process. Association rules help uncover all such relationships between items from huge
databases.

To elaborate on this idea — Rules do not tie back a users’ different transactions over time to identify
relationships. List of items with unique transaction IDs (from all users) are studied as one
group. This is helpful in placement of products on aisles. On the other hand, collaborative filtering
ties back all transactions corresponding to a user ID to identify similarity between users’
preferences. This is helpful in recommending items on e-commerce websites, recommending songs
on spotify, etc.

Lets now see what an association rule exactly looks like. It consists of an antecedent and a
consequent, both of which are a list of items. Note that implication here is co-occurrence and not
causality. For a given rule, itemset is the list of all the items in the antecedent and the consequent.

Various metrics are in place to help us understand the strength of association between these two.
Let us go through them all.

1. Support

This measure gives an idea of how frequent an itemset is in all the transactions. Consider itemset1 =
{bread} and itemset2 = {shampoo}. There will be far more transactions containing bread than those
containing shampoo. So as you rightly guessed, itemset1 will generally have a higher support
than itemset2. Now consider itemset1 = {bread, butter} and itemset2 = {bread, shampoo}. Many
transactions will have both bread and butter on the cart but bread and shampoo? Not so much. So
in this case, itemset1 will generally have a higher support than itemset2. Mathematically, support is
the fraction of the total number of transactions in which the itemset occurs.

Value of support helps us identify the rules worth considering for further analysis. For example,
one might want to consider only the itemsets which occur at least 50 times out of a total of 10,000
transactions i.e. support = 0.005. If an itemset happens to have a very low support, we do not have
enough information on the relationship between its items and hence no conclusions can be drawn
from such a rule.

2. Confidence

This measure defines the likeliness of occurrence of consequent on the cart given that the cart
already has the antecedents. That is to answer the question — of all the transactions containing say,
{Captain Crunch}, how many also had {Milk} on them? We can say by common knowledge that
{Captain Crunch} → {Milk} should be a high confidence rule. Technically, confidence is the
conditional probability of occurrence of consequent given the antecedent.

Let us consider few more examples before moving ahead. What do you think would be the
confidence for {Butter} → {Bread}? That is, what fraction of transactions having butter also had
bread? Very high i.e. a value close to 1? That’s right. What about {Yogurt} → {Milk}? High again.
{Toothbrush} → {Milk}? Not so sure? Confidence for this rule will also be high since {Milk} is
such a frequent itemset and would be present in every other transaction.

It does not matter what you have in the antecedent for such a frequent consequent. The confidence
for an association rule having a very frequent consequent will always be high.
I will introduce some numbers here to clarify this further.

Total transactions = 100. 10 of them have both milk and toothbrush, 70 have milk but no toothbrush
and 4 have toothbrush but no milk.

Consider the numbers from figure on the left. Confidence for {Toothbrush} → {Milk} will be
10/(10+4) = 0.7

Looks like a high confidence value. But we know intuitively that these two products have a weak
association and there is something misleading about this high confidence value. Lift is introduced
to overcome this challenge.

Considering just the value of confidence limits our capability to make any business inference.

3. Lift

Lift controls for the support (frequency) of consequent while calculating the conditional probability
of occurrence of {Y} given {X}. Lift is a very literal term given to this measure. Think of it as the
*lift* that {X} provides to our confidence for having {Y} on the cart. To rephrase, lift is the rise in
probability of having {Y} on the cart with the knowledge of {X} being present over the probability
of having {Y} on the cart without any knowledge about presence of {X}. Mathematically,
In cases where {X} actually leads to {Y} on the cart, value of lift will be greater than 1. Let us
understand this with an example which will be continuation of the {Toothbrush} → {Milk} rule.

Probability of having milk on the cart with the knowledge that toothbrush is present (i.e. confidence)
: 10/(10+4) = 0.7

Now to put this number in perspective, consider the probability of having milk on the cart without
any knowledge about toothbrush: 80/100 = 0.8

These numbers show that having toothbrush on the cart actually reduces the probability of having
milk on the cart to 0.7 from 0.8! This will be a lift of 0.7/0.8 = 0.87. Now that’s more like the real
picture. A value of lift less than 1 shows that having toothbrush on the cart does not increase the
chances of occurrence of milk on the cart in spite of the rule showing a high confidence value. A
value of lift greater than 1 vouches for high association between {Y} and {X}. More the value of
lift, greater are the chances of preference to buy {Y} if the customer has already bought {X}. Lift is
the measure that will help store managers to decide product placements on aisle.

https://www.spiceworks.com/tech/artificial-intelligence/articles/what-is-linear-regression/
What Is Linear Regression?

Linear regression is an algorithm that provides a linear relationship between an independent

variable and a dependent variable to predict the outcome of future events. It is a statistical
method used in data science and machine learning for predictive analysis.

The independent variable is also the predictor or explanatory variable that remains unchanged due
to the change in other variables. However, the dependent variable changes with fluctuations in the
independent variable. The regression model predicts the value of the dependent variable, which is
the response or outcome variable being analyzed or studied.
Thus, linear regression is a supervised learning algorithm that simulates a mathematical
relationship between variables and makes predictions for continuous or numeric variables such as
sales, salary, age, product price, etc.

This analysis method is advantageous when at least two variables are available in the data, as
observed in stock market forecasting, portfolio management, scientific analysis, etc.

A sloped straight line represents the linear regression model.

Best Fit Line for a Linear Regression Model

In the above figure,

X-axis = Independent variable

Y-axis = Output / dependent variable

Line of regression = Best fit line for a model

Here, a line is plotted for the given data points that suitably fit all the issues. Hence, it is called the
‘best fit line.’ The goal of the linear regression algorithm is to find this best fit line seen in the
above figure.

Key benefits of linear regression

Linear regression is a popular statistical tool used in data science, thanks to the several benefits it
offers, such as:

1. Easy implementation

The linear regression model is computationally simple to implement as it does not demand a lot of
engineering overheads, neither before the model launch nor during its maintenance.

2. Interpretability

Unlike other deep learning models (neural networks), linear regression is relatively
straightforward. As a result, this algorithm stands ahead of black-box models that fall short in
justifying which input variable causes the output variable to change.

3. Scalability

Linear regression is not computationally heavy and, therefore, fits well in cases where scaling is
essential. For example, the model can scale well regarding increased data volume (big data).

4. Optimal for online settings

The ease of computation of these algorithms allows them to be used in online settings. The model
can be trained and retrained with each new example to generate predictions in real-time, unlike the
neural networks or support vector machines that are computationally heavy and require plenty of
computing resources and substantial waiting time to retrain on a new dataset. All these factors
make such compute-intensive models expensive and unsuitable for real-time applications.

The above features highlight why linear regression is a popular model to solve real-life machine
learning problems.
ypes of Linear Regression with Examples

Linear regression has been a critical driving force behind many AI and data science applications.
This statistical technique is beneficial for businesses as it is a simple, interpretable, and efficient
method to evaluate trends and make future estimates or forecasts.

The types of linear regression models include:

1. Simple linear regression

Simple linear regression reveals the correlation between a dependent variable (input) and an
independent variable (output). Primarily, this regression type describes the following:

 Relationship strength between the given variables.

Example: The relationship between pollution levels and rising temperatures.

 The value of the dependent variable is based on the value of the independent variable.

Example: The value of pollution level at a specific temperature.

2. Multiple linear regression

Multiple linear regression establishes the relationship between independent variables (two or
more) and the corresponding dependent variable. Here, the independent variables can be either
continuous or categorical. This regression type helps foresee trends, determine future values, and
predict the impacts of changes.

Example: Consider the task of calculating blood pressure. In this case, height, weight, and amount
of exercise can be considered independent variables. Here, we can use multiple linear regression
to analyze the relationship between the three independent variables and one dependent variable, as
all the variables considered are quantitative.
3. Logistic regression

Logistic regression—also referred to as the logit model—is applicable in cases where there is one
dependent variable and more independent variables. The fundamental difference between multiple
and logistic regression is that the target variable in the logistic approach is discrete (binary or an
ordinal value). Implying, the dependent variable is finite or categorical–either P or Q (binary
regression) or a range of limited options P, Q, R, or S.

The variable value is limited to just two possible outcomes in linear regression. However, logistic
regression addresses this issue as it can return a probability score that shows the chances of any
particular event.

Example: One can determine the likelihood of choosing an offer on your website (dependent
variable). For analysis purposes, you can look at various visitor characteristics such as the sites
they came from, count of visits to your site, and activity on your site (independent variables). This
can help determine the probability of certain visitors who are more likely to accept the offer. As a
result, it allows you to make better decisions on whether to promote the offer on your site or not.

Furthermore, logistic regression is extensively used in machine learning algorithms in cases such
as spam email detection, predicting a loan amount for a customer, and more.

4. Ordinal regression

Ordinal regression involves one dependent dichotomous variable and one independent variable,
which can either be ordinal or nominal. It facilitates the interaction between dependent variables
with multiple ordered levels with one or more independent variables.

For a dependent variable with m categories, (m -1) equations will be created. Each equation has a
different intercept but the same slope coefficients for the predictor variables. Thus, ordinal
regression creates multiple prediction equations for various categories. In machine learning,
ordinal regression refers to ranking learning or ranking analysis computed using a generalized
linear model (GLM).
Example: Consider a survey where the respondents are supposed to answer as ‘agree’ or
‘disagree.’ In some cases, such responses are of no help as one cannot derive a definitive
conclusion, complicating the generalized results. However, you can observe a natural order in the
categories by adding levels to responses, such as agree, strongly agree, disagree, and strongly
disagree. Ordinal regression thus helps in predicting the dependent variable having ‘ordered’
multiple categories using independent variables.

5. Multinomial logistic regression

Multinomial logistic regression (MLR) is performed when the dependent variable is nominal with
more than two levels. It specifies the relationship between one dependent nominal variable and
one or more continuous-level (interval, ratio, or dichotomous) independent variables. Here, the
nominal variable refers to a variable with no intrinsic ordering.

Example: Multinomial logit can be used to model the program choices made by school students.
The program choices, in this case, refer to a vocational program, sports program, and academic
program. The choice of type of program can be predicted by considering a variety of attributes,
such as how well the students can read and write on the subjects given, gender, and awards received
by them.

Here, the dependent variable is the choice of programs with multiple levels (unordered). The
multinomial logistic regression technique is used to make predictions in such a case.

https://www.geeksforgeeks.org/ml-linear-regression/

What is Linear Regression?

Linear regression is a type of supervised machine learning algorithm that computes the linear
relationship between the dependent variable and one or more independent features by fitting a
linear equation to observed data.
When there is only one independent feature, it is known as Simple Linear Regression, and when
there are more than one feature, it is known as Multiple Linear Regression.
Similarly, when there is only one dependent variable, it is considered Univariate Linear
Regression, while when there are more than one dependent variables, it is known
as Multivariate Regression.
Why Linear Regression is Important?
The interpretability of linear regression is a notable strength. The model’s equation provides
clear coefficients that elucidate the impact of each independent variable on the dependent
variable, facilitating a deeper understanding of the underlying dynamics. Its simplicity is a
virtue, as linear regression is transparent, easy to implement, and serves as a foundational
concept for more complex algorithms.
Linear regression is not merely a predictive tool; it forms the basis for various advanced models.
Techniques like regularization and support vector machines draw inspiration from linear
regression, expanding its utility. Additionally, linear regression is a cornerstone in assumption
testing, enabling researchers to validate key assumptions about the data.
Types of Linear Regression
There are two main types of linear regression:
Simple Linear Regression
This is the simplest form of linear regression, and it involves only one independent variable and
one dependent variable. The equation for simple linear regression is:
y=β0+β1Xy=β0+β1X
where:
 Y is the dependent variable
 X is the independent variable
 β0 is the intercept
 β1 is the slope
Multiple Linear Regression
This involves more than one independent variable and one dependent variable. The equation
for multiple linear regression is:
y=β0+β1X1+β2X2+………βnXny=β0+β1X1+β2X2+………βnXn
where:
 Y is the dependent variable
 X1, X2, …, Xn are the independent variables
 β0 is the intercept
 β1, β2, …, βn are the slopes
The goal of the algorithm is to find the best Fit Line equation that can predict the values based
on the independent variables.
In regression set of records are present with X and Y values and these values are used to learn
a function so if you want to predict Y from an unknown X this learned function can be used. In
regression we have to find the value of Y, So, a function is required that predicts continuous Y
in the case of regression given X as independent features.
What is the best Fit Line?
Our primary objective while using linear regression is to locate the best-fit line, which implies
that the error between the predicted and actual values should be kept to a minimum. There will
be the least error in the best-fit line.
The best Fit Line equation provides a straight line that represents the relationship between the
dependent and independent variables. The slope of the line indicates how much the dependent
variable changes for a unit change in the independent variable(s).

Linear Regression
Here Y is called a dependent or target variable and X is called an independent variable also
known as the predictor of Y. There are many types of functions or modules that can be used for
regression. A linear function is the simplest type of function. Here, X may be a single feature
or multiple features representing the problem.
Linear regression performs the task to predict a dependent variable value (y) based on a given
independent variable (x)). Hence, the name is Linear Regression. In the figure above, X (input)
is the work experience and Y (output) is the salary of a person. The regression line is the best-
fit line for our model.
We utilize the cost function to compute the best values in order to get the best fit line since
different values for weights or the coefficient of lines result in different regression lines.

https://www.ibm.com/topics/linear-regression

What is linear regression?

Linear regression analysis is used to predict the value of a variable based on the value of another
variable. The variable you want to predict is called the dependent variable. The variable you are
using to predict the other variable's value is called the independent variable.
This form of analysis estimates the coefficients of the linear equation, involving one or more
independent variables that best predict the value of the dependent variable. Linear regression fits
a straight line or surface that minimizes the discrepancies between predicted and actual output
values. There are simple linear regression calculators that use a “least squares” method to discover
the best-fit line for a set of paired data. You then estimate the value of X (dependent variable) from
Y (independent variable).
Key assumptions of effective linear regression

Assumptions to be considered for success with linear-regression analysis:

 For each variable: Consider the number of valid cases, mean and standard deviation.
 For each model: Consider regression coefficients, correlation matrix, part and partial
correlations, multiple R, R2, adjusted R2, change in R2, standard error of the estimate,
analysis-of-variance table, predicted values and residuals. Also, consider 95-percent-
confidence intervals for each regression coefficient, variance-covariance matrix, variance
inflation factor, tolerance, Durbin-Watson test, distance measures (Mahalanobis, Cook and
leverage values), DfBeta, DfFit, prediction intervals and case-wise diagnostic information.
 Plots: Consider scatterplots, partial plots, histograms and normal probability plots.
 Data: Dependent and independent variables should be quantitative. Categorical variables,
such as religion, major field of study or region of residence, need to be recoded to binary
(dummy) variables or other types of contrast variables.
 Other assumptions: For each value of the independent variable, the distribution of the
dependent variable must be normal. The variance of the distribution of the dependent
variable should be constant for all values of the independent variable. The relationship
between the dependent variable and each independent variable should be linear and all
observations should be independent.

Examples of linear-regression success

Evaluating trends and sales estimates

You can also use linear-regression analysis to try to predict a salesperson’s total yearly sales (the
dependent variable) from independent variables such as age, education and years of experience.

Analyze pricing elasticity

Changes in pricing often impact consumer behavior — and linear regression can help you analyze
how. For instance, if the price of a particular product keeps changing, you can use regression
analysis to see whether consumption drops as the price increases. What if consumption does not
drop significantly as the price increases? At what price point do buyers stop purchasing the
product? This information would be very helpful for leaders in a retail business.

Assess risk in an insurance company

Linear regression techniques can be used to analyze risk. For example, an insurance company
might have limited resources with which to investigate homeowners’ insurance claims; with linear
regression, the company’s team can build a model for estimating claims costs. The analysis could
help company leaders make important business decisions about what risks to take.
Sports analysis
Linear regression isn’t always about business. It’s also important in sports. For instance, you might
wonder if the number of games won by a basketball team in a season is related to the average
number of points the team scores per game. A scatterplot indicates that these variables are linearly
related. The number of games won and the average number of points scored by the opponent are
also linearly related. These variables have a negative relationship. As the number of games won
increases, the average number of points scored by the opponent decreases. With linear regression,
you can model the relationship of these variables. A good model can be used to predict how many
games teams will win.

https://aws.amazon.com/what-is/logistic-
regression/#:~:text=Logistic%20regression%20is%20a%20data,outcomes%2C%20like%20yes%
20or%20no.

What is logistic regression?

Logistic regression is a data analysis technique that uses mathematics to find the relationships
between two data factors. It then uses this relationship to predict the value of one of those factors
based on the other. The prediction usually has a finite number of outcomes, like yes or no.

For example, let’s say you want to guess if your website visitor will click the checkout button in
their shopping cart or not. Logistic regression analysis looks at past visitor behavior, such as time
spent on the website and the number of items in the cart. It determines that, in the past, if visitors
spent more than five minutes on the site and added more than three items to the cart, they clicked
the checkout button. Using this information, the logistic regression function can then predict the
behavior of a new website visitor.

Why is logistic regression important?

Logistic regression is an important technique in the field of artificial intelligence and machine
learning (AI/ML). ML models are software programs that you can train to perform complex data
processing tasks without human intervention. ML models built using logistic regression help
organizations gain actionable insights from their business data. They can use these insights for
predictive analysis to reduce operational costs, increase efficiency, and scale faster. For example,
businesses can uncover patterns that improve employee retention or lead to more profitable product
design.

Below, we list some benefits of using logistic regression over other ML techniques.

Simplicity

Logistic regression models are mathematically less complex than other ML methods. Therefore,
you can implement them even if no one on your team has in-depth ML expertise.

Speed

Logistic regression models can process large volumes of data at high speed because they require
less computational capacity, such as memory and processing power. This makes them ideal for
organizations that are starting with ML projects to gain some quick wins.

Flexibility

You can use logistic regression to find answers to questions that have two or more finite outcomes.
You can also use it to preprocess data. For example, you can sort data with a large range of values,
such as bank transactions, into a smaller, finite range of values by using logistic regression. You
can then process this smaller data set by using other ML techniques for more accurate analysis.

Visibility

Logistic regression analysis gives developers greater visibility into internal software processes
than do other data analysis techniques. Troubleshooting and error correction are also easier
because the calculations are less complex.

What are the applications of logistic regression?

Logistic regression has several real-world applications in many different industries.

Manufacturing

Manufacturing companies use logistic regression analysis to estimate the probability of part failure
in machinery. They then plan maintenance schedules based on this estimate to minimize future
failures.

Healthcare

Medical researchers plan preventive care and treatment by predicting the likelihood of disease in
patients. They use logistic regression models to compare the impact of family history or genes on
diseases.

Finance

Financial companies have to analyze financial transactions for fraud and assess loan applications
and insurance applications for risk. These problems are suitable for a logistic regression model
because they have discrete outcomes, like high risk or low risk and fraudulent or not fraudulent.

Marketing

Online advertising tools use the logistic regression model to predict if users will click on an
advertisement. As a result, marketers can analyze user responses to different words and images
and create high-performing advertisements with which customers will engage.

How does regression analysis work?

Logistic regression is one of several different regression analysis techniques that data scientists
commonly use in machine learning (ML). To understand logistic regression, we must first
understand basic regression analysis. Below, we use an example of linear regression analysis to
demonstrate how regression analysis works.

Identify the question

Any data analysis begins with a business question. For logistic regression, you should frame the
question to get particular outcomes:
 Do rainy days impact our monthly sales? (yes or no)

 What type of credit card activity is the customer performing? (authorized, fraudulent, or
potentially fraudulent)
Collect historical data

After identifying the question, you need to identify the data factors that are involved. You will
then collect past data for all factors. For example, to answer the first question shown above, you
could collect the number of rainy days and your monthly sales data for each month in the past three
years.

Train the regression analysis model

You will process the historical data using regression software. The software will process the
different data points and connect them mathematically by using equations. For example, if the
number of rainy days for three months are 3, 5, and 8 and the number of sales in those months are
8, 12, and 18, the regression algorithm will connect the factors with the equation:

Number of Sales = 2*(Number of Rainy Days) + 2

Make predictions for unknown values

For unknown values, the software uses the equation to make a prediction. If you know that it will
rain for six days in July, the software will estimate July’s sale value as 14.

How does the logistic regression model work?

To understand the logistic regression model, let’s first understand equations and variables.

Equations

In mathematics, equations give the relationship between two variables: x and y. You can use these
equations, or functions, to plot a graph along the x-axis and y-axis by putting in different values
of x and y. For instance, if you plot the graph for the function y = 2*x, you will get a straight line
as shown below. Hence this function is also called a linear function.
Variables

In statistics, variables are the data factors or attributes whose values vary. For any analysis, certain
variables are independent or explanatory variables. These attributes are the cause of an outcome.
Other variables are dependent or response variables; their values depend on the independent
variables. In general, logistic regression explores how independent variables affect one dependent
variable by looking at historical data values of both variables.

In our example above, x is called the independent variable, predictor variable, or explanatory
variable because it has a known value. Y is called the dependent variable, outcome variable, or
response variable because its value is unknown.

Logistic regression function

Logistic regression is a statistical model that uses the logistic function, or logit function, in
mathematics as the equation between x and y. The logit function maps y as a sigmoid function of x.

If you plot this logistic regression equation, you will get an S-curve as shown below.
As you can see, the logit function returns only values between 0 and 1 for the dependent variable,
irrespective of the values of the independent variable. This is how logistic regression estimates the
value of the dependent variable. Logistic regression methods also model equations between
multiple independent variables and one dependent variable.

Logistic regression analysis with multiple independent variables

In many cases, multiple explanatory variables affect the value of the dependent variable. To model
such input datasets, logistic regression formulas assume a linear relationship between the different
independent variables. You can modify the sigmoid function and compute the final output variable
as

y = f(β0 + β1x1 + β2x2+… βnxn)

The symbol β represents the regression coefficient. The logit model can reverse calculate these
coefficient values when you give it a sufficiently large experimental dataset with known values of
both dependent and independent variables.

Log odds

The logit model can also determine the ratio of success to failure or log odds. For example, if you
were playing poker with your friends and you won four matches out of 10, your odds of winning
are four sixths, or four out of six, which is the ratio of your success to failure. The probability of
winning, on the other hand, is four out of 10.
Mathematically, your odds in terms of probability are p/(1 - p), and your log odds are log (p/(1
- p)). You can represent the logistic function as log odds as shown below:

What are the types of logistic regression analysis?

There are three approaches to logistic regression analysis based on the outcomes of the dependent
variable.

Binary logistic regression

Binary logistic regression works well for binary classification problems that have only two
possible outcomes. The dependent variable can have only two values, such as yes and no or 0 and
1.

Even though the logistic function calculates a range of values between 0 and 1, the binary
regression model rounds the answer to the closest values. Generally, answers below 0.5 are
rounded to 0, and answers above 0.5 are rounded to 1, so that the logistic function returns a binary
outcome.

Multinomial logistic regression

Multinomial regression can analyze problems that have several possible outcomes as long as the
number of outcomes is finite. For example, it can predict if house prices will increase by 25%,
50%, 75%, or 100% based on population data, but it cannot predict the exact value of a house.

Multinomial logistic regression works by mapping outcome values to different values between 0
and 1. Since the logistic function can return a range of continuous data, like 0.1, 0.11, 0.12, and so
on, multinomial regression also groups the output to the closest possible values.
Ordinal logistic regression

Ordinal logistic regression, or the ordered logit model, is a special type of multinomial regression
for problems in which numbers represent ranks rather than actual values. For example, you would
use ordinal regression to predict the answer to a survey question that asks customers to rank your
service as poor, fair, good, or excellent based on a numerical value, such as the number of items
they purchase from you over the year.

How does logistic regression compare to other ML techniques?

The two common data analysis techniques are linear regression analysis and deep learning.

Linear regression analysis

As explained above, linear regression models the relationship between dependent and independent
variables by using a linear combination. The linear regression equation is

y= β0X0 + β1X1 + β2X2+… βnXn+ ε, where β1 to βn and ε are regression coefficients.

Logistic regression vs. linear regression

Linear regression predicts a continuous dependent variable by using a given set of independent
variables. A continuous variable can have a range of values, such as price or age. So linear
regression can predict actual values of the dependent variable. It can answer questions like "What
will the price of rice be after 10 years?"

Unlike linear regression, logistic regression is a classification algorithm. It cannot predict actual
values for continuous data. It can answer questions like "Will the price of rice increase by 50% in
10 years?"

https://www.geeksforgeeks.org/understanding-logistic-regression/

What is Logistic Regression?

Logistic regression is used for binary classification where we use sigmoid function, that takes
input as independent variables and produces a probability value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic function for
an input is greater than 0.5 (threshold value) then it belongs to Class 1 otherwise it belongs to
Class 0. It’s referred to as regression because it is the extension of linear regression but is
mainly used for classification problems.
Key Points:
 Logistic regression predicts the output of a categorical dependent variable. Therefore, the
outcome must be a categorical or discrete value.
 It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as
0 and 1, it gives the probabilistic values which lie between 0 and 1.
 In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
function, which predicts two maximum values (0 or 1).


Logistic Function – Sigmoid Function

 The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
 It maps any real value into another value within a range of 0 and 1. The value of the logistic
regression must be between 0 and 1, which cannot go beyond this limit, so it forms a curve
like the “S” form.
 The S-form curve is called the Sigmoid function or the logistic function.
 In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and a value
below the threshold values tends to 0.

Types of Logistic Regression

On the basis of the categories, Logistic Regression can be classified into three types:
1. Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered
types of the dependent variable, such as “cat”, “dogs”, or “sheep”
3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of
dependent variables, such as “low”, “Medium”, or “High”.
Assumptions of Logistic Regression
We will explore the assumptions of logistic regression as understanding these assumptions is
important to ensure that we are using appropriate application of the model. The assumption
include:
1. Independent observations: Each observation is independent of the other. meaning there is
no correlation between any input variables.
2. Binary dependent variables: It takes the assumption that the dependent variable must be
binary or dichotomous, meaning it can take only two values. For more than two categories
SoftMax functions are used.
3. Linearity relationship between independent variables and log odds: The relationship
between the independent variables and the log odds of the dependent variable should be
linear.
4. No outliers: There should be no outliers in the dataset.
5. Large sample size: The sample size is sufficiently large

Terminologies involved in Logistic Regression

Here are some common terms involved in logistic regression:
 Independent variables: The input characteristics or predictor factors applied to the
dependent variable’s predictions.
 Dependent variable: The target variable in a logistic regression model, which we are trying
to predict.
 Logistic function: The formula used to represent how the independent and dependent
variables relate to one another. The logistic function transforms the input variables into a
probability value between 0 and 1, which represents the likelihood of the dependent variable
being 1 or 0.
 Odds: It is the ratio of something occurring to something not occurring. it is different from
probability as the probability is the ratio of something occurring to everything that could
possibly occur.
 Log-odds: The log-odds, also known as the logit function, is the natural logarithm of the
odds. In logistic regression, the log odds of the dependent variable are modeled as a linear
combination of the independent variables and the intercept.
 Coefficient: The logistic regression model’s estimated parameters, show how the
independent and dependent variables relate to one another.
 Intercept: A constant term in the logistic regression model, which represents the log odds
when all independent variables are equal to zero.
 Maximum likelihood estimation: The method used to estimate the coefficients of the
logistic regression model, which maximizes the likelihood of observing the data given the
model.
How does Logistic Regression work?
The logistic regression model transforms the linear regression function continuous value output
into categorical value output using a sigmoid function, which maps any real-valued set of
independent variables input into a value between 0 and 1. This function is known as the logistic
function.
Let the independent input features be:
X=[x11 …x1mx21 …x2m ⋮⋱ ⋮ xn1 …xnm]X=x11 x21 ⋮xn1 ……⋱ …x1mx2m⋮ xnm
and the dependent variable is Y having only binary value i.e. 0 or 1.
Y={0 if Class11 if Class2Y={01 if Class1 if Class2
then, apply the multi-linear function to the input variables X.
z=(∑i=1nwixi)+bz=(∑i=1nwixi)+b
Here xixi is the ith observation of X, wi=[w1,w2,w3,⋯,wm]wi=[w1,w2,w3,⋯,wm] is the weights
or Coefficient, and b is the bias term also known as intercept. simply this can be represented as the
dot product of weight and bias.
z=w⋅X+bz=w⋅X+b
whatever we discussed above is the linear regression.

Sigmoid Function
Now we use the sigmoid function where the input will be z and we find the probability between 0
and 1. i.e. predicted y.
σ(z)=11+e−zσ(z)=1+e−z1
Sigmoid function

As shown above, the figure sigmoid function converts the continuous variable data into
the probability i.e. between 0 and 1.
 σ(z) σ(z) tends towards 1 as z→∞z→∞
 σ(z) σ(z) tends towards 0 as z→−∞z→−∞
 σ(z) σ(z) is always bounded between 0 and 1
where the probability of being a class can be measured as:
P(y=1)=σ(z)P(y=0)=1−σ(z)P(y=1)=σ(z)P(y=0)=1−σ(z)
Logistic Regression Equation
The odd is the ratio of something occurring to something not occurring. it is different from
probability as the probability is the ratio of something occurring to everything that could possibly
occur. so odd will be:
p(x)1−p(x) =ez1−p(x)p(x) =ez

Precision-Recall Tradeoff in Logistic Regression Threshold Setting

Logistic regression becomes a classification technique only when a decision threshold is
brought into the picture. The setting of the threshold value is a very important aspect of Logistic
regression and is dependent on the classification problem itself.
The decision for the value of the threshold value is majorly affected by the values of precision
and recall. Ideally, we want both precision and recall being 1, but this seldom is the case.
In the case of a Precision-Recall tradeoff, we use the following arguments to decide upon the
threshold:
1. Low Precision/High Recall: In applications where we want to reduce the number of false
negatives without necessarily reducing the number of false positives, we choose a decision
value that has a low value of Precision or a high value of Recall. For example, in a cancer
diagnosis application, we do not want any affected patient to be classified as not affected
without giving much heed to if the patient is being wrongfully diagnosed with cancer. This
is because the absence of cancer can be detected by further medical diseases, but the
presence of the disease cannot be detected in an already rejected candidate.
2. High Precision/Low Recall: In applications where we want to reduce the number of false
positives without necessarily reducing the number of false negatives, we choose a decision
value that has a high value of Precision or a low value of Recall. For example, if we are
classifying customers whether they will react positively or negatively to a personalized
advertisement, we want to be absolutely sure that the customer will react positively to the
advertisement because otherwise, a negative reaction can cause a loss of potential sales from
the customer.
Differences Between Linear and Logistic Regression
The difference between linear regression and logistic regression is that linear regression output
is the continuous value that can be anything while logistic regression predicts the probability
that an instance belongs to a given class or not.

Linear Regression Logistic Regression

Linear regression is used to Logistic regression is used to

predict the continuous predict the categorical
dependent variable using a given dependent variable using a given
set of independent variables. set of independent variables.

Linear regression is used for It is used for solving

solving regression problem. classification problems.

In this we predict the value of In this we predict values of

continuous variables categorical variables

In this we find best fit line. In this we find S-Curve.

Linear Regression Logistic Regression

Least square estimation method Maximum likelihood estimation

is used for estimation of method is used for Estimation of
accuracy. accuracy.

Output must be categorical

The output must be continuous
value such as 0 or 1, Yes or no,
value, such as price, age, etc.
etc.

It required linear relationship

It not required linear
between dependent and
relationship.
independent variables.

There may be collinearity There should be little to no

between the independent collinearity between
variables. independent variables.

https://www.ibm.com/topics/logistic-regression

What is logistic regression?

Logistic regression estimates the probability of an event occurring, such as voted or didn’t vote,
based on a given data set of independent variables.

This type of statistical model (also known as logit model) is often used for classification and
predictive analytics. Since the outcome is a probability, the dependent variable is bounded between
0 and 1. In logistic regression, a logit transformation is applied on the odds—that is, the probability
of success divided by the probability of failure. This is also commonly known as the log odds, or
the natural logarithm of odds, and this logistic function is represented by the following formulas:

Logit(pi) = 1/(1+ exp(-pi))

ln(pi/(1-pi)) = Beta_0 + Beta_1*X_1 + … + B_k*K_k

In this logistic regression equation, logit(pi) is the dependent or response variable and x is the
independent variable. The beta parameter, or coefficient, in this model is commonly estimated via
maximum likelihood estimation (MLE). This method tests different values of beta through
multiple iterations to optimize for the best fit of log odds. All of these iterations produce the log
likelihood function, and logistic regression seeks to maximize this function to find the best
parameter estimate. Once the optimal coefficient (or coefficients if there is more than one
independent variable) is found, the conditional probabilities for each observation can be calculated,
logged, and summed together to yield a predicted probability. For binary classification, a
probability less than .5 will predict 0 while a probability greater than 0 will predict 1. After the
model has been computed, it’s best practice to evaluate the how well the model predicts the
dependent variable, which is called goodness of fit. The Hosmer–Lemeshow test is a popular
method to assess model fit.

Linear regression vs logistic regression

Both linear and logistic regression are among the most popular models within data science, and
open-source tools, like Python and R, make the computation for them quick and easy.

Linear regression models are used to identify the relationship between a continuous dependent
variable and one or more independent variables. When there is only one independent variable and
one dependent variable, it is known as simple linear regression, but as the number of independent
variables increases, it is referred to as multiple linear regression. For each type of linear regression,
it seeks to plot a line of best fit through a set of data points, which is typically calculated using the
least squares method.

Similar to linear regression, logistic regression is also used to estimate the relationship between a
dependent variable and one or more independent variables, but it is used to make a prediction about
a categorical variable versus a continuous one. A categorical variable can be true or false, yes or
no, 1 or 0, et cetera. The unit of measure also differs from linear regression as it produces a
probability, but the logit function transforms the S-curve into straight line.
While both models are used in regression analysis to make predictions about future outcomes,
linear regression is typically easier to understand. Linear regression also does not require as large
of a sample size as logistic regression needs an adequate sample to represent values across all the
response categories. Without a larger, representative sample, the model may not have sufficient
statistical power to detect a significant effect.

Types of logistic regression

There are three types of logistic regression models, which are defined based on categorical
response.

 Binary logistic regression: In this approach, the response or dependent variable is

dichotomous in nature—i.e. it has only two possible outcomes (e.g. 0 or 1). Some popular
examples of its use include predicting if an e-mail is spam or not spam or if a tumor is
malignant or not malignant. Within logistic regression, this is the most commonly used
approach, and more generally, it is one of the most common classifiers for binary
classification.
 Multinomial logistic regression: In this type of logistic regression model, the dependent
variable has three or more possible outcomes; however, these values have no specified
order. For example, movie studios want to predict what genre of film a moviegoer is likely
to see to market films more effectively. A multinomial logistic regression model can help
the studio to determine the strength of influence a person's age, gender, and dating status
may have on the type of film that they prefer. The studio can then orient an advertising
campaign of a specific movie toward a group of people likely to go see it.
 Ordinal logistic regression: This type of logistic regression model is leveraged when the
response variable has three or more possible outcome, but in this case, these values do have
a defined order. Examples of ordinal responses include grading scales from A to F or rating
scales from 1 to 5.
Use cases of logistic regression

Logistic regression is commonly used for prediction and classification problems. Some of these
use cases include:
 Fraud detection: Logistic regression models can help teams identify data anomalies,
which are predictive of fraud. Certain behaviors or characteristics may have a higher
association with fraudulent activities, which is particularly helpful to banking and other
financial institutions in protecting their clients. SaaS-based companies have also started to
adopt these practices to eliminate fake user accounts from their datasets when conducting
data analysis around business performance.
 Disease prediction: In medicine, this analytics approach can be used to predict the
likelihood of disease or illness for a given population. Healthcare organizations can set up
preventative care for individuals that show higher propensity for specific illnesses.
 Churn prediction: Specific behaviors may be indicative of churn in different functions of
an organization. For example, human resources and management teams may want to know
if there are high performers within the company who are at risk of leaving the organization;
this type of insight can prompt conversations to understand problem areas within the
company, such as culture or compensation. Alternatively, the sales organization may want
to learn which of their clients are at risk of taking their business elsewhere. This can prompt
teams to set up a retention strategy to avoid lost revenue.

https://www.geeksforgeeks.org/types-of-regression-techniques/

Types of Regression Techniques

Along with the development of the machine learning domain regression analysis techniques

have gained popularity as well as developed manifold from just y = mx + c. There are several

types of regression techniques, each suited for different types of data and different types of

relationships. The main types of regression techniques are:

1. Linear Regression

2. Polynomial Regression

3. Stepwise Regression

4. Decision Tree Regression

5. Random Forest Regression

6. Support Vector Regression

7. Ridge Regression

8. Lasso Regression

9. ElasticNet Regression

10. Bayesian Linear Regression

Linear Regression

Linear regression is used for predictive analysis. Linear regression is a linear approach for

modeling the relationship between the criterion or the scalar response and the multiple

predictors or explanatory variables. Linear regression focuses on the conditional probability

distribution of the response given the values of the predictors. For linear regression, there is a

danger of overfitting. The formula for linear regression is:

Syntax:

y = θx + b

where,

 θ – It is the model weights or parameters

 b – It is known as the bias.

This is the most basic form of regression analysis and is used to model a linear relationship

between a single dependent variable and one or more independent variables.

Here, a linear regression model is instantiated to fit a linear relationship between input features

(X) and target values (y). This code is used for simple demonstration of the approach.

Python
from sklearn.linear_model import LinearRegression

# Create a linear regression model

model = LinearRegression()

# Fit the model to the data

model.fit(X, y)

# Predict the response for a new data point

y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a linear

regression model for predictive modeling tasks.

Polynomial Regression

This is an extension of linear regression and is used to model a non-linear relationship between

the dependent variable and independent variables. Here as well syntax remains the same but

now in the input variables we include some polynomial or higher degree terms of some already

existing features as well. Linear regression was only able to fit a linear model to the data at

hand but with polynomial features, we can easily fit some non-linear relationship between the

target as well as input features.

Here is the code for simple demonstration of the Polynomial regression approach.
Python

from sklearn.linear_model import PolynomialRegression

# Create a polynomial regression model

model = PolynomialRegression(degree=2)

# Fit the model to the data

model.fit(X, y)

# Predict the response for a new data point

y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a

Polynomial regression model for predictive modeling tasks.

Stepwise Regression

Stepwise regression is used for fitting regression models with predictive models. It is carried

out automatically. With each step, the variable is added or subtracted from the set of explanatory

variables. The approaches for stepwise regression are forward selection, backward elimination,

and bidirectional elimination. The formula for stepwise regression is

Here is the code for simple demonstration of the stepwise regression approach.

Python

from sklearn.linear_model import StepwiseLinearRegression

# Create a stepwise regression model

model = StepwiseLinearRegression(forward=True,

backward=True,

verbose=1)

# Fit the model to the data

model.fit(X, y)

# Predict the response for a new data point

y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a Stepwise

regression model for predictive modeling tasks.

Decision Tree Regression

A Decision Tree is the most powerful and popular tool for classification and prediction.

A Decision tree is a flowchart-like tree structure, where each internal node denotes a test on an

attribute, each branch represents an outcome of the test, and each leaf node (terminal node)

holds a class label. There is a non-parametric method used to model a decision tree to predict a

continuous outcome.

Here is the code for simple demonstration of the Decision Tree regression approach.

Python

from sklearn.tree import DecisionTreeRegressor

# Create a decision tree regression model

model = DecisionTreeRegressor()

# Fit the model to the data

model.fit(X, y)

# Predict the response for a new data point

y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Decision

Tree regression model for predictive modeling tasks.

Random Forest Regression

Random Forest is an ensemble technique capable of performing both regression and

classification tasks with the use of multiple decision trees and a technique called Bootstrap and

Aggregation, commonly known as bagging. The basic idea behind this is to combine multiple

decision trees in determining the final output rather than relying on individual decision trees.

Random Forest has multiple decision trees as base learning models. We randomly perform row

sampling and feature sampling from the dataset forming sample datasets for every model. This

part is called Bootstrap.

Here is the code for simple demonstration of the Random Forest regression approach.

Python

from sklearn.ensemble import RandomForestRegressor

# Create a random forest regression model

model = RandomForestRegressor(n_estimators=100)

# Fit the model to the data

model.fit(X, y)
# Predict the response for a new data point

y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a Random

Forest regression model for predictive modeling tasks.

Support Vector Regression (SVR)

Support vector regression (SVR) is a type of support vector machine (SVM) that is used for

regression tasks. It tries to find a function that best predicts the continuous output value for a

given input value.

SVR can use both linear and non-linear kernels. A linear kernel is a simple dot product between

two input vectors, while a non-linear kernel is a more complex function that can capture more

intricate patterns in the data. The choice of kernel depends on the data’s characteristics and the

task’s complexity.

Here is the code for simple demonstration of the Support vector regression approach.

Python

from sklearn.svm import SVR

# Create a support vector regression model

model = SVR(kernel='linear')

# Fit the model to the data

model.fit(X, y)

# Predict the response for a new data point

y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a Support

vector regression model for predictive modeling tasks.

Ridge Regression

Ridge regression is a technique for analyzing multiple regression data. When multicollinearity

occurs, least squares estimates are unbiased. This is a regularized linear regression model, it

tries to reduce the model complexity by adding a penalty term to the cost function. A degree of

bias is added to the regression estimates, and as a result, ridge regression reduces the standard

errors.

Here is the code for simple demonstration of the Ridge regression approach.

Python
from sklearn.linear_model import Ridge

# Create a ridge regression model

model = Ridge(alpha=0.1)

# Fit the model to the data

model.fit(X, y)

# Predict the response for a new data point

y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a Ridge

regression model for predictive modeling tasks.

Lasso Regression

Lasso regression is a regression analysis method that performs both variable selection

and regularization. Lasso regression uses soft thresholding. Lasso regression selects only a

subset of the provided covariates for use in the final model.

This is another regularized linear regression model, it works by adding a penalty term to the

cost function, but it tends to zero out some features’ coefficients, which makes it useful for

feature selection.

Here is the code for simple demonstration of the Lasso regression approach.
Python

from sklearn.linear_model import Lasso

# Create a lasso regression model

model = Lasso(alpha=0.1)

# Fit the model to the data

model.fit(X, y)

# Predict the response for a new data point

y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a Lasso

regression model for predictive modeling tasks.

ElasticNet Regression

Linear Regression suffers from overfitting and can’t deal with collinear data. When there are

many features in the dataset and even some of them are not relevant to the predictive model.

This makes the model more complex with a too-inaccurate prediction on the test set (or

overfitting). Such a model with high variance does not generalize on the new data. So, to deal

with these issues, we include both L-2 and L-1 norm regularization to get the benefits of both
Ridge and Lasso at the same time. The resultant model has better predictive power than Lasso.

It performs feature selection and also makes the hypothesis simpler. The modified cost function

for Elastic-Net Regression is given below:

where,

 w(j) represents the weight for the jth feature.

 n is the number of features in the dataset.

 lambda1 is the regularization strength for the L1 norm.

 lambda2 is the regularization strength for the L2 norm.

Here is the code for simple demonstration of the Elasticnet regression approach.

Python

from sklearn.linear_model import ElasticNet

# Create an elastic net regression model

model = ElasticNet(alpha=0.1, l1_ratio=0.5)

# Fit the model to the data

model.fit(X, y)

# Predict the response for a new data point

y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Elastic

Net regression model for predictive modeling tasks.

Bayesian Linear Regression

As the name suggests this algorithm is purely based on Bayes Theorem. Because of this reason

only we do not use the Least Square method to determine the coefficients of the regression

model. So, the technique which is used here to find the model weights and parameters relies on

features posterior distribution and this provides an extra stability factor to the regression model

which is based on this technique.

Here is the code for simple demonstration of the Bayesian Linear regression approach.

Python

from sklearn.linear_model import BayesianLinearRegression

# Create a Bayesian linear regression model

model = BayesianLinearRegression()

# Fit the model to the data

model.fit(X, y)

# Predict the response for a new data point

y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a Bayesian

linear regression model for predictive modeling tasks.

https://www.appier.com/en/blog/5-types-of-regression-analysis-and-when-to-use-

them#:~:text=Lasso%20regression%20Like%20ridge%20regression,not%20happen%20with

%20ridge%20regression.

Regression analysis is an incredibly powerful machine learning tool used for analyzing data. Here
we will explore how it works, what the main types are and what it can do for your business.

What Is Regression in Machine Learning?

Regression analysis is a way of predicting future happenings between a dependent (target) and
one or more independent variables (also known as a predictor). For example, it can be used to
predict the relationship between reckless driving and the total number of road accidents caused
by a driver, or, to use a business example, the effect on sales and spending a certain amount of
money on advertising.

Regression is one of the most common models of machine learning. It differs from classification
models because it estimates a numerical value, whereas classification models identify which
category an observation belongs to.

The main uses of regression analysis are forecasting, time series modeling and finding the cause
and effect relationship between variables.

Why Is It Important?
Regression has a wide range of real-life applications. It is essential for any machine learning
problem that involves continuous numbers – this includes, but is not limited to, a host of
examples, including:

Financial forecasting (like house price estimates, or stock prices)

Sales and promotions forecasting

Testing automobiles

Weather analysis and prediction

Time series forecasting

As well as telling you whether a significant relationship exists between two or more variables,
regression analysis can give specific details about that relationship. Specifically, it can estimate
the strength of impact that multiple variables will have on a dependent variable. If you change
the value of one variable (price, say), regression analysis should tell you what effect that will
have on the dependent variable (sales).

Businesses can use regression analysis to test the effects of variables as measured on different
scales. With it in your toolbox, you can assess the best set of variables to use when building
predictive models, greatly increasing the accuracy of your forecasting.

Finally, regression analysis is the best way of solving regression problems in machine learning
using data modeling. By plotting data points on a chart and running the best fit line through
them, you can predict each data point’s likelihood of error: the further away from the line they
lie, the higher their error of prediction (this best fit line is also known as a regression line).

What Are the Different Types of Regression?

1. Linear regression
One of the most basic types of regression in machine learning, linear regression comprises a
predictor variable and a dependent variable related to each other in a linear fashion. Linear
regression involves the use of a best fit line, as described above.

You should use linear regression when your variables are related linearly. For example, if you
are forecasting the effect of increased advertising spend on sales. However, this analysis is
susceptible to outliers, so it should not be used to analyze big data sets.

2. Logistic regression

Does your dependent variable have a discrete value? In other words, can it only have one of
two values (either 0 or 1, true or false, black or white, spam or not spam, and so on)? In that
case, you might want to use logistic regression to analyze your data.

Logistic regression uses a sigmoid curve to show the relationship between the target and
independent variables. However, caution should be exercised: logistic regression works best
with large data sets that have an almost equal occurrence of values in target variables. The
dataset should not contain a high correlation between independent variables (a phenomenon
known as multicollinearity), as this will create a problem when ranking the variables.

3. Ridge regression

If, however, you do have a high correlation between independent variables, ridge regression is
a more suitable tool. It is known as a regularization technique, and is used to reduce the
complexity of the model. It introduces a small amount of bias (known as the ‘ridge regression
penalty’) which, using a bias matrix, makes the model less susceptible to overfitting.

4. Lasso regression

Like ridge regression, lasso regression is another regularization technique that reduces the
model’s complexity. It does so by prohibiting the absolute size of the regression coefficient.
This causes the coefficient value to become closer to zero, which does not happen with ridge
regression.
The advantage? It can use feature selection, letting you select a set of features from the dataset
to build the model. By only using the required features – and setting the rest as zero – lasso
regression avoids overfitting.

5. Polynomial regression

Polynomial regression models a non-linear dataset using a linear model. It is the equivalent of
making a square peg fit into a round hole. It works in a similar way to multiple linear regression
(which is just linear regression but with multiple independent variables), but uses a non -linear
curve. It is used when data points are present in a non-linear fashion.

The model transforms these data points into polynomial features of a given degree, and models
them using a linear model. This involves best fitting them using a polynomial line, which is
curved, rather than the straight line seen in linear regression. However, this model can be prone
to overfitting, so you are advised to analyze the curve towards the end to avoid odd-looking
results.

Artificial Intelligence
No ratings yet
Artificial Intelligence
67 pages
Game Audio Programming 4 Principles and Practices
100% (2)
Game Audio Programming 4 Principles and Practices
356 pages
Presentation On Artificial Intelligence and Machine Learning
No ratings yet
Presentation On Artificial Intelligence and Machine Learning
20 pages
XI AI UNIT 1 Introduction Artificial Intelligence For Everyone
No ratings yet
XI AI UNIT 1 Introduction Artificial Intelligence For Everyone
18 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
19 pages
L 23 Na I Machine Learning
No ratings yet
L 23 Na I Machine Learning
6 pages
Unit
No ratings yet
Unit
112 pages
Introductiontoaiml 240919083826 24f51819
No ratings yet
Introductiontoaiml 240919083826 24f51819
105 pages
MODULE 08 Artificial Intelligence
No ratings yet
MODULE 08 Artificial Intelligence
84 pages
Ai Unit 1 Notes
No ratings yet
Ai Unit 1 Notes
53 pages
AI-Driven Grocery Management
No ratings yet
AI-Driven Grocery Management
71 pages
Artificial Intelligence For
No ratings yet
Artificial Intelligence For
90 pages
Generative Ai - Record
No ratings yet
Generative Ai - Record
70 pages
Builtin Com Artificial Intelligence
No ratings yet
Builtin Com Artificial Intelligence
20 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
14 pages
Artificial Intelligence in Fin Serv
No ratings yet
Artificial Intelligence in Fin Serv
43 pages
Artificial Intelligence Notes Detailed
No ratings yet
Artificial Intelligence Notes Detailed
119 pages
Emerging Chapter 3-Artificial Intelligence (AI) .PD
No ratings yet
Emerging Chapter 3-Artificial Intelligence (AI) .PD
41 pages
A.I. Lecture 2 New
No ratings yet
A.I. Lecture 2 New
25 pages
Machine Vision
No ratings yet
Machine Vision
67 pages
Data Science UNIT II
No ratings yet
Data Science UNIT II
28 pages
Unit-1 AI
No ratings yet
Unit-1 AI
103 pages
Untitled Presentation
No ratings yet
Untitled Presentation
11 pages
What Is Artificial Intelligence (AI) - Built in
No ratings yet
What Is Artificial Intelligence (AI) - Built in
22 pages
Introduction To Artificial Intelligence: Inte Ligê Ncia Artif Icial E Cibe Rse Gurança (Inacs)
No ratings yet
Introduction To Artificial Intelligence: Inte Ligê Ncia Artif Icial E Cibe Rse Gurança (Inacs)
35 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
34 pages
Unit 1-Ai-To Every-One
No ratings yet
Unit 1-Ai-To Every-One
26 pages
UNIT 1 DT
No ratings yet
UNIT 1 DT
14 pages
Ai Introduction
No ratings yet
Ai Introduction
20 pages
Artificial Intelligence by Deeksha 8th Class
No ratings yet
Artificial Intelligence by Deeksha 8th Class
11 pages
Unit 1 - Introduction To AI
No ratings yet
Unit 1 - Introduction To AI
133 pages
Week#1
No ratings yet
Week#1
46 pages
Introductory - (Artificial Intelligence)
No ratings yet
Introductory - (Artificial Intelligence)
5 pages
Lecture 1 - Introduction To The Course and AI, ML
No ratings yet
Lecture 1 - Introduction To The Course and AI, ML
44 pages
#INTERACTION Handout With Quiz 14
No ratings yet
#INTERACTION Handout With Quiz 14
8 pages
AI Intro
No ratings yet
AI Intro
18 pages
Business Transformation Enablement Program
No ratings yet
Business Transformation Enablement Program
48 pages
1 Introduction To AI 15-07-2024
No ratings yet
1 Introduction To AI 15-07-2024
63 pages
UNIT - 1 Notes
No ratings yet
UNIT - 1 Notes
28 pages
C# Tutorial
50% (2)
C# Tutorial
5 pages
Artificial Intelligence 0
No ratings yet
Artificial Intelligence 0
15 pages
What Is Artificial Intelligence
No ratings yet
What Is Artificial Intelligence
5 pages
AI Notes
No ratings yet
AI Notes
2 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
49 pages
Lesson 11 Artificial Intelligence Basics
No ratings yet
Lesson 11 Artificial Intelligence Basics
8 pages
Understanding Artificial Intelligence
No ratings yet
Understanding Artificial Intelligence
6 pages
Module 2 'Machine Learning-AI'
No ratings yet
Module 2 'Machine Learning-AI'
19 pages
AI Class Notes
No ratings yet
AI Class Notes
56 pages
16-2 p30 Mapping of j1939 To Can FD Cia602 Zeltwanger
No ratings yet
16-2 p30 Mapping of j1939 To Can FD Cia602 Zeltwanger
2 pages
Unit 2 AIML
No ratings yet
Unit 2 AIML
23 pages
Artificial Intelligence - ETT Reviewer
No ratings yet
Artificial Intelligence - ETT Reviewer
7 pages
Artificial Intelligence Intro Lec 1
No ratings yet
Artificial Intelligence Intro Lec 1
8 pages
Unec 1727169634
No ratings yet
Unec 1727169634
21 pages
What Is Artificial Intelligence
No ratings yet
What Is Artificial Intelligence
1 page
EGlu User Manual
No ratings yet
EGlu User Manual
58 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
6 pages
Digital Fluency
No ratings yet
Digital Fluency
9 pages
Artificial Intelligence Introduction
No ratings yet
Artificial Intelligence Introduction
8 pages
Second Life For Clipper Applications
100% (6)
Second Life For Clipper Applications
28 pages
Unit I Introduction To Artificial Intelligence
No ratings yet
Unit I Introduction To Artificial Intelligence
11 pages
Use Cases For Extending The UI of SAP Fiori Apps
100% (1)
Use Cases For Extending The UI of SAP Fiori Apps
33 pages
What Is Artificial Intelligence (AI) ?
No ratings yet
What Is Artificial Intelligence (AI) ?
2 pages
What Is Artificial Intelligence
No ratings yet
What Is Artificial Intelligence
5 pages
Research Paper On AI
No ratings yet
Research Paper On AI
16 pages
Billdesk PG Interface Specs-Libord Brokerage Private Limited
No ratings yet
Billdesk PG Interface Specs-Libord Brokerage Private Limited
11 pages
OAFQuestions
No ratings yet
OAFQuestions
17 pages
Samsung Pg17n, Pg19n Service Manual
No ratings yet
Samsung Pg17n, Pg19n Service Manual
85 pages
9) Front End Processor PDF
No ratings yet
9) Front End Processor PDF
23 pages
Install & Running An EMC VNX VSA v2.0
No ratings yet
Install & Running An EMC VNX VSA v2.0
42 pages
Time Dimension For Data Warehouse
No ratings yet
Time Dimension For Data Warehouse
710 pages
Smart Note Taker: A Seminar Report On
No ratings yet
Smart Note Taker: A Seminar Report On
32 pages
Drawings For Manufacture
No ratings yet
Drawings For Manufacture
16 pages
NSDC-Assessment Processes and Protocols - Guide For STT - Final
No ratings yet
NSDC-Assessment Processes and Protocols - Guide For STT - Final
88 pages
Performance&Scalability Ch3
No ratings yet
Performance&Scalability Ch3
41 pages
Elton Fungirai - R2011838Q - Final Year Project Documentation - CryptoLoom DVPN
No ratings yet
Elton Fungirai - R2011838Q - Final Year Project Documentation - CryptoLoom DVPN
73 pages
Filing in C
0% (1)
Filing in C
6 pages
SAX (Simple API For XML)
No ratings yet
SAX (Simple API For XML)
16 pages
Syllabus Index IGCSE
No ratings yet
Syllabus Index IGCSE
11 pages
Solved - A Stage Extraction Process Is Depicted in Fig. P9.13. I...
No ratings yet
Solved - A Stage Extraction Process Is Depicted in Fig. P9.13. I...
3 pages
AD8232 Heart Rate Monitor Hookup Guide: Available Online at
No ratings yet
AD8232 Heart Rate Monitor Hookup Guide: Available Online at
15 pages
CSC WS 1
No ratings yet
CSC WS 1
4 pages
Ibs 1000
No ratings yet
Ibs 1000
2 pages
ECONOMIA
No ratings yet
ECONOMIA
9 pages
Vinod CV Mulesoft
No ratings yet
Vinod CV Mulesoft
1 page
My First SQL Practice - To Create Table
No ratings yet
My First SQL Practice - To Create Table
4 pages
Vectores Activity For Tercero de Secundaria - Live Worksheets
No ratings yet
Vectores Activity For Tercero de Secundaria - Live Worksheets
1 page
Adroitec Engineering Solutions P LTD: Corporate Training Feedback Form
No ratings yet
Adroitec Engineering Solutions P LTD: Corporate Training Feedback Form
1 page
AI Systems
From Everand
AI Systems
Anand Vemula
No ratings yet
The Art of AI Scrum Master & Work
From Everand
The Art of AI Scrum Master & Work
Tom Henricksen
No ratings yet
The Art of AI Project Management & Work
From Everand
The Art of AI Project Management & Work
Tom Henricksen
No ratings yet