Unit 5
Unit 5
https://www.ibm.com/topics/artificial-intelligence
What is AI?
Artificial intelligence, or AI, is technology that enables computers and machines to simulate
human intelligence and problem-solving capabilities.
On its own or combined with other technologies (e.g., sensors, geolocation, robotics) AI can
perform tasks that would otherwise require human intelligence or intervention. Digital assistants,
GPS guidance, autonomous vehicles, and generative AI tools (like Open AI's Chat GPT) are just
a few examples of AI in the daily news and our daily lives.
As a field of computer science, artificial intelligence encompasses (and is often mentioned together
with) machine learning and deep learning. These disciplines involve the development of AI
algorithms, modeled after the decision-making processes of the human brain, that can ‘learn’ from
available data and make increasingly more accurate classifications or predictions over time.
Artificial intelligence has gone through many cycles of hype, but even to skeptics, the release of
ChatGPT seems to mark a turning point. The last time generative AI loomed this large, the
breakthroughs were in computer vision, but now the leap forward is in natural language processing
(NLP). Today, generative AI can learn and synthesize not just human language but other data types
including images, video, software code, and even molecular structures.
https://cloud.google.com/learn/what-is-artificial-intelligence
AI is the backbone of innovation in modern computing, unlocking value for individuals and
businesses. For example, optical character recognition (OCR) uses AI to extract text and data from
images and documents, turns unstructured content into business-ready structured data, and unlocks
valuable insights.
https://www.techtarget.com/searchenterpriseai/definition/AI-Artificial-Intelligence
What is AI?
As the hype around AI has accelerated, vendors have scrambled to promote how their products
and services incorporate it. Often, what they refer to as "AI" is a well-established technology such
as machine learning.
AI requires specialized hardware and software for writing and training machine learning
algorithms. No single programming language is used exclusively in AI, but Python, R, Java, C++
and Julia are all popular languages among AI developers.
https://en.wikipedia.org/wiki/Artificial_intelligence
Some high-profile applications of AI include advanced web search engines (e.g., Google
Search); recommendation systems (used by YouTube, Amazon, and Netflix); interacting via
human speech (e.g., Google Assistant, Siri, and Alexa); autonomous
vehicles (e.g., Waymo); generative and creative tools (e.g., ChatGPT, Apple Intelligence, and AI
art); and superhuman play and analysis in strategy games (e.g., chess and Go). However, many AI
applications are not perceived as AI: "A lot of cutting edge AI has filtered into general applications,
often without being called AI because once something becomes useful enough and common
enough it's not labeled AI anymore."[2][3]
Alan Turing was the first person to conduct substantial research in the field that he called "machine
intelligence".[4] Artificial intelligence was founded as an academic discipline in 1956,[5] by those
now considered the founding fathers of AI: John McCarthy, Marvin Minksy, Nathaniel Rochester,
and Claude Shannon.[6][7] The field went through multiple cycles of optimism,[8][9] followed by
periods of disappointment and loss of funding, known as AI winter.[10][11] Funding and interest
vastly increased after 2012 when deep learning surpassed all previous AI techniques,[12] and after
2017 with the transformer architecture.[13] This led to the AI boom of the early 2020s, with
companies, universities, and laboratories overwhelmingly based in the United States pioneering
significant advances in artificial intelligence.[14]
https://builtin.com/artificial-intelligence
Over time, AI systems improve on their performance of specific tasks, allowing them to adapt to
new inputs and make decisions without being explicitly programmed to do so. In essence, artificial
intelligence is about teaching machines to think and learn like humans, with the goal of automating
work and solving problems more efficiently.
https://www.ibm.com/topics/machine-learning
What is ML?
Machine learning (ML) is a branch of artificial intelligence (AI) and computer science that focuses
on the using data and algorithms to enable AI to imitate the way that humans learn, gradually
improving its accuracy.
How does machine learning work?
UC Berkeley (link resides outside ibm.com) breaks out the learning system of a machine learning
algorithm into three main parts.
1. A Decision Process: In general, machine learning algorithms are used to make a prediction
or classification. Based on some input data, which can be labeled or unlabeled, your
algorithm will produce an estimate about a pattern in the data.
2. An Error Function: An error function evaluates the prediction of the model. If there are
known examples, an error function can make a comparison to assess the accuracy of the
model.
3. A Model Optimization Process: If the model can fit better to the data points in the training
set, then weights are adjusted to reduce the discrepancy between the known example and
the model estimate. The algorithm will repeat this iterative “evaluate and optimize”
process, updating weights autonomously until a threshold of accuracy has been met.
Neural networks: Neural networks simulate the way the human brain works, with a huge
number of linked processing nodes. Neural networks are good at recognizing patterns and
play an important role in applications including natural language translation, image
recognition, speech recognition, and image creation.
Linear regression: This algorithm is used to predict numerical values, based on a linear
relationship between different values. For example, the technique could be used to predict
house prices based on historical data for the area.
Logistic regression: This supervised learning algorithm makes predictions for categorical
response variables, such as “yes/no” answers to questions. It can be used for applications
such as classifying spam and quality control on a production line.
Clustering: Using unsupervised learning, clustering algorithms can identify patterns in
data so that it can be grouped. Computers can help data scientists by identifying differences
between data items that humans have overlooked.
Decision trees: Decision trees can be used for both predicting numerical values
(regression) and classifying data into categories. Decision trees use a branching sequence
of linked decisions that can be represented with a tree diagram. One of the advantages of
decision trees is that they are easy to validate and audit, unlike the black box of the neural
network.
Random forests: In a random forest, the machine learning algorithm predicts a value or
category by combining the results from a number of decision trees.
Supervised learning, also known as supervised machine learning, is defined by its use of labeled
datasets to train algorithms to classify data or predict outcomes accurately. As input data is fed
into the model, the model adjusts its weights until it has been fitted appropriately. This occurs as
part of the cross validation process to ensure that the model avoids overfitting or underfitting.
Supervised learning helps organizations solve a variety of real-world problems at scale, such as
classifying spam in a separate folder from your inbox. Some methods used in supervised learning
include neural networks, naïve bayes, linear regression, logistic regression, random forest, and
support vector machine (SVM).
Unsupervised learning, also known as unsupervised machine learning, uses machine learning
algorithms to analyze and cluster unlabeled datasets (subsets called clusters). These algorithms
discover hidden patterns or data groupings without the need for human intervention. This method’s
ability to discover similarities and differences in information make it ideal for exploratory data
analysis, cross-selling strategies, customer segmentation, and image and pattern recognition. It’s
also used to reduce the number of features in a model through the process of dimensionality
reduction. Principal component analysis (PCA) and singular value decomposition (SVD) are two
common approaches for this. Other algorithms used in unsupervised learning include neural
networks, k-means clustering, and probabilistic clustering methods.
Semi-supervised learning
Semi-supervised learning offers a happy medium between supervised and unsupervised learning.
During training, it uses a smaller labeled data set to guide classification and feature extraction from
a larger, unlabeled data set. Semi-supervised learning can solve the problem of not having enough
labeled data for a supervised learning algorithm. It also helps if it’s too costly to label enough data.
Here are just a few examples of machine learning you might encounter every day:
Speech recognition: It is also known as automatic speech recognition (ASR), computer speech
recognition, or speech-to-text, and it is a capability which uses natural language processing (NLP)
to translate human speech into a written format. Many mobile devices incorporate speech
recognition into their systems to conduct voice search—e.g. Siri—or improve accessibility for
texting.
Customer service: Online chatbots are replacing human agents along the customer journey,
changing the way we think about customer engagement across websites and social media
platforms. Chatbots answer frequently asked questions (FAQs) about topics such as shipping, or
provide personalized advice, cross-selling products or suggesting sizes for users. Examples
include virtual agents on e-commerce sites; messaging bots, using Slack and Facebook Messenger;
and tasks usually done by virtual assistants and voice assistants.
Computer vision: This AI technology enables computers to derive meaningful information from
digital images, videos, and other visual inputs, and then take the appropriate action. Powered by
convolutional neural networks, computer vision has applications in photo tagging on social media,
radiology imaging in healthcare, and self-driving cars in the automotive industry.
Recommendation engines: Using past consumption behavior data, AI algorithms can help to
discover data trends that can be used to develop more effective cross-selling strategies.
Recommendation engines are used by online retailers to make relevant product recommendations
to customers during the checkout process.
Robotic process automation (RPA): Also known as software robotics, RPA uses intelligent
automation technologies to perform repetitive manual tasks.
Fraud detection: Banks and other financial institutions can use machine learning to spot
suspicious transactions. Supervised learning can train a model using information about known
fraudulent transactions. Anomaly detection can identify transactions that look atypical and deserve
further investigation.
https://cloud.google.com/learn/what-is-machine-learning
Today’s enterprises are inundated with data. To drive better business decisions, they have to make
sense of it. But the sheer volume coupled with complexity makes data difficult to analyze using
traditional tools. Building, testing, iterating, and deploying analytical models for identifying
patterns and insights in data eats up employees’ time in a way that scales poorly. Machine learning
can enable an organization to derive insights quickly as data scales.
Machine learning defined
Machine learning is a subset of artificial intelligence that enables a system to autonomously learn
and improve using neural networks and deep learning, without being explicitly programmed, by
feeding it large amounts of data.
Machine learning allows computer systems to continuously adjust and enhance themselves as they
accrue more “experiences.” Thus, the performance of these systems can be improved by providing
larger and more varied datasets to be processed.
Artificial intelligence is an area of computer science concerned with building computers and
machines that can reason, learn, and act in a way resembling human intelligence, or systems that
involve data whose scale exceeds what humans can analyze. The field includes many different
disciplines including data analytics, statistics, hardware and software engineering, neuroscience,
and even philosophy.
Just as machine learning is a subset of artificial intelligence, deep learning is a subset of machine
learning. Deep learning works by training neural networks on sets of data. A neural network is a
model that uses a system of artificial neurons that are computational nodes used to classify and
analyze data. Data is fed into the first layer of a neural network, with each node making a decision,
and then passing that information onto multiple nodes in the next layer. Training models with more
than three layers are referred to as “deep neural networks” or “deep learning.” Some modern neural
networks have hundreds or thousands of layers.
Assuming the training data is of high quality, the more training samples the machine learning
algorithm receives, the more accurate the model will become. The algorithm fits the model to the
data during training, in what is called the “fitting process.” If the outcome does not fit the expected
outcome, the algorithm is re-trained again and again until it outputs the accurate response. In
essence, the algorithm learns from the data and reaches outcomes based on whether the input and
response fit with a line, cluster, or other statistical correlation.
In broad strokes, there are three kinds of models used in machine learning.
Supervised learning is a machine learning model that uses labeled training data (structured data)
to map a specific feature to a label. In supervised learning, the output is known (such as recognizing
a picture of an apple) and the model is trained on data of the known output. In simple terms, to
train the algorithm to recognize pictures of apples, feed it pictures labeled as apples.
Linear regression
Polynomial regression
K-nearest neighbors
Naive Bayes
Decision trees
Unsupervised learning is a machine learning model that uses unlabeled data (unstructured data)
to learn patterns. Unlike supervised learning, the “correctness” of the output is not known ahead
of time. Rather, the algorithm learns from the data without human input (and is thus, unsupervised)
and categorizes it into groups based on attributes. For instance, if the algorithm is given pictures
of apples and bananas, it will work by itself to categorize which picture is an apple and which is a
banana. Unsupervised learning is good at descriptive modeling and pattern matching.
Fuzzy means
K-means clustering
Hierarchical clustering
Partial least squares
There’s also a mixed approach to machine learning called semi-supervised learning in which only
some data is labeled. In semi-supervised learning, the algorithm must figure out how to organize
and structure the data to achieve a known result. For instance, the machine learning model is told
that the result is a pear, but only some training data is labeled as a pear.
Reinforcement learning is a machine learning model that can be described as “learn by doing”
through a series of trial and error experiments. An “agent” learns to perform a defined task through
a feedback loop until its performance is within a desirable range. The agent receives positive
reinforcement when it performs the task well and negative reinforcement when it performs poorly.
An example of reinforcement learning is when Google researchers taught a reinforcement learning
algorithm to play the game Go. The model, which had no prior knowledge of the rules of Go,
simply moved pieces at random and “learned” the best moves to make. The algorithm was trained
via positive and negative reinforcement to the point that the machine learning model could beat a
human player at the game.
The more data consumed by a machine learning algorithm, the better it gets in finding trends and
patterns in that data. For instance, an ecommerce website might use machine learning to
understand how people shop on their site and use that information to give people better
recommendations or find trend data that can lead to new product opportunities.
Automation
Machine learning and artificial intelligence can take away much of the dull and dreary work from
human workers. Utilities like robotic process automation can perform some of the tedious business
tasks that keep people from performing more meaningful work. Computer vision and objection
detection algorithms can help robots pick and pack items from an assembly line. Always-on fraud
detection and threat-assessment machine learning can find security flaws before they become a
problem.
Continuous improvement
Given the right kinds of data, machine learning algorithms will continue to improve to be faster
and more accurate. A good example is the GPT-3 dataset that continues to improve how it
generates text.
Machine learning is often only as good as the data it is being fed. If a machine learning algorithm
is fed a biased dataset, it will deliver biased results.
Data acquisition
Machine learning can require a lot of data before it can be useful. As many machine learning use
cases are based on supervised learning, acquiring and cleaning structured data to train the
algorithms is an important first step, which can be difficult if data resides in a variety of siloed
locations within an organization.
While machine learning, artificial intelligence, and cloud vendors try to make it as easy as possible
to set up and run machine learning algorithms, organizations often need programmers and data
scientists to understand and utilize the training algorithms and their results.
Resource intensive
Machine learning can be time consuming, requiring a lot of computing resources and employee
hours to begin processing data and achieving results.
https://www.geeksforgeeks.org/ml-machine-learning/
What is Machine Learning?
Machine learning is a branch of artificial intelligence that enables algorithms to uncover hidden
patterns within datasets, allowing them to make predictions on new, similar data without
explicit programming for each task. Traditional machine learning combines data with statistical
tools to predict outputs, yielding actionable insights. This technology finds applications in
diverse fields such as image and speech recognition, natural language processing,
recommendation systems, fraud detection, portfolio optimization, and automating tasks.
For instance, recommender systems use historical data to personalize suggestions. Netflix, for
example, employs collaborative and content-based filtering to recommend movies and TV
shows based on user viewing history, ratings, and genre preferences. Reinforcement learning
further enhances these systems by enabling agents to make decisions based on environmental
feedback, continually refining recommendations.
Machine learning’s impact extends to autonomous vehicles, drones, and robots, enhancing their
adaptability in dynamic environments. This approach marks a breakthrough where machines
learn from data examples to generate accurate outcomes, closely intertwined with data mining
and data science.
Difference between Machine Learning and Traditional Programming
The Difference between Machine Learning and Traditional Programming is as follows:
Traditional
Machine Learning Programming Artificial Intelligence
Sometimes AI uses a
Traditional programming combination of both Data and
ML can find patterns and
is totally dependent on the Pre-defined rules, which gives
insights in large datasets that
intelligence of it a great edge in solving
might be difficult for humans
developers. So, it has very complex tasks with good
to discover.
limited capability. accuracy which seem
impossible to humans.
Traditional
Machine Learning Programming Artificial Intelligence
https://www.spiceworks.com/tech/artificial-intelligence/articles/what-is-ml/#_001
What Is Machine Learning?
Machine learning (ML) is a discipline of artificial intelligence (AI) that provides machines
with the ability to automatically learn from data and past experiences while identifying
patterns to make predictions with minimal human intervention.
Machine learning derives insightful information from large volumes of data by leveraging
algorithms to identify patterns and learn in an iterative process. ML algorithms use computation
methods to learn directly from data instead of relying on any predetermined equation that may
serve as a model.
While machine learning is not a new concept – dating back to World War II when the Enigma
Machine was used – the ability to apply complex mathematical calculations automatically to
growing volumes and varieties of available data is a relatively recent development.
Today, with the rise of big data, IoT, and ubiquitous computing, machine learning has become
essential for solving problems across numerous areas, such as
Machine learning algorithms are molded on a training dataset to create a model. As new input data
is introduced to the trained ML algorithm, it uses the developed model to make a prediction.
Types of Machine Learning
Machine learning algorithms can be trained in many ways, with each method having its pros and
cons. Based on these methods and ways of learning, machine learning is broadly categorized into
four main types:
Types
of Machine Learning
This type of ML involves supervision, where machines are trained on labeled datasets and enabled
to predict outputs based on the provided training. The labeled dataset specifies that some input and
output parameters are already mapped. Hence, the machine is trained with the input and
corresponding output. A device is made to predict the outcome using the test dataset in subsequent
phases.
For example, consider an input dataset of parrot and crow images. Initially, the machine is trained
to understand the pictures, including the parrot and crow’s color, eyes, shape, and size. Post-
training, an input picture of a parrot is provided, and the machine is expected to identify the object
and predict the output. The trained machine checks for the various features of the object, such as
color, eyes, shape, etc., in the input picture, to make a final prediction. This is the process of object
identification in supervised machine learning.
The primary objective of the supervised learning technique is to map the input variable (a) with
the output variable (b). Supervised machine learning is further classified into two broad categories:
Some known classification algorithms include the Random Forest Algorithm, Decision Tree
Algorithm, Logistic Regression Algorithm, and Support Vector Machine Algorithm.
Popular regression algorithms include the Simple Linear Regression Algorithm, Multivariate
Regression Algorithm, Decision Tree Algorithm, and Lasso Regression.
Unsupervised learning refers to a learning technique that’s devoid of supervision. Here, the
machine is trained using an unlabeled dataset and is enabled to predict the output without any
supervision. An unsupervised learning algorithm aims to group the unsorted dataset based on the
input’s similarities, differences, and patterns.
For example, consider an input dataset of images of a fruit-filled container. Here, the images are
not known to the machine learning model. When we input the dataset into the ML model, the task
of the model is to identify the pattern of objects, such as color, shape, or differences seen in the
input images and categorize them. Upon categorization, the machine then predicts the output as it
gets tested with a test dataset.
Unsupervised machine learning is further classified into two types:
Clustering: The clustering technique refers to grouping objects into clusters based
on parameters such as similarities or differences between objects. For example,
grouping customers by the products they purchase.
Some known clustering algorithms include the K-Means Clustering Algorithm, Mean-Shift
Algorithm, DBSCAN Algorithm, Principal Component Analysis, and Independent Component
Analysis.
Popular algorithms obeying association rules include the Apriori Algorithm, Eclat Algorithm, and
FP-Growth Algorithm.
3. Semi-supervised learning
4. Reinforcement learning
Unlike supervised learning, reinforcement learning lacks labeled data, and the agents learn via
experiences only. Consider video games. Here, the game specifies the environment, and each move
of the reinforcement agent defines its state. The agent is entitled to receive feedback via
punishment and rewards, thereby affecting the overall game score. The ultimate goal of the agent
is to achieve a high score.
Reinforcement learning is applied across different fields such as game theory, information theory,
and multi-agent systems. Reinforcement learning is further divided into two types of methods or
algorithms:
Industry verticals handling large amounts of data have realized the significance and value of
machine learning technology. As machine learning derives insights from data in real-time,
organizations using it can work efficiently and gain an edge over their competitors.
Every industry vertical in this fast-paced digital world, benefits immensely from machine learning
tech. Here, we look at the top five ML application sectors.
1. Healthcare industry
Machine learning is being increasingly adopted in the healthcare industry, credit to wearable
devices and sensors such as wearable fitness trackers, smart health watches, etc. All such devices
monitor users’ health data to assess their health in real-time.
Moreover, the technology is helping medical practitioners in analyzing trends or flagging events
that may help in improved patient diagnoses and treatment. ML algorithms even allow medical
experts to predict the lifespan of a patient suffering from a fatal disease with increasing accuracy.
To address these issues, companies like Genentech have collaborated with GNS Healthcare to
leverage machine learning and simulation AI platforms, innovating biomedical treatments to
address these issues. ML technology looks for patients’ response markers by analyzing individual
genes, which provides targeted therapies to patients.
2. Finance sector
Today, several financial organizations and banks use machine learning technology to tackle
fraudulent activities and draw essential insights from vast volumes of data. ML-derived insights
aid in identifying investment opportunities that allow investors to decide when to trade.
Moreover, data mining methods help cyber-surveillance systems zero in on warning signs of
fraudulent activities, subsequently neutralizing them. Several financial institutes have already
partnered with tech companies to leverage the benefits of machine learning.
For example,
Citibank has partnered with fraud detection company Feedzai to handle online and
in-person banking frauds.
PayPal uses several machine learning tools to differentiate between legitimate and
fraudulent transactions between buyers and sellers.
3. Retail sector
Retail websites extensively use machine learning to recommend items based on users’ purchase
history. Retailers use ML techniques to capture data, analyze it, and deliver personalized shopping
experiences to their customers. They also implement ML for marketing campaigns, customer
insights, customer merchandise planning, and price optimization.
According to a September 2021 report by Grand View Research, Inc., the global recommendation
engine market is expected to reach a valuation of $17.30 billion by 2028. Common day-to-day
examples of recommendation systems include:
When you browse items on Amazon, the product recommendations that you see on
the homepage result from machine learning algorithms. Amazon uses artificial
neural networks (ANN) to offer intelligent, personalized recommendations relevant
to customers based on their recent purchase history, comments, bookmarks, and
other online activities.
Netflix and YouTube rely heavily on recommendation systems to suggest shows and
videos to their users based on their viewing history.
Moreover, retail sites are also powered with virtual assistants or conversational chatbots that
leverage ML, natural language processing (NLP), and natural language understanding (NLU) to
automate customer shopping experiences.
4. Travel industry
Machine learning is playing a pivotal role in expanding the scope of the travel industry. Rides
offered by Uber, Ola, and even self-driving cars have a robust machine learning backend.
Consider Uber’s machine learning algorithm that handles the dynamic pricing of their rides. Uber
uses a machine learning model called ‘Geosurge’ to manage dynamic pricing parameters. It uses
real-time predictive modeling on traffic patterns, supply, and demand. If you are getting late for a
meeting and need to book an Uber in a crowded area, the dynamic pricing model kicks in, and you
can get an Uber ride immediately but would need to pay twice the regular fare.
Moreover, the travel industry uses machine learning to analyze user reviews. User comments are
classified through sentiment analysis based on positive or negative scores. This is used for
campaign monitoring, brand monitoring, compliance monitoring, etc., by companies in the travel
industry.
5. Social media
With machine learning, billions of users can efficiently engage on social media networks. Machine
learning is pivotal in driving social media platforms from personalizing news feeds to delivering
user-specific ads. For example, Facebook’s auto-tagging feature employs image recognition to
identify your friend’s face and tag them automatically. The social network uses ANN to recognize
familiar faces in users’ contact lists and facilitates automated tagging.
Similarly, LinkedIn knows when you should apply for your next role, whom you need to connect
with, and how your skills rank compared to peers. All these features are enabled by machine
learning.
https://www.ibm.com/topics/deep-learning
The chief difference between deep learning and machine learning is the structure of the underlying
neural network architecture. “Nondeep,” traditional machine learning models use simple neural
networks with one or two computational layers. Deep learning models use three or more layers—
but typically hundreds or thousands of layers—to train the models.
While supervised learning models require structured, labeled input data to make accurate outputs,
deep learning models can use unsupervised learning. With unsupervised learning, deep learning
models can extract the characteristics, features and relationships they need to make accurate
outputs from raw, unstructured data. Additionally, these models can even evaluate and refine their
outputs for increased precision.
Deep learning is an aspect of data science that drives many applications and services that
improve automation, performing analytical and physical tasks without human intervention. This
enables many everyday products and services—such as digital assistants, voice-enabled TV
remotes, credit card fraud detection, self-driving cars and generative AI.
Deep learning algorithms are incredibly complex, and there are different types of neural networks
to address specific problems or datasets. Here are six. Each has its own advantages and they are
presented here roughly in the order of their development, with each successive model adjusting to
overcome a weakness in a previous model.
One potential weakness across them all is that deep learning models are often “black boxes,”
making it difficult to understand their inner workings and posing interpretability challenges. But
this can be balanced against the overall benefits of high accuracy and scalability.
CNNs
Convolutional neural networks (CNNs or ConvNets) are used primarily in computer vision and
image classification applications. They can detect features and patterns within images and videos,
enabling tasks such as object detection, image recognition, pattern recognition and face
recognition. These networks harness principles from linear algebra, particularly matrix
multiplication, to identify patterns within an image.
CNNs are a specific type of neural network, which is composed of node layers, containing an input
layer, one or more hidden layers and an output layer. Each node connects to another and has an
associated weight and threshold. If the output of any individual node is above the specified
threshold value, that node is activated, sending data to the next layer of the network. Otherwise,
no data is passed along to the next layer of the network.
At least three main types of layers make up a CNN: a convolutional layer, pooling layer and fully
connected (FC) layer. For complex uses, a CNN might contain up to thousands of layers, each
layer building on the previous layers. By “convolution”—working and reworking the original
input—detailed patterns can be discovered. With each layer, the CNN increases in its complexity,
identifying greater portions of the image. Earlier layers focus on simple features, such as colors
and edges. As the image data progresses through the layers of the CNN, it starts to recognize larger
elements or shapes of the object until it finally identifies the intended object.
CNNs are distinguished from other neural networks by their superior performance with image,
speech or audio signal inputs. Before CNNs, manual and time-consuming feature extraction
methods were used to identify objects in images. However, CNNs now provide a more scalable
approach to image classification and object recognition tasks, and process high-dimensional data.
And CNNs can exchange data between layers, to deliver more efficient data processing. While
information might be lost in the pooling layer, this might be outweighed by the benefits of CNNs,
which can help to reduce complexity, improve efficiency and limit risk of overfitting.
There are other disadvantages to CNNs, which are computationally demanding—costing time and
budget, requiring many graphical processing units (GPUs). They also require highly trained
experts with cross-domain knowledge, and careful testing of configurations, hyperparameters and
configurations.
RNNs
Recurrent neural networks (RNNs) are typically used in natural language and speech
recognition applications as they use sequential or time-series data. RNNs can be identified by their
feedback loops. These learning algorithms are primarily used when using time-series data to make
predictions about future outcomes. Use cases include stock market predictions or sales forecasting,
or ordinal or temporal problems, such as language translation, natural language processing (NLP),
speech recognition and image captioning. These functions are often incorporated into popular
applications such as Siri, voice search and Google Translate.
RNNs use their “memory” as they take information from prior inputs to influence the current input
and output. While traditional deep neural networks assume that inputs and outputs are independent
of each other, the output of RNNs depends on the prior elements within the sequence. While future
events would also be helpful in determining the output of a given sequence, unidirectional
recurrent neural networks cannot account for these events in their predictions.
RNNs share parameters across each layer of the network and share the same weight parameter
within each layer of the network, with the weights adjusted through the processes of
backpropagation and gradient descent to facilitate reinforcement learning.
RNNs use a backpropagation through time (BPTT) algorithm to determine the gradients, which is
slightly different from traditional backpropagation as it is specific to sequence data. The principles
of BPTT are the same as traditional backpropagation, where the model trains itself by calculating
errors from its output layer to its input layer. BPTT differs from the traditional approach in that
BPTT sums errors at each time step, whereas feedforward networks do not need to sum errors as
they do not share parameters across each layer.
An advantage over other neural network types is that RNNs use both binary data processing and
memory. RNNs can plan out multiple inputs and productions so that rather than delivering only
one result for a single input, RMMs can produce one-to-many, many-to-one or many-to-many
outputs.
There are also options within RNNs. For example, the long short-term memory (LSTM) network
is superior to simple RNNs by learning and acting on longer-term dependencies.
However, RNNs tend to run into two basic problems, known as exploding gradients and vanishing
gradients. These issues are defined by the size of the gradient, which is the slope of the loss
function along the error curve.
When the gradient is vanishing and is too small, it continues to become smaller, updating
the weight parameters until they become insignificant—that is: zero (0). When that occurs,
the algorithm is no longer learning.
Exploding gradients occur when the gradient is too large, creating an unstable model. In
this case, the model weights grow too large, and they will eventually be represented as NaN
(not a number). One solution to these issues is to reduce the number of hidden layers within
the neural network, eliminating some of the complexity in the RNN models.
Some final disadvantages: RNNs might also require long training time and be difficult to use on
large datasets. Optimizing RNNs add complexity when they have many layers and parameters.
Deep learning made it possible to move beyond the analysis of numerical data, by adding the
analysis of images, speech and other complex data types. Among the first class of models to
achieve this were variational autoencoders (VAEs). They were the first deep-learning models to
be widely used for generating realistic images and speech, which empowered deep generative
modeling by making models easier to scale—which is the cornerstone of what we think of
as generative AI.
Autoencoders work by encoding unlabeled data into a compressed representation, and then
decoding the data back into its original form. Plain autoencoders were used for a variety of
purposes, including reconstructing corrupted or blurry images. Variational autoencoders added the
critical ability not just to reconstruct data, but also to output variations on the original data.
This ability to generate novel data ignited a rapid-fire succession of new technologies, from
generative adversarial networks (GANs) to diffusion models, capable of producing ever more
realistic—but fake—images. In this way, VAEs set the stage for today’s generative AI.
Autoencoders are built out of blocks of encoders and decoders, an architecture that also underpins
today’s large language models. Encoders compress a dataset into a dense representation, arranging
similar data points closer together in an abstract space. Decoders sample from this space to create
something new while preserving the dataset’s most important features.
The biggest advantage to autoencoders is the ability to handle large batches of data and show input
data in a compressed form, so the most significant aspects stand out—enabling anomaly detection
and classification tasks. This also speeds transmission and reduces storage requirements.
Autoencoders can be trained on unlabeled data so they might be used where labeled data is not
available. When unsupervised training is used, there is a time savings advantage: deep learning
algorithms learn automatically and gain accuracy without needing manual feature engineering. In
addition, VAEs can generate new sample data for text or image generation.
There are disadvantages to autoencoders. The training of deep or intricate structures can be a drain
on computational resources. And during unsupervised training, the model might overlook the
needed properties and instead simply replicate the input data. Autoencoders might also overlook
complex data linkages in structured data so that it does not correctly identify complex
relationships.
GANs
Generative adversarial networks (GANs) are neural networks that are used both in and outside of
artificial intelligence (AI) to create new data resembling the original training data. These can
include images appearing to be human faces—but are generated, not taken of real people. The
“adversarial” part of the name comes from the back-and-forth between the two portions of the
GAN: a generator and a discriminator.
The generator creates something: images, video or audio and then producing an output
with a twist. For example, a horse can be transformed into a zebra with some degree of
accuracy. The result depends on the input and how well-trained the layers are in the
generative model for this use case.
The discriminator is the adversary, where the generative result (fake image) is compared
against the real images in the dataset. The discriminator tries to distinguish between the
real and fake images, video or audio.
GANs train themselves. The generator creates fakes while the discriminator learns to spot the
differences between the generator's fakes and the true examples. When the discriminator is able to
flag the fake, then the generator is penalized. The feedback loop continues until the generator
succeeds in producing output that the discriminator cannot distinguish.
The prime GAN benefit is creating realistic output that can be difficult to distinguish from the
originals, which in turn may be used to further train machine learning models. Setting up a GAN
to learn is straightforward, since they are trained by using unlabeled data or with minor labeling.
However, the potential disadvantage is that the generator and discriminator might go back-and-
forth in competition for a long time, creating a large system drain. One training limitation is that a
huge amount of input data might be required to obtain a satisfactory output. Another potential
problem is “mode collapse,” when the generator produces a limited set of outputs rather than a
wider variety.
Diffusion models
Diffusion models are generative models that are trained using the forward and reverse diffusion
process of progressive noise-addition and denoising. Diffusion models generate data—most often
images—similar to the data on which they are trained, but then overwrite the data used to train
them. They gradually add Gaussian noise to the training data until it’s unrecognizable, then learn
a reversed “denoising” process that can synthesize output (usually images) from random noise
input.
A diffusion model learns to minimize the differences of the generated samples versus the desired
target. Any discrepancy is quantified and the model's parameters are updated to minimize the
loss—training the model to produce samples closely resembling the authentic training data.
Beyond image quality, diffusion models have the advantage of not requiring adversarial training,
which speeds the learning process and also offering close process control. Training is more stable
than with GANs and diffusion models are not as prone to mode collapse.
But, compared to GANs, diffusion models can require more computing resources to train,
including more fine-tuning. IBM Research® has also discovered that this form of generative AI
can be hijacked with hidden backdoors, giving attackers control over the image creation process
so that AI diffusion models can be tricked into generating manipulated images.
Transformer models
Using fill-in-the-blank guessing, the encoder learns how words and sentences relate to each other,
building up a powerful representation of language without having to label parts of speech and other
grammatical features. Transformers, in fact, can be pretrained at the outset without a particular
task in mind. After these powerful representations are learned, the models can later be
specialized—with much less data—to perform a requested task.
Several innovations make this possible. Transformers process words in a sentence simultaneously,
enabling text processing in parallel, speeding up training. Earlier techniques including recurrent
neural networks (RNNs) processed words one by one. Transformers also learned the positions of
words and their relationships—this context enables them to infer meaning and disambiguate words
such as “it” in long sentences.
By eliminating the need to define a task upfront, transformers made it practical to pretrain language
models on vast amounts of raw text, enabling them to grow dramatically in size. Previously,
labeled data was gathered to train one model on a specific task. With transformers, one model
trained on a massive amount of data can be adapted to multiple tasks by fine-tuning it on a small
amount of labeled task-specific data.
Language transformers today are used for nongenerative tasks such as classification and entity
extraction as well as generative tasks including machine translation, summarization and question
answering. Transformers have surprised many people with their ability to generate convincing
dialog, essays and other content.
Natural language processing (NLP) transformers provide remarkable power since they can run in
parallel, processing multiple portions of a sequence simultaneously, which then greatly speeds
training. Transformers also track long-term dependencies in text, which enables them to
understand the overall context more clearly and create superior output. In addition, transformers
are more scalable and flexible in order to be customized by task.
Deep learning is a method in artificial intelligence (AI) that teaches computers to process data in
a way that is inspired by the human brain. Deep learning models can recognize complex patterns
in pictures, text, sounds, and other data to produce accurate insights and predictions. You can use
deep learning methods to automate tasks that typically require human intelligence, such as
describing images or transcribing a sound file into text.
Artificial intelligence (AI) attempts to train computers to think and learn as humans do. Deep
learning technology drives many AI applications used in everyday products, such as the following:
Digital assistants
Fraud detection
Deep learning models are computer files that data scientists have trained to perform tasks using an
algorithm or a predefined set of steps. Businesses use deep learning models to analyze data and
make predictions in various applications.
Deep learning has several use cases in automotive, aerospace, manufacturing, electronics, medical
research, and other fields. These are some examples of deep learning:
Self-driving cars use deep learning models to automatically detect road signs and pedestrians.
Defense systems use deep learning to automatically flag areas of interest in satellite images.
Medical image analysis uses deep learning to automatically detect cancer cells for medical
diagnosis.
Factories use deep learning applications to automatically detect when people or objects are within
an unsafe distance of machines.
You can group these various use cases of deep learning into four broad categories—computer
vision, speech recognition, natural language processing (NLP), and recommendation engines.
Computer vision
Computer vision is the computer's ability to extract information and insights from images and
videos. Computers can use deep learning techniques to comprehend images in the same way that
humans do. Computer vision has several applications, such as the following:
Content moderation to automatically remove unsafe or inappropriate content from image and
video archives
Facial recognition to identify faces and recognize attributes like open eyes, glasses, and facial hair
Image classification to identify brand logos, clothing, safety gear, and other image details
Speech recognition
Deep learning models can analyze human speech despite varying speech patterns, pitch, tone,
language, and accent. Virtual assistants such as Amazon Alexa and automatic transcription
software use speech recognition to do the following tasks:
Accurately subtitle videos and meeting recordings for a wider content reach.
Natural language processing
Computers use deep learning algorithms to gather insights and meaning from text data and
documents. This ability to process natural, human-created text has several use cases, including in
these functions:
Indexing of key phrases that indicate sentiment, such as positive and negative comments on social
media
Recommendation engines
Applications can use deep learning methods to track user activity and develop personalized
recommendations. They can analyze the behavior of various users and help them discover new
products or services. For example, many media and entertainment companies, such as Netflix,
Fox, and Peacock, use deep learning to give personalized video recommendations.
Deep learning algorithms are neural networks that are modeled after the human brain. For example,
a human brain contains millions of interconnected neurons that work together to learn and process
information. Similarly, deep learning neural networks, or artificial neural networks, are made of
many layers of artificial neurons that work together inside the computer.
Artificial neurons are software modules called nodes, which use mathematical calculations to
process data. Artificial neural networks are deep learning algorithms that use these nodes to solve
complex problems.
Input layer
An artificial neural network has several nodes that input data into it. These nodes make up the
input layer of the system.
Hidden layer
The input layer processes and passes the data to layers further in the neural network. These hidden
layers process information at different levels, adapting their behavior as they receive new
information. Deep learning networks have hundreds of hidden layers that they can use to analyze
a problem from several different angles.
For example, if you were given an image of an unknown animal that you had to classify, you would
compare it with animals you already know. For example, you would look at the shape of its eyes
and ears, its size, the number of legs, and its fur pattern. You would try to identify patterns, such
as the following:
The animal has cat eyes, so it could be some type of wild cat.
The hidden layers in deep neural networks work in the same way. If a deep learning algorithm is
trying to classify an animal image, each of its hidden layers processes a different feature of the
animal and tries to accurately categorize it.
Output layer
The output layer consists of the nodes that output the data. Deep learning models that output "yes"
or "no" answers have only two nodes in the output layer. On the other hand, those that output a
wider range of answers have more nodes.
Deep learning is a subset of machine learning. Deep learning algorithms emerged in an attempt to
make traditional machine learning techniques more efficient. Traditional machine learning
methods require significant human effort to train the software. For example, in animal image
recognition, you need to do the following:
This process is called supervised learning. In supervised learning, result accuracy improves only
when you have a broad and sufficiently varied dataset. For instance, the algorithm might accurately
identify black cats but not white cats because the training dataset had more images of black cats.
In that case, you would need to label more white cat images and train the machine learning models
once again.
A deep learning network has the following benefits over traditional machine learning.
Machine learning methods find unstructured data, such as text documents, challenging to process
because the training dataset can have infinite variations. On the other hand, deep learning models
can comprehend unstructured data and make general observations without manual feature
extraction. For instance, a neural network can recognize that these two different input sentences
have the same meaning:
A deep learning application can analyze large amounts of data more deeply and reveal new insights
for which it might not have been trained. For example, consider a deep learning model that is
trained to analyze consumer purchases. The model has data only for the items you have already
purchased. However, the artificial neural network can suggest new items that you haven't bought
by comparing your buying patterns to those of other similar customers.
Unsupervised learning
Deep learning models can learn and improve over time based on user behavior. They do not require
large variations of labeled datasets. For example, consider a neural network that automatically
corrects or suggests words by analyzing your typing behavior. Let's assume it was trained in the
English language and can spell-check English words. However, if you frequently type non-English
words, such as danke, the neural network automatically learns and autocorrects these words too.
Volatile datasets have large variations. One example is loan repayment amounts in a bank. A deep
learning neural network can categorize and sort that data as well, such as by analyzing financial
transactions and flagging some of them for fraud detection.
As deep learning is a relatively new technology, certain challenges come with its practical
implementation.
Large quantities of high-quality data
Deep learning algorithms give better results when you train them on large amounts of high-quality
data. Outliers or mistakes in your input dataset can significantly affect the deep learning process.
For instance, in our animal image example, the deep learning model might classify an airplane as
a turtle if non-animal images were accidentally introduced in the dataset.
To avoid such inaccuracies, you must clean and process large amounts of data before you can train
deep learning models. The input data preprocessing requires large amounts of data storage
capacity.
Deep learning algorithms are compute-intensive and require infrastructure with sufficient compute
capacity to properly function. Otherwise, they take a long time to process results.
https://cloud.google.com/discover/what-is-deep-learning
Once a neural network has been trained, it can be used to make predictions with new data it’s
received.
Both deep learning and machine learning are branches of artificial intelligence, with machine
learning being a broader term encompassing various techniques, including deep learning. Both
machine learning and deep learning algorithms can be trained on labeled or unlabeled data,
depending on the task and algorithm.
Machine learning and deep learning are both applicable to tasks such as image recognition, speech
recognition, and natural language processing. However, deep learning often outperforms
traditional machine learning in complex pattern recognition tasks like image classification and
object detection due to its ability to learn hierarchical representations of data.
Image recognition: To identify objects and features in images, such as people, animals, places,
etc.
Natural language processing: To help understand the meaning of text, such as in customer
service chatbots and spam filters.
Finance: To help analyze financial data and make predictions about market trends
Text to image: Convert text into images, such as in the Google Translate app.
CNNs are used for image recognition and processing. They are particularly good at identifying
objects in images, even when those objects are partially obscured or distorted.
Deep reinforcement learning is used for robotics and game playing. It is a type of machine learning
that allows an agent to learn how to behave in an environment by interacting with it and receiving
rewards or punishments.
RNNs are used for natural language processing and speech recognition. They are particularly good
at understanding the context of a sentence or phrase, and they can be used to generate text or
translate languages.
Can learn complex relationships between features in data: This makes them more powerful
than traditional machine learning methods.
Large dataset training: This makes them very scalable, and able to learn from a wider range of
experiences, making more accurate predictions.
Data-driven learning: DL models can learn in a data-driven way, requiring less human
intervention to train them, increasing efficiency and scalability. These models learn from data that
is constantly being generated, such as data from sensors or social media.
Data requirements: Deep learning models require large amounts of data to learn from, making it
difficult to apply deep learning to problems where there is not a lot of data available.
Overfitting: DL models may be prone to overfitting. This means that they can learn the noise in
the data rather than the underlying relationships.
Bias: These models can potentially be biased, depending on the data that it’s based on. This can
lead to unfair or inaccurate predictions. It is important to take steps to mitigate bias in deep learning
models.
https://www.geeksforgeeks.org/introduction-deep-learning/
The definition of Deep learning is that it is the branch of machine learning that is based on
artificial neural network architecture. An artificial neural network or ANN uses layers of
interconnected nodes called neurons that work together to process and learn from the input data.
In a fully connected Deep neural network, there is an input layer and one or more hidden layers
connected one after the other. Each neuron receives input from the previous layer neurons or
the input layer. The output of one neuron becomes the input to other neurons in the next layer
of the network, and this process continues until the final layer produces the output of the
network. The layers of the neural network transform the input data through a series of nonlinear
transformations, allowing the network to learn complex representations of the input data.
Today Deep learning AI has become one of the most popular and visible areas of machine
learning, due to its success in a variety of applications, such as computer vision, natural
language processing, and Reinforcement learning.
Deep learning AI can be used for supervised, unsupervised as well as reinforcement machine
learning. it uses a variety of ways to process these.
Supervised Machine Learning: Supervised machine learning is the machine
learning technique in which the neural network learns to make predictions or classify data
based on the labeled datasets. Here we input both input features along with the target
variables. the neural network learns to make predictions based on the cost or error that
comes from the difference between the predicted and the actual target, this process is known
as backpropagation. Deep learning algorithms like Convolutional neural networks,
Recurrent neural networks are used for many supervised tasks like image classifications and
recognization, sentiment analysis, language translations, etc.
Unsupervised Machine Learning: Unsupervised machine learning is the machine
learning technique in which the neural network learns to discover the patterns or to cluster
the dataset based on unlabeled datasets. Here there are no target variables. while the machine
has to self-determined the hidden patterns or relationships within the datasets. Deep learning
algorithms like autoencoders and generative models are used for unsupervised tasks like
clustering, dimensionality reduction, and anomaly detection.
Reinforcement Machine Learning: Reinforcement Machine Learning is the machine
learning technique in which an agent learns to make decisions in an environment to
maximize a reward signal. The agent interacts with the environment by taking action and
observing the resulting rewards. Deep learning can be used to learn policies, or a set of
actions, that maximizes the cumulative reward over time. Deep reinforcement learning
algorithms like Deep Q networks and Deep Deterministic Policy Gradient (DDPG) are used
to reinforce tasks like robotics and game playing etc.
Apply statistical algorithms to learn the Uses artificial neural network architecture
hidden patterns and relationships in the to learn the hidden patterns and
dataset. relationships in the dataset.
Takes less time to train the model. Takes more time to train the model.
What is Clustering?
The task of grouping data points based on their similarity with each other is called Clustering
or Cluster Analysis. This method is defined under the branch of Unsupervised Learning, which
aims at gaining insights from unlabelled data points, that is, unlike supervised learning we don’t
have a target variable.
Clustering aims at forming groups of homogeneous data points from a heterogeneous dataset.
It evaluates the similarity based on a metric like Euclidean distance, Cosine similarity,
Manhattan distance, etc. and then group the points with highest similarity score together.
For Example, In the graph given below, we can clearly see that there are 3 circular clusters
forming on the basis of distance
Now it is not necessary that the clusters formed must be circular in shape. The shape of clusters
can be arbitrary. There are many algortihms that work well with detecting arbitrary shaped
clusters.
For example, In the below given graph we can see that the clusters formed are not circular in
shape.
Types of Clustering
Broadly speaking, there are 2 types of clustering that can be performed to group similar data
points:
Hard Clustering: In this type of clustering, each data point belongs to a cluster completely
or not. For example, Let’s say there are 4 data point and we have to cluster them into 2
clusters. So each data point will either belong to cluster 1 or cluster 2.
A C1
B C2
C C2
D C1
Soft Clustering: In this type of clustering, instead of assigning each data point into a
separate cluster, a probability or likelihood of that point being that cluster is evaluated. For
example, Let’s say there are 4 data point and we have to cluster them into 2 clusters. So we
will be evaluating a probability of a data point belonging to both clusters. This probability
is calculated for all data points.
A 0.91 0.09
B 0.3 0.7
C 0.17 0.83
D 1 0
Uses of Clustering
Now before we begin with types of clustering algorithms, we will go through the use cases of
Clustering algorithms. Clustering algorithms are majorly used for:
Market Segmentation – Businesses use clustering to group their customers and use targeted
advertisements to attract more audience.
Market Basket Analysis – Shop owners analyze their sales and figure out which items are
majorly bought together by the customers. For example, In USA, according to a study
diapers and beers were usually bought together by fathers.
Social Network Analysis – Social media sites use your data to understand your browsing
behaviour and provide you with targeted friend recommendations or content
recommendations.
Medical Imaging – Doctors use Clustering to find out diseased areas in diagnostic images
like X-rays.
Anomaly Detection – To find outliers in a stream of real-time dataset or forecasting
fraudulent transactions we can use clustering to identify them.
Simplify working with large datasets – Each cluster is given a cluster ID after clustering is
complete. Now, you may reduce a feature set’s whole feature set into its cluster ID.
Clustering is effective when it can represent a complicated case with a straightforward
cluster ID. Using the same principle, clustering data can make complex datasets simpler.
There are many more use cases for clustering but there are some of the major and common use
cases of clustering. Moving forward we will be discussing Clustering Algorithms that will help
you perform the above tasks.
Types of Clustering Algorithms
At the surface level, clustering helps in the analysis of unstructured data. Graphing, the shortest
distance, and the density of the data points are a few of the elements that influence cluster
formation. Clustering is the process of determining how related the objects are based on a metric
called the similarity measure. Similarity metrics are easier to locate in smaller sets of features.
It gets harder to create similarity measures as the number of features increases. Depending on
the type of clustering algorithm being utilized in data mining, several techniques are employed
to group the data from the datasets. In this part, the clustering techniques are described. Various
types of clustering algorithms are:
1. Centroid-based Clustering (Partitioning methods)
2. Density-based Clustering (Model-based methods)
3. Connectivity-based Clustering (Hierarchical clustering)
4. Distribution-based Clustering
We will be going through each of these types in brief.
1. Centroid-based Clustering (Partitioning methods)
Partitioning methods are the most easiest clustering algorithms. They group data points on the
basis of their closeness. Generally, the similarity measure chosen for these algorithms are
Euclidian distance, Manhattan Distance or Minkowski Distance. The datasets are separated into
a predetermined number of clusters, and each cluster is referenced by a vector of values. When
compared to the vector value, the input data variable shows no difference and joins the cluster.
The primary drawback for these algorithms is the requirement that we establish the number of
clusters, “k,” either intuitively or scientifically (using the Elbow Method) before any clustering
machine learning system starts allocating the data points. Despite this, it is still the most popular
type of clustering. K-means and K-medoids clustering are some examples of this type
clustering.
2. Density-based Clustering (Model-based methods)
Density-based clustering, a model-based method, finds groups based on the density of data
points. Contrary to centroid-based clustering, which requires that the number of clusters be
predefined and is sensitive to initialization, density-based clustering determines the number of
clusters automatically and is less susceptible to beginning positions. They are great at handling
clusters of different sizes and forms, making them ideally suited for datasets with irregularly
shaped or overlapping clusters. These methods manage both dense and sparse data regions by
focusing on local density and can distinguish clusters with a variety of morphologies.
In contrast, centroid-based grouping, like k-means, has trouble finding arbitrary shaped clusters.
Due to its preset number of cluster requirements and extreme sensitivity to the initial positioning
of centroids, the outcomes can vary. Furthermore, the tendency of centroid-based approaches
to produce spherical or convex clusters restricts their capacity to handle complicated or
irregularly shaped clusters. In conclusion, density-based clustering overcomes the drawbacks
of centroid-based techniques by autonomously choosing cluster sizes, being resilient to
initialization, and successfully capturing clusters of various sizes and forms. The most popular
density-based clustering algorithm is DBSCAN.
3. Connectivity-based Clustering (Hierarchical clustering)
A method for assembling related data points into hierarchical clusters is called hierarchical
clustering. Each data point is initially taken into account as a separate cluster, which is
subsequently combined with the clusters that are the most similar to form one large cluster that
contains all of the data points.
Think about how you may arrange a collection of items based on how similar they are. Each
object begins as its own cluster at the base of the tree when using hierarchical clustering, which
creates a dendrogram, a tree-like structure. The closest pairings of clusters are then combined
into larger clusters after the algorithm examines how similar the objects are to one another.
When every object is in one cluster at the top of the tree, the merging process has finished.
Exploring various granularity levels is one of the fun things about hierarchical clustering. To
obtain a given number of clusters, you can select to cut the dendrogram at a particular height.
The more similar two objects are within a cluster, the closer they are. It’s comparable to
classifying items according to their family trees, where the nearest relatives are clustered
together and the wider branches signify more general connections. There are 2 approaches for
Hierarchical clustering:
Divisive Clustering: It follows a top-down approach, here we consider all data points to be
part one big cluster and then this cluster is divide into smaller groups.
Agglomerative Clustering: It follows a bottom-up approach, here we consider all data
points to be part of individual clusters and then these clusters are clubbed together to make
one big cluster with all data points.
4. Distribution-based Clustering
Using distribution-based clustering, data points are generated and organized according to their
propensity to fall into the same probability distribution (such as a Gaussian, binomial, or other)
within the data. The data elements are grouped using a probability-based distribution that is
based on statistical distributions. Included are data objects that have a higher likelihood of being
in the cluster. A data point is less likely to be included in a cluster the further it is from the
cluster’s central point, which exists in every cluster.
A notable drawback of density and boundary-based approaches is the need to specify the
clusters a priori for some algorithms, and primarily the definition of the cluster form for the
bulk of algorithms. There must be at least one tuning or hyper-parameter selected, and while
doing so should be simple, getting it wrong could have unanticipated repercussions.
Distribution-based clustering has a definite advantage over proximity and centroid-based
clustering approaches in terms of flexibility, accuracy, and cluster structure. The key issue is
that, in order to avoid overfitting, many clustering methods only work with simulated or
manufactured data, or when the bulk of the data points certainly belong to a preset distribution.
The most popular distribution-based clustering algorithm is Gaussian Mixture Model.
https://www.subex.com/blog/introduction-to-clustering-in-data-science/
What is Clustering and How it Works?
Clustering is the task of dividing the population or data points into several groups such that data
points in the same groups are similar to other data points in that group and dissimilar to the data
points in other groups. It is basically an assembly of objects based on similarity and dissimilarity
between them.
The Importance of Clustering
Clustering helps in understanding the natural grouping in a dataset. Their motivation is to check
out to parcel the information into some gathering of legitimate groupings. Grouping quality relies
upon the strategies and the identification of hidden patterns. The biggest advantage of clustering
over-classification is it can adapt to the changes made and helps single out useful features that
differentiate different groups.
It can be used in the field of biology, by deriving animal and plant taxonomies, identifying genes
with the same capabilities.
It helps marketers to find the distinct groups in their customer base and they can characterize their
customer groups by using purchasing patterns.
Different Types of Clustering Methods
Connectivity-based Clustering (Hierarchical clustering)
Centroid based clustering is considered as one of the simplest clustering algorithms, yet the most
effective way of creating clusters and assigning data points to it. The intuition behind centroid-
based clustering is that a cluster is characterized and represented by a central vector and data points
that are in close proximity to these vectors are assigned to the respective clusters.
Distribution-based Clustering
Distribution-based clustering creates, and groups data points based on their likely hood of
belonging to the same probability distribution in the data
Density-based clustering methods take density into consideration instead of distances. Clusters are
considered as the densest region in a data space, which is separated by regions of lower object
density, and it is defined as a maximal set of connected points.
Fuzzy Clustering
The general idea about clustering revolves around assigning data points to mutually exclusive
clusters, meaning, a data point always resides uniquely inside a cluster, and it cannot belong to
more than one cluster. Fuzzy clustering methods change this paradigm by assigning a data-point
to multiple clusters with a quantified degree of belongingness metric.
Constraint-based (Supervised Clustering)
The clustering process, in general, is based on the approach that the data can be divided into an
optimal number of “unknown” groups. The underlying stages of all the clustering algorithms to
find those hidden patterns and similarities, without any intervention or predefined conditions
If you are working with ML algorithms, chances are you will be widely using Clustering.
Clustering is an incredibly useful unsupervised machine learning method that has a wide variety
of applications.
https://www.javatpoint.com/clustering-in-machine-learning
Clustering or cluster analysis is a machine learning technique, which groups the unlabelled
dataset. It can be defined as "A way of grouping the data points into different clusters, consisting
of similar data points. The objects with the possible similarities remain in a group that has less or
no similarities with another group."
It does it by finding some similar patterns in the unlabelled dataset such as shape, size, color,
behavior, etc., and divides them as per the presence and absence of those similar patterns.
After applying this clustering technique, each cluster or group is provided with a cluster-ID. ML
system can use this id to simplify the processing of large and complex datasets. The clustering
technique is commonly used for statistical data analysis.
Example: Let's understand the clustering technique with the real-world example of Mall: When
we visit any shopping mall, we can observe that the things with similar usage are grouped together.
Such as the t-shirts are grouped in one section, and trousers are at other sections, similarly, at
vegetable sections, apples, bananas, Mangoes, etc., are grouped in separate sections, so that we
can easily find out the things. The clustering technique also works in the same way. Other examples
of clustering are grouping documents according to the topic.
The clustering technique can be widely used in various tasks. Some most common uses of this
technique are:
o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.
Apart from these general usages, it is used by the Amazon in its recommendation system to
provide the recommendations as per the past search of products. Netflix also uses this technique
to recommend the movies and web-series to its users as per the watch history.
The below diagram explains the working of the clustering algorithm. We can see the different
fruits are divided into several groups with similar properties.
1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering
Partitioning Clustering
It is a type of clustering that divides the data into non-hierarchical groups. It is also known as
the centroid-based method. The most common example of partitioning clustering is the K-
Means Clustering algorithm.
In this type, the dataset is divided into a set of k groups, where K is used to define the number of
pre-defined groups. The cluster center is created in such a way that the distance between the data
points of one cluster is minimum as compared to another cluster centroid.
Density-Based Clustering
The density-based clustering method connects the highly-dense areas into clusters, and the
arbitrarily shaped distributions are formed as long as the dense region can be connected. This
algorithm does it by identifying different clusters in the dataset and connects the areas of high
densities into clusters. The dense areas in data space are divided from each other by sparser areas.
These algorithms can face difficulty in clustering the data points if the dataset has varying densities
and high dimensions.
In the distribution model-based clustering method, the data is divided based on the probability of
how a dataset belongs to a particular distribution. The grouping is done by assuming some
distributions commonly Gaussian Distribution.
The example of this type is the Expectation-Maximization Clustering algorithm that uses
Gaussian Mixture Models (GMM).
Hierarchical Clustering
Hierarchical clustering can be used as an alternative for the partitioned clustering as there is no
requirement of pre-specifying the number of clusters to be created. In this technique, the dataset is
divided into clusters to create a tree-like structure, which is also called a dendrogram. The
observations or any number of clusters can be selected by cutting the tree at the correct level. The
most common example of this method is the Agglomerative Hierarchical algorithm.
Fuzzy Clustering
Fuzzy clustering is a type of soft method in which a data object may belong to more than one group
or cluster. Each dataset has a set of membership coefficients, which depend on the degree of
membership to be in a cluster. Fuzzy C-means algorithm is the example of this type of clustering;
it is sometimes also known as the Fuzzy k-means algorithm.
Clustering Algorithms
The Clustering algorithms can be divided based on their models that are explained above. There
are different types of clustering algorithms published, but only a few are commonly used. The
clustering algorithm is based on the kind of data that we are using. Such as, some algorithms need
to guess the number of clusters in the given dataset, whereas some are required to find the
minimum distance between the observation of the dataset.
Here we are discussing mainly popular Clustering algorithms that are widely used in machine
learning:
1. K-Means algorithm: The k-means algorithm is one of the most popular clustering
algorithms. It classifies the dataset by dividing the samples into different clusters of equal
variances. The number of clusters must be specified in this algorithm. It is fast with fewer
computations required, with the linear complexity of O(n).
2. Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth
density of data points. It is an example of a centroid-based model, that works on updating
the candidates for centroid to be the center of the points within a given region.
3. DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications
with Noise. It is an example of a density-based model similar to the mean-shift, but with
some remarkable advantages. In this algorithm, the areas of high density are separated by
the areas of low density. Because of this, the clusters can be found in any arbitrary shape.
4. Expectation-Maximization Clustering using GMM: This algorithm can be used as an
alternative for the k-means algorithm or for those cases where K-means can be failed. In
GMM, it is assumed that the data points are Gaussian distributed.
5. Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm
performs the bottom-up hierarchical clustering. In this, each data point is treated as a single
cluster at the outset and then successively merged. The cluster hierarchy can be represented
as a tree-structure.
6. Affinity Propagation: It is different from other clustering algorithms as it does not require
to specify the number of clusters. In this, each data point sends a message between the pair
of data points until convergence. It has O(N2T) time complexity, which is the main
drawback of this algorithm.
Applications of Clustering
Below are some commonly known applications of clustering technique in Machine Learning:
o In Identification of Cancer Cells: The clustering algorithms are widely used for the
identification of cancerous cells. It divides the cancerous and non-cancerous data sets into
different groups.
o In Search Engines: Search engines also work on the clustering technique. The search
result appears based on the closest object to the search query. It does it by grouping similar
data objects in one group that is far from the other dissimilar objects. The accurate result
of a query depends on the quality of the clustering algorithm used.
o Customer Segmentation: It is used in market research to segment the customers based on
their choice and preferences.
o In Biology: It is used in the biology stream to classify different species of plants and
animals using the image recognition technique.
o In Land Use: The clustering technique is used in identifying the area of similar lands use
in the GIS database. This can be very useful to find that for what purpose the particular
land should be used, that means for which purpose it is more suitable.
https://www.techtarget.com/searchenterpriseai/definition/clustering-in-machine-learning
Clustering is a data science technique in machine learning that groups similar rows in a data set.
After running a clustering technique, a new column appears in the data set to indicate the group
each row of data fits into best. Since rows of data, or data points, often represent people, financial
transactions, documents or other important entities, these groups tend to form clusters of similar
entities that have several kinds of real-world applications.
Since clustering generally groups input data, it can be very creative and flexible. Clustering can be
used for data exploration and preprocessing, as well as specific applications. From a technical
perspective, common applications of clustering include the following:
Data visualization. Data often contains natural groups or segments, and clustering should be
able to find them. Visualizing clusters can be a highly informative data analysis approach.
Prototypes. Prototypes are data points that represent many other points and help explain data
and models. If a cluster represents a large market segment, then the data point at the cluster
center -- or cluster centroid -- is the prototypical member of that market segment.
Sampling. Since clustering can define groups in the data, clusters can be used to create
different types of data samples. Drawing an equal number of data points from each cluster in
a data set, for example, can create a balanced sample of the population represented by that data
set.
For business applications, clustering is a battle-tested tool in market segmentation and fraud
detection. Clustering is also useful for categorizing documents, making product recommendations
and in other applications where grouping entities makes sense.
There are many types of clustering algorithms, but K-means and hierarchical clustering are the
most widely available in data science tools.
K-means clustering
The K-means clustering algorithm, choose a specific number of clusters to create in the data and
denote that number as k. K can be 3, 10, 1,000 or any other number of clusters, but smaller numbers
work better. The algorithm then makes k clusters and the center point of each cluster or centroid
becomes the mean, or average, value of each variable inside the cluster. K-means and related
approaches -- such as k-mediods for character data or k-prototypes for mixed numeric and
character data -- are fast and work well on large data sets. However, they usually make simple,
spherical clusters of roughly the same size.
Hierarchical clustering
If you're seeking more complex and realistic clusters of different shapes and sizes, and don't want
to pick the k before starting the analysis, hierarchical clustering might be a better choice.
Hierarchical clustering accommodates a divisive approach: start with one big cluster, break that
cluster into smaller ones until each point is in its own cluster and then choose from all the
interesting clustering solutions in between.
Another option is an agglomerative approach, in which each data point starts in its own cluster.
Combine the data points into clusters until all the points are in one big cluster and then choose the
best clusters in between. Unfortunately, hierarchical clustering algorithms tend to be slow or
impossible for big data, so a k still has to be chosen to arrive at the final answer.
One of the hardest parts of clustering is choosing the number of clusters that best suits the data and
application. There are data-driven methods to estimate k, such as silhouette score and gap statistics.
These quantitative formulas provide a numeric score that helps choose the best number of clusters.
Domain knowledge can also be used: For example, a project has enough budget for 10 different
marketing campaigns, so commercial concerns dictate 10 is a good number of clusters or
experienced marketers who have worked in a certain vertical for a long time know the best number
of segments for the market. Combining quantitative analysis and domain knowledge often works
well, too.
In clustering, answers are usually validated through a technique known as profiling, which
involves naming the clusters. For example, DINKs (dual income, no kids), HINRYs (high income,
not rich yet) and hockey moms are all names that refer to groups of consumers. These names are
usually determined by looking at the centroid -- or prototypical data point -- for each cluster and
ensuring they're logical and different from the other discovered prototypes.
Visualization is also a key aspect of profiling. Clusters can be plotted to ensure they don't overlap
and that their arrangement makes sense. For example, clusters for very different market segments
should appear visually distant in a plot.
Clustering has many business applications. Two of these use cases are explained below and
illustrated in Figure 1 and Figure 2 in the graphic titled "Clustering use cases."
For a data set of customers in which each row of data -- or data point -- is a customer, clustering
techniques can be used to create groups of similar customers. Known as market segments, these
customer groups can improve marketing efforts.
Figure 1 uses data pertaining to consumers' income and property value and K-means clustering to
find three larger, roughly circular and similarly sized clusters within that market.
Cluster 1 appears to be a group of affluent consumers who own homes -- perhaps some DINKs.
Cluster 2 likely represents middle-class homeowners -- probably some hockey moms and dads.
Cluster 3 contains higher income consumers who don't appear to own homes -- HINRYs in many
cases.
One of the more common applications of market segments is to optimize the money spent on
marketing. For example, it probably doesn't make sense to send grocery coupons to Clusters 1 and
3 because they're unlikely to use them. On the other hand, premium co-branded credit card offers
are likely wasted on Cluster 2 because they don't want the annual fees. With this knowledge of
market segments, marketers can spend their budgets in a more efficient manner.
https://www.geeksforgeeks.org/association-rule/
Association Rule
Association rule mining finds interesting associations and relationships among large sets of data
items. This rule shows how frequently a itemset occurs in a transaction. A typical example is a
Market Based Analysis. Market Based Analysis is one of the key techniques used by large
relations to show associations between items.It allows retailers to identify relationships between
the items that people buy together frequently. Given a set of transactions, we can find rules that
will predict the occurrence of an item based on the occurrences of other items in the transaction.
TID Items
1 Bread, Milk
Before we start defining the rule, let us first see the basic definitions. Support Count( )
– Frequency of occurrence of a itemset.
Here ({Milk, Bread, Diaper})=2
= 2/5
= 0.4
= 2/3
= 0.67
= 0.4/(0.6*0.6)
= 1.11
The Association rule is very useful in analyzing datasets. The data is collected using bar -code
scanners in supermarkets. Such databases consists of a large number of transaction records
which list all items bought by a customer on a single purchase. So the manager could know if
certain groups of items are consistently purchased together and use this data for adjusting store
layouts, cross-selling, promotions based on statistics.
https://www.javatpoint.com/association-rule-learning
Association rule learning is a type of unsupervised learning technique that checks for the
dependency of one data item on another data item and maps accordingly so that it can be more
profitable. It tries to find some interesting relations or associations among the variables of dataset.
It is based on different rules to discover the interesting relations between variables in the database.
The association rule learning is one of the very important concepts of machine learning, and it is
employed in Market Basket analysis, Web usage mining, continuous production, etc. Here
market basket analysis is a technique used by the various big retailer to discover the associations
between items. We can understand it by taking an example of a supermarket, as in a supermarket,
all products that are purchased together are put together.
For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk, so these
products are stored within a shelf or mostly nearby. Consider the below diagram:
1. Apriori
2. Eclat
3. F-P Growth Algorithm
How does Association Rule Learning work?
Association rule learning works on the concept of If and Else Statement, such as if A then B.
Here the If element is called antecedent, and then statement is called as Consequent. These types
of relationships where we can find out some association or relation between two items is known as
single cardinality. It is all about creating rules, and if the number of items increases, then
cardinality also increases accordingly. So, to measure the associations between thousands of data
items, there are several metrics. These metrics are given below:
o Support
o Confidence
o Lift
Support
Support is the frequency of A or how frequently an item appears in the dataset. It is defined as the
fraction of the transaction T that contains the itemset X. If there are X datasets, then for transactions
T, it can be written as:
Confidence
Confidence indicates how often the rule has been found to be true. Or how often the items X and
Y occur together in the dataset when the occurrence of X is already given. It is the ratio of the
transaction that contains X and Y to the number of records that contain X.
Lift
It is the ratio of the observed support measure and expected support if X and Y are independent of
each other. It has three possible values:
Apriori Algorithm
This algorithm uses frequent datasets to generate association rules. It is designed to work on the
databases that contain transactions. This algorithm uses a breadth-first search and Hash Tree to
calculate the itemset efficiently.
It is mainly used for market basket analysis and helps to understand the products that can be bought
together. It can also be used in the healthcare field to find drug reactions for patients.
Eclat Algorithm
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a depth-first
search technique to find frequent itemsets in a transaction database. It performs faster execution
than Apriori Algorithm.
The F-P growth algorithm stands for Frequent Pattern, and it is the improved version of the
Apriori Algorithm. It represents the database in the form of a tree structure that is known as a
frequent pattern or tree. The purpose of this frequent tree is to extract the most frequent patterns.
It has various applications in machine learning and data mining. Below are some popular
applications of association rule learning:
o Market Basket Analysis: It is one of the popular examples and applications of association
rule mining. This technique is commonly used by big retailers to determine the association
between items.
o Medical Diagnosis: With the help of association rules, patients can be cured easily, as it
helps in identifying the probability of illness for a particular disease.
o Protein Sequence: The association rules help in determining the synthesis of artificial
Proteins.
o It is also used for the Catalog Design and Loss-leader Analysis and many more other
applications.
https://towardsdatascience.com/association-rules-2-aa9a77241654
Association Rules is one of the very important concepts of machine learning being used in market
basket analysis. In a store, all vegetables are placed in the same aisle, all dairy items are placed
together and cosmetics form another set of such groups. Investing time and resources on deliberate
product placements like this not only reduces a customer’s shopping time, but also reminds the
customer of what relevant items (s)he might be interested in buying, thus helping stores cross-sell
in the process. Association rules help uncover all such relationships between items from huge
databases.
To elaborate on this idea — Rules do not tie back a users’ different transactions over time to identify
relationships. List of items with unique transaction IDs (from all users) are studied as one
group. This is helpful in placement of products on aisles. On the other hand, collaborative filtering
ties back all transactions corresponding to a user ID to identify similarity between users’
preferences. This is helpful in recommending items on e-commerce websites, recommending songs
on spotify, etc.
Lets now see what an association rule exactly looks like. It consists of an antecedent and a
consequent, both of which are a list of items. Note that implication here is co-occurrence and not
causality. For a given rule, itemset is the list of all the items in the antecedent and the consequent.
Various metrics are in place to help us understand the strength of association between these two.
Let us go through them all.
1. Support
This measure gives an idea of how frequent an itemset is in all the transactions. Consider itemset1 =
{bread} and itemset2 = {shampoo}. There will be far more transactions containing bread than those
containing shampoo. So as you rightly guessed, itemset1 will generally have a higher support
than itemset2. Now consider itemset1 = {bread, butter} and itemset2 = {bread, shampoo}. Many
transactions will have both bread and butter on the cart but bread and shampoo? Not so much. So
in this case, itemset1 will generally have a higher support than itemset2. Mathematically, support is
the fraction of the total number of transactions in which the itemset occurs.
Value of support helps us identify the rules worth considering for further analysis. For example,
one might want to consider only the itemsets which occur at least 50 times out of a total of 10,000
transactions i.e. support = 0.005. If an itemset happens to have a very low support, we do not have
enough information on the relationship between its items and hence no conclusions can be drawn
from such a rule.
2. Confidence
This measure defines the likeliness of occurrence of consequent on the cart given that the cart
already has the antecedents. That is to answer the question — of all the transactions containing say,
{Captain Crunch}, how many also had {Milk} on them? We can say by common knowledge that
{Captain Crunch} → {Milk} should be a high confidence rule. Technically, confidence is the
conditional probability of occurrence of consequent given the antecedent.
Let us consider few more examples before moving ahead. What do you think would be the
confidence for {Butter} → {Bread}? That is, what fraction of transactions having butter also had
bread? Very high i.e. a value close to 1? That’s right. What about {Yogurt} → {Milk}? High again.
{Toothbrush} → {Milk}? Not so sure? Confidence for this rule will also be high since {Milk} is
such a frequent itemset and would be present in every other transaction.
It does not matter what you have in the antecedent for such a frequent consequent. The confidence
for an association rule having a very frequent consequent will always be high.
I will introduce some numbers here to clarify this further.
Total transactions = 100. 10 of them have both milk and toothbrush, 70 have milk but no toothbrush
and 4 have toothbrush but no milk.
Consider the numbers from figure on the left. Confidence for {Toothbrush} → {Milk} will be
10/(10+4) = 0.7
Looks like a high confidence value. But we know intuitively that these two products have a weak
association and there is something misleading about this high confidence value. Lift is introduced
to overcome this challenge.
Considering just the value of confidence limits our capability to make any business inference.
3. Lift
Lift controls for the support (frequency) of consequent while calculating the conditional probability
of occurrence of {Y} given {X}. Lift is a very literal term given to this measure. Think of it as the
*lift* that {X} provides to our confidence for having {Y} on the cart. To rephrase, lift is the rise in
probability of having {Y} on the cart with the knowledge of {X} being present over the probability
of having {Y} on the cart without any knowledge about presence of {X}. Mathematically,
In cases where {X} actually leads to {Y} on the cart, value of lift will be greater than 1. Let us
understand this with an example which will be continuation of the {Toothbrush} → {Milk} rule.
Probability of having milk on the cart with the knowledge that toothbrush is present (i.e. confidence)
: 10/(10+4) = 0.7
Now to put this number in perspective, consider the probability of having milk on the cart without
any knowledge about toothbrush: 80/100 = 0.8
These numbers show that having toothbrush on the cart actually reduces the probability of having
milk on the cart to 0.7 from 0.8! This will be a lift of 0.7/0.8 = 0.87. Now that’s more like the real
picture. A value of lift less than 1 shows that having toothbrush on the cart does not increase the
chances of occurrence of milk on the cart in spite of the rule showing a high confidence value. A
value of lift greater than 1 vouches for high association between {Y} and {X}. More the value of
lift, greater are the chances of preference to buy {Y} if the customer has already bought {X}. Lift is
the measure that will help store managers to decide product placements on aisle.
https://www.spiceworks.com/tech/artificial-intelligence/articles/what-is-linear-regression/
What Is Linear Regression?
The independent variable is also the predictor or explanatory variable that remains unchanged due
to the change in other variables. However, the dependent variable changes with fluctuations in the
independent variable. The regression model predicts the value of the dependent variable, which is
the response or outcome variable being analyzed or studied.
Thus, linear regression is a supervised learning algorithm that simulates a mathematical
relationship between variables and makes predictions for continuous or numeric variables such as
sales, salary, age, product price, etc.
This analysis method is advantageous when at least two variables are available in the data, as
observed in stock market forecasting, portfolio management, scientific analysis, etc.
Linear regression is a popular statistical tool used in data science, thanks to the several benefits it
offers, such as:
1. Easy implementation
The linear regression model is computationally simple to implement as it does not demand a lot of
engineering overheads, neither before the model launch nor during its maintenance.
2. Interpretability
Unlike other deep learning models (neural networks), linear regression is relatively
straightforward. As a result, this algorithm stands ahead of black-box models that fall short in
justifying which input variable causes the output variable to change.
3. Scalability
Linear regression is not computationally heavy and, therefore, fits well in cases where scaling is
essential. For example, the model can scale well regarding increased data volume (big data).
The ease of computation of these algorithms allows them to be used in online settings. The model
can be trained and retrained with each new example to generate predictions in real-time, unlike the
neural networks or support vector machines that are computationally heavy and require plenty of
computing resources and substantial waiting time to retrain on a new dataset. All these factors
make such compute-intensive models expensive and unsuitable for real-time applications.
The above features highlight why linear regression is a popular model to solve real-life machine
learning problems.
ypes of Linear Regression with Examples
Linear regression has been a critical driving force behind many AI and data science applications.
This statistical technique is beneficial for businesses as it is a simple, interpretable, and efficient
method to evaluate trends and make future estimates or forecasts.
Simple linear regression reveals the correlation between a dependent variable (input) and an
independent variable (output). Primarily, this regression type describes the following:
The value of the dependent variable is based on the value of the independent variable.
Multiple linear regression establishes the relationship between independent variables (two or
more) and the corresponding dependent variable. Here, the independent variables can be either
continuous or categorical. This regression type helps foresee trends, determine future values, and
predict the impacts of changes.
Example: Consider the task of calculating blood pressure. In this case, height, weight, and amount
of exercise can be considered independent variables. Here, we can use multiple linear regression
to analyze the relationship between the three independent variables and one dependent variable, as
all the variables considered are quantitative.
3. Logistic regression
Logistic regression—also referred to as the logit model—is applicable in cases where there is one
dependent variable and more independent variables. The fundamental difference between multiple
and logistic regression is that the target variable in the logistic approach is discrete (binary or an
ordinal value). Implying, the dependent variable is finite or categorical–either P or Q (binary
regression) or a range of limited options P, Q, R, or S.
The variable value is limited to just two possible outcomes in linear regression. However, logistic
regression addresses this issue as it can return a probability score that shows the chances of any
particular event.
Example: One can determine the likelihood of choosing an offer on your website (dependent
variable). For analysis purposes, you can look at various visitor characteristics such as the sites
they came from, count of visits to your site, and activity on your site (independent variables). This
can help determine the probability of certain visitors who are more likely to accept the offer. As a
result, it allows you to make better decisions on whether to promote the offer on your site or not.
Furthermore, logistic regression is extensively used in machine learning algorithms in cases such
as spam email detection, predicting a loan amount for a customer, and more.
4. Ordinal regression
Ordinal regression involves one dependent dichotomous variable and one independent variable,
which can either be ordinal or nominal. It facilitates the interaction between dependent variables
with multiple ordered levels with one or more independent variables.
For a dependent variable with m categories, (m -1) equations will be created. Each equation has a
different intercept but the same slope coefficients for the predictor variables. Thus, ordinal
regression creates multiple prediction equations for various categories. In machine learning,
ordinal regression refers to ranking learning or ranking analysis computed using a generalized
linear model (GLM).
Example: Consider a survey where the respondents are supposed to answer as ‘agree’ or
‘disagree.’ In some cases, such responses are of no help as one cannot derive a definitive
conclusion, complicating the generalized results. However, you can observe a natural order in the
categories by adding levels to responses, such as agree, strongly agree, disagree, and strongly
disagree. Ordinal regression thus helps in predicting the dependent variable having ‘ordered’
multiple categories using independent variables.
Multinomial logistic regression (MLR) is performed when the dependent variable is nominal with
more than two levels. It specifies the relationship between one dependent nominal variable and
one or more continuous-level (interval, ratio, or dichotomous) independent variables. Here, the
nominal variable refers to a variable with no intrinsic ordering.
Example: Multinomial logit can be used to model the program choices made by school students.
The program choices, in this case, refer to a vocational program, sports program, and academic
program. The choice of type of program can be predicted by considering a variety of attributes,
such as how well the students can read and write on the subjects given, gender, and awards received
by them.
Here, the dependent variable is the choice of programs with multiple levels (unordered). The
multinomial logistic regression technique is used to make predictions in such a case.
https://www.geeksforgeeks.org/ml-linear-regression/
Linear Regression
Here Y is called a dependent or target variable and X is called an independent variable also
known as the predictor of Y. There are many types of functions or modules that can be used for
regression. A linear function is the simplest type of function. Here, X may be a single feature
or multiple features representing the problem.
Linear regression performs the task to predict a dependent variable value (y) based on a given
independent variable (x)). Hence, the name is Linear Regression. In the figure above, X (input)
is the work experience and Y (output) is the salary of a person. The regression line is the best-
fit line for our model.
We utilize the cost function to compute the best values in order to get the best fit line since
different values for weights or the coefficient of lines result in different regression lines.
https://www.ibm.com/topics/linear-regression
For each variable: Consider the number of valid cases, mean and standard deviation.
For each model: Consider regression coefficients, correlation matrix, part and partial
correlations, multiple R, R2, adjusted R2, change in R2, standard error of the estimate,
analysis-of-variance table, predicted values and residuals. Also, consider 95-percent-
confidence intervals for each regression coefficient, variance-covariance matrix, variance
inflation factor, tolerance, Durbin-Watson test, distance measures (Mahalanobis, Cook and
leverage values), DfBeta, DfFit, prediction intervals and case-wise diagnostic information.
Plots: Consider scatterplots, partial plots, histograms and normal probability plots.
Data: Dependent and independent variables should be quantitative. Categorical variables,
such as religion, major field of study or region of residence, need to be recoded to binary
(dummy) variables or other types of contrast variables.
Other assumptions: For each value of the independent variable, the distribution of the
dependent variable must be normal. The variance of the distribution of the dependent
variable should be constant for all values of the independent variable. The relationship
between the dependent variable and each independent variable should be linear and all
observations should be independent.
https://aws.amazon.com/what-is/logistic-
regression/#:~:text=Logistic%20regression%20is%20a%20data,outcomes%2C%20like%20yes%
20or%20no.
Logistic regression is a data analysis technique that uses mathematics to find the relationships
between two data factors. It then uses this relationship to predict the value of one of those factors
based on the other. The prediction usually has a finite number of outcomes, like yes or no.
For example, let’s say you want to guess if your website visitor will click the checkout button in
their shopping cart or not. Logistic regression analysis looks at past visitor behavior, such as time
spent on the website and the number of items in the cart. It determines that, in the past, if visitors
spent more than five minutes on the site and added more than three items to the cart, they clicked
the checkout button. Using this information, the logistic regression function can then predict the
behavior of a new website visitor.
Logistic regression is an important technique in the field of artificial intelligence and machine
learning (AI/ML). ML models are software programs that you can train to perform complex data
processing tasks without human intervention. ML models built using logistic regression help
organizations gain actionable insights from their business data. They can use these insights for
predictive analysis to reduce operational costs, increase efficiency, and scale faster. For example,
businesses can uncover patterns that improve employee retention or lead to more profitable product
design.
Below, we list some benefits of using logistic regression over other ML techniques.
Simplicity
Logistic regression models are mathematically less complex than other ML methods. Therefore,
you can implement them even if no one on your team has in-depth ML expertise.
Speed
Logistic regression models can process large volumes of data at high speed because they require
less computational capacity, such as memory and processing power. This makes them ideal for
organizations that are starting with ML projects to gain some quick wins.
Flexibility
You can use logistic regression to find answers to questions that have two or more finite outcomes.
You can also use it to preprocess data. For example, you can sort data with a large range of values,
such as bank transactions, into a smaller, finite range of values by using logistic regression. You
can then process this smaller data set by using other ML techniques for more accurate analysis.
Visibility
Logistic regression analysis gives developers greater visibility into internal software processes
than do other data analysis techniques. Troubleshooting and error correction are also easier
because the calculations are less complex.
Manufacturing companies use logistic regression analysis to estimate the probability of part failure
in machinery. They then plan maintenance schedules based on this estimate to minimize future
failures.
Healthcare
Medical researchers plan preventive care and treatment by predicting the likelihood of disease in
patients. They use logistic regression models to compare the impact of family history or genes on
diseases.
Finance
Financial companies have to analyze financial transactions for fraud and assess loan applications
and insurance applications for risk. These problems are suitable for a logistic regression model
because they have discrete outcomes, like high risk or low risk and fraudulent or not fraudulent.
Marketing
Online advertising tools use the logistic regression model to predict if users will click on an
advertisement. As a result, marketers can analyze user responses to different words and images
and create high-performing advertisements with which customers will engage.
Logistic regression is one of several different regression analysis techniques that data scientists
commonly use in machine learning (ML). To understand logistic regression, we must first
understand basic regression analysis. Below, we use an example of linear regression analysis to
demonstrate how regression analysis works.
Any data analysis begins with a business question. For logistic regression, you should frame the
question to get particular outcomes:
Do rainy days impact our monthly sales? (yes or no)
What type of credit card activity is the customer performing? (authorized, fraudulent, or
potentially fraudulent)
Collect historical data
After identifying the question, you need to identify the data factors that are involved. You will
then collect past data for all factors. For example, to answer the first question shown above, you
could collect the number of rainy days and your monthly sales data for each month in the past three
years.
You will process the historical data using regression software. The software will process the
different data points and connect them mathematically by using equations. For example, if the
number of rainy days for three months are 3, 5, and 8 and the number of sales in those months are
8, 12, and 18, the regression algorithm will connect the factors with the equation:
For unknown values, the software uses the equation to make a prediction. If you know that it will
rain for six days in July, the software will estimate July’s sale value as 14.
To understand the logistic regression model, let’s first understand equations and variables.
Equations
In mathematics, equations give the relationship between two variables: x and y. You can use these
equations, or functions, to plot a graph along the x-axis and y-axis by putting in different values
of x and y. For instance, if you plot the graph for the function y = 2*x, you will get a straight line
as shown below. Hence this function is also called a linear function.
Variables
In statistics, variables are the data factors or attributes whose values vary. For any analysis, certain
variables are independent or explanatory variables. These attributes are the cause of an outcome.
Other variables are dependent or response variables; their values depend on the independent
variables. In general, logistic regression explores how independent variables affect one dependent
variable by looking at historical data values of both variables.
In our example above, x is called the independent variable, predictor variable, or explanatory
variable because it has a known value. Y is called the dependent variable, outcome variable, or
response variable because its value is unknown.
Logistic regression is a statistical model that uses the logistic function, or logit function, in
mathematics as the equation between x and y. The logit function maps y as a sigmoid function of x.
If you plot this logistic regression equation, you will get an S-curve as shown below.
As you can see, the logit function returns only values between 0 and 1 for the dependent variable,
irrespective of the values of the independent variable. This is how logistic regression estimates the
value of the dependent variable. Logistic regression methods also model equations between
multiple independent variables and one dependent variable.
In many cases, multiple explanatory variables affect the value of the dependent variable. To model
such input datasets, logistic regression formulas assume a linear relationship between the different
independent variables. You can modify the sigmoid function and compute the final output variable
as
The symbol β represents the regression coefficient. The logit model can reverse calculate these
coefficient values when you give it a sufficiently large experimental dataset with known values of
both dependent and independent variables.
Log odds
The logit model can also determine the ratio of success to failure or log odds. For example, if you
were playing poker with your friends and you won four matches out of 10, your odds of winning
are four sixths, or four out of six, which is the ratio of your success to failure. The probability of
winning, on the other hand, is four out of 10.
Mathematically, your odds in terms of probability are p/(1 - p), and your log odds are log (p/(1
- p)). You can represent the logistic function as log odds as shown below:
There are three approaches to logistic regression analysis based on the outcomes of the dependent
variable.
Binary logistic regression works well for binary classification problems that have only two
possible outcomes. The dependent variable can have only two values, such as yes and no or 0 and
1.
Even though the logistic function calculates a range of values between 0 and 1, the binary
regression model rounds the answer to the closest values. Generally, answers below 0.5 are
rounded to 0, and answers above 0.5 are rounded to 1, so that the logistic function returns a binary
outcome.
Multinomial regression can analyze problems that have several possible outcomes as long as the
number of outcomes is finite. For example, it can predict if house prices will increase by 25%,
50%, 75%, or 100% based on population data, but it cannot predict the exact value of a house.
Multinomial logistic regression works by mapping outcome values to different values between 0
and 1. Since the logistic function can return a range of continuous data, like 0.1, 0.11, 0.12, and so
on, multinomial regression also groups the output to the closest possible values.
Ordinal logistic regression
Ordinal logistic regression, or the ordered logit model, is a special type of multinomial regression
for problems in which numbers represent ranks rather than actual values. For example, you would
use ordinal regression to predict the answer to a survey question that asks customers to rank your
service as poor, fair, good, or excellent based on a numerical value, such as the number of items
they purchase from you over the year.
The two common data analysis techniques are linear regression analysis and deep learning.
As explained above, linear regression models the relationship between dependent and independent
variables by using a linear combination. The linear regression equation is
Linear regression predicts a continuous dependent variable by using a given set of independent
variables. A continuous variable can have a range of values, such as price or age. So linear
regression can predict actual values of the dependent variable. It can answer questions like "What
will the price of rice be after 10 years?"
Unlike linear regression, logistic regression is a classification algorithm. It cannot predict actual
values for continuous data. It can answer questions like "Will the price of rice increase by 50% in
10 years?"
https://www.geeksforgeeks.org/understanding-logistic-regression/
Sigmoid Function
Now we use the sigmoid function where the input will be z and we find the probability between 0
and 1. i.e. predicted y.
σ(z)=11+e−zσ(z)=1+e−z1
Sigmoid function
As shown above, the figure sigmoid function converts the continuous variable data into
the probability i.e. between 0 and 1.
σ(z) σ(z) tends towards 1 as z→∞z→∞
σ(z) σ(z) tends towards 0 as z→−∞z→−∞
σ(z) σ(z) is always bounded between 0 and 1
where the probability of being a class can be measured as:
P(y=1)=σ(z)P(y=0)=1−σ(z)P(y=1)=σ(z)P(y=0)=1−σ(z)
Logistic Regression Equation
The odd is the ratio of something occurring to something not occurring. it is different from
probability as the probability is the ratio of something occurring to everything that could possibly
occur. so odd will be:
p(x)1−p(x) =ez1−p(x)p(x) =ez
https://www.ibm.com/topics/logistic-regression
This type of statistical model (also known as logit model) is often used for classification and
predictive analytics. Since the outcome is a probability, the dependent variable is bounded between
0 and 1. In logistic regression, a logit transformation is applied on the odds—that is, the probability
of success divided by the probability of failure. This is also commonly known as the log odds, or
the natural logarithm of odds, and this logistic function is represented by the following formulas:
In this logistic regression equation, logit(pi) is the dependent or response variable and x is the
independent variable. The beta parameter, or coefficient, in this model is commonly estimated via
maximum likelihood estimation (MLE). This method tests different values of beta through
multiple iterations to optimize for the best fit of log odds. All of these iterations produce the log
likelihood function, and logistic regression seeks to maximize this function to find the best
parameter estimate. Once the optimal coefficient (or coefficients if there is more than one
independent variable) is found, the conditional probabilities for each observation can be calculated,
logged, and summed together to yield a predicted probability. For binary classification, a
probability less than .5 will predict 0 while a probability greater than 0 will predict 1. After the
model has been computed, it’s best practice to evaluate the how well the model predicts the
dependent variable, which is called goodness of fit. The Hosmer–Lemeshow test is a popular
method to assess model fit.
Both linear and logistic regression are among the most popular models within data science, and
open-source tools, like Python and R, make the computation for them quick and easy.
Linear regression models are used to identify the relationship between a continuous dependent
variable and one or more independent variables. When there is only one independent variable and
one dependent variable, it is known as simple linear regression, but as the number of independent
variables increases, it is referred to as multiple linear regression. For each type of linear regression,
it seeks to plot a line of best fit through a set of data points, which is typically calculated using the
least squares method.
Similar to linear regression, logistic regression is also used to estimate the relationship between a
dependent variable and one or more independent variables, but it is used to make a prediction about
a categorical variable versus a continuous one. A categorical variable can be true or false, yes or
no, 1 or 0, et cetera. The unit of measure also differs from linear regression as it produces a
probability, but the logit function transforms the S-curve into straight line.
While both models are used in regression analysis to make predictions about future outcomes,
linear regression is typically easier to understand. Linear regression also does not require as large
of a sample size as logistic regression needs an adequate sample to represent values across all the
response categories. Without a larger, representative sample, the model may not have sufficient
statistical power to detect a significant effect.
There are three types of logistic regression models, which are defined based on categorical
response.
Logistic regression is commonly used for prediction and classification problems. Some of these
use cases include:
Fraud detection: Logistic regression models can help teams identify data anomalies,
which are predictive of fraud. Certain behaviors or characteristics may have a higher
association with fraudulent activities, which is particularly helpful to banking and other
financial institutions in protecting their clients. SaaS-based companies have also started to
adopt these practices to eliminate fake user accounts from their datasets when conducting
data analysis around business performance.
Disease prediction: In medicine, this analytics approach can be used to predict the
likelihood of disease or illness for a given population. Healthcare organizations can set up
preventative care for individuals that show higher propensity for specific illnesses.
Churn prediction: Specific behaviors may be indicative of churn in different functions of
an organization. For example, human resources and management teams may want to know
if there are high performers within the company who are at risk of leaving the organization;
this type of insight can prompt conversations to understand problem areas within the
company, such as culture or compensation. Alternatively, the sales organization may want
to learn which of their clients are at risk of taking their business elsewhere. This can prompt
teams to set up a retention strategy to avoid lost revenue.
https://www.geeksforgeeks.org/types-of-regression-techniques/
Along with the development of the machine learning domain regression analysis techniques
have gained popularity as well as developed manifold from just y = mx + c. There are several
types of regression techniques, each suited for different types of data and different types of
1. Linear Regression
2. Polynomial Regression
3. Stepwise Regression
7. Ridge Regression
8. Lasso Regression
9. ElasticNet Regression
Linear Regression
Linear regression is used for predictive analysis. Linear regression is a linear approach for
modeling the relationship between the criterion or the scalar response and the multiple
distribution of the response given the values of the predictors. For linear regression, there is a
Syntax:
y = θx + b
where,
This is the most basic form of regression analysis and is used to model a linear relationship
Here, a linear regression model is instantiated to fit a linear relationship between input features
(X) and target values (y). This code is used for simple demonstration of the approach.
Python
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a linear
Polynomial Regression
This is an extension of linear regression and is used to model a non-linear relationship between
the dependent variable and independent variables. Here as well syntax remains the same but
now in the input variables we include some polynomial or higher degree terms of some already
existing features as well. Linear regression was only able to fit a linear model to the data at
hand but with polynomial features, we can easily fit some non-linear relationship between the
Here is the code for simple demonstration of the Polynomial regression approach.
Python
model = PolynomialRegression(degree=2)
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a
Stepwise Regression
Stepwise regression is used for fitting regression models with predictive models. It is carried
out automatically. With each step, the variable is added or subtracted from the set of explanatory
variables. The approaches for stepwise regression are forward selection, backward elimination,
Python
model = StepwiseLinearRegression(forward=True,
backward=True,
verbose=1)
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Stepwise
A Decision Tree is the most powerful and popular tool for classification and prediction.
A Decision tree is a flowchart-like tree structure, where each internal node denotes a test on an
attribute, each branch represents an outcome of the test, and each leaf node (terminal node)
holds a class label. There is a non-parametric method used to model a decision tree to predict a
continuous outcome.
Here is the code for simple demonstration of the Decision Tree regression approach.
Python
model = DecisionTreeRegressor()
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Decision
classification tasks with the use of multiple decision trees and a technique called Bootstrap and
Aggregation, commonly known as bagging. The basic idea behind this is to combine multiple
decision trees in determining the final output rather than relying on individual decision trees.
Random Forest has multiple decision trees as base learning models. We randomly perform row
sampling and feature sampling from the dataset forming sample datasets for every model. This
Here is the code for simple demonstration of the Random Forest regression approach.
Python
model = RandomForestRegressor(n_estimators=100)
model.fit(X, y)
# Predict the response for a new data point
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Random
Support vector regression (SVR) is a type of support vector machine (SVM) that is used for
regression tasks. It tries to find a function that best predicts the continuous output value for a
SVR can use both linear and non-linear kernels. A linear kernel is a simple dot product between
two input vectors, while a non-linear kernel is a more complex function that can capture more
intricate patterns in the data. The choice of kernel depends on the data’s characteristics and the
task’s complexity.
Here is the code for simple demonstration of the Support vector regression approach.
Python
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Support
Ridge Regression
Ridge regression is a technique for analyzing multiple regression data. When multicollinearity
occurs, least squares estimates are unbiased. This is a regularized linear regression model, it
tries to reduce the model complexity by adding a penalty term to the cost function. A degree of
bias is added to the regression estimates, and as a result, ridge regression reduces the standard
errors.
Here is the code for simple demonstration of the Ridge regression approach.
Python
from sklearn.linear_model import Ridge
model = Ridge(alpha=0.1)
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Ridge
Lasso Regression
Lasso regression is a regression analysis method that performs both variable selection
and regularization. Lasso regression uses soft thresholding. Lasso regression selects only a
This is another regularized linear regression model, it works by adding a penalty term to the
cost function, but it tends to zero out some features’ coefficients, which makes it useful for
feature selection.
Here is the code for simple demonstration of the Lasso regression approach.
Python
model = Lasso(alpha=0.1)
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Lasso
ElasticNet Regression
Linear Regression suffers from overfitting and can’t deal with collinear data. When there are
many features in the dataset and even some of them are not relevant to the predictive model.
This makes the model more complex with a too-inaccurate prediction on the test set (or
overfitting). Such a model with high variance does not generalize on the new data. So, to deal
with these issues, we include both L-2 and L-1 norm regularization to get the benefits of both
Ridge and Lasso at the same time. The resultant model has better predictive power than Lasso.
It performs feature selection and also makes the hypothesis simpler. The modified cost function
where,
Here is the code for simple demonstration of the Elasticnet regression approach.
Python
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Elastic
As the name suggests this algorithm is purely based on Bayes Theorem. Because of this reason
only we do not use the Least Square method to determine the coefficients of the regression
model. So, the technique which is used here to find the model weights and parameters relies on
features posterior distribution and this provides an extra stability factor to the regression model
Here is the code for simple demonstration of the Bayesian Linear regression approach.
Python
model = BayesianLinearRegression()
model.fit(X, y)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Bayesian
https://www.appier.com/en/blog/5-types-of-regression-analysis-and-when-to-use-
them#:~:text=Lasso%20regression%20Like%20ridge%20regression,not%20happen%20with
%20ridge%20regression.
Regression analysis is an incredibly powerful machine learning tool used for analyzing data. Here
we will explore how it works, what the main types are and what it can do for your business.
Regression analysis is a way of predicting future happenings between a dependent (target) and
one or more independent variables (also known as a predictor). For example, it can be used to
predict the relationship between reckless driving and the total number of road accidents caused
by a driver, or, to use a business example, the effect on sales and spending a certain amount of
money on advertising.
Regression is one of the most common models of machine learning. It differs from classification
models because it estimates a numerical value, whereas classification models identify which
category an observation belongs to.
The main uses of regression analysis are forecasting, time series modeling and finding the cause
and effect relationship between variables.
Why Is It Important?
Regression has a wide range of real-life applications. It is essential for any machine learning
problem that involves continuous numbers – this includes, but is not limited to, a host of
examples, including:
Testing automobiles
As well as telling you whether a significant relationship exists between two or more variables,
regression analysis can give specific details about that relationship. Specifically, it can estimate
the strength of impact that multiple variables will have on a dependent variable. If you change
the value of one variable (price, say), regression analysis should tell you what effect that will
have on the dependent variable (sales).
Businesses can use regression analysis to test the effects of variables as measured on different
scales. With it in your toolbox, you can assess the best set of variables to use when building
predictive models, greatly increasing the accuracy of your forecasting.
Finally, regression analysis is the best way of solving regression problems in machine learning
using data modeling. By plotting data points on a chart and running the best fit line through
them, you can predict each data point’s likelihood of error: the further away from the line they
lie, the higher their error of prediction (this best fit line is also known as a regression line).
1. Linear regression
One of the most basic types of regression in machine learning, linear regression comprises a
predictor variable and a dependent variable related to each other in a linear fashion. Linear
regression involves the use of a best fit line, as described above.
You should use linear regression when your variables are related linearly. For example, if you
are forecasting the effect of increased advertising spend on sales. However, this analysis is
susceptible to outliers, so it should not be used to analyze big data sets.
2. Logistic regression
Does your dependent variable have a discrete value? In other words, can it only have one of
two values (either 0 or 1, true or false, black or white, spam or not spam, and so on)? In that
case, you might want to use logistic regression to analyze your data.
Logistic regression uses a sigmoid curve to show the relationship between the target and
independent variables. However, caution should be exercised: logistic regression works best
with large data sets that have an almost equal occurrence of values in target variables. The
dataset should not contain a high correlation between independent variables (a phenomenon
known as multicollinearity), as this will create a problem when ranking the variables.
3. Ridge regression
If, however, you do have a high correlation between independent variables, ridge regression is
a more suitable tool. It is known as a regularization technique, and is used to reduce the
complexity of the model. It introduces a small amount of bias (known as the ‘ridge regression
penalty’) which, using a bias matrix, makes the model less susceptible to overfitting.
4. Lasso regression
Like ridge regression, lasso regression is another regularization technique that reduces the
model’s complexity. It does so by prohibiting the absolute size of the regression coefficient.
This causes the coefficient value to become closer to zero, which does not happen with ridge
regression.
The advantage? It can use feature selection, letting you select a set of features from the dataset
to build the model. By only using the required features – and setting the rest as zero – lasso
regression avoids overfitting.
5. Polynomial regression
Polynomial regression models a non-linear dataset using a linear model. It is the equivalent of
making a square peg fit into a round hole. It works in a similar way to multiple linear regression
(which is just linear regression but with multiple independent variables), but uses a non -linear
curve. It is used when data points are present in a non-linear fashion.
The model transforms these data points into polynomial features of a given degree, and models
them using a linear model. This involves best fitting them using a polynomial line, which is
curved, rather than the straight line seen in linear regression. However, this model can be prone
to overfitting, so you are advised to analyze the curve towards the end to avoid odd-looking
results.