0% found this document useful (0 votes)
28 views22 pages

AI Unit 4

MN M

Uploaded by

Ayush Rai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views22 pages

AI Unit 4

MN M

Uploaded by

Ayush Rai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

UNIT 4

Machine Learning
Machine Learning is the field of study that gives computers the capability to learn without being explicitly
programmed. ML is one of the most exciting technologies that one would have ever come across. As it is
evident from the name, it gives the computer that makes it more similar to humans: The ability to learn.
Machine learning is actively being used today, perhaps in many more places than one would expect.

Machine learning is the sub field of AI in which we try to improve decision making power of
intelligent agents. Agent has a performance element that decides what actions to take and a learning
element that modifies the performance element so that it makes better decisions. Design of learning
element is affected by following three major factors:
1) Which components of performance element are to be learned.
2) What feedback is available to learn these components.
3) What is representation method used for components.
Following are some ways of learning mostly used in machines:
(A) Logical learning (B) Inductive learning (C) Deductive
learning.
Logical Learning: In this process a new concept or solution through the use of similar known
concepts. We use this type of learning when solving problems on an exam , where previously learned
examples serve as a guide or when we learn to drive a truck using our knowledge of car driving.

Inductive Learning: This technique requires the use of inductive inference, a form of invalid but
useful inference. We use inductive learning when we formulate a general concept after seeing a
number of instances or examples of the concept. E.g : When we learn the concept of color or sweet
taste after experiencing sensation associated with several objects.

Deductive Learning: This is performed through a sequence of deductive inference steps using known
facts. From the known facts , new facts or relationships are logically delivered. E.g : If we have an
information that weather is Hot and Humid then we can infer that it may Rain also. Another example
may be , let
P → Q & Q → R , 𝑡ℎ𝑒𝑛 𝑤𝑒 𝑐𝑎𝑛 𝑖𝑛𝑓𝑒𝑟 𝑡ℎ𝑎𝑡 𝑃 → 𝑅

Features of Machine learning


 Machine learning is data driven technology. Large amount of data generated by organizations on
daily bases. So, by notable relationships in data, organizations makes better decisions.
 Machine can learn itself from past data and automatically improve.
 From the given dataset it detects various patterns on data.
 For the big organizations branding is important and it will become moreeasy to target relatable
customer base.
 It is similar to data mining because it is also deals with the huge amount of data.

Applications of Machine Learning

 Image Recognition
 Speech Recognition
 Recommender Systems
 Fraud Detection
 Self- Driving Cars
 Medical Diagnosis
 Stock Market Trading
 Virtual Try On

Key Differences Between Artificial Intelligence (AI) and Machine Learning (ML)

Sl.No. ARTIFICIAL INTELLIGENCE MACHINE LEARNING


ML stands for Machine Learning which
AI stands for Artificial intelligence,
is defined as the
1 where intelligence is defined as the
acquisition of knowledge or skill
ability to acquire and apply knowledge.
AI is the broader family consisting of ML and Machine Learning is the subset of Artificial
2.
DL as its components. Intelligence.
The aim is to increase the chance of success and The aim is to increase accuracy, but it does
3.
not accuracy. not care about; the success
AI is aiming to develop an intelligent system Machine learning is attempting to construct
4. capable of performing a variety of complex jobs. machines that can only accomplish the jobs for
decision-making which they have been trained.
It works as a computer program that does smart Here, the tasks systems machine takes data and
5.
work. learns from data.
The goal is to simulate natural intelligence to The goal is to learn from data on certain tasks to
6.
solve complex problems. maximize the performance on that task.
8. AI has a very broad variety of applications. The scope of machine learning is constrained.
ML allows systems to learn new things from
9. AI is decision-making.
data.
It is developing a system that mimics humans to
10. It involves creating self-learning algorithms.
solve problems.
ML will go for a solution whether it is optimal
11. AI will go for finding the optimal solution.
or not.
12. AI leads to intelligence or wisdom. ML leads to knowledge.
AI is a broader family consisting of ML and DL
13. ML is a subset of AI.
as its components.
Three broad categories of AI are : Three broad categories of ML are :

1. Artificial Narrow Intelligence (ANI) 1. Supervised Learning


14.
2. Artificial General Intelligence (AGI) 2. Unsupervised Learning
3. Artificial Super Intelligence (ASI) 3. Reinforcement Learning

ML can work with only structured and semi-


AI can work with structured, semi-structured,
15. structured data.
and unstructured data.
16. AI’s key uses include- The most common uses of machine learning-
Sl.No. ARTIFICIAL INTELLIGENCE MACHINE LEARNING

 Siri, customer service via chatbots


 Expert Systems
 Facebook’s automatic friend suggestions
 Machine Translation like Google
 Google’s search algorithms
Translate
 Banking fraud analysis
 Intelligent humanoid robots such as
 Stock price forecast
Sophia,
 Online recommender systems, and so on.
and so on.

AI refers to the broad field of creating machines


that can simulate human intelligence and perform
ML is a subset of AI that involves training
tasks such as understanding natural language,
17. algorithms on data to make predictions,
recognizing images and sounds, making
decisions, and recommendations.
decisions, and solving complex problems.

AI is a broad concept that includes various


methods for creating intelligent machines, focuses on teaching machines how to learn from
including rule-based systems, expert systems, data without being explicitly programmed, using
18.
and machine learning algorithms. AI systems can algorithms such as neural networks, decision
be programmed to follow specific rules, make trees, and clustering.
logical inferences, or learn from data using ML.
In contrast, ML algorithms require large
AI systems can be built using both structured and amounts of structured data to learn and improve
unstructured data, including text, images, video, their performance. The quality and quantity of
19. and audio. AI algorithms can work with data in a the data used to train ML algorithms are critical
variety of formats, and they can analyze and factors in determining the accuracy and
process data to extract meaningful insights. effectiveness of the system.

AI is a broader concept that encompasses many


different applications, including robotics, natural
ML, on the other hand, is primarily used for
language processing, speech recognition, and
pattern recognition, predictive modeling, and
20. autonomous vehicles. AI systems can be used to
decision making in fields such as marketing,
solve complex problems in various fields, such
fraud detection, and credit scoring.
as healthcare, finance, and transportation.

AI systems can be designed to work In contrast, ML algorithms require human


autonomously or with minimal human involvement to set up, train, and optimize the
intervention, depending on the complexity of the system. ML algorithms require the expertise of
21.
task. AI systems can make decisions and take data scientists, engineers, and other
actions based on the data and rules provided to professionals to design and implement the
them. system.
Types of Machine Learning
Machine learning is a subset of AI, which enables the machine to automatically learn from data,
improve performance from past experiences, and make predictions. Machine learning contains a set
of algorithms that work on a huge amount of data. Data is fed to these algorithms to train them, and on the
basis of training, they build the model & perform a specific task.

These ML algorithms help to solve different business problems like Regression, Classification,
Forecasting, Clustering, and Associations, etc.

Based on the methods and way of learning, machine learning is divided into mainly four types, which are:

1. Supervised Machine Learning


2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning

1. Supervised Machine Learning


Supervised machine learning is based on supervision. It means in the supervised learning technique, we
train the machines using the "labelled" dataset, and based on the training, the machine predicts the output.
Here, the labelled data specifies that some of the inputs are already mapped to the output. More preciously,
we can say; first, we train the machine with the input and corresponding output, and then we ask the
machine to predict the output using the test dataset.

Let's understand supervised learning with an example. Suppose we have an input dataset of cats and dog
images. So, first, we will provide the training to the machine to understand the images, such as the shape
& size of the tail of cat and dog, Shape of eyes, colour, height (dogs are taller, cats are smaller), etc.
After completion of training, we input the picture of a cat and ask the machine to identify the object and
predict the output. Now, the machine is well trained, so it will check all the features of the object, such as
height, shape, colour, eyes, ears, tail, etc., and find that it's a cat. So, it will put it in the Cat category. This
is the process of how the machine identifies the objects in Supervised Learning.

The main goal of the supervised learning technique is to map the input variable(x) with the output
variable(y). Some real-world applications of supervised learning are Risk Assessment, Fraud Detection,
Spam filtering, etc.
Categories of Supervised Machine Learning

Supervised machine learning can be classified into two types of problems, which are given below:

 Classification
 Regression

a) Classification

Classification algorithms are used to solve the classification problems in which the output variable is
categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The classification algorithms
predict the categories present in the dataset. Some real-world examples of classification algorithms are
Spam Detection, Email filtering, etc.

Some popular classification algorithms are given below:

 Random Forest Algorithm


 Decision Tree Algorithm
 Logistic Regression Algorithm
 Support Vector Machine Algorithm

b) Regression

Regression algorithms are used to solve regression problems in which there is a linear relationship
between input and output variables. These are used to predict continuous output variables, such as market
trends, weather prediction, etc.

Some popular Regression algorithms are given below:

 Simple Linear Regression Algorithm


 Multivariate Regression Algorithm
 Decision Tree Algorithm
 Lasso Regression

Advantages and Disadvantages of Supervised Learning

Advantages:

 Since supervised learning work with the labelled dataset so we can have an exact idea about the classes of
objects.
 These algorithms are helpful in predicting the output on the basis of prior experience.

Disadvantages:

 These algorithms are not able to solve complex tasks.


 It may predict the wrong output if the test data is different from the training data.
 It requires lots of computational time to train the algorithm.

Applications of Supervised Learning

Some common applications of Supervised Learning are given below:

 Image Segmentation:

Supervised Learning algorithms are used in image segmentation. In this process, image classification is
performed on different image data with pre-defined labels.
 Medical Diagnosis:

Supervised algorithms are also used in the medical field for diagnosis purposes. It is done by using medical
images and past labelled data with labels for disease conditions. With such a process, the machine can
identify a disease for the new patients.

 Fraud Detection –

Supervised Learning classification algorithms are used for identifying fraud transactions, fraud customers,
etc. It is done by using historic data to identify the patterns that can lead to possible fraud.

 Spam detection –

In spam detection & filtering, classification algorithms are used. These algorithms classify an email as spam
or not spam. The spam emails are sent to the spam folder.

 Speech Recognition –

Supervised learning algorithms are also used in speech recognition. The algorithm is trained with voice
data, and various identifications can be done using the same, such as voice-activated passwords, voice
commands, etc.

2. Unsupervised Machine Learning


Unsupervised learning is different from the Supervised learning technique; as its name suggests, there is
no need for supervision. It means, in unsupervised machine learning, the machine is trained using the
unlabeled dataset, and the machine predicts the output without any supervision.

In unsupervised learning, the models are trained with the data that is neither classified nor labelled, and the
model acts on that data without any supervision.

The main aim of the unsupervised learning algorithm is to group or categories the unsorted dataset
according to the similarities, patterns, and differences. Machines are instructed to find the hidden
patterns from the input dataset.

Let's take an example to understand it more preciously; suppose there is a basket of fruit images, and we
input it into the machine learning model. The images are totally unknown to the model, and the task of the
machine is to find the patterns and categories of the objects.

So, now the machine will discover its patterns and differences, such as colour difference, shape difference,
and predict the output when it is tested with the test dataset.

Categories of Unsupervised Machine Learning

Unsupervised Learning can be further classified into two types, which are given below:

 Clustering
 Association

1) Clustering

The clustering technique is used when we want to find the inherent groups from the data. It is a way to
group the objects into a cluster such that the objects with the most similarities remain in one group and
have fewer or no similarities with the objects of other groups. An example of the clustering algorithm is
grouping the customers by their purchasing behaviour.
Some of the popular clustering algorithms are given below:

 K-Means Clustering algorithm


 Mean-shift algorithm
 DBSCAN Algorithm
 Principal Component Analysis
 Independent Component Analysis

2) Association

Association rule learning is an unsupervised learning technique, which finds interesting relations among
variables within a large dataset. The main aim of this learning algorithm is to find the dependency of one
data item on another data item and map those variables accordingly so that it can generate maximum
profit. This algorithm is mainly applied in Market Based analysis, Web usage mining, continuous
production, etc.

Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-growth
algorithm.

Advantages and Disadvantages of Unsupervised Learning Algorithm

Advantages:

 These algorithms can be used for complicated tasks compared to the supervised ones because these
algorithms work on the unlabeled dataset.
 Unsupervised algorithms are preferable for various tasks as getting the unlabeled dataset is easier as
compared to the labelled dataset.

Disadvantages:

 The output of an unsupervised algorithm can be less accurate as the dataset is not labelled, and algorithms
are not trained with the exact output in prior.
 Working with Unsupervised learning is more difficult as it works with the unlabelled dataset that does not
map with the output.

Applications of Unsupervised Learning

 Network Analysis: Unsupervised learning is used for identifying plagiarism and copyright in document
network analysis of text data for scholarly articles.
 Recommendation Systems: Recommendation systems widely use unsupervised learning techniques for
building recommendation applications for different web applications and e-commerce websites.
 Anomaly Detection: Anomaly detection is a popular application of unsupervised learning, which can
identify unusual data points within the dataset. It is used to discover fraudulent transactions.
 Singular Value Decomposition: Singular Value Decomposition or SVD is used to extract particular
information from the database. For example, extracting information of each user located at a particular
location.

3. Semi-Supervised Learning
Semi-Supervised learning is a type of Machine Learning algorithm that lies between Supervised and
Unsupervised machine learning. It represents the intermediate ground between Supervised (With
Labelled training data) and Unsupervised learning (with no labelled training data) algorithms and uses the
combination of labelled and unlabeled datasets during the training period.

Although Semi-supervised learning is the middle ground between supervised and unsupervised learning
and operates on the data that consists of a few labels, it mostly consists of unlabeled data. As labels are
costly, but for corporate purposes, they may have few labels. It is completely different from supervised
and unsupervised learning as they are based on the presence & absence of labels.

To overcome the drawbacks of supervised learning and unsupervised learning algorithms, the
concept of Semi-supervised learning is introduced. The main aim of semi-supervised learning is to
effectively use all the available data, rather than only labelled data like in supervised learning. Initially,
similar data is clustered along with an unsupervised learning algorithm, and further, it helps to label the
unlabeled data into labelled data. It is because labelled data is a comparatively more expensive acquisition
than unlabeled data.

We can imagine these algorithms with an example. Supervised learning is where a student is under the
supervision of an instructor at home and college. Further, if that student is self-analysing the same concept
without any help from the instructor, it comes under unsupervised learning. Under semi-supervised
learning, the student has to revise himself after analyzing the same concept under the guidance of an
instructor at college.

Advantages and disadvantages of Semi-supervised Learning

Advantages:

 It is simple and easy to understand the algorithm.


 It is highly efficient.
 It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.

Disadvantages:

 Iterations results may not be stable.


 We cannot apply these algorithms to network-level data.
 Accuracy is low.

4. Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI agent (A software
component) automatically explore its surrounding by hitting & trail, taking action, learning from
experiences, and improving its performance. Agent gets rewarded for each good action and get
punished for each bad action; hence the goal of reinforcement learning agent is to maximize the rewards.

In reinforcement learning, there is no labelled data like supervised learning, and agents learn from their
experiences only.

The reinforcement learning process is similar to a human being; for example, a child learns various things
by experiences in his day-to-day life. An example of reinforcement learning is to play a game, where the
Game is the environment, moves of an agent at each step define states, and the goal of the agent is to get a
high score. Agent receives feedback in terms of punishment and rewards.

Due to its way of working, reinforcement learning is employed in different fields such as Game theory,
Operation Research, Information theory, multi-agent systems.

A reinforcement learning problem can be formalized using Markov Decision Process(MDP). In MDP,
the agent constantly interacts with the environment and performs actions; at each action, the environment
responds and generates a new state.

Categories of Reinforcement Learning

Reinforcement learning is categorized mainly into two types of methods/algorithms:


 Positive Reinforcement Learning: Positive reinforcement learning specifies increasing the tendency that
the required behaviour would occur again by adding something. It enhances the strength of the behaviour of
the agent and positively impacts it.
 Negative Reinforcement Learning: Negative reinforcement learning works exactly opposite to the positive
RL. It increases the tendency that the specific behaviour would occur again by avoiding the negative
condition.

Real-world Use cases of Reinforcement Learning

 Video Games:
RL algorithms are much popular in gaming applications. It is used to gain super-human performance. Some
popular games that use RL algorithms are AlphaGO and AlphaGO Zero.
 Resource Management:
The "Resource Management with Deep Reinforcement Learning" paper showed that how to use RL in
computer to automatically learn and schedule resources to wait for different jobs in order to minimize
average job slowdown.
 Robotics:
RL is widely being used in Robotics applications. Robots are used in the industrial and manufacturing area,
and these robots are made more powerful with reinforcement learning. There are different industries that
have their vision of building intelligent robots using AI and Machine learning technology.
 Text Mining
Text-mining, one of the great applications of NLP, is now being implemented with the help of
Reinforcement Learning by Salesforce company.

Advantages and Disadvantages of Reinforcement Learning

Advantages

 It helps in solving complex real-world problems which are difficult to be solved by general techniques.
 The learning model of RL is similar to the learning of human beings; hence most accurate results can be
found.
 Helps in achieving long term results.

Disadvantage

 RL algorithms are not preferred for simple problems.


 RL algorithms require huge data and computations.
 Too much reinforcement learning can lead to an overload of states which can weaken the results.

The curse of dimensionality limits reinforcement learning for real physical systems.

Decision Tree Classification Algorithm


 Decision Tree is a Supervised learning technique that can be used for
both classification and Regression problems, but mostly it is preferred for
solving Classification problems. It is a tree-structured classifier, where
internal nodes represent the features of a dataset, branches represent
the decision rules and each leaf node represents the outcome.
 In a Decision tree, there are two nodes, which are the Decision Node and
Leaf Node. Decision nodes are used to make any decision and have
multiple branches, whereas Leaf nodes are the output of those decisions
and do not contain any further branches.
 The decisions or the test are performed on the basis of features of the
given dataset.
 It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
 It is called a decision tree because, similar to a tree, it starts with the root
node, which expands on further branches and constructs a tree-like
structure.
 In order to build a tree, we use the CART algorithm, which stands for
Classification and Regression Tree algorithm.
 A decision tree simply asks a question, and based on the answer (Yes/No),
it further split the tree into subtrees.
 Below diagram explains the general structure of a decision tree:

Note: A decision tree can contain categorical data (YES/NO) as well as


numeric data.

Why use Decision Trees?


There are various algorithms in Machine learning, so choosing the best
algorithm for the given dataset and problem is the main point to
remember while creating a machine learning model. Below are the two
reasons for using the Decision tree:
 Decision Trees usually mimic human thinking ability while making a
decision, so it is easy to understand.
 The logic behind the decision tree can be easily understood because it
shows a tree-like structure.

Decision Tree Terminologies


 Root Node: Root node is from where the decision tree starts. It represents
the entire dataset, which further gets divided into two or more
homogeneous sets.
 Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
 Splitting: Splitting is the process of dividing the decision node/root node
into sub-nodes according to the given conditions.
 Branch/Sub Tree: A tree formed by splitting the tree.
 Pruning: Pruning is the process of removing the unwanted branches from
the tree.
 Parent/Child node: The root node of the tree is called the parent node,
and other nodes are called the child nodes.

How does the Decision Tree algorithm Work?


In a decision tree, for predicting the class of the given dataset, the
algorithm starts from the root node of the tree. This algorithm
compares the values of root attribute with the record (real dataset)
attribute and, based on the comparison, follows the branch and jumps
to the next node.
For the next node, the algorithm again compares the attribute value
with the other sub-nodes and move further. It continues the process
until it reaches the leaf node of the tree. The complete process can be
better understood using the below algorithm:
 Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
 Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
 Step-3: Divide the S into subsets that contains possible values for the best
attributes.
 Step-4: Generate the decision tree node, which contains the best attribute.
 Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node as a
leaf node.

Example: Suppose there is a candidate who has a job offer and wants
to decide whether he should accept the offer or Not. So, to solve this
problem, the decision tree starts with the root node (Salary attribute by
ASM). The root node splits further into the next decision node
(distance from the office) and one leaf node based on the
corresponding labels. The next decision node further gets split into one
decision node (Cab facility) and one leaf node. Finally, the decision
node splits into two leaf nodes (Accepted offers and Declined offer).
Consider the below diagram:

Attribute Selection Measures


While implementing a Decision tree, the main issue arises that how to
select the best attribute for the root node and for sub-nodes. So, to
solve such problems there is a technique which is called as Attribute
selection measure or ASM. By this measurement, we can easily select
the best attribute for the nodes of the tree. There are two popular
techniques for ASM, which are:
 Information Gain
 Gini Index

1. Information Gain:
 Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
 It calculates how much information a feature provides us about a class.
 According to the value of information gain, we split the node and build
the decision tree.
 A decision tree algorithm always tries to maximize the value of
information gain, and a node/attribute having the highest information gain
is split first. It can be calculated using the below formula:

1. Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)]

Entropy: Entropy is a metric to measure the impurity in a given


attribute. It specifies randomness in data. Entropy can be calculated as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

Where,
 S= Total number of samples
 P(yes)= probability of yes
 P(no)= probability of no

2. Gini Index:
 Gini index is a measure of impurity or purity used while creating a
decision tree in the CART(Classification and Regression Tree) algorithm.
 An attribute with the low Gini index should be preferred as compared to
the high Gini index.
 It only creates binary splits, and the CART algorithm uses the Gini index
to create binary splits.
 Gini index can be calculated using the below formula:
Gini Index= 1- ∑jPj2

Pruning: Getting an Optimal Decision tree


Pruning is a process of deleting the unnecessary nodes from a tree in
order to get the optimal decision tree.
A too-large tree increases the risk of overfitting, and a small tree may
not capture all the important features of the dataset. Therefore, a
technique that decreases the size of the learning tree without reducing
accuracy is known as Pruning. There are mainly two types of tree
pruning technology used:
T

 Cost Complexity Pruning


 Reduced Error Pruning.

Advantages of the Decision Tree


 It is simple to understand as it follows the same process which a human
follow while making any decision in real-life.
 It can be very useful for solving decision-related problems.
 It helps to think about all the possible outcomes for a problem.
 There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree


 The decision tree contains lots of layers, which makes it complex.
 It may have an overfitting issue, which can be resolved using the Random
Forest algorithm.
 For more class labels, the computational complexity of the decision tree
may increase.
Statistical Machine Learning
 Statistical machine learning involves using statistical techniques to
develop models that can learn from data and make predictions or
decisions. Statistical machine learning merges the computational
efficiency and adaptability of machine learning algorithms with statistical
inference and modeling capabilities.
 By employing statistical methods, we can extract significant patterns,
relationships, and insights from intricate datasets, thereby promoting the
effectiveness of machine learning algorithms.
 Statistics is the science that allows us to collect, analyze, interpret,
present, and organize data. It provides a robust set of tools for
understanding patterns and trends, and making inferences and predictions
based on data. When we're dealing with large datasets, statistics helps us
understand and summarize the data, allowing us to make sense of complex
phenomena.
 Constructing machine learning models. Statistics provides the methodologies and
principles for creating models in machine learning. For instance, the linear
regression model leverages the statistical method of least squares to estimate the
coefficients.

As a result, a solid understanding of statistics not only allows us to better


construct and validate machine learning models, but also enables us to
interpret their outputs in a meaningful and useful way.
Key statistical concepts

Probability

Probability theory is of utmost importance in machine learning as it provides the foundation for modeling
uncertainty and making probabilistic predictions. How could we quantify the likelihood of different
outcomes, events, or simply numerical values? Probability helps with that! In addition, Probability
distributions are especially important in machine learning and make all the magic happen.

Descriptive Statistics

Descriptive statistics enable us to understand the characteristics and properties of datasets. They help us
summarize and visualize data, identify patterns, detect outliers, and gain initial insights that inform
subsequent modeling and analysis.

Measure of Central Tendency

The mean, median, and mode provide valuable insights into the central or representative values of a
dataset. In machine learning, they aid in data preprocessing by assisting with the imputation of missing
values and identifying potential outliers.
Variance and Standard Deviation

Variance and standard deviation quantify the spread or dispersion of data points around the central
tendency. They serve as indicators of data consistency and variability in machine learning.

These measures are useful for feature selection or dimensionality reduction, identifying features with

Measure of Spread

Range, interquartile range, and percentiles are measures of spread that offer insights into the distribution of
data values. They are particularly valuable in outlier detection, as they help identify and address outliers
that can greatly influence model training and predictions

Sampling

Machine learning models are trained based on sampled data. If the samples are not carefully selected, the
reliability of our models becomes uncertain. Ideally, we aim to choose representative subsets of data from
larger populations.

Estimation

Estimation techniques are crucial in machine learning for determining unknown population parameters
based on sample data. They allow us to estimate model parameters, evaluate model performance, and
make predictions about unseen data.

The most common estimation method used in machine learning is Maximum Likelihood (ML) estimation,
which finds the estimator of an unknown parameter by maximizing the likelihood function.

Statistical Machine Learning Techniques


Linear Regression

Linear regression is a term commonly encountered in the statistical literature, but it is more than just that.
It is also seen as a supervised learning algorithm that captures the connection between a dependent
variable and independent variables.

Logistic Regression

Logistic regression is a statistical classification algorithm that estimates the probability of categorical
outcomes based on independent variables. By applying a logistic function, it predicts the occurrence of a
particular class.

Decision Trees

Decision trees are versatile algorithms that use statistics to split data based on features, creating a tree-like
structure for classification or regression. They are intuitive, interpretable, and handle categorical and
numerical data.

Random Forest

Random Forest is an ensemble learning method that improves prediction accuracy by combining multiple
decision trees. It employs sampling to randomly select subsets of features and data for building the trees.
The predictions of these individual trees are then aggregated to make the final prediction.
Support Vector Machines (SVM)

SVM is a powerful algorithm that can be used for classification and regression tasks. It uses statistical
principles to create a boundary between different groups of data points, making it easier to tell them apart.
By optimizing this boundary, SVM reduces the chances of making mistakes and improves overall

K-Nearest Neighbors (KNN)

KNN is a simple yet effective algorithm used for classifying data points based on the majority vote of their
nearest neighbors. It is suitable for both classification and regression problems and does not require
training.

Learning with complete data


Learning with complete data refers to the process of training machine learning models when the dataset
contains no missing values, and all the relevant information needed for the task is available for every
instance. In this scenario, each data point in the dataset has complete information on all the features or
variables considered in the analysis. This is an idealized situation that simplifies the training and
evaluation of machine learning models.

Naïve Bayes Classifier Algorithm


 Naïve Bayes algorithm is a supervised learning algorithm, which is based
on Bayes theorem and used for solving classification problems.
 It is mainly used in text classification that includes a high-dimensional
training dataset.
 Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast machine
learning models that can make quick predictions.
 It is a probabilistic classifier, which means it predicts on the basis of
the probability of an object.
 Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.

Why is it called Naïve Bayes?


The Naïve Bayes algorithm is comprised of two words Naïve and
Bayes, Which can be described as:
 Naïve: It is called Naïve because it assumes that the occurrence of a
certain feature is independent of the occurrence of other features. Such as
if the fruit is identified on the bases of color, shape, and taste, then red,
spherical, and sweet fruit is recognized as an apple. Hence each feature
individually contributes to identify that it is an apple without depending
on each other.
 Bayes: It is called Bayes because it depends on the principle of Bayes'
Theorem.
Bayes' Theorem:
 Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is
used to determine the probability of a hypothesis with prior knowledge. It
depends on the conditional probability.
 The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed


event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the


probability of a hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing the


evidence.

P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes' Classifier:


Working of Naïve Bayes' Classifier can be understood with the help of the
below example:

Suppose we have a dataset of weather conditions and corresponding target


variable "Play". So using this dataset we need to decide that whether we should
play or not on a particular day according to the weather conditions. So to solve
this problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Problem: If the weather is sunny, then the Player should play or not?

Solution: To solve this, first consider the below dataset:


Outlook Play

0 Rainy Yes

1 Sunny Yes

2 Overcast Yes
3 Overcast Yes

4 Sunny No

5 Rainy Yes

6 Sunny Yes

7 Overcast Yes

8 Rainy No

9 Sunny No

10 Sunny Yes

11 Rainy No

12 Overcast Yes

13 Overcast Yes

Frequency table for the Weather Conditions:

Weather Yes No

Overcast 5 0

Rainy 2 2

Sunny 3 2

Total 10 4

Likelihood table weather condition:

Weather No Yes

Overcast 0 5 5/14= 0.35

Rainy 2 2 4/14=0.29

Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71

Applying Bayes'theorem:

P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)

P(Sunny|Yes)= 3/10= 0.3

P(Sunny)= 0.35

P(Yes)=0.71

So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60


P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)

P(Sunny|NO)= 2/4=0.5

P(No)= 0.29

P(Sunny)= 0.35

So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)

Hence on a Sunny day, Player can play the game.

Advantages of Naïve Bayes Classifier:

 Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
 It can be used for Binary as well as Multi-class Classifications.
 It performs well in Multi-class predictions as compared to the other Algorithms.
 It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:

 Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship
between features.

Applications of Naïve Bayes Classifier:

 It is used for Credit Scoring.


 It is used in medical data classification.
 It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner.
 It is used in Text classification such as Spam filtering and Sentiment analysis.

Learning with Hidden Data


Learning with hidden data refers to situations in machine learning where some of the variables or
information are not directly observable, yet they play a crucial role in the underlying structure of the data.
This hidden or latent data could represent unobservable factors, missing values, or unmeasured variables.
The challenge is to infer or estimate these hidden variables in order to make better predictions or
understand the structure of the data.

Common scenarios involving hidden data include:

1. Missing Data:
o Sometimes, certain variables in a dataset may have missing values. Learning with hidden
data involves developing methods to handle and impute these missing values to make the
most accurate predictions.
2. Latent Variables:
o In many models, there are latent variables that are not directly observable but influence the
observed data. For example, in clustering problems, the latent variable might represent the
cluster assignment of each data point.
3. Unobservable Factors:
o Hidden data can also represent factors that are not directly measurable but have a
significant impact on the observed data. These factors may include underlying trends,
patterns, or characteristics that are not apparent in the given data.

The Expectation-Maximization (EM) algorithm


It is a commonly used approach for learning with hidden data. It allows for the estimation of model
parameters in the presence of latent variables or missing data.

1. Initialization:
o Initialize the model parameters randomly or based on some prior knowledge.

2. Expectation (E-step):
o Estimate the expected values of the latent variables given the observed data and the current
parameter estimates. Compute the posterior distribution of the hidden variables.

3. Maximization (M-step):
o Update the model parameters by maximizing the expected log-likelihood obtained from the E-step.
This involves finding parameter values that increase the likelihood of the observed data given the
estimated latent variable distribution.

4. Iteration:
o Repeat the E-step and M-step iteratively until the algorithm converges. At each iteration, the model
parameters are refined, leading to improved estimates of both observable and hidden data.

You might also like