0% found this document useful (0 votes)

15 views54 pages

Vision & Mission

Uploaded by

chr60629

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views54 pages

Vision & Mission

Uploaded by

chr60629

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

VISION & MISSION

Institute Vision:

Aspire to be a leading institute in Professional Education by Creating

technocrats to Propel Societal Transformation through Inventions and
innovations

Institute Mission:

To impart technology integrated active learning environment that nurtures

the technical & life skills.

To enhance scientific temper through active research leading to innovations

& sustainable environment.

To create responsible citizens with highest ethical standards.

Department Vision:

Expand as a centre of excellence in the field of electrical engineering

through industrial and academic research by training the learners for global
acceptance.

Department Mission:

To work with commitment for the improvement of quality teaching.

To conduct the creative research by addressing the needs of the industry

and society

To develop the professional practise among the learners of encourage

lifelong learning, team work and leadership.
PROGRAM EDUCATIONAL OBJECTIVES

Program Educational Objectives:

PEO 1. Graduates shall have technical knowledge and skills in the area of

Electrical and Electronics engineering to fulfil the needs of industry and

society.

PEO 2. Graduates will have research capabilities to achieve success in their

chosen field with team work.

PEO 3. Graduates shall be successful engineers with lifelong learning, right

attitude and Ethics.

PROGRAM OUTCOMES & PROGRAM SPECIFIC

OUTCOMES:

Program outcomes:

PO1: Engineering knowledge:

Apply the knowledge of mathematics, science, engineering fundamentals,

and an engineering specialization to the solution of complex engineering
problems.

PO2: Problem analysis:

Identify, formulate, review research literature, and analyze complex

engineering problems reaching substantiated conclusions using first principles of
mathematics natural sciences, and engineering sciences
PO3: Design/development of solutions:

Design solutions for complex engineering problems and design system

components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and
environmental considerations.

PO4: Conduct investigations of complex problems:

Use research-based knowledge and research methods including design of

experiments, analysis and interpretation of data, and synthesis of the information
to provide valid conclusions.

PO5: Modern tool usage:

Create, select, and apply appropriate techniques, resources, and modern

engineering and IT tools including prediction and modelling to complex
engineering activities with an understanding of the limitations.

PO6: The engineer and society:

Apply reasoning informed by the contextual knowledge to assess societal,

health, safety, legal and cultural issues and the consequent responsibilities
relevant to the professional engineering practice.

PO7: Environment and sustainability:

Understand the impact of the professional engineering solutions in societal

and environmental contexts, and demonstrate the knowledge of, and need for
sustainable development.

PO8: Ethics:

Apply ethical principles and commit to professional ethics and

responsibilities and norms of the engineering practice.
PO9: Individual and team work:

Function effectively as an individual, and as a member or leader in diverse

teams, and in multidisciplinary settings.

PO10: Communication:

Communicate effectively on complex engineering activities with the

engineering community and with society at large, such as, being able to
comprehend and write effective reports and design documentation, make
effective presentations, and give and receive clear instructions.

PO11: Project management and finance:

Demonstrate knowledge and understanding of the engineering and

management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments

PO12: Life-long learning:

Recognize the need for, and have the preparation and ability to engage in
Independent and lifelong learning in the broadest context of technological
change.

PSO1: Power electronics:

An ability to develop working models in the area of power electronics with
sound theoretical background.
PSO2: Power systems:
An ability to promote the use of renewable energy sources in the area of
power systems.
PSO3: Control systems:
An ability to focus on different control techniques in the field of electrical
and electronics engineering.
Index
1. Introduction to Machine Learning - Pg. 3
o What is Machine Learning? - Pg. 4
o Wellsprings of Machine Learning - Pg. 5
o Varieties of Machine Learning - Pg. 6
2. Boolean Functions in Machine Learning - Pg. 11
o Boolean Algebra - Pg. 11
o Diagrammatic Representations - Pg. 12
o Classes of Boolean Functions - Pg. 13
3. Version Spaces - Pg. 16
o Version Spaces and Mistake Bounds - Pg. 17
o Version Graphs - Pg. 18
o Learning as Search of a Version Space - Pg. 19
o Candidate Elimination Method - Pg. 20
4. Neural Networks - Pg. 23
o Threshold Logic Units - Pg. 24
o Linear Machines - Pg. 25
o Training Feedforward Networks by Backpropagation - Pg. 27
o Synergies with Knowledge-Based Methods - Pg. 29
5. Statistical Learning - Pg. 31
o Statistical Decision Theory - Pg. 32
o Gaussian Distributions - Pg. 32
o Conditionally Independent Binary Components - Pg. 33
o Learning Belief Networks - Pg. 33
o Nearest-Neighbor Methods - Pg. 34
6. Decision Trees - Pg. 37
o Definitions - Pg. 37
o Supervised Learning of Univariate Decision Trees - Pg. 38
o Avoiding Overfitting - Pg. 40
7. Inductive Logic Programming (ILP) - Pg. 44
o Notation and Definitions - Pg. 44
o A Generic ILP Algorithm - Pg. 45
o Example - Pg. 46
o Inducing Recursive Programs - Pg. 48
o Relationships with Decision Trees - Pg. 49

Page | 1
Table of Figures
1. Figure 1: An AI System - Pg. 3
2. Figure 2: An Input-Output Function - Pg. 7
3. Figure 3: Implementing the Version Space - Pg. 16
4. Figure 4: A Version Graph for Terms - Pg. 18
5. Figure 5: A Threshold Logic Unit (TLU) - Pg. 23
6. Figure 6: Weight Space - Pg. 25
7. Figure 7: The Two-Dimensional Gaussian Distribution - Pg. 32
8. Figure 8: A Decision Tree - Pg. 37
9. Figure 9: A Decision Tree with Subtree Replication - Pg. 40
10.Figure 10: Sufficient, Necessary, and Consistent Programs - Pg. 45

Page | 2
Introduction to Machine Learning

Machine Learning (ML) is a subset of artificial intelligence (AI) focused on

building systems that can learn from data and make decisions without being
explicitly programmed for every possible scenario. Rather than following static
instructions, machine learning algorithms enable systems to identify patterns and
adapt to new inputs. This adaptive learning is crucial in today’s data-driven world,
where traditional programming methods would struggle to handle the
complexities of massive and dynamic datasets.

Fig 1 An AI System

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 3

1.1.1 What is Machine Learning?

Machine learning is the field of study that gives computers the ability to learn
without being explicitly programmed. It involves designing algorithms and
models that can learn patterns from data and make predictions or decisions based
on that learning.

• Key Characteristics:

o Learning from Data: The essence of ML is the ability to derive

knowledge from data. For example, a model might be trained on past
weather data to predict future weather patterns.

o Generalization: ML models should be able to apply the patterns

they've learned to new, unseen data.

o Adaptability: Models can improve over time as they are exposed to

more data.

• Types of Machine Learning:

o Supervised Learning: The model is trained on a labeled dataset

(i.e., data with known outcomes) to learn to predict the output from
input data.

o Unsupervised Learning: The model is trained on data without

labels, aiming to discover underlying patterns or structures in the
data.

o Reinforcement Learning: The model learns by interacting with an

environment and receiving feedback through rewards or penalties.

• Applications of ML: It is used in a variety of fields, such as healthcare

(e.g., disease diagnosis), finance (e.g., fraud detection), marketing (e.g.,
customer segmentation), and autonomous driving.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 4

1.1.2 Wellsprings of Machine Learning

Machine learning has its origins in several foundational fields, including

statistics, computer science, optimization theory, and cognitive science. These
disciplines laid the groundwork for creating algorithms that can find patterns in
data and improve themselves with experience.

• History:

o Early work in machine learning began with attempts to simulate

human intelligence and cognitive processes.

o Statistical methods in the mid-20th century began to be applied to

machine learning problems, paving the way for modern supervised
learning techniques.

o The rise of powerful computers and the availability of vast amounts

of data in the 21st century accelerated the development and adoption
of machine learning techniques.

• Key Influences:

o Artificial Intelligence (AI): ML is a core subfield of AI, focusing

on systems that learn and adapt autonomously.

o Statistics: Concepts such as probability, regression, and hypothesis

testing form the basis of many ML algorithms.

o Optimization: Many ML algorithms involve finding optimal

solutions or minimizing loss functions, drawing heavily from
optimization theory.

o Cognitive Science: Early ML systems were inspired by the human

brain’s ability to learn from experience and make decisions.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 5

1.1.3 Varieties of Machine Learning

Machine learning can be categorized based on the type of data used and the
learning process. The three main categories are:

1. Supervised Learning:

o Definition: A model is trained on labeled data (input-output pairs).

The goal is to learn the mapping from inputs to outputs.

o Common Algorithms: Linear regression, decision trees, support

vector machines (SVM), neural networks.

o Applications: Image classification, spam email detection, medical

diagnoses.

2. Unsupervised Learning:

o Definition: The model is given data without explicit labels and must
find patterns, groupings, or structures within the data on its own.

o Common Algorithms: K-means clustering, hierarchical clustering,

principal component analysis (PCA), autoencoders.

o Applications: Market segmentation, anomaly detection,

dimensionality reduction.

3. Reinforcement Learning:

o Definition: An agent learns by interacting with an environment and

receiving feedback in the form of rewards or penalties.

o Key Concepts: Exploration vs. exploitation, reward signals,

Markov Decision Processes (MDPs).

o Applications: Robotics, game playing (e.g., AlphaGo), autonomous

driving.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 6

4. Semi-supervised and Self-supervised Learning:

o Definition: These methods lie between supervised and unsupervised

learning. Semi-supervised learning uses a small amount of labeled
data along with a large amount of unlabeled data. Self-supervised
learning generates its own labels from the input data.

1.2 Learning Input-Output Functions

Machine learning can be viewed as a process of learning a function that maps

inputs to outputs. This section would explore how this function is learned and
how different types of inputs and outputs are handled.

1.2.1 Types of Learning

Learning in machine learning can be broadly classified into different categories

based on the way the model interacts with the data:

Fig 2. An Input-Output Function

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 7

1. Supervised Learning:

o The most straightforward approach, where the system learns from

input-output pairs and generalizes this mapping to unseen data.

o Examples: Image classification, medical diagnosis, financial

prediction.

2. Unsupervised Learning:

o Involves learning patterns or structure from input data without any

labeled outputs. The focus is on uncovering hidden patterns,
groupings, or relationships in the data.

o Examples: Customer segmentation, clustering, anomaly detection.

3. Reinforcement Learning:

o Involves an agent interacting with an environment, learning to make

decisions based on feedback in the form of rewards or penalties.

o Examples: Robotics, self-learning AI agents in games, autonomous

vehicles.

4. Transfer Learning:

o Involves leveraging knowledge learned from one task to improve

learning on a related task. This is particularly useful when data is
scarce for a new problem but abundant for similar problems.

5. Few-shot Learning:

o A type of learning where the model is expected to learn from only a

few examples.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 8

1.2.2 Input Vectors

An input vector refers to the representation of the input data fed into a machine
learning model. Each data point can be viewed as a vector of features or variables,
which are the dimensions of the data that the model uses to make predictions or
classifications.

• Feature Engineering: The process of selecting, transforming, or creating

new features from raw data that improve the performance of the model.

o Examples: Extracting numerical features from text (e.g., word

frequencies), normalizing features to scale them, creating
categorical features.

• Feature Selection: The process of identifying and using the most relevant
features while ignoring redundant or irrelevant ones.

• Dimensionality Reduction: Techniques like PCA or t-SNE that reduce the

number of features in a dataset to make models more efficient.

1.2.3 Outputs

The output of a machine learning model is what the model predicts or produces
after being trained on input data. Depending on the task, outputs can take many
forms:

• Regression Tasks: Outputs are continuous values, such as predicting house

prices based on features like location, size, etc.

• Classification Tasks: Outputs are discrete categories, such as labeling

emails as spam or not spam.

• Clustering Tasks: Outputs are group assignments, where similar data

points are grouped together (e.g., customer segments in marketing).

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 9

• Reinforcement Learning: The output is a series of actions that maximize
the cumulative reward in an environment.

In supervised learning, the output is typically compared to the true label (or
ground truth) to calculate the model’s performance, usually via loss functions like
Mean Squared Error (MSE) or Cross-Entropy.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 10

Boolean Functions in Machine Learning
In machine learning, Boolean functions are used to describe decision-making
processes, particularly in classification problems, feature selection, and the
construction of decision trees. These functions map input features to outputs that
are often binary (0 or 1, true or false). Understanding how to represent and work
with Boolean functions is crucial for designing models that deal with binary
classification tasks, decision rules, and logic-based models.

2.1 Representation

2.1.1 Boolean Algebra in Machine Learning

In machine learning, Boolean algebra helps represent logical relationships

between features and outcomes. For example, a binary classifier might use
Boolean operations to combine features and determine the class label (e.g., yes
or no).

• Logical Operations in ML:

o AND (⋀): Represents an intersection or conjunction of conditions.

For example, "if age > 50 AND income > 30k, then the prediction is
‘yes’".

o OR (⋁): Represents a disjunction where the condition is true if at

least one of the features is satisfied. E.g., "if age > 50 OR income >
30k, then predict ‘yes’".

o NOT (¬): Reverses a condition. E.g., "if NOT (age > 50), then
predict ‘no’".

• Use in Decision Trees: Boolean algebra simplifies decision trees by using

basic operations to combine conditions. Each decision node in a decision
tree might involve an AND/OR combination of features.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 11

• Simplification: Boolean algebra is used to simplify decision rules in ML
models, allowing for the creation of more compact and efficient decision-
making rules (e.g., simplifying a series of logical conditions in decision
tree pruning).

2.1.2 Diagrammatic Representations in Machine Learning

In machine learning, diagrammatic representations of Boolean functions are

crucial for visualizing decision-making processes, especially when explaining or
debugging models.

• Truth Tables: In the context of binary classification, truth tables can

represent all possible combinations of input features and their associated
output. For a classifier that takes two binary features, a truth table could
show all possible feature combinations and the corresponding predicted
outcome.

• Decision Trees: A decision tree can be viewed as a series of Boolean

decisions applied to features. Each decision node tests whether a certain
condition holds (e.g., "Is feature x > threshold?"), and the tree branches
according to the answer (true or false). The decision rules can be
represented as a series of Boolean expressions.

• Logic Gates: In certain models, such as neural networks or rule-based

classifiers, Boolean logic gates (AND, OR, NOT) can be used as building
blocks for more complex decisions. For example, a perceptron (the
simplest form of a neural network) behaves like a logic gate that combines
weighted inputs using Boolean operations to make a binary decision.

• Karnaugh Maps (K-Maps): Though typically used in hardware design,

K-maps can also help simplify Boolean expressions for machine learning
models by reducing the complexity of decision rules. They’re used to
minimize the number of features or conditions in classification models.
SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 12
2.2 Classes of Boolean Functions in Machine Learning

In machine learning, Boolean functions help define different types of decision

boundaries and classification rules. These functions can be categorized into
different forms based on how they are used to classify data points.

2.2.1 Terms and Clauses in Decision Rules

In decision rule-based classifiers, terms refer to individual conditions or

comparisons (e.g., “age > 50”), while clauses are combinations of terms (e.g.,
“age > 50 AND income > 30k”).

• Example: A decision rule like "age > 50 AND income > 30k" represents
a term (age > 50) and another term (income > 30k), combined with the
AND operator.

2.2.2 Disjunctive Normal Form (DNF) Functions

A Boolean function is in Disjunctive Normal Form (DNF) if it is expressed as

an OR (disjunction) of AND (conjunction) terms. In machine learning, DNF is
useful for rule-based classifiers where we have multiple conditions that lead to
a positive outcome.

• Example: In a binary classification problem, a DNF function could

represent a rule like:

o "Predict ‘yes’ if (age > 50 AND income > 30k) OR (age <= 50 AND
income > 40k)."

DNF is highly relevant for models that use rule-based learning, like decision
trees, where each path in the tree can be seen as a conjunction of features.

2.2.3 Conjunctive Normal Form (CNF) Functions

A Boolean function is in Conjunctive Normal Form (CNF) if it is expressed as

an AND (conjunction) of OR (disjunction) terms. CNF is less common in typical

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 13

machine learning algorithms, but it can be useful for certain forms of logical rule-
based classification, especially when using Satisfiability Solvers (like in
constraint satisfaction problems).

• Example: A CNF might represent a rule like:

o "Predict ‘no’ if (age > 50 OR income > 30k) AND (age <= 50 OR
income <= 20k)."

2.2.4 Decision Lists

Decision lists are a sequence of ordered rules used for classification. Each rule in
a decision list is a Boolean expression, and the output is determined by the first
matching rule.

• Example: A decision list might look like:

o "If age > 50, predict ‘yes’."

o "If income > 40k, predict ‘yes’."

o "Otherwise, predict ‘no’."

Decision lists are useful in situations where there is a priority among rules or
when conditions are complex.

2.2.5 Symmetric and Voting Functions

In some machine learning tasks, such as ensemble learning (e.g., Random

Forests or Boosting), voting functions aggregate the outputs of several
classifiers.

• Symmetric Functions: These are Boolean functions that produce the same
output for any permutation of their input variables. In machine learning,
this can be useful when dealing with symmetric data, such as in ensemble
learning, where the order of the classifiers doesn’t matter.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 14

• Voting Functions: These involve taking a vote from multiple classifiers.
For example, in a majority voting scheme, the most frequent output from
multiple classifiers (each making Boolean predictions) is chosen as the
final decision.

2.2.6 Linearly Separable Functions

A Boolean function is linearly separable if there exists a hyperplane (or line in

two dimensions) that separates the inputs into two classes. Linear classifiers
(e.g., Perceptrons) can be used to model these functions.

• Example: A linearly separable function might involve a classification

problem where a decision boundary can separate the positive and negative
instances in feature space. In such cases, a linear model like Logistic
Regression or Support Vector Machines (SVM) can perform well.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 15

Using Version Spaces for Learning
In machine learning, Version Spaces are a conceptual framework used to
describe the set of hypotheses (models) consistent with a set of training examples.
These hypotheses are iteratively refined as more examples are encountered, and
the space of possible models narrows down. The Candidate Elimination
Method is one specific algorithm that uses this framework to incrementally
eliminate hypotheses that do not fit the observed data.

Fig 3 Implementing the Version Space

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 16

3.1 Version Spaces and Mistake Bounds

Version Spaces:

• Definition: A Version Space is the set of all hypotheses that are consistent
with a given set of training examples. It is essentially the collection of all
possible models that could explain the observed data.

• Learning Process: As more training examples are provided, the version

space is updated. If a hypothesis is inconsistent with a new training
example, it is removed from the version space. Over time, the version space
shrinks until only one hypothesis (or a small set) remains, which is used
for prediction.

• Mistake Bounds: The mistake bound is a theoretical concept used to

measure how many mistakes (incorrect predictions) a learning algorithm
might make before arriving at the correct hypothesis. In the context of
version spaces, the mistake bound can be used to determine the number of
incorrect predictions the algorithm will make before converging to a
hypothesis that correctly classifies all future examples.

Mistake Bound Analysis:

The mistake bound often depends on factors like:

• Size of the version space: The number of possible hypotheses that could
fit the data.

• The nature of the training data: How noisy the data is and how well the
hypotheses generalize to unseen data.

• The learning algorithm used: For example, some algorithms might

converge faster than others, reducing the mistake bound.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 17

For a consistent learner (one that always eventually finds the correct hypothesis),
the mistake bound provides a limit on how many mistakes can occur before the
algorithm identifies the correct model.

3.2 Version Graphs

Fig 4.A Version Graph for Terms

• Version Graphs are a more structured way to represent version spaces.

Rather than storing all hypotheses in an unorganized manner, a version
graph organizes hypotheses based on their generality and specificity.

• Nodes: Each node in a version graph represents a hypothesis (a possible

model).

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 18

• Edges: An edge between two nodes indicates that one hypothesis is more
specific or general than the other. If one hypothesis is a generalization
(broader) of another, it is connected to the more specific hypothesis.

• Version Graphs for Learning: These graphs help visualize the space of
possible hypotheses and make it easier to reason about the relationships
between hypotheses. By organizing hypotheses this way, learners can
explore the hypothesis space more efficiently. They can navigate the
version graph to eliminate inconsistent hypotheses based on new training
data and gradually refine the hypotheses set.

• Graph Structure: The version graph typically has a hierarchical structure

where:

o The most general hypotheses (the least restrictive) are at the top.

o The most specific hypotheses (the most restrictive) are at the bottom.

Version graphs are particularly useful in inductive learning where the goal is to
identify the most specific hypothesis that still explains the training data.

3.3 Learning as Search of a Version Space

In this approach, learning is framed as a search process through the version

space. The goal of the learner is to efficiently explore the version space to find a
hypothesis that best fits the training data.

Search Process:

• Initial Step: Start with a broad version space that includes all hypotheses.

• Refinement: As each training example is processed, the learner eliminates

hypotheses that are inconsistent with the new example.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 19

• Convergence: Over time, the version space narrows down, eventually
converging to a hypothesis (or a small set of hypotheses) that correctly
classifies future examples.

In practice, this search is done by using algorithms like the Candidate

Elimination Method, which is designed to update the version space based on
new examples.

Challenges in Search:

• Exploration vs. Exploitation: In some cases, searching the version space

may involve trade-offs between exploring new hypotheses and exploiting
the current best hypothesis.

• Efficiency: Searching a large version space can be computationally

expensive. Therefore, methods like pruning or using heuristics to narrow
down the search space are important.

3.4 The Candidate Elimination Method

The Candidate Elimination Method is a specific algorithm used to search and

refine the version space. It operates in a way that progressively narrows down the
version space based on the training data, with the goal of eventually identifying
the best hypothesis.

How it Works:

1. Initialization: Start with two sets of hypotheses:

o S: The set of most specific hypotheses (initially, this is the

hypothesis that classifies all examples incorrectly).

o G: The set of most general hypotheses (initially, this is the

hypothesis that classifies all examples correctly).

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 20

2. Processing each training example:

o For each example, update the S and G sets:

▪ S is refined to remove any hypotheses that are inconsistent

with the example.

▪ G is refined to include only hypotheses that could explain the

example while remaining consistent with the training data.

3. Refinement:

o Specific Hypotheses: If a hypothesis in S does not fit the new

training example, it is generalized to be consistent with the example.

o General Hypotheses: If a hypothesis in G does not fit, it is

specialized (made more specific) to be consistent with the example.

4. Convergence: Over time, as more examples are processed, the sets S and
G converge. The S set becomes more specific, and the G set becomes more
general, ultimately leading to a refined hypothesis that fits the training data.

Advantages:

• Efficient: This method helps in narrowing down the hypothesis space

quickly by systematically eliminating inconsistent hypotheses.

• Clear Decision Boundaries: Since the method works by maintaining

general and specific hypotheses, it provides clear boundaries for
classification.

Disadvantages:

• Requires Consistent Data: The Candidate Elimination Method assumes

that the data is consistent (i.e., there exists a hypothesis that correctly
classifies all examples). If the data is noisy or inconsistent, the method may
fail to converge.
SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 21
• Computational Complexity: The method can be computationally
expensive for large hypothesis spaces, as it requires comparing multiple
hypotheses with each new training example.

Summary of Concepts:

• Version Space: The set of all hypotheses consistent with the training data.

• Mistake Bounds: The theoretical limit on how many mistakes an

algorithm might make before converging to the correct hypothesis.

• Version Graph: A hierarchical structure representing the relationships

between hypotheses in a version space.

• Learning as Search: Learning involves searching through the version

space to find the correct hypothesis.

• Candidate Elimination Method: An algorithm for refining the version

space, iterating over the training examples to eliminate inconsistent
hypotheses and converge on the best model.

Together, these concepts provide a formal framework for inductive learning,

helping algorithms efficiently search for hypotheses that explain the observed
data. The Candidate Elimination Method, in particular, is foundational for
concept learning tasks in machine learning.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 22

4.Neural Networks
In machine learning, Neural Networks (NNs) are a powerful class of models
designed to recognize patterns by learning from data. They consist of
interconnected units (neurons), inspired by biological neural networks, that work
together to solve tasks like classification, regression, and more complex tasks
such as steering a van, as mentioned in your application example. Let's explore
the specific sections you've highlighted in more detail:

4.1 Threshold Logic Units (TLUs)

A Threshold Logic Unit (TLU) is a type of artificial neuron used in early neural
network models. It acts as a basic building block for more complex neural
networks.

Fig 5.A Threshold Logic Unit (TLU)

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 23

4.1.1 Definitions and Geometry

• Definition: A TLU takes a set of inputs, applies weights to them, sums

them up, and then applies a threshold function to decide whether the output
should be 0 or 1 (binary classification). Mathematically, the output yyy of
a TLU can be described as:

y=step(w1x1+w2x2+⋯+wnxn−θ)y = \text{step}(w_1x_1 + w_2x_2 + \dots +

w_nx_n - \theta)y=step(w1x1+w2x2+⋯+wnxn−θ)

Where:

o x1,x2,…,xnx_1, x_2, \dots, x_nx1,x2,…,xn are the input features.

o w1,w2,…,wnw_1, w_2, \dots, w_nw1,w2,…,wn are the weights

associated with the inputs.

o θ\thetaθ is the threshold.

• Geometry: The decision boundary of a TLU is a hyperplane in the input

space. This means that the function performed by the TLU can be
visualized geometrically as separating the input space into two regions: one
where the output is 1 and another where the output is 0.

4.1.2 Special Cases of Linearly Separable Functions

• Linearly Separable Functions: A function is linearly separable if the data

points can be separated into two classes by a straight line (in 2D) or a
hyperplane (in higher dimensions). For linearly separable data, a TLU can
be trained to correctly classify all examples.

• Example: In a binary classification problem where the input data can be

separated with a straight line (such as classifying points above or below a
line), a single TLU can learn the decision boundary effectively.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 24

4.1.3 Error-Correction Training of a TLU

• Training: A simple approach to training a TLU is the error-correction

rule. For each training example, the network compares the predicted output
with the actual output. The weights are updated based on the error, typically
using a simple rule like: wi←wi+Δwiw_i \leftarrow w_i + \Delta w_iwi
←wi+Δwi Where: Δwi=η(t−o)xi\Delta w_i = \eta (t - o) x_iΔwi=η(t−o)xi

o ttt is the target output.

o ooo is the predicted output.

o η\etaη is the learning rate.

o xix_ixi is the input feature.

4.1.4 Weight Space

• The weight space is the multi-dimensional space where each point

represents a set of weights for the TLU. During training, the weight vector
is adjusted iteratively, moving through this space towards the optimal
weights that minimize error.

Fig 6. Weight Space

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 25

4.1.5 The Widrow-Hoff Procedure

• The Widrow-Hoff rule, also known as the delta rule, is a gradient descent
method for updating the weights of the TLU. It adjusts weights in the
direction that minimizes the error between the predicted and target outputs.

4.1.6 Training a TLU on Non-Linearly Separable Training Sets

• Non-linearly separable data refers to cases where no straight line or

hyperplane can separate the data into two classes. In these cases, a single
TLU won't work. This limitation leads to the development of more
complex architectures like multi-layer networks (i.e., neural networks).

4.2 Linear Machines

• Linear Machines are models that separate data using linear decision
boundaries. These include linear classifiers like perceptrons, which can
be used to classify linearly separable data. However, they struggle with
non-linearly separable data, which leads to the development of more
complex neural network models.

4.3 Networks of TLUs

A network of TLUs refers to a collection of TLUs arranged in layers to solve more

complex problems.

4.3.1 Motivation and Examples

• The motivation behind networks of TLUs is to tackle more complex, non-

linear problems. By stacking multiple TLUs into layers, networks can
create complex decision boundaries, enabling the classification of non-
linearly separable data.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 26

• Example: An XOR function, which is not linearly separable, can be solved
by a simple network of TLUs.

4.3.2 Madalines

• Madalines (Multiple Adaline Units) are a type of neural network that

consists of several Adaline units, which are similar to TLUs but use linear
activation functions. These networks were used to solve problems that
single-layer networks couldn't handle.

4.3.3 Piecewise Linear Machines

• These machines are neural networks that use piecewise linear activation
functions. They can approximate any continuous function by combining
several linear segments.

4.3.4 Cascade Networks

• Cascade Networks are a type of network where the outputs of earlier

layers are fed into subsequent layers. This approach allows the network to
build complex decision boundaries step by step.

4.4 Training Feedforward Networks by Backpropagation

4.4.1 Notation

• Feedforward Networks: These networks are composed of layers of

neurons, where each layer is fully connected to the next one, and
information flows in one direction (from input to output).

• Notation: The input to each neuron is represented as a vector, and weights

are associated with the connections between neurons. The output of a
neuron is computed as a weighted sum of its inputs, followed by an
activation function.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 27

4.4.2 The Backpropagation Method

• Backpropagation is the key algorithm used for training multi-layer neural

networks. It computes the gradient of the error with respect to each weight
by applying the chain rule of calculus, propagating the error backwards
from the output layer to the input layer.

4.4.3 Computing Weight Changes in the Final Layer

• The weights in the final layer of a network are adjusted based on the error
between the predicted output and the target. The weight updates are
proportional to the error gradient and the input values to the layer.

4.4.4 Computing Changes to the Weights in Intermediate Layers

• For intermediate layers, the weight updates depend on the error from the
subsequent layer, multiplied by the derivative of the activation function.
This allows the network to learn from both the direct and indirect
contributions of neurons to the final output.

4.4.5 Variations on Backprop

• Stochastic Gradient Descent (SGD): A popular variant of

backpropagation where weights are updated after processing each
individual training example, instead of after processing the entire dataset.

• Mini-batch Gradient Descent: A hybrid approach where weights are

updated after processing small batches of data, balancing computational
efficiency and convergence speed.

4.4.6 An Application: Steering a Van

• Backpropagation can be used in applications like autonomous driving,

where a neural network might be trained to steer a van by processing
sensory inputs (such as camera images or lidar data) and outputting

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 28

steering commands. This involves mapping inputs to desired outputs
(steering angles) through a multi-layer neural network.

4.5 Synergies Between Neural Networks and Knowledge-Based Methods

• Neural networks and knowledge-based methods can work together to

enhance learning. Knowledge-based systems use explicit rules and
domain-specific knowledge to guide decision-making, while neural
networks can learn patterns from data. By combining both, we can create
more robust models that leverage both learned data and predefined
knowledge.

Summary of Key Concepts:

1. Threshold Logic Units (TLUs): Basic building blocks of neural networks

that classify based on linear thresholds.

2. Linear Machines: Classifiers that separate data using linear decision

boundaries.

3. Networks of TLUs: Multi-layer architectures that allow solving non-linear

problems.

4. Backpropagation: The algorithm for training multi-layer neural networks

by adjusting weights based on error gradients.

5. Applications: Neural networks can be used for complex tasks like

classification, regression, and control systems (e.g., steering a van).

6. Synergy with Knowledge-Based Methods: Neural networks and

knowledge-based methods can complement each other to improve model
performance.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 29

Neural networks, especially with techniques like backpropagation, have become
the foundation for many modern machine learning tasks due to their ability to
handle complex, non-linear problems.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 30

5. Statistical Learning
Statistical learning is a framework in machine learning where models are trained
based on statistical theory. The goal is to make predictions about unknown data
based on observed data. Statistical methods offer a robust approach to dealing
with uncertainty and variability in real-world data. The main methods discussed
in this section are Statistical Decision Theory, Belief Networks, and Nearest-
Neighbor Methods.

5.1 Using Statistical Decision Theory

Statistical Decision Theory provides a framework for decision-making under

uncertainty. It aims to model and optimize the decision-making process by
considering potential outcomes, their associated probabilities, and the costs or
benefits of those outcomes.

5.1.1 Background and General Method

The general method of Statistical Decision Theory involves:

1. Defining the decision problem: The problem is modeled with a set of

possible actions and outcomes.

2. Assigning probabilities to the possible outcomes of each decision.

3. Assigning costs or utilities to each possible outcome, based on the

decision made.

4. Choosing the decision that maximizes the expected utility or minimizes

the expected loss.

This is especially useful in situations where we have incomplete knowledge about

the data or the environment, and we aim to make the best decision given the
uncertainty.
SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 31
5.1.2 Gaussian (or Normal) Distributions

A Gaussian distribution (also known as a normal distribution) is a probability

distribution commonly used in statistical learning due to its many useful
properties, such as its symmetry and the central limit theorem.

fig 7. The Two-Dimensional Gaussian Distribution

• Properties of Gaussian distributions:

o It is fully characterized by its mean (μ\muμ) and variance

(σ2\sigma^2σ2).

o The probability density function (PDF) is given by:

f(x)=1σ2πexp⁡(−(x−μ)22σ2)f(x) = \frac{1}{\sigma\sqrt{2\pi}}
\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)f(x)=σ2π1
exp(−2σ2(x−μ)2)

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 32

o Bell curve shape: The distribution has a peak at the mean and tails
that extend towards infinity.

Gaussian distributions are widely used in modeling real-world data, especially

when the data exhibits variability around a central value. In many machine
learning models (e.g., Gaussian Naive Bayes), it’s assumed that the features
follow a Gaussian distribution.

5.1.3 Conditionally Independent Binary Components

In some learning problems, we assume that the features (or variables) are
conditionally independent given the target class. This assumption is central to
models like Naive Bayes classifiers.

• Conditional Independence: The assumption is that, given the class label,

the individual features do not influence each other. Mathematically, for
binary features X1,X2,…,XnX_1, X_2, \dots, X_nX1,X2,…,Xn and class
YYY, this assumption can be written as:
P(X1,X2,…,Xn∣Y)=∏i=1nP(Xi∣Y)P(X_1, X_2, \dots, X_n \mid Y) =
\prod_{i=1}^{n} P(X_i \mid Y)P(X1,X2,…,Xn∣Y)=i=1∏nP(Xi∣Y)

This simplifies the model and makes it computationally feasible, though it may
not always be true in practice. Nevertheless, the simplicity of this assumption
often leads to good performance, especially when the features are not strongly
dependent.

5.2 Learning Belief Networks

Belief Networks (also known as Bayesian Networks) are a type of probabilistic

graphical model used to represent the conditional dependencies between variables
in a compact form. These networks consist of nodes (representing variables) and
directed edges (representing dependencies).

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 33

• Learning Belief Networks involves:

o Modeling the joint probability distribution of a set of variables.

o Using Bayes' theorem to update beliefs as new evidence is

observed.

o Inference: Computing the posterior probabilities of certain variables

given observed data.

Belief networks are powerful tools for handling uncertainty and for building
models where multiple variables interact in complex ways. They are used in areas
such as decision support systems, diagnostics, and pattern recognition.

5.3 Nearest-Neighbor Methods

Nearest-Neighbor Methods are a class of algorithms used for classification and

regression. They work by comparing new data points to the most similar, or
"nearest," data points in the training set and making predictions based on the
known outcomes of those neighbors.

Key Points of Nearest-Neighbor Methods:

• k-Nearest Neighbors (k-NN): A widely used method where the class or

value of a new data point is predicted based on the majority class (for
classification) or the average value (for regression) of the k nearest data
points in the training set.

o Distance Metric: The proximity between points is typically

measured using distance metrics like Euclidean distance:

dist(x,x′)=∑i=1n(xi−xi′)2\text{dist}(x, x') = \sqrt{\sum_{i=1}^{n}(x_i -

x'_i)^2}dist(x,x′)=i=1∑n(xi−xi′)2

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 34

Other distance metrics can be used depending on the type of data, such as
Manhattan distance or cosine similarity.

o Choosing k: The number of neighbors (k) is a critical parameter. A

small value of k makes the model sensitive to noise, while a large
value can make the model overly smooth and less sensitive to local
patterns.

• Advantages:

o Simple and intuitive.

o Non-parametric: It makes no assumptions about the underlying data

distribution.

o Works well for both classification and regression tasks.

• Disadvantages:

o Computationally expensive, especially for large datasets, because it

requires calculating distances to all training points.

o Sensitive to the choice of distance metric and the scaling of features.

Summary of Key Concepts

1. Statistical Decision Theory:

o A framework for making decisions under uncertainty by modeling

actions, outcomes, and probabilities.

2. Gaussian Distributions:

o A common probability distribution in statistical learning, used for

modeling continuous data with symmetry around a mean.

3. Conditionally Independent Binary Components:

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 35
o Assumption of independence between features given the class,
central to models like Naive Bayes.

4. Belief Networks:

o Graphical models that represent the probabilistic relationships

between variables.

5. Nearest-Neighbor Methods:

o A non-parametric approach to classification and regression based on

the similarity between new data points and training data.

These statistical learning methods provide the theoretical foundation for many
machine learning algorithms, from basic classifiers to sophisticated probabilistic
models. They help guide decision-making in uncertain environments, model
complex dependencies, and make predictions based on observed data.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 36

6. Decision Trees
Decision Trees are one of the most popular and interpretable machine learning
algorithms. They are used for both classification and regression tasks and work
by partitioning the feature space into subsets and making predictions based on the
majority class or average value of the data points in each subset. A decision tree
is structured as a tree, where each internal node represents a test (or decision) on
a feature, each branch represents the outcome of that test, and each leaf node
represents a class label or a continuous value.

Fig 8 A Decision Tree

6.1 Definitions

A decision tree is a flowchart-like structure where:

• Nodes represent tests on attributes (features of the data).

• Edges (branches) represent the outcomes of those tests.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 37

• Leaves represent a decision or classification label (for classification tasks)
or a predicted value (for regression tasks).

Important Terminology:

• Root Node: The topmost node in a decision tree, where the first decision
is made.

• Internal Nodes: Nodes that represent decision tests based on input

features.

• Leaf Nodes: Terminal nodes that assign a class label or output a predicted
value.

• Branches: Edges connecting nodes, representing possible outcomes of the

test.

6.2 Supervised Learning of Univariate Decision Trees

Univariate decision trees use a single feature (attribute) at each decision node to
split the data. This makes the tree interpretable, as each decision only considers
one feature at a time.

6.2.1 Selecting the Type of Test

When building a decision tree, the first step is to decide what type of test to use
at each node. Tests can involve:

• Threshold tests for continuous features (e.g., "Is age > 30?").

• Categorical tests for discrete features (e.g., "Is the color red?").

The choice of tests influences the structure of the tree and how well it generalizes
to unseen data.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 38

6.2.2 Using Uncertainty Reduction to Select Tests

The goal of a decision tree is to reduce uncertainty (or entropy) at each node.
One popular criterion to decide how to split the data at each node is the
Information Gain (or reduction in entropy).

• Entropy is a measure of uncertainty or impurity in the dataset.

o For classification, the entropy H(S)H(S)H(S) of a dataset SSS is

defined as: H(S)=−∑i=1kpilog⁡2piH(S) = - \sum_{i=1}^{k} p_i
\log_2 p_iH(S)=−i=1∑kpilog2pi where pip_ipi is the probability of
a class label iii in the set SSS.

• Information Gain is the reduction in entropy achieved by splitting the

dataset based on a particular attribute.

o It is calculated as the difference between the entropy of the original

set and the weighted sum of the entropies of the subsets created by
the split.

The attribute that maximizes Information Gain is chosen for the test at the
current node.

6.2.3 Non-Binary Attributes

Decision trees can handle both binary (true/false) and non-binary (multiple
categories) attributes. For non-binary attributes, a test could involve comparing
the attribute to several possible values or ranges. The splitting criteria can be
generalized by using multi-way splits instead of just binary splits.

6.3 Networks Equivalent to Decision Trees

Certain types of networks, such as feedforward neural networks or regression

trees, can represent the same decision-making process as decision trees. These

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 39

networks might have additional complexity or flexibility, but conceptually they
can achieve similar outcomes.

Fig9: A Decision Tree with Subtree Replication

• Decision Trees can be viewed as shallow neural networks that are

specifically designed for interpretable decision-making.

• Some methods like Madalines (Multiple Adaptive Linear Elements)

attempt to bridge the gap between decision trees and neural networks by
using a combination of threshold units to replicate the decision-making
process of a tree.

6.4 Overfitting and Evaluation

Overfitting occurs when a model learns too much from the training data,
capturing noise and irregularities instead of generalizable patterns. This leads to
poor performance on unseen data.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 40

6.4.1 Overfitting

Overfitting happens when the decision tree becomes too complex, splitting the
data into many small subsets that are too specific to the training data. While this
results in perfect accuracy on the training set, the model performs poorly on new
data.

• Signs of Overfitting: The model has high accuracy on training data but
low accuracy on validation/test data.

6.4.2 Validation Methods

To evaluate the performance of a decision tree and combat overfitting, various

validation methods are used:

• Cross-validation: Splitting the data into multiple subsets (folds) and

training and testing the model on different combinations of these folds.

• Holdout Method: Splitting the dataset into training and testing sets and
using the testing set to evaluate the model.

• Bootstrap Sampling: Randomly sampling from the training set to build

multiple models and testing on the unseen data.

6.4.3 Avoiding Overfitting in Decision Trees

Several techniques can be used to prevent overfitting in decision trees:

• Pruning: Reducing the size of the tree after it has been grown, removing
branches that do not provide significant predictive value.

• Limiting tree depth: Restricting the maximum depth of the tree to prevent
excessive complexity.

• Minimum samples per leaf: Setting a minimum number of data points

required in a leaf node to prevent the tree from creating overly specific
rules for small subsets of the data.
SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 41
6.4.4 Minimum-Description Length Methods

The Minimum-Description Length (MDL) principle is a way to balance model

complexity with accuracy. The idea is to select the tree that minimizes the total
description length (the number of bits needed to describe both the tree and the
data). This is closely related to Occam’s Razor, where simpler models are
preferred if they perform similarly to more complex ones.

6.5 The Problem of Replicated Subtrees

In decision trees, replicated subtrees can occur when the same subset of data is
processed by multiple branches of the tree. This redundancy can be inefficient
and unnecessary. Identifying and eliminating such replicated subtrees helps
reduce the tree's complexity.

6.6 The Problem of Missing Attributes

A common issue in real-world datasets is missing attribute values. Decision trees

can handle missing data in several ways:

• Imputation: Replacing missing values with estimates, such as the mean or

median value for continuous attributes or the most common value for
categorical attributes.

• Handling Missing Values During Splitting: When splitting data at a

node, decision trees can handle missing values by assigning them to the
branch that most closely matches the missing attribute's distribution.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 42

6.7 Comparisons

• Advantages of Decision Trees:

o Interpretability: Easy to understand and visualize.

o Non-parametric: No assumptions about the underlying data

distribution.

o Can handle both classification and regression tasks.

• Disadvantages of Decision Trees:

o Overfitting: Susceptible to overfitting, especially with deep trees.

o Instability: Small changes in the data can result in a very different

tree.

o Bias: Can be biased toward features with more levels or continuous

attributes.

Summary

• Decision Trees are powerful models for classification and regression that
partition the feature space based on tests.

• Key techniques such as information gain, pruning, and cross-validation

are essential for training robust decision trees.

• Overfitting is a critical challenge, and methods like pruning and limiting

tree depth can help mitigate it.

• Decision trees are also susceptible to issues such as replicated subtrees

and missing attributes, but these can be addressed with proper techniques.

By understanding these key components and strategies, decision trees can be

effectively applied to a variety of machine learning problems.
SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 43
7. Inductive Logic Programming (ILP)
Inductive Logic Programming (ILP) is a subfield of machine learning that
focuses on learning logic-based models, such as first-order logic rules, from
examples. Unlike traditional machine learning algorithms that typically operate
with propositional (flat) data, ILP operates with relational (structured) data, where
learning takes place over sets of objects and their relationships.

ILP combines elements of inductive learning (learning from examples) with

logic programming, allowing for the induction of rules that generalize over
structured data. The output of an ILP system is typically a set of logical rules that
can explain or predict unseen examples based on the provided input data.

7.1 Notation and Definitions

To better understand ILP, it’s essential to familiarize oneself with the notation and
definitions used in logic programming and inductive learning:

• Literals: A literal is a basic statement or its negation. For example,

human(X) is a literal, and ¬human(X) represents the negation.

• Atoms: An atom is a basic relation or predicate applied to arguments. For

instance, likes(john, pizza) is an atom.

• Clauses: A clause is a disjunction (OR) of literals, which can be interpreted

as a set of logical rules. A Horn clause is a special type of clause that is
used in ILP, which consists of a head (a positive literal) and a body (a
conjunction of literals).

• Background Knowledge: This is the set of predefined facts or rules that

the ILP system has access to during learning.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 44

• Positive and Negative Examples: In ILP, examples are given in terms of
positive and negative instances. A positive example satisfies the concept
(target) being learned, whereas a negative example does not.

Key Components of ILP:

1. Training Data: Examples represented in a logical form, such as sets of

facts or tuples.

2. Hypotheses: The learned rules or models that generalize the patterns in the
data.

3. Background Knowledge: Domain-specific facts or rules that provide

context to the learning task.

4. Target Concept: The concept or relationship that the ILP system is tasked
to learn, typically expressed as a logical rule.

7.2 A Generic ILP Algorithm

A generic ILP algorithm typically follows these steps:

Fig 10 Sufficient, Necessary, and Consistent Programs

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 45

1. Input:

o A set of positive examples: Instances where the target concept is

true.

o A set of negative examples: Instances where the target concept is

false.

o Background knowledge: Domain-specific facts and rules that help

guide the search for hypotheses.

2. Hypothesis Space: The space of potential hypotheses consists of logical

rules that describe the target concept. Each hypothesis is a logic clause that
relates to the given examples and background knowledge.

3. Inductive Search: The system searches for generalizations of the positive

examples, starting from the most specific hypotheses (that only cover a
small number of examples) and gradually generalizing. The algorithm uses
various search strategies, such as breadth-first search, depth-first
search, or beam search, to explore the hypothesis space.

4. Refinement: The candidate hypotheses are refined iteratively by adding or

removing literals. For example, the system may start with a hypothesis that
explains only a subset of the positive examples, and then incrementally add
conditions (literals) to make the hypothesis cover more examples.

5. Termination: The process stops when an optimal hypothesis is found, or

when a stopping condition is met (e.g., no further improvements can be
made or the hypothesis reaches a predefined level of complexity).

6. Output: The final learned rule or set of rules that describe the target
concept, such as a set of Horn clauses.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 46

Example of a Generic ILP Algorithm:

For example, in the context of learning to predict whether a person is a "parent",

the input could include background knowledge about family relationships (e.g.,
mother(X, Y) means X is the mother of Y), positive examples (e.g., parent(john)),
and negative examples (e.g., ¬parent(mary)).

The ILP system could generate hypotheses such as:

• parent(X) :- mother(X, Y). This hypothesis suggests that if X is a mother

of someone (Y), then X is a parent. The system can then refine this
hypothesis by exploring more examples and relationships, such as
considering fathers or additional background knowledge.

7.3 An Example

To better illustrate how ILP works, consider an example where the goal is to learn
a rule for classifying animals based on their attributes. Suppose the system is
provided with background knowledge about different animal species and their
features (e.g., has_wings(X) means X has wings, flies(X) means X flies, etc.),
along with positive and negative examples of animals (e.g., eagle is a positive
example, dog is a negative example).

Step-by-Step Example:

1. Positive Example: eagle(fly), sparrow(fly).

2. Negative Example: dog(no_fly), cat(no_fly).

3. Background Knowledge:

o has_wings(X) means X has wings.

o flies(X) means X flies.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 47

An ILP system might deduce the rule:

• flies(X) :- has_wings(X). This rule indicates that if an animal has wings, it

can fly.

7.4 Inducing Recursive Programs

One of the most powerful aspects of ILP is its ability to induce recursive logic.
This is particularly useful when learning tasks involve hierarchical or recursive
relationships, such as in natural language processing or reasoning tasks.

For instance, consider a recursive rule:

• ancestor(X, Y) :- parent(X, Y).

• ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y).

In this case, an ancestor of Y can be either a direct parent or a parent of a parent

(i.e., a grandparent). The recursive structure allows the system to generalize over
chains of relationships and generate more complex rules.

7.5 Choosing Literals to Add

In ILP, the process of choosing which literals to add to a rule is crucial for refining
hypotheses. Literals can be added based on their utility in increasing the
hypothesis’s explanatory power. Some strategies for choosing literals include:

• Entropy-based measures: Where literals that reduce uncertainty the most

are preferred.

• Greedy search: Adding literals that maximize information gain or reduce

error in the current hypothesis.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 48

Choosing literals effectively involves balancing complexity (keeping the model
simple) with accuracy (fitting the data well).

7.6 Relationships Between ILP and Decision Tree Induction

ILP and decision tree induction share similarities in that both are used for
supervised learning tasks, but they differ in their approach and output.

• Decision Trees: Decision trees learn a series of binary tests on features and
generate a tree structure to make predictions.

• ILP: ILP, in contrast, generates logical rules or Horn clauses that describe
patterns in the data. These rules are more general than decision tree splits,
as they can represent more complex relationships.

However, the core similarity is that both ILP and decision trees search for patterns
in data and output rules that can be used to classify new instances.

Summary

Inductive Logic Programming (ILP) is a powerful framework for learning logical

rules from structured data. Key features include:

• Relational data: ILP works with structured data, where examples are not
just individual instances but can involve relationships between entities.

• Logic-based rules: The output of ILP is typically a set of logical rules that
explain patterns in the data.

• Recursive rules: ILP is capable of learning recursive and hierarchical

relationships, making it suitable for more complex tasks.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 49

• Connection with decision trees: While decision trees are simpler, ILP can
represent more complex patterns through logical rules.

ILP is especially useful in domains where background knowledge and

structured data are available, such as bioinformatics, natural language
processing, and knowledge discovery.

Conclusion
Machine learning has established itself as a pivotal field in artificial intelligence,
empowering systems to learn from data and make decisions independently. By
distinguishing between types of learning, such as supervised, unsupervised, and
reinforcement learning, we can understand how different approaches suit a wide
range of applications, from predictive modeling to complex decision-making.
Boolean functions and version spaces illustrate machine learning’s logical
foundations, where algorithms form structured rules and iteratively refine
hypotheses. Neural networks, particularly with advanced training techniques like
backpropagation, have demonstrated exceptional capability in capturing
complex, non-linear relationships, making them suitable for tasks that require
deep pattern recognition.
Statistical learning methods offer robust tools for handling data variability and
uncertainty, relying on probabilistic models and inference techniques that
optimize decision-making under uncertain conditions. Moreover, the
interpretability of models like decision trees and the logical structure of inductive
logic programming (ILP) provide transparency in predictions and are invaluable
in applications where understanding the model’s decision process is crucial.
Overall, the adaptability of machine learning makes it indispensable across
diverse fields, allowing systems to learn continuously and respond to new
information. This foundation supports further advancements and opens up
possibilities for sophisticated, adaptive, and efficient AI-driven solutions across
industries.

SASI INSTITUTE OF TECHNOLOGY & ENGINEERING Page | 50

State Farm Class Action
100% (1)
State Farm Class Action
26 pages
Deep Learning and Its Applications
No ratings yet
Deep Learning and Its Applications
21 pages
224 Block-3
No ratings yet
224 Block-3
129 pages
Data Analytics in Project Management Spalek Seweryn Instant Download
No ratings yet
Data Analytics in Project Management Spalek Seweryn Instant Download
85 pages
Iii Sem - Ai19442 - Foml
No ratings yet
Iii Sem - Ai19442 - Foml
42 pages
Ml-Lab-Manual Cse
No ratings yet
Ml-Lab-Manual Cse
69 pages
Updated 1Z0-1080-25 Dumps With Real Exam Questions
No ratings yet
Updated 1Z0-1080-25 Dumps With Real Exam Questions
9 pages
IIT KGP CALENDER - Cohort 3
No ratings yet
IIT KGP CALENDER - Cohort 3
2 pages
HFT Strategies
No ratings yet
HFT Strategies
4 pages
Machine Learning 6th Sem
No ratings yet
Machine Learning 6th Sem
43 pages
Cross Validation and Performance Evaluation
No ratings yet
Cross Validation and Performance Evaluation
47 pages
PM Resume
No ratings yet
PM Resume
1 page
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
11 pages
Harshith
No ratings yet
Harshith
54 pages
TY BTech CSE Honors Structures and Syllabus 220722
No ratings yet
TY BTech CSE Honors Structures and Syllabus 220722
25 pages
18 ML Lab Manual Final
No ratings yet
18 ML Lab Manual Final
30 pages
Hebbian Deep Learning Without Feedback
No ratings yet
Hebbian Deep Learning Without Feedback
28 pages
AI Manual-2021-2022 (Even) - Lab Manual
100% (1)
AI Manual-2021-2022 (Even) - Lab Manual
37 pages
ML Notes Mod 4
No ratings yet
ML Notes Mod 4
27 pages
Proactive Failure Detection of Automotive Components and Its Recovery Recommendations Using Static Rule Engine and LLM Models
No ratings yet
Proactive Failure Detection of Automotive Components and Its Recovery Recommendations Using Static Rule Engine and LLM Models
11 pages
Presentation of ML
No ratings yet
Presentation of ML
20 pages
PPAP4.0 - Using AI To Improve PPAP Effectiveness by John Cachat Nov 2024
No ratings yet
PPAP4.0 - Using AI To Improve PPAP Effectiveness by John Cachat Nov 2024
12 pages
Detection of Malicious Hyperlinks Using Machine Learning A Proposed System
No ratings yet
Detection of Malicious Hyperlinks Using Machine Learning A Proposed System
4 pages
E Book Strengthen Your Business With AI Solutions
No ratings yet
E Book Strengthen Your Business With AI Solutions
37 pages
Machine Learning With Python Report
100% (1)
Machine Learning With Python Report
41 pages
Adoc - Pub Irfan Abbas Vincent Suhartono Stefanus Santosa Abs
No ratings yet
Adoc - Pub Irfan Abbas Vincent Suhartono Stefanus Santosa Abs
15 pages
Machine Learning
No ratings yet
Machine Learning
64 pages
Java Project Report
No ratings yet
Java Project Report
11 pages
Technology Mca Master of Computer Applications Semester 2 2024 May Artifical Intelligence and Machine Learning Rev 2019 C Scheme
No ratings yet
Technology Mca Master of Computer Applications Semester 2 2024 May Artifical Intelligence and Machine Learning Rev 2019 C Scheme
1 page
Solving The Rubik S Cube With
No ratings yet
Solving The Rubik S Cube With
8 pages
Government Polytechnic College: Machine Learning
No ratings yet
Government Polytechnic College: Machine Learning
22 pages
Eigenvector Spatial Filtering Enhancing Natural Hazards V - 2024 - Environmental
No ratings yet
Eigenvector Spatial Filtering Enhancing Natural Hazards V - 2024 - Environmental
19 pages
AI & ML - LAB MANUAL - 3 Varients - 23-24-1
No ratings yet
AI & ML - LAB MANUAL - 3 Varients - 23-24-1
105 pages
Brochure CMU Computer Vision 22-May-2024 V9
No ratings yet
Brochure CMU Computer Vision 22-May-2024 V9
10 pages
The Futureof Enterpriseresourceplanning ERPHarnessing Artificial Intelligence
No ratings yet
The Futureof Enterpriseresourceplanning ERPHarnessing Artificial Intelligence
6 pages
Machine Learning: Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya
No ratings yet
Machine Learning: Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya
333 pages
A Beginner's Guide To Large Language Mo-Ebook-Part1
No ratings yet
A Beginner's Guide To Large Language Mo-Ebook-Part1
25 pages
ML Session Plan-2023-24
No ratings yet
ML Session Plan-2023-24
14 pages
Session 18-Cluster Analysis
No ratings yet
Session 18-Cluster Analysis
20 pages
Ai Repair Paper
No ratings yet
Ai Repair Paper
4 pages
ML Mod-2
No ratings yet
ML Mod-2
33 pages
Machine Learning (6CS4-02) Unit-1 Notes
No ratings yet
Machine Learning (6CS4-02) Unit-1 Notes
34 pages
CSE-DS Power BI Updated Lab Manual
No ratings yet
CSE-DS Power BI Updated Lab Manual
99 pages
Cs3491 Artificial Intelilgence and Machine Learning
No ratings yet
Cs3491 Artificial Intelilgence and Machine Learning
27 pages
AI Unit 4
No ratings yet
AI Unit 4
22 pages
Data Analytics - Project Videos & Ideas
No ratings yet
Data Analytics - Project Videos & Ideas
6 pages
X - AI - Question Bank2022
No ratings yet
X - AI - Question Bank2022
7 pages
Ise Aiml-Lab Manual
No ratings yet
Ise Aiml-Lab Manual
47 pages
Screenshot 2024-05-28 at 12.25.15 PM
No ratings yet
Screenshot 2024-05-28 at 12.25.15 PM
53 pages
Major Project Documentation Final
No ratings yet
Major Project Documentation Final
40 pages
Machine Learning (R22a6602)
No ratings yet
Machine Learning (R22a6602)
125 pages
IT Report PDF
No ratings yet
IT Report PDF
24 pages
VMTW ML Lab Manual
No ratings yet
VMTW ML Lab Manual
37 pages
ML Lab Manual Simplified
No ratings yet
ML Lab Manual Simplified
40 pages
Upload Unit 1
No ratings yet
Upload Unit 1
36 pages
Revised Handout 15ec3054 MLC
No ratings yet
Revised Handout 15ec3054 MLC
18 pages
Course Pack - V Sem Machine Learning by DR SantoshKumar5
No ratings yet
Course Pack - V Sem Machine Learning by DR SantoshKumar5
27 pages
CS3491 Artificial Intelilgence and Machine Learning
No ratings yet
CS3491 Artificial Intelilgence and Machine Learning
27 pages
Iii-Ii Aids R22 ML
No ratings yet
Iii-Ii Aids R22 ML
25 pages
A Deep Convolutional Neural Network For Wafer
No ratings yet
A Deep Convolutional Neural Network For Wafer
9 pages
ML Using Python IT UPDATED
No ratings yet
ML Using Python IT UPDATED
53 pages
Ragulvishnu Seminar
No ratings yet
Ragulvishnu Seminar
42 pages
ML Lab Manual 20-06
No ratings yet
ML Lab Manual 20-06
40 pages
Lab Manual
No ratings yet
Lab Manual
41 pages
BSC Final Project
No ratings yet
BSC Final Project
8 pages
ML Unit I
No ratings yet
ML Unit I
13 pages
CE5008 - Machine Intelligence
No ratings yet
CE5008 - Machine Intelligence
6 pages
21ai66 ML Lab Manual
No ratings yet
21ai66 ML Lab Manual
41 pages
Embedded Based Food Quality Detection
No ratings yet
Embedded Based Food Quality Detection
2 pages
Dilip Bagercha Report
No ratings yet
Dilip Bagercha Report
53 pages
TRAINING REPORT Abha Shrivas 0801EC171002
No ratings yet
TRAINING REPORT Abha Shrivas 0801EC171002
17 pages
Part1 Ai&Ml Course File
No ratings yet
Part1 Ai&Ml Course File
26 pages
CHO - Artificial Intelligence and Machine Learning
No ratings yet
CHO - Artificial Intelligence and Machine Learning
16 pages
UNIT-1 Machine Learning
No ratings yet
UNIT-1 Machine Learning
43 pages
AIML Course File
No ratings yet
AIML Course File
13 pages
LecturePlan BI521 22CST-347
No ratings yet
LecturePlan BI521 22CST-347
8 pages
Skumar
No ratings yet
Skumar
25 pages
Shareef
No ratings yet
Shareef
29 pages
Lecture 1 Introduction To Machine Learning - Notes
No ratings yet
Lecture 1 Introduction To Machine Learning - Notes
9 pages
Seminar Report On Machine Learing
33% (3)
Seminar Report On Machine Learing
30 pages
Machine Learning Syllabus
No ratings yet
Machine Learning Syllabus
10 pages
A PDF
No ratings yet
A PDF
26 pages
Machine Learning (6CS4-02) Unit-3 Notes
No ratings yet
Machine Learning (6CS4-02) Unit-3 Notes
21 pages
Artifical Intelligence and Machine Learning Lab
No ratings yet
Artifical Intelligence and Machine Learning Lab
109 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
5 pages
R25 Department Vision & Mission
No ratings yet
R25 Department Vision & Mission
3 pages
Ec3561 Vlsi Laboratory L T P C
No ratings yet
Ec3561 Vlsi Laboratory L T P C
6 pages
ML Unit-4
No ratings yet
ML Unit-4
40 pages