Vision & Mission
Vision & Mission
Institute Vision:
Institute Mission:
Department Vision:
Department Mission:
PEO 1. Graduates shall have technical knowledge and skills in the area of
society.
OUTCOMES:
Program outcomes:
PO8: Ethics:
PO10: Communication:
Recognize the need for, and have the preparation and ability to engage in
Independent and life- long learning in the broadest context of technological
change.
Page | 1
Table of Figures
1. Figure 1: An AI System - Pg. 3
2. Figure 2: An Input-Output Function - Pg. 7
3. Figure 3: Implementing the Version Space - Pg. 16
4. Figure 4: A Version Graph for Terms - Pg. 18
5. Figure 5: A Threshold Logic Unit (TLU) - Pg. 23
6. Figure 6: Weight Space - Pg. 25
7. Figure 7: The Two-Dimensional Gaussian Distribution - Pg. 32
8. Figure 8: A Decision Tree - Pg. 37
9. Figure 9: A Decision Tree with Subtree Replication - Pg. 40
10.Figure 10: Sufficient, Necessary, and Consistent Programs - Pg. 45
Page | 2
Introduction to Machine Learning
Fig 1 An AI System
Machine learning is the field of study that gives computers the ability to learn
without being explicitly programmed. It involves designing algorithms and
models that can learn patterns from data and make predictions or decisions based
on that learning.
• Key Characteristics:
• History:
• Key Influences:
Machine learning can be categorized based on the type of data used and the
learning process. The three main categories are:
1. Supervised Learning:
2. Unsupervised Learning:
o Definition: The model is given data without explicit labels and must
find patterns, groupings, or structures within the data on its own.
3. Reinforcement Learning:
2. Unsupervised Learning:
3. Reinforcement Learning:
4. Transfer Learning:
5. Few-shot Learning:
An input vector refers to the representation of the input data fed into a machine
learning model. Each data point can be viewed as a vector of features or variables,
which are the dimensions of the data that the model uses to make predictions or
classifications.
• Feature Selection: The process of identifying and using the most relevant
features while ignoring redundant or irrelevant ones.
1.2.3 Outputs
The output of a machine learning model is what the model predicts or produces
after being trained on input data. Depending on the task, outputs can take many
forms:
In supervised learning, the output is typically compared to the true label (or
ground truth) to calculate the model’s performance, usually via loss functions like
Mean Squared Error (MSE) or Cross-Entropy.
2.1 Representation
o NOT (¬): Reverses a condition. E.g., "if NOT (age > 50), then
predict ‘no’".
• Example: A decision rule like "age > 50 AND income > 30k" represents
a term (age > 50) and another term (income > 30k), combined with the
AND operator.
o "Predict ‘yes’ if (age > 50 AND income > 30k) OR (age <= 50 AND
income > 40k)."
DNF is highly relevant for models that use rule-based learning, like decision
trees, where each path in the tree can be seen as a conjunction of features.
o "Predict ‘no’ if (age > 50 OR income > 30k) AND (age <= 50 OR
income <= 20k)."
Decision lists are a sequence of ordered rules used for classification. Each rule in
a decision list is a Boolean expression, and the output is determined by the first
matching rule.
Decision lists are useful in situations where there is a priority among rules or
when conditions are complex.
• Symmetric Functions: These are Boolean functions that produce the same
output for any permutation of their input variables. In machine learning,
this can be useful when dealing with symmetric data, such as in ensemble
learning, where the order of the classifiers doesn’t matter.
Version Spaces:
• Definition: A Version Space is the set of all hypotheses that are consistent
with a given set of training examples. It is essentially the collection of all
possible models that could explain the observed data.
• Size of the version space: The number of possible hypotheses that could
fit the data.
• The nature of the training data: How noisy the data is and how well the
hypotheses generalize to unseen data.
• Version Graphs for Learning: These graphs help visualize the space of
possible hypotheses and make it easier to reason about the relationships
between hypotheses. By organizing hypotheses this way, learners can
explore the hypothesis space more efficiently. They can navigate the
version graph to eliminate inconsistent hypotheses based on new training
data and gradually refine the hypotheses set.
o The most general hypotheses (the least restrictive) are at the top.
o The most specific hypotheses (the most restrictive) are at the bottom.
Version graphs are particularly useful in inductive learning where the goal is to
identify the most specific hypothesis that still explains the training data.
Search Process:
• Initial Step: Start with a broad version space that includes all hypotheses.
Challenges in Search:
How it Works:
3. Refinement:
4. Convergence: Over time, as more examples are processed, the sets S and
G converge. The S set becomes more specific, and the G set becomes more
general, ultimately leading to a refined hypothesis that fits the training data.
Advantages:
Disadvantages:
Summary of Concepts:
• Version Space: The set of all hypotheses consistent with the training data.
A Threshold Logic Unit (TLU) is a type of artificial neuron used in early neural
network models. It acts as a basic building block for more complex neural
networks.
Where:
• The Widrow-Hoff rule, also known as the delta rule, is a gradient descent
method for updating the weights of the TLU. It adjusts weights in the
direction that minimizes the error between the predicted and target outputs.
• Linear Machines are models that separate data using linear decision
boundaries. These include linear classifiers like perceptrons, which can
be used to classify linearly separable data. However, they struggle with
non-linearly separable data, which leads to the development of more
complex neural network models.
4.3.2 Madalines
• These machines are neural networks that use piecewise linear activation
functions. They can approximate any continuous function by combining
several linear segments.
4.4.1 Notation
• The weights in the final layer of a network are adjusted based on the error
between the predicted output and the target. The weight updates are
proportional to the error gradient and the input values to the layer.
• For intermediate layers, the weight updates depend on the error from the
subsequent layer, multiplied by the derivative of the activation function.
This allows the network to learn from both the direct and indirect
contributions of neurons to the final output.
In some learning problems, we assume that the features (or variables) are
conditionally independent given the target class. This assumption is central to
models like Naive Bayes classifiers.
This simplifies the model and makes it computationally feasible, though it may
not always be true in practice. Nevertheless, the simplicity of this assumption
often leads to good performance, especially when the features are not strongly
dependent.
Belief networks are powerful tools for handling uncertainty and for building
models where multiple variables interact in complex ways. They are used in areas
such as decision support systems, diagnostics, and pattern recognition.
• Advantages:
• Disadvantages:
2. Gaussian Distributions:
4. Belief Networks:
5. Nearest-Neighbor Methods:
These statistical learning methods provide the theoretical foundation for many
machine learning algorithms, from basic classifiers to sophisticated probabilistic
models. They help guide decision-making in uncertain environments, model
complex dependencies, and make predictions based on observed data.
6.1 Definitions
Important Terminology:
• Root Node: The topmost node in a decision tree, where the first decision
is made.
• Leaf Nodes: Terminal nodes that assign a class label or output a predicted
value.
Univariate decision trees use a single feature (attribute) at each decision node to
split the data. This makes the tree interpretable, as each decision only considers
one feature at a time.
When building a decision tree, the first step is to decide what type of test to use
at each node. Tests can involve:
• Threshold tests for continuous features (e.g., "Is age > 30?").
• Categorical tests for discrete features (e.g., "Is the color red?").
The choice of tests influences the structure of the tree and how well it generalizes
to unseen data.
The goal of a decision tree is to reduce uncertainty (or entropy) at each node.
One popular criterion to decide how to split the data at each node is the
Information Gain (or reduction in entropy).
The attribute that maximizes Information Gain is chosen for the test at the
current node.
Decision trees can handle both binary (true/false) and non-binary (multiple
categories) attributes. For non-binary attributes, a test could involve comparing
the attribute to several possible values or ranges. The splitting criteria can be
generalized by using multi-way splits instead of just binary splits.
Overfitting occurs when a model learns too much from the training data,
capturing noise and irregularities instead of generalizable patterns. This leads to
poor performance on unseen data.
Overfitting happens when the decision tree becomes too complex, splitting the
data into many small subsets that are too specific to the training data. While this
results in perfect accuracy on the training set, the model performs poorly on new
data.
• Signs of Overfitting: The model has high accuracy on training data but
low accuracy on validation/test data.
• Holdout Method: Splitting the dataset into training and testing sets and
using the testing set to evaluate the model.
• Pruning: Reducing the size of the tree after it has been grown, removing
branches that do not provide significant predictive value.
• Limiting tree depth: Restricting the maximum depth of the tree to prevent
excessive complexity.
In decision trees, replicated subtrees can occur when the same subset of data is
processed by multiple branches of the tree. This redundancy can be inefficient
and unnecessary. Identifying and eliminating such replicated subtrees helps
reduce the tree's complexity.
Summary
• Decision Trees are powerful models for classification and regression that
partition the feature space based on tests.
To better understand ILP, it’s essential to familiarize oneself with the notation and
definitions used in logic programming and inductive learning:
2. Hypotheses: The learned rules or models that generalize the patterns in the
data.
4. Target Concept: The concept or relationship that the ILP system is tasked
to learn, typically expressed as a logical rule.
6. Output: The final learned rule or set of rules that describe the target
concept, such as a set of Horn clauses.
7.
7.3 An Example
To better illustrate how ILP works, consider an example where the goal is to learn
a rule for classifying animals based on their attributes. Suppose the system is
provided with background knowledge about different animal species and their
features (e.g., has_wings(X) means X has wings, flies(X) means X flies, etc.),
along with positive and negative examples of animals (e.g., eagle is a positive
example, dog is a negative example).
Step-by-Step Example:
3. Background Knowledge:
One of the most powerful aspects of ILP is its ability to induce recursive logic.
This is particularly useful when learning tasks involve hierarchical or recursive
relationships, such as in natural language processing or reasoning tasks.
In ILP, the process of choosing which literals to add to a rule is crucial for refining
hypotheses. Literals can be added based on their utility in increasing the
hypothesis’s explanatory power. Some strategies for choosing literals include:
ILP and decision tree induction share similarities in that both are used for
supervised learning tasks, but they differ in their approach and output.
• Decision Trees: Decision trees learn a series of binary tests on features and
generate a tree structure to make predictions.
• ILP: ILP, in contrast, generates logical rules or Horn clauses that describe
patterns in the data. These rules are more general than decision tree splits,
as they can represent more complex relationships.
However, the core similarity is that both ILP and decision trees search for patterns
in data and output rules that can be used to classify new instances.
Summary
• Relational data: ILP works with structured data, where examples are not
just individual instances but can involve relationships between entities.
• Logic-based rules: The output of ILP is typically a set of logical rules that
explain patterns in the data.
Conclusion
Machine learning has established itself as a pivotal field in artificial intelligence,
empowering systems to learn from data and make decisions independently. By
distinguishing between types of learning, such as supervised, unsupervised, and
reinforcement learning, we can understand how different approaches suit a wide
range of applications, from predictive modeling to complex decision-making.
Boolean functions and version spaces illustrate machine learning’s logical
foundations, where algorithms form structured rules and iteratively refine
hypotheses. Neural networks, particularly with advanced training techniques like
backpropagation, have demonstrated exceptional capability in capturing
complex, non-linear relationships, making them suitable for tasks that require
deep pattern recognition.
Statistical learning methods offer robust tools for handling data variability and
uncertainty, relying on probabilistic models and inference techniques that
optimize decision-making under uncertain conditions. Moreover, the
interpretability of models like decision trees and the logical structure of inductive
logic programming (ILP) provide transparency in predictions and are invaluable
in applications where understanding the model’s decision process is crucial.
Overall, the adaptability of machine learning makes it indispensable across
diverse fields, allowing systems to learn continuously and respond to new
information. This foundation supports further advancements and opens up
possibilities for sophisticated, adaptive, and efficient AI-driven solutions across
industries.