0% found this document useful (0 votes)

12 views24 pages

AIML

Uploaded by

thillai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views24 pages

AIML

Uploaded by

thillai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

MODULE I ARTIFICIAL INTELLIGENCE 9

Introduction, AI problems, foundation of AI and history of AI intelligent agents: Agents

and Environments, the concept of rationality, the nature of environments, structure of
agents, problem solving agents, problem formulation

(a) Intelligence - Ability to apply knowledge in order to perform better in an

environment. (b) Artificial Intelligence - Study and construction of agent
programs that perform well in a given environment, for a given agent
architecture. (c) Agent - An entity that takes action in response to precepts
from an environment. (d) Rationality - property of a system which does the
“right thing” given what it knows. (e) Logical Reasoning - A process of deriving
new sentences from old, such that the new sentences are necessarily true if the
old ones are true. Four Approaches of Artificial Intelligence: ➢ Acting humanly:
The Turing test approach. ➢ Thinking humanly: The cognitive modelling
approach. ➢ Thinking rationally: The laws of thought approach. ➢ Acting
rationally: The rational agent approach.

FUTURE OF ARTIFICIAL INTELLIGENCE • Transportation: Although it could take

a decade or more to perfect them, autonomous cars will one day ferry us from
place to place. • Manufacturing: AI powered robots work alongside humans to
perform a limited range of tasks like assembly and stacking, and predictive
analysis sensors keep equipment running smoothly. • Healthcare: In the
comparatively AI-nascent field of healthcare, diseases are more quickly and
accurately diagnosed, drug discovery is sped up and streamlined, virtual
nursing assistants monitor patients and big data analysis helps to create a
more personalized patient experience. • Education: Textbooks are digitized with
the help of AI, early-stage virtual tutors assist human instructors and facial
analysis gauges the emotions of students to help determine who’s struggling or
bored and better tailor the experience to their individual needs. • Media:
Journalism is harnessing AI, too, and will continue to benefit from it. Bloomberg
uses Cyborg technology to help make quick sense of complex financial reports.
The Associated Press employs the natural language abilities of Automated
Insights to produce 3,700 earning reports stories per year — nearly four times
more than in the recent past • Customer Service: Last but hardly least, Google
is working on an AI assistant that can place human-like calls to make
appointments at, say, your neighborhood hair salon. In addition to words, the
system understands context and nuance.
AGENTS AND ITS TYPES.

An agent is anything that can be viewed as perceiving its environment through

sensors and acting upon that environment through actuators. • Human
Sensors: • Eyes, ears, and other organs for sensors. • Human Actuators: •
Hands, legs, mouth, and other body parts. • Robotic Sensors: • Mic, cameras
and infrared range finders for sensors • Robotic Actuators: • Motors, Display,
speakers etc An agent can be: Human-Agent: A human agent has eyes, ears,
and other organs which work for sensors and hand, legs, vocal tract work for
actuators. Robotic Agent: A robotic agent can have cameras, infrared range
finder, NLP for sensors and various motors for actuators. Software Agent:
Software agent can have keystrokes, file contents as sensory input and act on
those inputs and display output on the screen. Hence the world around us is full
of agents such as thermostat, cell phone, camera, and even we are also
agents. Before moving forward, we should first know about sensors, effectors,
and actuators. Sensor: Sensor is a device which detects the change in the
environment and sends the information to other electronic devices. An agent
observes its environment through sensors. 8 Actuators: Actuators are the
component of machines that converts energy into motion. The actuators are
only responsible for moving and controlling a system. An actuator can be an
electric motor, gears, rails, etc. Effectors: Effectors are the devices which affect
the environment. Effectors can be legs, wheels, arms, fingers, wings, fins, and
display screen
MODULE SEARCHING
9
II
Searching for solutions, uniformed search strategies – Breadth first search, depth first
Search. Search with partial information (Heuristic search) Greedy best first search, A*
search Game Playing: Adversial search, Games, minimax, algorithm, optimal decisions
in multiplayer games, Alpha-Beta pruning, Evaluation functions, cutting of search
Uninformed Search Algorithms:

The search algorithms in this section have no additional information on the goal node
other than the one provided in the problem definition. The plans to reach the goal
state from the start state differ only by the order and/or length of actions.
Uninformed search is also called Blind search. These algorithms can only generate
the successors and differentiate between the goal state and non goal state.

The following uninformed search algorithms are discussed in this section.

1. Depth First Search
2. Breadth First Search
3. Uniform Cost Search
Each of these algorithms will have:
 A problem graph, containing the start node S and the goal node G.
 A strategy, describing the manner in which the graph will be traversed to
get to G.
 A fringe, which is a data structure used to store all the possible states
(nodes) that you can go from the current states.
 A tree, that results while traversing to the goal node.
 A solution plan, which the sequence of nodes from S to G.
Depth First Search:
Depth-first search (DFS) is an algorithm for traversing or searching tree or graph
data structures. The algorithm starts at the root node (selecting some arbitrary node
as the root node in the case of a graph) and explores as far as possible along each
branch before backtracking. It uses last in- first-out strategy and hence it is
implemented using a stack.

Example:
Question. Which solution would DFS find to move from node S to node G if run on
the graph below?
Solution. The equivalent search tree for the above graph is as follows. As DFS
traverses the tree “deepest node first”, it would always pick the deeper branch until
it reaches the solution (or it runs out of nodes, and goes to the next branch). The
traversal is shown in blue arrows.

Path: S -> A -> B -> C -> G

= the depth of the search tree = the number of levels of the search tree.
= number of nodes in level .

Time complexity: Equivalent to the number of nodes traversed in

DFS.
Space complexity: Equivalent to how large can the fringe

get.
Completeness: DFS is complete if the search tree is finite, meaning for a given
finite search tree, DFS will come up with a solution if it exists.
Optimality: DFS is not optimal, meaning the number of steps in reaching the
solution, or the cost spent in reaching it is high.
Breadth First Search:
Breadth-first search (BFS) is an algorithm for traversing or searching tree or graph
data structures. It starts at the tree root (or some arbitrary node of a graph,
sometimes referred to as a ‘search key’), and explores all of the neighbor nodes at
the present depth prior to moving on to the nodes at the next depth level. It is
implemented using a queue.

Example:
Question. Which solution would BFS find to move from node S to node G if run on
the graph below?

Solution. The equivalent search tree for the above graph is as follows. As BFS
traverses the tree “shallowest node first”, it would always pick the shallower branch
until it reaches the solution (or it runs out of nodes, and goes to the next branch).
The traversal is shown in blue arrows.

Path: S -> D -> G

= the depth of the shallowest solution.
= number of nodes in level .
Time complexity: Equivalent to the number of nodes traversed in BFS until the
shallowest

solution.
Space complexity: Equivalent to how large can the fringe
get.
Completeness: BFS is complete, meaning for a given search tree, BFS will come up
with a solution if it exists.

Optimality: BFS is optimal as long as the costs of all edges are equal.

Uniform Cost Search:

UCS is different from BFS and DFS because here the costs come into play. In other
words, traversing via different edges might not have the same cost. The goal is to
find a path where the cumulative sum of costs is the least.

Cost of a node is defined as:

cost(node) = cumulative cost of all nodes from root
cost(root) = 0
Example:
Question. Which solution would UCS find to move from node S to node G if run on
the graph below?

Solution. The equivalent search tree for the above graph is as follows. The cost of
each node is the cumulative cost of reaching that node from the root. Based on the
UCS strategy, the path with the least cumulative cost is chosen. Note that due to the
many options in the fringe, the algorithm explores most of them so long as their cost
is low, and discards them when a lower-cost path is found; these discarded
traversals are not shown below. The actual traversal is shown in blue.
Path: S -> A -> B -> G
Cost: 5

Let = cost of solution.

= arcs cost.

Then effective depth

Time complexity: , Space

complexity:
Advantages:
 UCS is complete only if states are finite and there should be no loop with
zero weight.
 UCS is optimal only if there is no negative cost.
Disadvantages:
 Explores options in every “direction”.
 No information on goal location.

Informed Search Algorithms:

Here, the algorithms have information on the goal state, which helps in more
efficient searching. This information is obtained by something called a heuristic.
In this section, we will discuss the following search algorithms.
1. Greedy Search
2. A* Tree Search
3. A* Graph Search
Search Heuristics: In an informed search, a heuristic is a function that estimates
how close a state is to the goal state. For example – Manhattan distance, Euclidean
distance, etc. (Lesser the distance, closer the goal.) Different heuristics are used in
different informed algorithms discussed below.
Greedy Search:

In greedy search, we expand the node closest to the goal node. The “closeness” is
estimated by a heuristic h(x).

Heuristic: A heuristic h is defined as-

h(x) = Estimate of distance of node x from the goal node.
Lower the value of h(x), closer is the node from the goal.

Strategy: Expand the node closest to the goal state, i.e. expand the node with a
lower h value.

Example:
Question. Find the path from S to G using greedy search. The heuristic values h of
each node below the name of the node.

Solution. Starting from S, we can traverse to A(h=9) or D(h=5). We choose D, as it

has the lower heuristic cost. Now from D, we can move to B(h=4) or E(h=3). We
choose E with a lower heuristic cost. Finally, from E, we go to G(h=0). This entire
traversal is shown in the search tree below, in blue.

Path: S -> D -> E -> G

Advantage: Works well with informed search problems, with fewer steps to reach a
goal.
Disadvantage: Can turn into unguided DFS in the worst case.

A* Tree Search:

A* Tree Search, or simply known as A* Search, combines the strengths of uniform-

cost search and greedy search. In this search, the heuristic is the summation of the
cost in UCS, denoted by g(x), and the cost in the greedy search, denoted by h(x).
The summed cost is denoted by f(x).

Heuristic: The following points should be noted wrt heuristics in A*

search.
 Here, h(x) is called the forward cost and is an estimate of the distance of
the current node from the goal node.
 And, g(x) is called the backward cost and is the cumulative cost of a
node from the root node.
 A* search is optimal only when for all nodes, the forward cost for a node
h(x) underestimates the actual cost h*(x) to reach the goal. This property
of A* heuristic is called admissibility.

Admissibility:

Strategy: Choose the node with the lowest f(x) value.

Example:
Question. Find the path to reach from S to G using A* search.

Solution. Starting from S, the algorithm computes g(x) + h(x) for all nodes in the
fringe at each step, choosing the node with the lowest sum. The entire work is shown
in the table below.

Note that in the fourth set of iterations, we get two paths with equal summed cost
f(x), so we expand them both in the next set. The path with a lower cost on further
expansion is the chosen path.

Path h(x) g(x) f(x)

S 7 0 7
S -> A 9 3 12

S -> D 5 2 7

S -> D -> B 4 2+1=3 7

S -> D -> E 3 2+4=6 9

S -> D -> B -> C 2 3+2=5 7

S -> D -> B -> E 3 3+1=4 7

S -> D -> B -> C -> G 0 5+4=9 9

S -> D -> B -> E -> G

0 4+3=7 7

Path: S -> D -> B -> E -> G

Cost: 7

A* Graph Search:
 A* tree search works well, except that it takes time re-exploring the
branches it has already explored. In other words, if the same node has
expanded twice in different branches of the search tree, A* search might
explore both of those branches, thus wasting time
 A* Graph Search, or simply Graph Search, removes this limitation by
adding this rule: do not expand the same node more than once.
 Heuristic. Graph search is optimal only when the forward cost between
two successive nodes A and B, given by h(A) – h (B), is less than or equal
to the backward cost between those two nodes g(A -> B). This property of
the graph search heuristic is called consistency.

Consistency:

Example:
Question. Use graph searches to find paths from S to G in the following graph.
the Solution. We solve this question pretty much the same way we solved last
question, but in this case, we keep a track of nodes explored so that we don’t re-
explore them.

Path: S -> D -> B -> E -> G

Cost: 7

MODULE III SUPERVISED LEARNING 9

Introduction, Different types of learning, Linear regression, Logistic regression,
Gradient Descent: Introduction, Stochastic Gradient Descent, Subgradients, Stochastic
Gradient Descent for risk minimization, Support Vector Machines: Hard SVM, Soft
SVM, Optimality conditions, Duality, Kernel trick, Implementing Soft SVM with Kernels,
Decision Trees: Decision Tree algorithms, Random forests

Gradient Descent is an iterative optimization process that searches for an objective

function’s optimum value (Minimum/Maximum). It is one of the most used methods
for changing a model’s parameters in order to reduce a cost function in machine
learning projects.
The primary goal of gradient descent is to identify the model parameters that provide
the maximum accuracy on both training and test datasets. In gradient descent, the
gradient is a vector pointing in the general direction of the function’s steepest rise at
a particular point. The algorithm might gradually drop towards lower values of the
function by moving in the opposite direction of the gradient, until reaching the
minimum of the function.
Types of Gradient Descent:
Typically, there are three types of Gradient Descent:
1. Batch Gradient Descent
2. Stochastic Gradient Descent
3. Mini-batch Gradient Descent
In this article, we will be discussing Stochastic Gradient Descent (SGD).
Table of Content
 Stochastic Gradient Descent (SGD):
 Stochastic Gradient Descent Algorithm
 Difference between Stochastic Gradient Descent & batch Gradient Descent
 Python Code For Stochastic Gradient Descent
 Stochastic Gradient Descent (SGD) using TensorFLow
 Advantages of Stochastic Gradient Descent
 Disadvantages of Stochastic Gradient Descent
Stochastic Gradient Descent (SGD):
Stochastic Gradient Descent (SGD) is a variant of the Gradient Descent algorithm
that is used for optimizing machine learning models. It addresses the computational
inefficiency of traditional Gradient Descent methods when dealing with large datasets
in machine learning projects.
In SGD, instead of using the entire dataset for each iteration, only a single random
training example (or a small batch) is selected to calculate the gradient and update
the model parameters. This random selection introduces randomness into the
optimization process, hence the term “stochastic” in stochastic Gradient Descent
The advantage of using SGD is its computational efficiency, especially when dealing
with large datasets. By using a single example or a small batch, the computational
cost per iteration is significantly reduced compared to traditional Gradient Descent
methods that require processing the entire dataset.
Stochastic Gradient Descent Algorithm
 Initialization: Randomly initialize the parameters of the model.
 Set Parameters: Determine the number of iterations and the learning rate
(alpha) for updating the parameters.
 Stochastic Gradient Descent Loop: Repeat the following steps until the
model converges or reaches the maximum number of iterations:
o Shuffle the training dataset to introduce randomness.
o Iterate over each training example (or a small batch) in
the shuffled order.
o Compute the gradient of the cost function with respect
to the model parameters using the current training
example (or batch).
o Update the model parameters by taking a step in the
direction of the negative gradient, scaled by the learning
rate.
o Evaluate the convergence criteria, such as the difference
in the cost function between iterations of the gradient.
 Return Optimized Parameters: Once the convergence criteria are met or
the maximum number of iterations is reached, return the optimized model
parameters.
In SGD, since only one sample from the dataset is chosen at random for each
iteration, the path taken by the algorithm to reach the minima is usually noisier than
your typical Gradient Descent algorithm. But that doesn’t matter all that much
because the path taken by the algorithm does not matter, as long as we reach the
minimum and with a significantly shorter training time.
The path taken by Batch Gradient Descent is shown below:

Batch gradient optimization path

A path taken by Stochastic Gradient Descent looks as follows –

stochastic gradient optimization path

One thing to be noted is that, as SGD is generally noisier than typical Gradient
Descent, it usually took a higher number of iterations to reach the minima, because
of the randomness in its descent. Even though it requires a higher number of
iterations to reach the minima than typical Gradient Descent, it is still
computationally much less expensive than typical Gradient Descent. Hence, in most
scenarios, SGD is preferred over Batch Gradient Descent for optimizing a learning
algorithm.
Difference between Stochastic Gradient Descent & batch Gradient Descent
The comparison between Stochastic Gradient Descent (SGD) and Batch Gradient
Descent are as follows:
Stochastic Gradient Descent
Aspect (SGD) Batch Gradient Descent

Uses a single random sample or a

Uses the entire dataset (batch) at
small batch of samples at each
each iteration.
Dataset Usage iteration.
Computationally less expensive Computationally more expensive
Computational per iteration, as it processes per iteration, as it processes the
Efficiency fewer data points. entire dataset.

Faster convergence due to Slower convergence due to less

Convergence frequent updates. frequent updates.

High noise due to frequent

Low noise as it updates
updates with a single or few
parameters using all data points.
Noise in Updates samples.

Less stable as it may oscillate More stable as it converges

Stability around the optimal solution. smoothly towards the optimum.

Requires less memory as it

Requires more memory to hold
Memory processes fewer data points at a
the entire dataset in memory.
Requirement time.

Frequent updates make it

Less frequent updates make it
Update suitable for online learning and
suitable for smaller datasets.
Frequency large datasets.

Initialization Less sensitive to initial parameter More sensitive to initial parameter

Sensitivity values due to frequent updates. values.

Advantages of Stochastic Gradient Descent

 Speed: SGD is faster than other variants of Gradient Descent such as
Batch Gradient Descent and Mini-Batch Gradient Descent since it uses only
one example to update the parameters.
 Memory Efficiency: Since SGD updates the parameters for each training
example one at a time, it is memory-efficient and can handle large
datasets that cannot fit into memory.
 Avoidance of Local Minima: Due to the noisy updates in SGD, it has the
ability to escape from local minima and converges to a global minimum.
Disadvantages of Stochastic Gradient Descent
 Noisy updates: The updates in SGD are noisy and have a high variance,
which can make the optimization process less stable and lead to
oscillations around the minimum.
 Slow Convergence: SGD may require more iterations to converge to the
minimum since it updates the parameters for each training example one at
a time.
 Sensitivity to Learning Rate: The choice of learning rate can be critical in
SGD since using a high learning rate can cause the algorithm to overshoot
the minimum, while a low learning rate can make the algorithm converge
slowly.
 Less Accurate: Due to the noisy updates, SGD may not converge to the
exact global minimum and can result in a suboptimal solution. This can be
mitigated by using techniques such as learning rate scheduling and
momentum-based updates.

MODULE UNSUPERVISED LEARNING 9

IV
Nearest Neighbour: k-nearest neighbour, Curse of dimensionality, Clustering: Linkage-
based clustering algorithms, k-means algorithm, Spectral clustering, Dimensionality
reduction: Principal Component Analysis, Random projections, Compressed sensing

MODULE V COMPUTATIONAL LEARNING THEORY AND DEEP NEURAL 9

NETWORKS
Statistical Learning Framework: PAC learning, Agnostic PAC learning, Bias-complexity
tradeoff, No free lunch theorem, VC dimension, Structural risk minimization, Adaboost,
Foundations of Deep Learning: DNN, CNN, RNN, Autoencoders

Parameter CLASSIFICATION CLUSTERING

Type used for supervised learning used for unsupervised learning

process of classifying the input grouping the instances based on

Basic instances based on their their similarity without the help
corresponding class labels of class labels

it has labels so there is need of

there is no need of training and
Need training and testing dataset for
testing dataset
verifying the model created

more complex as compared to less complex as compared to

Complexity
clustering classification

k-means clustering algorithm,

Logistic regression, Naive Bayes
Example Fuzzy c-means clustering
classifier, Support vector
Algorithms algorithm, Gaussian (EM)
machines, etc.
clustering algorithm, etc.

2. KNN is one of the most basic yet essential classification algorithms in machine
learning. It belongs to the supervised learning domain and finds intense application in
pattern recognition, data mining, and intrusion detection.
It is widely disposable in real-life scenarios since it is non-parametric, meaning it does
not make any underlying assumptions about the distribution of data (as opposed to
other algorithms such as GMM, which assume a Gaussian distribution of the given
data). We are given some prior data (also called training data), which classifies
coordinates into groups identified by an attribute.

3. Clustering consists of grouping certain objects that are similar to each other, it can
be used to decide if two items are similar or dissimilar in their properties. In a Data
Mining sense, the similarity measure is a distance with dimensions describing object
features. That means if the distance among two data points is small then there is
a high degree of similarity among the objects and vice versa. The similarity
is subjective and depends heavily on the context and application.
4. Multiple dimensions are hard to think in, impossible to visualize, and, due to the
exponential growth of the number of possible values with each dimension, complete
enumeration of all subspaces becomes intractable with increasing dimensionality. This
problem is known as the curse of dimensionality.

5. In multivariate statistics, spectral clustering techniques make use of the spectrum

(eigenvalues) of the similarity matrix of the data to perform dimensionality reduction
before clustering in fewer dimensions.

6. The Curse of Dimensionality significantly impacts machine learning algorithms in

various ways. It leads to increased computational complexity, longer training times, and
higher resource requirements.

7. Random projection is a dimension reduction tool. “Projection” means that the

technique projects the data from a high-dimensional space to a lower-dimensional space,
and “Random” means the projection matrix is randomly generated.

8. PAC (Probably Approximately Correct) learning is a framework used for mathematical

analysis. A PAC Learner tries to learn a concept (approximately correct) by selecting a
hypothesis from a set of hypotheses that has a low generalization error.

9. The bias-variance tradeoff implies that as we increase the complexity of a model, its
variance decreases, and its bias increases. Conversely, as we decrease the model's
complexity, its variance increases, but its bias decreases.

10.

Comparison Feed-forward Neural

Attribute Networks Recurrent Neural Networks

Signal flow direction Forward only Bidirectional

Delay introduced No Yes

Complexity Low High

Neuron independence
Yes No
in the same layer

Speed High slow

Pattern recognition, speech Language translation, speech-

Commonly used for recognition, and character to-text conversion, and
recognition robotic control

11. With fast implementation time, the CNN model requires fewer parameters for
training, and model performance is maintained. With faster execution, the DNN model
requires the most parameters for training, but the model performance is compromised
with less accuracy.
12. Artificial intelligence (AI) is the overarching system. Machine learning is a subset of
AI. Deep learning is a subfield of machine learning, and neural networks make up the
backbone of deep learning algorithms.

13.b. The advancements in Data Science and Machine Learning have made it possible
for us to solve several complex regression and classification problems. However, the
performance of all these ML models depends on the data fed to them. Thus, it is
imperative that we provide our ML models with an optimal dataset. Now, one might
think that the more data we provide to our model, the better it becomes – however, it
is not the case. If we feed our model with an excessively large dataset (with a large
no. of features/columns), it gives rise to the problem of overfitting, wherein the model
starts getting influenced by outlier values and noise. This is called the Curse of
Dimensionality.

Dimensionality Reduction is a statistical/ML-based technique wherein we try to

reduce the number of features in our dataset and obtain a dataset with an optimal
number of dimensions.
One of the most common ways to accomplish Dimensionality Reduction is Feature
Extraction, wherein we reduce the number of dimensions by mapping a higher
dimensional feature space to a lower-dimensional feature space. The most popular
technique of Feature Extraction is Principal Component Analysis (PCA)
Dimensionality reduction is a technique that reduces the number of features or
variables in a dataset, while preserving the essential information or structure. It can
help you optimize your model performance by improving the speed, accuracy, and
interpretability of your data analysis.

What is Predictive Modeling: Predictive modeling is a probabilistic process that

allows us to forecast outcomes, on the basis of some predictors. These predictors are
basically features that come into play when deciding the final result, i.e. the outcome
of the model.
Dimensionality reduction is the process of reducing the number of features (or
dimensions) in a dataset while retaining as much information as possible. This can be
done for a variety of reasons, such as to reduce the complexity of a model, to improve
the performance of a learning algorithm, or to make it easier to visualize the data.
There are several techniques for dimensionality reduction, including principal
component analysis (PCA), singular value decomposition (SVD), and linear discriminant
analysis (LDA). Each technique uses a different method to project the data onto a
lower-dimensional space while preserving important information.

14. b. To overcome the curse of dimensionality, you can consider the following
strategies:

Dimensionality Reduction Techniques:

 Feature Selection: Identify and select the most relevant features from the
original dataset while discarding irrelevant or redundant ones. This reduces the
dimensionality of the data, simplifying the model and improving its efficiency.

 Feature Extraction: Transform the original high-dimensional data into a lower-

dimensional space by creating new features that capture the essential
information. Techniques such as Principal Component Analysis (PCA) and t-
distributed Stochastic Neighbor Embedding (t-SNE) are commonly used for
feature extraction.

Data Preprocessing:

 Normalization: Scale the features to a similar range to prevent certain features

from dominating others, especially in distance-based algorithms.

 Handling Missing Values: Address missing data appropriately through imputation

or deletion to ensure robustness in the model training process.

Feature Selection and Dimensionality Reduction

1. Feature Selection: SelectKBest is used to select the top k features based
on a specified scoring function (f_classif in this case). It selects the features
that are most likely to be related to the target variable.
2. Dimensionality Reduction: PCA (Principal Component Analysis) is then
used to further reduce the dimensionality of the selected features. It
transforms the data into a lower-dimensional space while retaining as much
variance as possible.
Training the classifiers
1. Training Before Dimensionality Reduction: Train a Random Forest
classifier (clf_before) on the original scaled features (X_train_scaled)
without dimensionality reduction.
2. Evaluation Before Dimensionality Reduction: Make predictions
(y_pred_before) on the test set (X_test_scaled) using the classifier trained
before dimensionality reduction, and calculate the accuracy
(accuracy_before) of the model.
3. Training After Dimensionality Reduction: Train a new Random Forest
classifier (clf_after) on the reduced feature set (X_train_pca) after
dimensionality reduction.
4. Evaluation After Dimensionality Reduction: Make predictions
(y_pred_after) on the test set (X_test_pca) using the classifier trained after
dimensionality reduction, and calculate the accuracy (accuracy_after) of the
model.
The accuracy before dimensionality reduction is 0.8745, while the accuracy
after dimensionality reduction is 0.9236. This improvement indicates that the
dimensionality reduction technique (PCA in this case) helped the model generalize
better to unseen data.

What are Autoencoders?

Autoencoders are a specialized class of algorithms that can learn efficient
representations of input data with no need for labels. It is a class of artificial neural
networks designed for unsupervised learning. Learning to compress and effectively
represent input data without specific labels is the essential principle of an automatic
decoder. This is accomplished using a two-fold structure that consists of an encoder
and a decoder. The encoder transforms the input data into a reduced-dimensional
representation, which is often referred to as “latent space” or “encoding”. From that
representation, a decoder rebuilds the initial input. For the network to gain meaningful
patterns in data, a process of encoding and decoding facilitates the definition of
essential features.
Architecture of Autoencoder in Deep Learning
The general architecture of an autoencoder includes an encoder, decoder, and
bottleneck layer.
What is Machine Learning?

Machine Learning is a subset of Artificial Intelligence and Deep Learning is an important

part of its’ broader family which includes deep neural networks, deep belief networks,

and recurrent neural networks.² Mainly, in Deep Learning there are three fundamental

architectures of neural network that perform well on different types of data which

are FFNN, RNN, and CNN.

Deep Neural Networks (DNNs)

Deep Neural Networks (DNNs) are typically Feed Forward Networks (FFNNs) in

which data flows from the input layer to the output layer without going backward³ and

the links between the layers are one way which is in the forward direction and they never

touch a node again.

The outputs are obtained by supervised learning with datasets of some information based

on ‘what we want’ through back propagation. Like you go to a restaurant and the chef

gives you an idea about the ingredients of your meal. FFNNs work in the same way as

you will have the flavor of those specific ingredients while eating but just after finishing

your meal you will forget what you have eaten. If the chef gives you the meal of same

ingredients again you can’t recognize the ingredients, you have to start from scratch as

you don’t have any memory of that. But the human brain doesn’t work like that.

Recurrent Neural Network (RNN)

A Recurrent Neural Network (RNN) addresses this issue which is a FFNN with a time

twist. This neural network isn’t stateless, has connections between passes and
connections through time. They are a class of artificial neural network where connections

between nodes form a directed graph along a sequence like features links from a layer to

previous layers, allowing information to flow back into the previous parts of the network

thus each model in the layers depends on past events, allowing information to persist.

In this way, RNNs can use their internal state (memory) to process sequences of inputs.

This makes them applicable to tasks such as unsegmented, connected handwriting

recognition or speech recognition. But they not only work on the information you feed but

also on the related information from the past which means whatever you feed and train

the network matters, like feeding it ‘chicken’ then ‘egg’ may give different output in

comparison to ‘egg’ then ‘chicken’. RNNs also have problems like vanishing (or

exploding) gradient/long-term dependency problem where information rapidly gets lost

over time. Actually, it’s the weight which gets lost when it reaches a value of 0 or 1 000

000, not the neuron. But in this case, the previous state won’t be very informative as it’s

the weight which stores the information from the past.

Long Short Term Memory (LSTM)

Thankfully, breakthroughs like Long Short Term Memory (LSTM) don’t have this

problem! LSTMs are a special kind of RNN, capable of learning long-term dependencies

which make RNN smart at remembering things that have happened in the past and

finding patterns across time to make its next guesses make sense. LSTMs broke records

for improved Machine Translation, Language Modeling and Multilingual Language

Processing.

Convolutional Neural Network (CNN)

Next comes the Convolutional Neural Network (CNN, or ConvNet) which is a class of

deep neural networks which is most commonly applied to analyzing visual imagery. Their

natural language processing. Also, LSTM combined with Convolutional Neural

Networks (CNNs) improved automatic image captioning like those are seen in

Facebook. Thus you can see that RNN is more like helping us in data processing

predicting our next step whereas CNN helps us in visuals analyzing.

Ai Unit I
No ratings yet
Ai Unit I
50 pages
Searching in Problem Solving AI - PPTX - 20240118 - 183824 - 0000
No ratings yet
Searching in Problem Solving AI - PPTX - 20240118 - 183824 - 0000
57 pages
A.I Notes
No ratings yet
A.I Notes
18 pages
Three Day Online Class
No ratings yet
Three Day Online Class
32 pages
Definations
No ratings yet
Definations
14 pages
12 April 7 PG Answers
No ratings yet
12 April 7 PG Answers
7 pages
Unit - 2
No ratings yet
Unit - 2
98 pages
Slide bài giảng nhập môn Robot và Trí tuệ nhân tạo hcmute
No ratings yet
Slide bài giảng nhập môn Robot và Trí tuệ nhân tạo hcmute
177 pages
FAI - Unit 2 - Search
No ratings yet
FAI - Unit 2 - Search
66 pages
Artificial Intelligence: Dr. Hashem Tamimi Palestine Polytechnic University 2024
No ratings yet
Artificial Intelligence: Dr. Hashem Tamimi Palestine Polytechnic University 2024
76 pages
AIML Simp Answers
No ratings yet
AIML Simp Answers
53 pages
Msc. 3 Sem: Unit - 1
No ratings yet
Msc. 3 Sem: Unit - 1
57 pages
Study Material
No ratings yet
Study Material
61 pages
Lectures 2
No ratings yet
Lectures 2
194 pages
Abhishek, 22EBUCS001
No ratings yet
Abhishek, 22EBUCS001
37 pages
An AI System Is Composed of An Agent and Its Environment
No ratings yet
An AI System Is Composed of An Agent and Its Environment
9 pages
AI
No ratings yet
AI
61 pages
AI & ML Unit 1 Notes
No ratings yet
AI & ML Unit 1 Notes
26 pages
UNIT 2. AI Part 1
No ratings yet
UNIT 2. AI Part 1
28 pages
Lecture 2 - Search
No ratings yet
Lecture 2 - Search
13 pages
AI Unit-II
No ratings yet
AI Unit-II
74 pages
EE480 Agents ClassicSearch
No ratings yet
EE480 Agents ClassicSearch
57 pages
#1 Introduction To Artificial Intelligence (AI)
No ratings yet
#1 Introduction To Artificial Intelligence (AI)
25 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
28 pages
Artificial - Intillegence (Imp Questions)
No ratings yet
Artificial - Intillegence (Imp Questions)
58 pages
L03 Problem Solving As Search I
No ratings yet
L03 Problem Solving As Search I
66 pages
Problem Solving
No ratings yet
Problem Solving
19 pages
Solving Problems by Searching & Constraint Satisfaction Problem
No ratings yet
Solving Problems by Searching & Constraint Satisfaction Problem
53 pages
Lesson 04 - Problem Solving by Searching
No ratings yet
Lesson 04 - Problem Solving by Searching
82 pages
1Q: Breadth First Search (BFS) For Artificial Intelligence
No ratings yet
1Q: Breadth First Search (BFS) For Artificial Intelligence
28 pages
Lecture 3 Problem Solving
No ratings yet
Lecture 3 Problem Solving
49 pages
Aifl SPV Unit1
No ratings yet
Aifl SPV Unit1
102 pages
2 (B) Search Agents Uninformed
No ratings yet
2 (B) Search Agents Uninformed
95 pages
Ai&ml Module-2 Notes
No ratings yet
Ai&ml Module-2 Notes
30 pages
AI (Chap 3&4)
No ratings yet
AI (Chap 3&4)
182 pages
AI Unit 2
No ratings yet
AI Unit 2
143 pages
L03 Problem Solving As Search I
No ratings yet
L03 Problem Solving As Search I
59 pages
Ai QB
No ratings yet
Ai QB
15 pages
AI Lecture Nortes: Tim Blackwell October 24, 2016
No ratings yet
AI Lecture Nortes: Tim Blackwell October 24, 2016
27 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
46 pages
Search Algorithms in AI
No ratings yet
Search Algorithms in AI
7 pages
Cpts 440 / 540 Artificial Intelligence: Search
No ratings yet
Cpts 440 / 540 Artificial Intelligence: Search
182 pages
Chapter 3
No ratings yet
Chapter 3
39 pages
CSE422 Notes
No ratings yet
CSE422 Notes
28 pages
Unit 1
No ratings yet
Unit 1
42 pages
Assignment - Ai
No ratings yet
Assignment - Ai
11 pages
AIML Problem Solving Agents Module 2
No ratings yet
AIML Problem Solving Agents Module 2
11 pages
Search Algorithms in AI: Last Updated: 22 Mar, 2023
No ratings yet
Search Algorithms in AI: Last Updated: 22 Mar, 2023
17 pages
Search
No ratings yet
Search
11 pages
Chapter 3
No ratings yet
Chapter 3
52 pages
AI Kuliah 03 (Uninformed Search) (New) - 2
No ratings yet
AI Kuliah 03 (Uninformed Search) (New) - 2
63 pages
Artificial Intelligence Notes
No ratings yet
Artificial Intelligence Notes
18 pages
AI Chapter2
No ratings yet
AI Chapter2
107 pages
Data Mining and Cryptograph Notes For Computer Science
No ratings yet
Data Mining and Cryptograph Notes For Computer Science
58 pages
AI Unit 2
No ratings yet
AI Unit 2
25 pages
Lec 02
No ratings yet
Lec 02
96 pages
Unit I
No ratings yet
Unit I
18 pages
Lecture 0 Notes
No ratings yet
Lecture 0 Notes
11 pages
Breadth First Search: Fundamentals and Applications
From Everand
Breadth First Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
AI Algorithms: Foundations, Applications, and Advancements
From Everand
AI Algorithms: Foundations, Applications, and Advancements
Anand Vemula
No ratings yet
Exploration of The Effectiveness of Expectation Maximization Algorithm For Suspicious Transaction Detection in Anti-Money Laundering
No ratings yet
Exploration of The Effectiveness of Expectation Maximization Algorithm For Suspicious Transaction Detection in Anti-Money Laundering
5 pages
AI-Driven IoT-Enabled UAV Inspection Framework For Predictive Maintenance and Sustainable Operations in Desalination Plants
No ratings yet
AI-Driven IoT-Enabled UAV Inspection Framework For Predictive Maintenance and Sustainable Operations in Desalination Plants
19 pages
Artificial Neural Network A Study
No ratings yet
Artificial Neural Network A Study
6 pages
Recent Research On AI in Games
No ratings yet
Recent Research On AI in Games
47 pages
MEERA Reservoir Simulation Software Introduction
No ratings yet
MEERA Reservoir Simulation Software Introduction
19 pages
CCW332 - DM Lab-1
No ratings yet
CCW332 - DM Lab-1
31 pages
Future of Automation
No ratings yet
Future of Automation
22 pages
AIFA 1 Introduction 020124
No ratings yet
AIFA 1 Introduction 020124
32 pages
Data Science Interview Questions: Answer Here
No ratings yet
Data Science Interview Questions: Answer Here
54 pages
Data Analytics - Object Segmentation UNIT-IV
No ratings yet
Data Analytics - Object Segmentation UNIT-IV
34 pages
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
No ratings yet
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
34 pages
Fake News Detection - Report
100% (1)
Fake News Detection - Report
59 pages
Unit - 5
No ratings yet
Unit - 5
24 pages
Agentic Retrieval-Augmented Generation For Time Series Analysis
No ratings yet
Agentic Retrieval-Augmented Generation For Time Series Analysis
14 pages
Prediction of Insurance Fraud Detection Using Machine Learning Algorithms
No ratings yet
Prediction of Insurance Fraud Detection Using Machine Learning Algorithms
8 pages
Statistics Science Vs Data Science
No ratings yet
Statistics Science Vs Data Science
11 pages
Emotion Detection of Autistic Children Using Support Vector Machine: A Review
100% (2)
Emotion Detection of Autistic Children Using Support Vector Machine: A Review
4 pages
Final Year Project Ideas!
No ratings yet
Final Year Project Ideas!
8 pages
ICT583 Assignment 1
No ratings yet
ICT583 Assignment 1
4 pages
Optimized Security System With Real-Time Operations
No ratings yet
Optimized Security System With Real-Time Operations
2 pages
PAQS 2019 Full Paper R2
No ratings yet
PAQS 2019 Full Paper R2
17 pages
An AIoT-based System For Real-Time Monitoring of Tunnel Construction
No ratings yet
An AIoT-based System For Real-Time Monitoring of Tunnel Construction
12 pages
Jitesh - ML - 5yr - Samsung - Jitesh Kumar Dewangan
No ratings yet
Jitesh - ML - 5yr - Samsung - Jitesh Kumar Dewangan
1 page
Machine Learning: BE Sixth Semester 20CS610
No ratings yet
Machine Learning: BE Sixth Semester 20CS610
211 pages
Digital Business & Innovation Assignment 1
No ratings yet
Digital Business & Innovation Assignment 1
12 pages
Protecting Privacy in The Era of Artificial Intelligence: Keyur Tripathi & Usama Mubarak
No ratings yet
Protecting Privacy in The Era of Artificial Intelligence: Keyur Tripathi & Usama Mubarak
7 pages
Gartner Reprint CFM
No ratings yet
Gartner Reprint CFM
3 pages
Artificial Neural Network Using Python
No ratings yet
Artificial Neural Network Using Python
3 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
Face Liveness Detection - Saketh
No ratings yet
Face Liveness Detection - Saketh
13 pages