AIML
AIML
The search algorithms in this section have no additional information on the goal node
other than the one provided in the problem definition. The plans to reach the goal
state from the start state differ only by the order and/or length of actions.
Uninformed search is also called Blind search. These algorithms can only generate
the successors and differentiate between the goal state and non goal state.
Example:
Question. Which solution would DFS find to move from node S to node G if run on
the graph below?
Solution. The equivalent search tree for the above graph is as follows. As DFS
traverses the tree “deepest node first”, it would always pick the deeper branch until
it reaches the solution (or it runs out of nodes, and goes to the next branch). The
traversal is shown in blue arrows.
= the depth of the search tree = the number of levels of the search tree.
= number of nodes in level .
DFS.
Space complexity: Equivalent to how large can the fringe
get.
Completeness: DFS is complete if the search tree is finite, meaning for a given
finite search tree, DFS will come up with a solution if it exists.
Optimality: DFS is not optimal, meaning the number of steps in reaching the
solution, or the cost spent in reaching it is high.
Breadth First Search:
Breadth-first search (BFS) is an algorithm for traversing or searching tree or graph
data structures. It starts at the tree root (or some arbitrary node of a graph,
sometimes referred to as a ‘search key’), and explores all of the neighbor nodes at
the present depth prior to moving on to the nodes at the next depth level. It is
implemented using a queue.
Example:
Question. Which solution would BFS find to move from node S to node G if run on
the graph below?
Solution. The equivalent search tree for the above graph is as follows. As BFS
traverses the tree “shallowest node first”, it would always pick the shallower branch
until it reaches the solution (or it runs out of nodes, and goes to the next branch).
The traversal is shown in blue arrows.
solution.
Space complexity: Equivalent to how large can the fringe
get.
Completeness: BFS is complete, meaning for a given search tree, BFS will come up
with a solution if it exists.
Optimality: BFS is optimal as long as the costs of all edges are equal.
UCS is different from BFS and DFS because here the costs come into play. In other
words, traversing via different edges might not have the same cost. The goal is to
find a path where the cumulative sum of costs is the least.
Solution. The equivalent search tree for the above graph is as follows. The cost of
each node is the cumulative cost of reaching that node from the root. Based on the
UCS strategy, the path with the least cumulative cost is chosen. Note that due to the
many options in the fringe, the algorithm explores most of them so long as their cost
is low, and discards them when a lower-cost path is found; these discarded
traversals are not shown below. The actual traversal is shown in blue.
Path: S -> A -> B -> G
Cost: 5
complexity:
Advantages:
UCS is complete only if states are finite and there should be no loop with
zero weight.
UCS is optimal only if there is no negative cost.
Disadvantages:
Explores options in every “direction”.
No information on goal location.
Here, the algorithms have information on the goal state, which helps in more
efficient searching. This information is obtained by something called a heuristic.
In this section, we will discuss the following search algorithms.
1. Greedy Search
2. A* Tree Search
3. A* Graph Search
Search Heuristics: In an informed search, a heuristic is a function that estimates
how close a state is to the goal state. For example – Manhattan distance, Euclidean
distance, etc. (Lesser the distance, closer the goal.) Different heuristics are used in
different informed algorithms discussed below.
Greedy Search:
In greedy search, we expand the node closest to the goal node. The “closeness” is
estimated by a heuristic h(x).
Strategy: Expand the node closest to the goal state, i.e. expand the node with a
lower h value.
Example:
Question. Find the path from S to G using greedy search. The heuristic values h of
each node below the name of the node.
A* Tree Search:
search.
Here, h(x) is called the forward cost and is an estimate of the distance of
the current node from the goal node.
And, g(x) is called the backward cost and is the cumulative cost of a
node from the root node.
A* search is optimal only when for all nodes, the forward cost for a node
h(x) underestimates the actual cost h*(x) to reach the goal. This property
of A* heuristic is called admissibility.
Admissibility:
Example:
Question. Find the path to reach from S to G using A* search.
Solution. Starting from S, the algorithm computes g(x) + h(x) for all nodes in the
fringe at each step, choosing the node with the lowest sum. The entire work is shown
in the table below.
Note that in the fourth set of iterations, we get two paths with equal summed cost
f(x), so we expand them both in the next set. The path with a lower cost on further
expansion is the chosen path.
S 7 0 7
S -> A 9 3 12
S -> D 5 2 7
0 4+3=7 7
A* Graph Search:
A* tree search works well, except that it takes time re-exploring the
branches it has already explored. In other words, if the same node has
expanded twice in different branches of the search tree, A* search might
explore both of those branches, thus wasting time
A* Graph Search, or simply Graph Search, removes this limitation by
adding this rule: do not expand the same node more than once.
Heuristic. Graph search is optimal only when the forward cost between
two successive nodes A and B, given by h(A) – h (B), is less than or equal
to the backward cost between those two nodes g(A -> B). This property of
the graph search heuristic is called consistency.
Consistency:
Example:
Question. Use graph searches to find paths from S to G in the following graph.
the Solution. We solve this question pretty much the same way we solved last
question, but in this case, we keep a track of nodes explored so that we don’t re-
explore them.
One thing to be noted is that, as SGD is generally noisier than typical Gradient
Descent, it usually took a higher number of iterations to reach the minima, because
of the randomness in its descent. Even though it requires a higher number of
iterations to reach the minima than typical Gradient Descent, it is still
computationally much less expensive than typical Gradient Descent. Hence, in most
scenarios, SGD is preferred over Batch Gradient Descent for optimizing a learning
algorithm.
Difference between Stochastic Gradient Descent & batch Gradient Descent
The comparison between Stochastic Gradient Descent (SGD) and Batch Gradient
Descent are as follows:
Stochastic Gradient Descent
Aspect (SGD) Batch Gradient Descent
2. KNN is one of the most basic yet essential classification algorithms in machine
learning. It belongs to the supervised learning domain and finds intense application in
pattern recognition, data mining, and intrusion detection.
It is widely disposable in real-life scenarios since it is non-parametric, meaning it does
not make any underlying assumptions about the distribution of data (as opposed to
other algorithms such as GMM, which assume a Gaussian distribution of the given
data). We are given some prior data (also called training data), which classifies
coordinates into groups identified by an attribute.
3. Clustering consists of grouping certain objects that are similar to each other, it can
be used to decide if two items are similar or dissimilar in their properties. In a Data
Mining sense, the similarity measure is a distance with dimensions describing object
features. That means if the distance among two data points is small then there is
a high degree of similarity among the objects and vice versa. The similarity
is subjective and depends heavily on the context and application.
4. Multiple dimensions are hard to think in, impossible to visualize, and, due to the
exponential growth of the number of possible values with each dimension, complete
enumeration of all subspaces becomes intractable with increasing dimensionality. This
problem is known as the curse of dimensionality.
9. The bias-variance tradeoff implies that as we increase the complexity of a model, its
variance decreases, and its bias increases. Conversely, as we decrease the model's
complexity, its variance increases, but its bias decreases.
10.
Neuron independence
Yes No
in the same layer
11. With fast implementation time, the CNN model requires fewer parameters for
training, and model performance is maintained. With faster execution, the DNN model
requires the most parameters for training, but the model performance is compromised
with less accuracy.
12. Artificial intelligence (AI) is the overarching system. Machine learning is a subset of
AI. Deep learning is a subfield of machine learning, and neural networks make up the
backbone of deep learning algorithms.
13.b. The advancements in Data Science and Machine Learning have made it possible
for us to solve several complex regression and classification problems. However, the
performance of all these ML models depends on the data fed to them. Thus, it is
imperative that we provide our ML models with an optimal dataset. Now, one might
think that the more data we provide to our model, the better it becomes – however, it
is not the case. If we feed our model with an excessively large dataset (with a large
no. of features/columns), it gives rise to the problem of overfitting, wherein the model
starts getting influenced by outlier values and noise. This is called the Curse of
Dimensionality.
14. b. To overcome the curse of dimensionality, you can consider the following
strategies:
Feature Selection: Identify and select the most relevant features from the
original dataset while discarding irrelevant or redundant ones. This reduces the
dimensionality of the data, simplifying the model and improving its efficiency.
Data Preprocessing:
part of its’ broader family which includes deep neural networks, deep belief networks,
and recurrent neural networks.² Mainly, in Deep Learning there are three fundamental
architectures of neural network that perform well on different types of data which
Deep Neural Networks (DNNs) are typically Feed Forward Networks (FFNNs) in
which data flows from the input layer to the output layer without going backward³ and
the links between the layers are one way which is in the forward direction and they never
The outputs are obtained by supervised learning with datasets of some information based
on ‘what we want’ through back propagation. Like you go to a restaurant and the chef
gives you an idea about the ingredients of your meal. FFNNs work in the same way as
you will have the flavor of those specific ingredients while eating but just after finishing
your meal you will forget what you have eaten. If the chef gives you the meal of same
ingredients again you can’t recognize the ingredients, you have to start from scratch as
you don’t have any memory of that. But the human brain doesn’t work like that.
A Recurrent Neural Network (RNN) addresses this issue which is a FFNN with a time
twist. This neural network isn’t stateless, has connections between passes and
connections through time. They are a class of artificial neural network where connections
between nodes form a directed graph along a sequence like features links from a layer to
previous layers, allowing information to flow back into the previous parts of the network
thus each model in the layers depends on past events, allowing information to persist.
In this way, RNNs can use their internal state (memory) to process sequences of inputs.
recognition or speech recognition. But they not only work on the information you feed but
also on the related information from the past which means whatever you feed and train
the network matters, like feeding it ‘chicken’ then ‘egg’ may give different output in
comparison to ‘egg’ then ‘chicken’. RNNs also have problems like vanishing (or
over time. Actually, it’s the weight which gets lost when it reaches a value of 0 or 1 000
000, not the neuron. But in this case, the previous state won’t be very informative as it’s
Thankfully, breakthroughs like Long Short Term Memory (LSTM) don’t have this
problem! LSTMs are a special kind of RNN, capable of learning long-term dependencies
which make RNN smart at remembering things that have happened in the past and
finding patterns across time to make its next guesses make sense. LSTMs broke records
Processing.
Next comes the Convolutional Neural Network (CNN, or ConvNet) which is a class of
deep neural networks which is most commonly applied to analyzing visual imagery. Their
Networks (CNNs) improved automatic image captioning like those are seen in
Facebook. Thus you can see that RNN is more like helping us in data processing