0% found this document useful (0 votes)
15 views25 pages

Long ML

The summary provides an overview of a question bank from GITA Autonomous College related to the subject of Machine Learning for the 5th semester of a B.Tech in Computer Science and Engineering. The question bank includes questions about k-mode clustering, DBSCAN clustering, dimensionality reduction using PCA, terms in reinforcement learning, approaches to reinforcement learning, and an explanation of reinforcement learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views25 pages

Long ML

The summary provides an overview of a question bank from GITA Autonomous College related to the subject of Machine Learning for the 5th semester of a B.Tech in Computer Science and Engineering. The question bank includes questions about k-mode clustering, DBSCAN clustering, dimensionality reduction using PCA, terms in reinforcement learning, approaches to reinforcement learning, and an explanation of reinforcement learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

GITA AUTONOMOUS COLLEGE, BHUBANESWAR

(Affiliated to BPUT, Odisha)

QUESTION BANK
B.TECH- Computer Science and Engineering (5 sem) Subject:-
th

Machine Learning

10 Explain k-mode clustering with a suitable example.

KModes clustering is one of the unsupervised Machine Learning algorithms that is used to
cluster categorical variables.

You might be wondering, why KModes clustering when we already have KMeans.

KMeans uses mathematical measures (distance) to cluster continuous data. The lesser the dist
more similar our data points are. Centroids are updated by Means.
But for categorical data points, we cannot calculate the distance. So we go for KModes algorith
the dissimilarities(total mismatches) between the data points. The lesser the dissimilarities the
similar our data points are. It uses Modes instead of means.
11 Explain DBSCAN clustering algorithm.
Density-Based Clustering Algorithms

Density-Based Clustering refers to unsupervised learning methods that identify distinc


groups/clusters in the data, based on the idea that a cluster in data space is a contiguou
high point density, separated from other such clusters by contiguous regions of low poin
Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a base algorithm
density-based clustering. It can discover clusters of different shapes and sizes from a lar
of data, which is containing noise and outliers.

The DBSCAN algorithm uses two parameters:

• minPts: The minimum number of points (a threshold) clustered together for a r


considered dense.
• eps (ε): A distance measure that will be used to locate the points in the neighbo
any point.
These parameters can be understood if we explore two concepts called Density Reachab
Density Connectivity.

Reachability in terms of density establishes a point to be reachable from another if it li


particular distance (eps) from it.
Connectivity, on the other hand, involves a transitivity based chaining-approach to dete
whether points are located in a particular cluster. For example, p and q points could be
if p->r->s->t->q, where a->b means b is in the neighborhood of a.
There are three types of points after the DBSCAN clustering is complete:
• Core — This is a point that has at least m points within distance n from itself.
• Border — This is a point that has at least one Core point at a distance n.
• Noise — This is a point that is neither a Core nor a Border. And it has less than m
within distance n from itself.

12 What is dimensionality reduction? Write PCA algorithm.

What is Dimensionality Reduction?


The number of input features, variables, or columns present in a given dataset is known as dimension
process to reduce these features is called dimensionality reduction.

A dataset contains a huge number of input features in various cases, which makes the predictive model
complicated. Because it is very difficult to visualize or make predictions for the training dataset with a hi
features, for such cases, dimensionality reduction techniques are required to use.

Dimensionality reduction technique can be defined as, "It is a way of converting the higher dimensions
lesser dimensions dataset ensuring that it provides similar information." These techniques are
in machine learning for obtaining a better fit predictive model while solving the classification and regress

It is commonly used in the fields that deal with high-dimensional data, such as speech recognition, signa
bioinformatics, etc. It can also be used for data visualization, noise reduction, cluster analysis, etc.

Steps for PCA algorithm


1. Getting the
Firstly, we need to take the input dataset and divide it into two subparts X and Y, where X is the training s
validation set.
2. Representing data into a
Now we will represent our dataset into a structure. Such as we will represent the two-dimensional matrix o
variable X. Here each row corresponds to the data items, and the column corresponds to the Features. T
columns is the dimensions of the dataset.
3. Standardizing the
In this step, we will standardize our dataset. Such as in a particular column, the features with high vari
important compared to the features with lower
If the importance of features is independent of the variance of the feature, then we will divide each data ite
with the standard deviation of the column. Here we will name the matrix as Z.
4. Calculating the Covariance of
To calculate the covariance of Z, we will take the matrix Z, and will transpose it. After transpose, we will m
The output matrix will be the Covariance matrix of Z.
5. Calculating the Eigen Values and Eigen
Now we need to calculate the eigenvalues and eigenvectors for the resultant covariance matrix Z. Eigen
covariance matrix are the directions of the axes with high information. And the coefficients of these ei
defined as the eigenvalues.
6. Sorting the Eigen
In this step, we will take all the eigenvalues and will sort them in decreasing order, which means from larg
And simultaneously sort the eigenvectors accordingly in matrix P of eigenvalues. The resultant matrix will b
7. Calculating the new features Or Principal
Here we will calculate the new features. To do this, we will multiply the P* matrix to the Z. In the resultant
observation is the linear combination of original features. Each column of the Z* matrix is independent of e
8. Remove less or unimportant features from the new
The new feature set has occurred, so we will decide here what to keep and what to remove. It means, we
the relevant or important features in the new dataset, and unimportant features will be removed out.

13 Explain different terms used in Reinforcement learning.

Terms used in Reinforcement Learning


o Agent(): An entity that can perceive/explore the environment and act upon it.
o Environment(): A situation in which an agent is present or surrounded by. In RL, we assume t
environment, which means it is random in nature.
o Action(): Actions are the moves taken by an agent within the environment.
o State(): State is a situation returned by the environment after each action taken by the agent.
o Reward(): A feedback returned to the agent from the environment to evaluate the action of the a
o Policy(): Policy is a strategy applied by the agent for the next action based on the current state.
o Value(): It is expected long-term retuned with the discount factor and opposite to the short-term
o Q-value(): It is mostly similar to the value, but it takes one additional parameter as a current actio

14 Explain the approaches of Reinforcement learning.

There are mainly three ways to implement reinforcement-learning in ML, which are:

1. Value-based:
The value-based approach is about to find the optimal value function, which is the maximum va
under any policy. Therefore, the agent expects the long-term return at any state(s) under policy π
2. Policy-based:
Policy-based approach is to find the optimal policy for the maximum future rewards without us
function. In this approach, the agent tries to apply such a policy that the action performed in each
maximize the future
The policy-based approach has mainly two types of policy:
o Deterministic: The same action is produced by the policy (π) at any state.
o Stochastic: In this policy, probability determines the produced action.
3. Model-based: In the model-based approach, a virtual model is created for the environment, a
explores that environment to learn it. There is no particular solution or algorithm for this approach
model representation is different for each environment.

15 What is reinforcement learning explain its detailed concepts.


o Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns
an environment by performing the actions and seeing the results of actions. For each good acti
gets positive feedback, and for each bad action, the agent gets negative feedback or penalty.
o In Reinforcement Learning, the agent learns automatically using feedbacks without any
unlike supervised learning.
o Since there is no labeled data, so the agent is bound to learn by its experience only.
o RL solves a specific type of problem where decision making is sequential, and the goal is lon
as game-playing, robotics, etc.
o The agent interacts with the environment and explores it by itself. The primary goal of an agent in r
learning is to improve the performance by getting the maximum positive rewards.
o The agent learns with the process of hit and trial, and based on the experience, it learns to perfor
a better way. Hence, we can say that "Reinforcement learning is a type of machine learning m
an intelligent agent (computer program) interacts with the environment and learns t
that." How a Robotic dog learns the movement of his arms is an example of Reinforcement learn
o It is a core part of Artificial intelligence, and all AI agent works on the concept of reinforcement l
we do not need to pre-program the agent, as it learns from its own experience without any human
o Example: Suppose there is an AI agent present within a maze environment, and his goal is to find
The agent interacts with the environment by performing some actions, and based on those action
the agent gets changed, and it also receives a reward or penalty as feedback.
o The agent continues doing these three things (take action, change state/remain in the same s
feedback), and by doing these actions, he learns and explores the environment.
o The agent learns that what actions lead to positive feedback or rewards and what actions lead
feedback penalty. As a positive reward, the agent gets a positive point, and as a penalty, it gets a n

AD

Terms used in Reinforcement Learning


o Agent(): An entity that can perceive/explore the environment and act upon it.
o Environment(): A situation in which an agent is present or surrounded by. In RL, we assume t
environment, which means it is random in nature.
o Action(): Actions are the moves taken by an agent within the environment.
o State(): State is a situation returned by the environment after each action taken by the agent.
o Reward(): A feedback returned to the agent from the environment to evaluate the action of the a
o Policy(): Policy is a strategy applied by the agent for the next action based on the current state.
o Value(): It is expected long-term retuned with the discount factor and opposite to the short-term
o Q-value(): It is mostly similar to the value, but it takes one additional parameter as a current actio

16 Explain the concept of the Bellman equation in reinforcement learning with an example.
According to the Bellman Equation, long-term- reward in a given action is equal to the rew
the current action combined with the expected reward from the future actions taken at th
time. Let’s try to understand first.
Let’s take an example:
Here we have a maze which is our environment and the sole goal of our agent is to reach the
state (R = 1) or to get Good reward and to avoid the fire state because it will be a failure (R
get Bad reward.

Fig: Without Bellman Equation

What happens without Bellman Equation?


Initially, we will give our agent some time to explore the environment and let it figure out a path
goal. As soon as it reaches its goal, it will back trace its steps back to its starting position and
values of all the states which eventually leads towards the goal as V = 1.
The agent will face no problem until we change its starting position, as it will not be able t
path towards the trophy state since the value of all the states is equal to 1. So, to solve this pr
should use Bellman Equation:

V(s)=maxa(R(s,a)+ γV(s’))

State(s): current state where the agent is in the environment


Next State(s’): After taking action(a) at state(s) the agent reaches s’
Value(V): Numeric representation of a state which helps the agent to find its path. V(s) here m
value of the state s.
Reward(R): treat which the agent gets after performing an action(a).
• R(s): reward for being in the state s
• R(s,a): reward for being in the state and performing an action a
• R(s,a,s’): reward for being in a state s, taking an action a and ending up in s’
e.g. Good reward can be +1, Bad reward can be -1, No reward can be 0.
Action(a): set of possible actions that can be taken by the agent in the state(s). e.g.
(LEFT, RIGHT, UP, DOWN)
Discount factor(γ): determines how much the agent cares about rewards in the distant future
those in the immediate future. It has a value between 0 and 1. Lower value encourages shor
term rewards while higher value promises long-term reward

Fig: Using Bellman Equation

The max denotes the most optimum action among all the actions that the agent can take in a
state which can lead to the reward after repeating this process every consecutive step.
For example:
• The state left to the fire state (V = 0.9) can go UP, DOWN, RIGHT but NOT LEFT be
wall(not accessible). Among all these actions available the maximum value for that s
the UP action.
• The current starting state of our agent can choose any random action UP or RIGHT
lead towards the reward with the same number of steps.
By using the Bellman equation our agent will calculate the value of every step except for the t
the fire state (V = 0), they cannot have values since they are the end of the maze.
So, after making such a plan our agent can easily accomplish its goal by just following the incr
values.

17 Explain different types of activation function used in Artificial Neural Networks.


What is Activation Function?

It’s just a thing function that you use to get the output of node. It is also known
as Transfer Function.

Why we use Activation functions with Neural Networks?

It is used to determine the output of neural network like yes or no. It maps the
values in between 0 to 1 or -1 to 1 etc. (depending upon the function).

The Activation Functions can be basically divided into 2 types-

1. Linear Activation Function

2. Non-linear Activation Functions

Linear or Identity Activation Function

As you can see the function is a line or linear. Therefore, the output of the functions
confined between any range.
Fig: Linear Activation Function

Equation : f(x) = x

Range : (-infinity to infinity)

It doesn’t help with the complexity or various parameters of usual data that is fed to
networks.

Non-linear Activation Function

The Nonlinear Activation Functions are the most used activation functions. Nonline
to makes the graph look something like this
Fig: Non-linear Activation Function

It makes it easy for the model to generalize or adapt with variety of data and to diffe
between the output.

The main terminologies needed to understand for nonlinear functions are:

Derivative or Differential: Change in y-axis w.r.t. change in x-axis.It is al


as slope.

Monotonic function: A function which is either entirely non-increasing or


decreasing.
18 Explain the concept of CNN with an example.
Convolutional Neural Network Convolutional Neural Networks are a special kind of neural network mainly used for image c
clustering of images and object recognition. DNNs enable unsupervised construction of hierarchical image representations.
best accuracy, deep convolutional neural networks are preferred more than any other neural network.
19 Explain the concept of RNN with an example.
. Recurrent Neural Network
Recurrent neural networks are yet another variation of feed-forward networks. Here each of the neurons present in the hid
receives an input with a specific delay in time. The Recurrent neural network mainly accesses the preceding info of existing
example, to guess the succeeding word in any sentence, one must have knowledge about the words that were previously u
processes the inputs but also shares the length as well as weights crossways time. It does not let the size of the model to in
increase in the input size. However, the only problem with this recurrent neural network is that it has slow computational s
it does not contemplate any future input for the current state. It has a problem with reminiscing prior information.

Applications:
o Machine Translation
o Robot Control
o Time Series Prediction
o Speech Recognition
o Speech Synthesis
o Time Series Anomaly Detection
o Rhythm Learning
o Music Composition
20 Explain different methods of ensemble learning.
There are 3 most common ensemble learning methods in machine learning. These are as follows:
o Bagging
o Boosting
o Stacking
However, we will mainly discuss Stacking on this topic.
1. Bagging
Bagging is a method of ensemble modeling, which is primarily used to solve supervised machine learning problems. It i
completed in two steps as follows: o Bootstrapping: It is a random sampling method that is used to derive samples from
using the replacement procedure. In this method, first, random data samples are fed to the primary model, and then a
algorithm is run on the samples to complete the learning process. o Aggregation: This is a step that involves the proces
the output of all base models and, based on their output, predicting an aggregate result with greater accuracy and red
Example: In the Random Forest method, predictions from multiple decision trees are ensembled parallelly. Further, in
problems, we use an average of these predictions to get the final output, whereas, in classification problems, the mod
the predicted class.
2. Boosting
Boosting is an ensemble method that enables each member to learn from the preceding member's mistakes and make
predictions for the future. Unlike the bagging method, in boosting, all base learners (weak) are arranged in a sequentia
they can learn from the mistakes of their preceding learner. Hence, in this way, all weak learners get turned into strong
make a better predictive model with significantly improved performance.
3. Stacking
Stacking is one of the popular ensemble modeling techniques in machine learning. Various weak learners are ensembl
manner in such a way that by combining them with Meta learners, we can predict better predictions for the future.

You might also like