Aiml Ut2 QB Solution
Aiml Ut2 QB Solution
1. Why is feature engineering important in model building? List out some of the
techniques used for feature engineering.
Answer:
i. Imputation: A typical problem in machine learning is missing values in the data sets, which
affects the way machine learning algorithms Imputation is the process of replacing missing
data with statistical estimates of the missing values, which produces a complete data set to
use to train machine learning models.
ii. One-hot encoding: A process by which categorical data is converted into a form that the
machine learning algorithm understands so it can make better predictions.
iii. Bag of words: A counting algorithm that calculates how many times a word is repeated in a
document. It can be used to determine similarities and differences in documents for such
applications as search and document classification.
iv. Automated feature engineering: This technique pulls out useful and meaningful features
using a framework that can be applied to any problem. Automated feature engineering
enables data scientists to be more productive by allowing them to spend more time on other
components of machine learning. This technique also allows citizen data scientists to do
feature engineering using a framework-based approach.
v. Binning: Binning, or grouping data, is key to preparing numerical data for machine learning.
This technique can be used to replace a column of numbers with categorical values
representing specific ranges.
vi. N-grams: Help predict the next item in a sequence. In sentiment analysis, the n-gram model
helps analyze the sentiment of the text or document.
vii. Feature crosses: A way to combine two or more categorical features into one. This
technique is particularly useful when certain features together denote a property better than
they do by themselves.
AI/ML UT2 QB SOLUTION
2. Why is an exploratory data analysis important? What are the components of EDA?
Answer:
EDA makes it simple to comprehend the structure of a dataset, making data modelling easier.
The primary goal of EDA is to make data ‘clean’ implying that it should be devoid of
redundancies. It aids in identifying incorrect data points so that they may be readily removed and
the data cleaned. Furthermore, it aids us in comprehending the relationship between the
variables, providing us with a broader view of the data and allowing us to expand on it by
leveraging the relationship between the variables. It also aids in the evaluation of the dataset’s
statistical measurements.
Outliers or abnormal occurrences in a dataset can have an impact on the accuracy of machine
learning models. The dataset might also contain some missing or duplicate values. EDA may be
used to eliminate or resolve all of the dataset’s undesirable qualities.
i. Data Collection: Data collection is an essential part of exploratory data analysis. It refers
to the process of finding and loading data into our system. Good, reliable data can be
found on various public sites or bought from private organizations. Some reliable sites for
data collection are Kaggle, Github, Machine Learning Repository, etc.
ii. Data Cleaning: Data cleaning refers to the process of removing unwanted variables and
values from your dataset and getting rid of any irregularities in it. Such anomalies can
disproportionately skew the data and hence adversely affect the results. Some steps that
can be done to clean data are:
Removing missing values, outliers, and unnecessary rows/ columns.
Re-indexing and reformatting our data.
iii. Univariate Analysis: In Univariate Analysis, you analyze data of just one variable. A
variable in your dataset refers to a single feature/ column. You can do this either with
graphical or non-graphical means by finding specific mathematical values in the data.
Some visual methods include:
Histograms: The frequency of data is represented with rectangle bars.
Box-plots: Here the information is represented in the form of boxes.
iv. Bivariate Analysis: Here, you use two variables and compare them. This way, you can
find how one feature affects the other. It is done with scatter plots, which plot individual
data points or correlation matrices that plot the correlation in hues. You can also use
boxplots.
AI/ML UT2 QB SOLUTION
3. What are the various methods to plot the dataset?
Answer:
i. Bar Graph: A bar graph is a graph that presents categorical data with rectangle-shaped
bars. The heights or lengths of these bars are proportional to the values that they
represent. The bars can be vertical or horizontal. A vertical bar graph is sometimes called
a column graph.
ii. Line Graph: It displays a sequence of data points as markers. The points are ordered
typically by their x-axis value. These points are joined with straight line segments. A line
graph is used to visualize a trend in data over intervals of time.
iii. Pie Chart: A pie chart is a circular statistical graphic. To illustrate numerical proportion,
it is divided into slices. In a pie chart, for every slice, each of its arc lengths is
proportional to the amount it represents. The central angles, and area are also
proportional. It is named after a sliced pie.
iv. Histogram: A histogram is an approximate representation of the distribution of
numerical data. The data is divided into non-overlapping intervals called bins and
buckets. A rectangle is erected over a bin whose height is proportional to the number of
data points in the bin. Histograms give a feel of the density of the distribution of the
underlying data.
v. Area Chart: It is represented by the area between the lines and the axis. The area is
proportional to the amount it represents.
vi. Dot Graph: A dot graph consists of data points plotted as dots on a graph. There are two
types of these:
The Wilkinson Dot Graph: In this dot graph, the local displacement is used to
prevent the dots on the plot from overlapping.
Cleaveland Dot Graph: This is a scatterplot-like chart that displays data
vertically in a single dimension.
vii. Scatter Plot: It is a type of plot using Cartesian coordinates to display values for two
variables for a set of data. It is displayed as a collection of points. Their position on the
horizontal axis determines the value of one variable. The position on the vertical axis
determines the value of the other variable.
AI/ML UT2 QB SOLUTION
4. Explain the steps involved in cleaning and preparing the data.
Answer:
Answer:
AI/ML UT2 QB SOLUTION
6. Explain the bracketing methods.
Answer:
Bracketing methods determine successively smaller intervals (brackets) that contain a root.
When the interval is small enough, then a root has been found. They generally use the
intermediate value theorem, which asserts that if a continuous function has values of opposite
signs at the end points of an interval, then the function has at least one root in the interval.
Therefore, they require to start with an interval such that the function takes opposite signs at the
end points of the interval.
i. Bisection method: The simplest root-finding algorithm is the bisection method. Let f be
a continuous function, for which one knows an interval [a, b] such that f(a) and f(b) have
opposite signs (a bracket). Let c = (a +b)/2 be the middle of the interval (the midpoint or
the point that bisects the interval). Then either f(a) and f(c), or f(c) and f(b) have opposite
signs, and one has divided by two the size of the interval. Although the bisection method
is robust, it gains one and only one bit of accuracy with each iteration. Other methods,
under appropriate conditions, can gain accuracy faster.
ii. False position (regula falsi): The false position method, also called the regula falsi
method, is similar to the bisection method, but instead of using bisection search's middle
of the interval it uses the x-intercept of the line that connects the plotted function values
at the endpoints of the interval, that is:
𝑎𝑓(𝑏) − 𝑏𝑓(𝑎)
𝑐=
𝑓(𝑏) − 𝑓(𝑎)
False position is similar to the secant method, except that, instead of retaining the last two
points, it makes sure to keep one point on either side of the root. The false position
method can be faster than the bisection method and will never diverge like the secant
method.
iii. ITP method: The ITP method is the only known method to bracket the root with the
same worst case guarantees of the bisection method while guaranteeing a superlinear
convergence to the root of smooth functions as the secant method. It is also the only
known method guaranteed to outperform the bisection method on the average for any
continuous distribution on the location of the root. It does so by keeping track of both the
bracketing interval as well as the minmax interval in which any point therein converges
as fast as the bisection method. The construction of the queried point c follows three
steps: interpolation, truncation and then projection onto the minmax interval.
AI/ML UT2 QB SOLUTION
7. Explain the bisection method.
Answer:
In the bisection method, if 𝑓(𝑎)𝑓(𝑏) < 0 , an estimate for the root of the equation 𝑓(𝑥) = 0 can
be found as the average of a and b:
𝑎+𝑏
𝑥𝑖 =
2
Upon evaluating 𝑓(𝑥𝑖), the next iteration would be to set either 𝑎 = 𝑥𝑖 or 𝑏 = 𝑥𝑖 such that for
the next iteration the root 𝑥𝑖+1 is between a and b. The following describes an algorithm for the
bisection method given 𝑎 < 𝑏, 𝑓(𝑥), 𝜀𝑠 , and maximum number of iterations:
Step 1: Evaluate 𝑓(𝑎) and 𝑓(𝑏) to ensure that 𝑓(𝑎)𝑓(𝑏) < 0. Otherwise, exit with an error.
𝑎+𝑏
Step 2: Calculate the value of the root in iteration i as 𝑥𝑖 = . Check which of the following
2
applies:
i. If 𝑓(𝑥𝑖) = 0, then the root has been found, the value of the error 𝜀𝑟 = 0. Exit.
ii. If 𝑓(𝑥𝑖)𝑓(𝑎𝑖) < 0, then for the next iteration, 𝑥𝑖+1 is bracketed between 𝑎𝑖 and 𝑥𝑖 . The
𝑥 −𝑥𝑖
value of 𝜀𝑟 = 𝑖+1
𝑥 𝑖+1
iii. If 𝑓(𝑥_𝑖)𝑓(𝑏_𝑖) < 0, then for the next iteration, 𝑥𝑖+1 is bracketed between 𝑥𝑖 and 𝑏𝑖 . The
𝑥𝑖+1 −𝑥𝑖
value of 𝜀𝑟 = 𝑥𝑖+1
Answer:
An algorithm for finding the nearest local minimum of a function which presupposes that the
gradient of the function can be computed. The method of steepest descent, also called the
gradient descent method, starts at a point 𝑃0 and, as many times as needed, moves from 𝑃𝑖 to
𝑃𝑖+1 by minimizing along the line extending from 𝑃𝑖 in the direction of −∇𝑓(𝑃𝑖 ), the local
downhill gradient.
When applied to a 1-dimensional function f(x), the method takes the form of iterating
𝑥𝑖 = 𝑥𝑖−1 − 𝜖 𝑓′(𝑥𝑖−1 )
from a starting point 𝑥0 for some small 𝜖 > 0 until a fixed point is reached. The results are
illustrated above for the function 𝑓(𝑥) = 𝑥 3 − 2𝑥 2 + 2 with 𝜖 = 0.1 and starting points 𝑥0 = 2
and 0.01, respectively.
This method has the severe drawback of requiring a great many iterations for functions which
have long, narrow valley structures.
AI/ML UT2 QB SOLUTION
9. List the algorithms used for non-linear dimensionality reduction.
Answer:
i. Kernel PCA: Kernel PCA is a non-linear dimensionality reduction technique that uses
kernels. It can also be considered as the non-linear form of normal PCA. Kernel PCA
works well with non-linear datasets where normal PCA cannot be used efficiently.
ii. t-distributed Stochastic Neighbor Embedding (t-SNE): This is also a non-linear
dimensionality reduction method mostly used for data visualization. In addition to that, it
is widely used in image processing and NLP. The Scikit-learn documentation
recommends you to use PCA or Truncated SVD before t-SNE if the number of features
in the dataset is more than 50.
iii. Multidimensional Scaling (MDS): MDA is another non-linear dimensionality reduction
technique that tries to preserve the distances between instances while reducing the
dimensionality of non-linear data. There are two types of MDS algorithms: Metric and
Non-metric. The MDS() class in the Scikit-learn implements both by setting the metric
hyperparameter to True (for Metric type) or False (for Non-metric type).
iv. Isometric mapping (Isomap): This method performs non-linear dimensionality
reduction through Isometric mapping. It is an extension of MDS or Kernel PCA. It
connects each instance by calculating the curved or geodesic distance to its nearest
neighbors and reduces dimensionality. The number of neighbors to consider for each
point can be specified through the n_neighbors hyperparameter of the Isomap() class
which implements the Isomap algorithm in the Scikit-learn.