0% found this document useful (0 votes)
8 views8 pages

Aiml Ut2 QB Solution

The document discusses the importance of feature engineering and exploratory data analysis (EDA) in model building, highlighting techniques such as imputation, one-hot encoding, and various plotting methods. It outlines steps for data cleaning and preparation, including removing duplicates and handling missing data, as well as optimization techniques like the bisection method and steepest descent method. Additionally, it lists algorithms for non-linear dimensionality reduction, such as Kernel PCA and t-SNE.

Uploaded by

yashvardhan983
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views8 pages

Aiml Ut2 QB Solution

The document discusses the importance of feature engineering and exploratory data analysis (EDA) in model building, highlighting techniques such as imputation, one-hot encoding, and various plotting methods. It outlines steps for data cleaning and preparation, including removing duplicates and handling missing data, as well as optimization techniques like the bisection method and steepest descent method. Additionally, it lists algorithms for non-linear dimensionality reduction, such as Kernel PCA and t-SNE.

Uploaded by

yashvardhan983
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

AI/ML UT2 QB SOLUTION

1. Why is feature engineering important in model building? List out some of the
techniques used for feature engineering.

Answer:

Feature engineering aids in better communicating a fundamental issue to predictive models,


increasing the model's accuracy for unobserved data. The feature engineering method chooses
the most practical predictor variables for the model, which is composed of predictor variables
and an outcome variable. An effective Feature Engineering implies:

 Higher efficiency of the model


 Easier Algorithms that fit the data
 Easier for Algorithms to detect patterns in the data
 Greater Flexibility of the features

Feature engineering techniques include:

i. Imputation: A typical problem in machine learning is missing values in the data sets, which
affects the way machine learning algorithms Imputation is the process of replacing missing
data with statistical estimates of the missing values, which produces a complete data set to
use to train machine learning models.
ii. One-hot encoding: A process by which categorical data is converted into a form that the
machine learning algorithm understands so it can make better predictions.
iii. Bag of words: A counting algorithm that calculates how many times a word is repeated in a
document. It can be used to determine similarities and differences in documents for such
applications as search and document classification.
iv. Automated feature engineering: This technique pulls out useful and meaningful features
using a framework that can be applied to any problem. Automated feature engineering
enables data scientists to be more productive by allowing them to spend more time on other
components of machine learning. This technique also allows citizen data scientists to do
feature engineering using a framework-based approach.
v. Binning: Binning, or grouping data, is key to preparing numerical data for machine learning.
This technique can be used to replace a column of numbers with categorical values
representing specific ranges.
vi. N-grams: Help predict the next item in a sequence. In sentiment analysis, the n-gram model
helps analyze the sentiment of the text or document.
vii. Feature crosses: A way to combine two or more categorical features into one. This
technique is particularly useful when certain features together denote a property better than
they do by themselves.
AI/ML UT2 QB SOLUTION
2. Why is an exploratory data analysis important? What are the components of EDA?

Answer:

EDA makes it simple to comprehend the structure of a dataset, making data modelling easier.
The primary goal of EDA is to make data ‘clean’ implying that it should be devoid of
redundancies. It aids in identifying incorrect data points so that they may be readily removed and
the data cleaned. Furthermore, it aids us in comprehending the relationship between the
variables, providing us with a broader view of the data and allowing us to expand on it by
leveraging the relationship between the variables. It also aids in the evaluation of the dataset’s
statistical measurements.

Outliers or abnormal occurrences in a dataset can have an impact on the accuracy of machine
learning models. The dataset might also contain some missing or duplicate values. EDA may be
used to eliminate or resolve all of the dataset’s undesirable qualities.

Steps Involved in Exploratory Data Analysis

i. Data Collection: Data collection is an essential part of exploratory data analysis. It refers
to the process of finding and loading data into our system. Good, reliable data can be
found on various public sites or bought from private organizations. Some reliable sites for
data collection are Kaggle, Github, Machine Learning Repository, etc.

ii. Data Cleaning: Data cleaning refers to the process of removing unwanted variables and
values from your dataset and getting rid of any irregularities in it. Such anomalies can
disproportionately skew the data and hence adversely affect the results. Some steps that
can be done to clean data are:
 Removing missing values, outliers, and unnecessary rows/ columns.
 Re-indexing and reformatting our data.

iii. Univariate Analysis: In Univariate Analysis, you analyze data of just one variable. A
variable in your dataset refers to a single feature/ column. You can do this either with
graphical or non-graphical means by finding specific mathematical values in the data.
Some visual methods include:
 Histograms: The frequency of data is represented with rectangle bars.
 Box-plots: Here the information is represented in the form of boxes.

iv. Bivariate Analysis: Here, you use two variables and compare them. This way, you can
find how one feature affects the other. It is done with scatter plots, which plot individual
data points or correlation matrices that plot the correlation in hues. You can also use
boxplots.
AI/ML UT2 QB SOLUTION
3. What are the various methods to plot the dataset?

Answer:

i. Bar Graph: A bar graph is a graph that presents categorical data with rectangle-shaped
bars. The heights or lengths of these bars are proportional to the values that they
represent. The bars can be vertical or horizontal. A vertical bar graph is sometimes called
a column graph.
ii. Line Graph: It displays a sequence of data points as markers. The points are ordered
typically by their x-axis value. These points are joined with straight line segments. A line
graph is used to visualize a trend in data over intervals of time.
iii. Pie Chart: A pie chart is a circular statistical graphic. To illustrate numerical proportion,
it is divided into slices. In a pie chart, for every slice, each of its arc lengths is
proportional to the amount it represents. The central angles, and area are also
proportional. It is named after a sliced pie.
iv. Histogram: A histogram is an approximate representation of the distribution of
numerical data. The data is divided into non-overlapping intervals called bins and
buckets. A rectangle is erected over a bin whose height is proportional to the number of
data points in the bin. Histograms give a feel of the density of the distribution of the
underlying data.
v. Area Chart: It is represented by the area between the lines and the axis. The area is
proportional to the amount it represents.
vi. Dot Graph: A dot graph consists of data points plotted as dots on a graph. There are two
types of these:
 The Wilkinson Dot Graph: In this dot graph, the local displacement is used to
prevent the dots on the plot from overlapping.
 Cleaveland Dot Graph: This is a scatterplot-like chart that displays data
vertically in a single dimension.
vii. Scatter Plot: It is a type of plot using Cartesian coordinates to display values for two
variables for a set of data. It is displayed as a collection of points. Their position on the
horizontal axis determines the value of one variable. The position on the vertical axis
determines the value of the other variable.
AI/ML UT2 QB SOLUTION
4. Explain the steps involved in cleaning and preparing the data.

Answer:

i. Remove duplicate or irrelevant observation: Remove unwanted observations from


your dataset, including duplicate observations or irrelevant observations. This can make
analysis more efficient and minimize distraction from your primary target—as well as
creating a more manageable and more performant dataset.
ii. Fix structural errors: Structural errors are when you measure or transfer data and notice
strange naming conventions, typos, or incorrect capitalization. These inconsistencies can
cause mislabelled categories or classes. For example, you may find “N/A” and “Not
Applicable” both appear, but they should be analysed as the same category.
iii. Filter unwanted outliers: Often, there will be one-off observations where, at a glance,
they do not appear to fit within the data you are analysing. This step is needed to
determine the validity of that number. If an outlier proves to be irrelevant for analysis or
is a mistake, consider removing it.
iv. Handle missing data: There are a couple of ways to deal with missing data. As a first
option, you can drop observations that have missing values. As a second option, you can
input missing values based on other observations. As a third option, you might alter the
way the data is used to effectively navigate null values.
v. Validate and QA: At the end of the data cleaning process, you should be able to answer
these questions as a part of basic validation:
 Does the data make sense?
 Does the data follow the appropriate rules for its field?
 Does it prove or disprove your working theory, or bring any insight to light?
 Can you find trends in the data to help you form your next theory?
 If not, is that because of a data quality issue?

5. Compare constrained and unconstrained optimization techniques.

Answer:
AI/ML UT2 QB SOLUTION
6. Explain the bracketing methods.

Answer:

Bracketing methods determine successively smaller intervals (brackets) that contain a root.
When the interval is small enough, then a root has been found. They generally use the
intermediate value theorem, which asserts that if a continuous function has values of opposite
signs at the end points of an interval, then the function has at least one root in the interval.
Therefore, they require to start with an interval such that the function takes opposite signs at the
end points of the interval.

i. Bisection method: The simplest root-finding algorithm is the bisection method. Let f be
a continuous function, for which one knows an interval [a, b] such that f(a) and f(b) have
opposite signs (a bracket). Let c = (a +b)/2 be the middle of the interval (the midpoint or
the point that bisects the interval). Then either f(a) and f(c), or f(c) and f(b) have opposite
signs, and one has divided by two the size of the interval. Although the bisection method
is robust, it gains one and only one bit of accuracy with each iteration. Other methods,
under appropriate conditions, can gain accuracy faster.

ii. False position (regula falsi): The false position method, also called the regula falsi
method, is similar to the bisection method, but instead of using bisection search's middle
of the interval it uses the x-intercept of the line that connects the plotted function values
at the endpoints of the interval, that is:

𝑎𝑓(𝑏) − 𝑏𝑓(𝑎)
𝑐=
𝑓(𝑏) − 𝑓(𝑎)

False position is similar to the secant method, except that, instead of retaining the last two
points, it makes sure to keep one point on either side of the root. The false position
method can be faster than the bisection method and will never diverge like the secant
method.

iii. ITP method: The ITP method is the only known method to bracket the root with the
same worst case guarantees of the bisection method while guaranteeing a superlinear
convergence to the root of smooth functions as the secant method. It is also the only
known method guaranteed to outperform the bisection method on the average for any
continuous distribution on the location of the root. It does so by keeping track of both the
bracketing interval as well as the minmax interval in which any point therein converges
as fast as the bisection method. The construction of the queried point c follows three
steps: interpolation, truncation and then projection onto the minmax interval.
AI/ML UT2 QB SOLUTION
7. Explain the bisection method.

Answer:

In the bisection method, if 𝑓(𝑎)𝑓(𝑏) < 0 , an estimate for the root of the equation 𝑓(𝑥) = 0 can
be found as the average of a and b:

𝑎+𝑏
𝑥𝑖 =
2
Upon evaluating 𝑓(𝑥𝑖), the next iteration would be to set either 𝑎 = 𝑥𝑖 or 𝑏 = 𝑥𝑖 such that for
the next iteration the root 𝑥𝑖+1 is between a and b. The following describes an algorithm for the
bisection method given 𝑎 < 𝑏, 𝑓(𝑥), 𝜀𝑠 , and maximum number of iterations:

Step 1: Evaluate 𝑓(𝑎) and 𝑓(𝑏) to ensure that 𝑓(𝑎)𝑓(𝑏) < 0. Otherwise, exit with an error.
𝑎+𝑏
Step 2: Calculate the value of the root in iteration i as 𝑥𝑖 = . Check which of the following
2
applies:

i. If 𝑓(𝑥𝑖) = 0, then the root has been found, the value of the error 𝜀𝑟 = 0. Exit.
ii. If 𝑓(𝑥𝑖)𝑓(𝑎𝑖) < 0, then for the next iteration, 𝑥𝑖+1 is bracketed between 𝑎𝑖 and 𝑥𝑖 . The
𝑥 −𝑥𝑖
value of 𝜀𝑟 = 𝑖+1
𝑥 𝑖+1
iii. If 𝑓(𝑥_𝑖)𝑓(𝑏_𝑖) < 0, then for the next iteration, 𝑥𝑖+1 is bracketed between 𝑥𝑖 and 𝑏𝑖 . The
𝑥𝑖+1 −𝑥𝑖
value of 𝜀𝑟 = 𝑥𝑖+1

Step 3: Set 𝑖 = 𝑖 + 1. If i reaches the maximum number of iterations or if 𝜀𝑟 ≤ 𝜀𝑠 , then the


iterations are stopped. Otherwise, return to step 2 with the new interval 𝑎𝑖+1 and 𝑏𝑖+1 .
AI/ML UT2 QB SOLUTION
8. Explain the steepest descent method.

Answer:

An algorithm for finding the nearest local minimum of a function which presupposes that the
gradient of the function can be computed. The method of steepest descent, also called the
gradient descent method, starts at a point 𝑃0 and, as many times as needed, moves from 𝑃𝑖 to
𝑃𝑖+1 by minimizing along the line extending from 𝑃𝑖 in the direction of −∇𝑓(𝑃𝑖 ), the local
downhill gradient.

When applied to a 1-dimensional function f(x), the method takes the form of iterating

𝑥𝑖 = 𝑥𝑖−1 − 𝜖 𝑓′(𝑥𝑖−1 )

from a starting point 𝑥0 for some small 𝜖 > 0 until a fixed point is reached. The results are
illustrated above for the function 𝑓(𝑥) = 𝑥 3 − 2𝑥 2 + 2 with 𝜖 = 0.1 and starting points 𝑥0 = 2
and 0.01, respectively.

This method has the severe drawback of requiring a great many iterations for functions which
have long, narrow valley structures.
AI/ML UT2 QB SOLUTION
9. List the algorithms used for non-linear dimensionality reduction.

Answer:

i. Kernel PCA: Kernel PCA is a non-linear dimensionality reduction technique that uses
kernels. It can also be considered as the non-linear form of normal PCA. Kernel PCA
works well with non-linear datasets where normal PCA cannot be used efficiently.
ii. t-distributed Stochastic Neighbor Embedding (t-SNE): This is also a non-linear
dimensionality reduction method mostly used for data visualization. In addition to that, it
is widely used in image processing and NLP. The Scikit-learn documentation
recommends you to use PCA or Truncated SVD before t-SNE if the number of features
in the dataset is more than 50.
iii. Multidimensional Scaling (MDS): MDA is another non-linear dimensionality reduction
technique that tries to preserve the distances between instances while reducing the
dimensionality of non-linear data. There are two types of MDS algorithms: Metric and
Non-metric. The MDS() class in the Scikit-learn implements both by setting the metric
hyperparameter to True (for Metric type) or False (for Non-metric type).
iv. Isometric mapping (Isomap): This method performs non-linear dimensionality
reduction through Isometric mapping. It is an extension of MDS or Kernel PCA. It
connects each instance by calculating the curved or geodesic distance to its nearest
neighbors and reduces dimensionality. The number of neighbors to consider for each
point can be specified through the n_neighbors hyperparameter of the Isomap() class
which implements the Isomap algorithm in the Scikit-learn.

10. Write short notes on the following:


i. PCA: Principal Component Analysis (PCA) is a statistical procedure that uses an
orthogonal transformation that converts a set of correlated variables to a set of
uncorrelated variables. PCA is the most widely used tool in exploratory data analysis and
in machine learning for predictive models. Moreover, PCA is an unsupervised statistical
technique used to examine the interrelations among a set of variables. It is also known as
a general factor analysis where regression determines a line of best fit.

ii. LDA: Linear Discriminant Analysis or Normal Discriminant Analysis or Discriminant


Function Analysis is a dimensionality reduction technique that is commonly used for
supervised classification problems. It is used for modelling differences in groups i.e.
separating two or more classes. It is used to project the features in higher dimension
space into a lower dimension space. For example, we have two classes and we need to
separate them efficiently. Classes can have multiple features. Using only a single feature
to classify them may result in some overlapping as shown in the below figure. So, we will
keep on increasing the number of features for proper classification.

You might also like