0% found this document useful (0 votes)
46 views12 pages

Interview Preparing - ML Draft

The document discusses various machine learning topics including parametric vs non-parametric algorithms, supervised learning techniques like linear regression, logistic regression, naive bayes, decision trees, and support vector machines. Unsupervised techniques covered are k-means clustering, hierarchical clustering, and anomaly detection. Deep learning models are also mentioned. Key algorithms covered include linear regression, logistic regression, naive bayes, decision trees, random forests, support vector machines, k-nearest neighbors, k-means clustering and principal component analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views12 pages

Interview Preparing - ML Draft

The document discusses various machine learning topics including parametric vs non-parametric algorithms, supervised learning techniques like linear regression, logistic regression, naive bayes, decision trees, and support vector machines. Unsupervised techniques covered are k-means clustering, hierarchical clustering, and anomaly detection. Deep learning models are also mentioned. Key algorithms covered include linear regression, logistic regression, naive bayes, decision trees, random forests, support vector machines, k-nearest neighbors, k-means clustering and principal component analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

1 MACHINE LEARNING TOPICS

1.1 PARAMETRIC ML ALGORITHMS


A learning model that summarizes data with a set of parameters of fixed size
(independent of the number of training examples) is called a parametric model. e.g., (LR,
LDA, Perceptron, Naïve Bayes, simple NN)

1.2 NON-PARAMETRIC ML ALGORITHMS


Nonparametric methods are good when you have a lot of data and no prior
knowledge, and when you don’t want to worry too much about choosing just the right
features. e.g., (KNN, DT, SVM)

1
1.3 SUPERVISED MACHINE LEARNING
Examples for regression applications:
 Analyze the marketing effectiveness, pricing, and promotions on the sales of a product.
 Forecast sales by analyzing the monthly company’s sales for the past few years.
 Predict house prices with an increase in the sizes of houses.
 Calculate causal relationships between parameters in biological systems

Examples for classification applications:


 Sentiment Analysis
 Email Spam Classification
 Document Classification
 Image Classification

1.3.1 Linear Regression


A Supervised Machine Learning which is used to predict values within a certain
range, rather than classifying them into categories. It calculates the error iteratively until

2
you reach the most accurate line with a minimum error value (that is, the minimum distance
between the line and all points).
Types:
1. Univariate Linear Regression — the basic information needed to start with.
2. Multivariate Linear Regression — the more complex form of Linear Regression. In higher
dimensions where we have more than one input (X), the line is called a plane or a hyper-plane.
𝑌(𝑋) = 𝑝0 + 𝑝1 𝑋1 + 𝑝2 𝑋2 + ⋯ + 𝑝𝑛 𝑋𝑛

1.3.2 Ridge Regression (L2 regularization)


Used if only one independent variable is being used to predict the output. Used in
high variance problems to make it simpler.
 The above-discussed linear regression uses OLS (Ordinary Least Squares) to predict the
output values.
 The complexity of the ML model can also be reduced via ridge regression.
 All the coefficients are reduced in ridge regression as weights are pushed to smaller
values.
 If lambda is too larger, it is also possible to over smooth (high bias).

1.3.3 Lasso Regression (L1 regularization)


Lasso (Least Absolute Shrinkage and Selection Operator) regression is another
widely used linear ML regression (one input variable).
 The regression coefficients are reduced by lasso regression to make them fit perfectly with
various datasets.
 Besides ML, the lasso algorithm is also used for regression in Data Mining.
 ML experts opt for the lasso regression algorithm when there is high multicollinearity in
the given dataset.

Multicollinearity in the dataset means independent variables are highly related to each
other, and a small change in the data can cause a large change in the regression
coefficients.

1.3.4 Logistic Regression


Logistic regression is a supervised classification algorithm. It is different from linear
regression where the dependent or output variable is a category or class. The target is a
discrete category or a class (not a continuous variable as in linear regression).

The logistic function (sigmoid function) is an S-shaped curve for data discrimination
across multiple classes. It can take any real value 0 – 1.

1.3.5 Naive Bayes

3
Naïve Bayes classifiers is a powerful and simple supervised machine learning
algorithm. It assumes that the value of a particular feature is independent of the value of any
other feature, given the class variable.
For example:
A fruit may be considered to be an apple if it is red, round, and about 10 cm in
diameter.
Features: Color, roundness, and diameter.
1. Define two classes (CY and CN) that correspond to Apple = Yes and Apple = No.
2. Compute the probability for 𝐶𝑦 as x:
𝑝(𝐶𝑌 | 𝑥): 𝑝(𝐴𝑝𝑝𝑙𝑒 = 𝑌𝑒𝑠 | 𝑅𝑒𝑑, 𝑟𝑜𝑢𝑛𝑑, => 10 𝑐𝑚)
3. Compute the probability for 𝐶𝑁 as x:

𝑝(𝐶𝑁 | 𝑥): 𝑝(𝐴𝑝𝑝𝑙𝑒 = 𝑁𝑜 | 𝑅𝑒𝑑, 𝑟𝑜𝑢𝑛𝑑, => 10 𝑐𝑚)


4. Discover which conditional probability is larger: If p(CY |x) > p(CN |x), then it is an
apple.
5. Compute

𝑝(𝑥|𝐶𝑌 ) = 𝑝(𝑅𝑒𝑑|𝐴𝑝𝑝𝑙𝑒 = 𝑌𝑒𝑠) ∗ 𝑝(𝑟𝑜𝑢𝑛𝑑|𝐴𝑝𝑝𝑙𝑒 = 𝑌𝑒𝑠)


∗ 𝑝(=> 10 𝑐𝑚|𝐴𝑝𝑝𝑙𝑒 = 𝑌𝑒𝑠)

to calculate p(Color = Red | Apple = Yes), you are asking, “What is the probability for
having a red color object given that we know that it is an apple”.

1.3.6 Decision Tree


A decision tree is a popular supervised learning algorithm that can be used for
classification and regression problems. Decision trees can explain why a specific
prediction was made by traversing the tree. It resembles a flowchart and is easy to interpret
because it breaks down a data set into smaller and smaller subsets while building the
associated decision tree.

4
Entropy
It is the measure of the amount of uncertainty and randomness in a set of data for
the classification task. Entropy is maximized when all points have equal probabilities.
Entropy zero means that there is no randomness for this attribute.
 Entropy for one attribute: Entropy(x) = - 𝑝(𝑥𝑖 ) ∗ log 2 ∗ 𝑝(𝑥𝑖 )
 Entropy for two attributes: Entropy(T, xi) = - 𝛴𝑖 𝑝(𝑥𝑖 ) ∗ log 2 ∗ 𝑝(𝑥𝑖 )
 Gain(T, x) = Entropy(T)-Entropy(T, x)

Information gain
It is used for ranking the attributes or features to split at given node in the tree.
 Information gain = (Entropy of distribution before the split)–(entropy of distribution after
it)
 Used for ranking the attributes or features to split at given node in the tree.
 It defines how much information a feature provides about a class.
 The feature with the highest information gain is used for the first split.

Decision Tree Regression


The core algorithm for building decision trees called ID3 which employs a top-down,
greedy search through the space of possible branches with no backtracking. The ID3
algorithm can be used to construct a decision tree for regression by replacing Information
Gain with Standard Deviation Reduction + Important explanation

The standard deviation reduction is based on the decrease in standard deviation after a
dataset is split on an attribute. Constructing a decision tree is all about finding attribute
that returns the highest standard deviation reduction (i.e., the most homogeneous
branches).

1.3.7 Random Forest


5
1.3.8 Support Vector Machines
SVM is a supervised learning model that can be a linear or non-linear classifier. SVM
is also called a “large Margin Classifier” because the algorithm seeks the hyperplane with
the largest margin, that is, the largest distance to the nearest sample points.

Support Vector Regression (SVR)


Important explanation
Support Vector Regression (SVR) uses the same principle as SVM, but for regression
problems. The problem of regression is to find a function that approximates mapping from
an input domain to real numbers on the basis of a training sample. There are a few important
parameters that you should be aware of before proceeding further:

6
Kernel
A kernel helps us find a hyperplane in the higher dimensional space without
increasing the computational cost.
Hyperplane
This is basically a separating line between two data classes in SVM. But in Support
Vector Regression, this is the line that will be used to predict the continuous output.
𝑌 = 𝑤𝑥 + 𝑏 (𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 ℎ𝑦𝑝𝑒𝑟𝑝𝑙𝑎𝑛𝑒)
−𝑎 < 𝑌 − 𝑤𝑥 + 𝑏 < +𝑎
Decision Boundary
A decision boundary can be thought of as a demarcation line (for simplification) on
one side of which lie positive examples and on the other side lie the negative examples. On
this very line, the examples may be classified as either positive or negative. This same
concept of SVM will be applied in Support Vector Regression as well. The equations of
decision boundary become:
𝑤𝑥 + 𝑏 = +𝑎
𝑤𝑥 + 𝑏 = −𝑎
1.3.9 K-Nearest Neighbor

K-Nearest Neighbors Regression


Important explanation

1.4 UNSUPERVISED MACHINE LEARNING

1.4.1K-Means Algorithm
K-means clustering is an unsupervised machine learning technique. The main goal of
the algorithm is to group the data observations into k clusters, where each observation
belongs to the cluster with the nearest mean. A cluster’s center is the centroid.

7
Examples of applications include
 Customer segmentation.
 Image segmentation and compression.
 Recommendation systems.

1.4.2 Hierarchal clustering

1.4.3 Anomaly detection

1.4.4 Principle Component Analysis

1.4.5 Kernel PCA

1.4.6 LLE
o T-SNE
o Independent Component Analysis
o Singular value decomposition
https://www.javatpoint.com/unsupervised-machine-learning

1.5 DEEP LEARNING MODELS


Artificial neural networks are collections of interconnected “neurons” (called nodes)
that work together to transform input data to output data. Each node applies a mathematical
transformation to the data it receives and has a respective activation function that defines
the output of the node based on a set of inputs. The last activation function can be
manipulated to change a neural network into a regression model (ReLU).

8
Perceptron
A perceptron is a single neuron model that was an originator for neural networks. It
is similar to linear regression. Each neuron has its own bias and slope (weights).

Backpropagation
Backpropagation is an algorithm for training neural networks that have many layers.
It works in two phases.
 Propagation of inputs through a neural network to the final layer (called feedforward).
 The algorithm computes an error. An error value is then calculated by using the wanted
output and the actual output for each output neuron in the network. The error value is
propagated backward through the weights of the network (adjusting the weights)
beginning with the output neurons through the hidden layer and to the input layer (as a
function of the contribution of the error).
Backpropagation continues to be an important aspect of neural network learning.
With faster and cheaper computing resources, it continues to be applied to larger and denser
networks.
Types of neural networks.
 Multilayer perceptron (MLP): A class of feed-forward artificial neural networks
(ANNs). It is useful in classification problems where inputs are assigned a class. It also
works in regression problems for a real-valued quantity like a house price prediction.
 Convolutional neural network (CNN): Takes an input as an image. It is useful for image
recognition problems like facial recognition.

9
 Recurrent neural network (RNN): It is suitable for inputs like audio and languages. It
can be used in applications like speech recognition and machine translation.
 Hybrid neural network: Covers more complex neural networks, for example,
autonomous cars that require processing images and work by using radar.

Examples for applications:


 Object detection, tracking, and image and video analysis by using a Convolutional Neural
Network (CNN)
 Natural language processing tasks like speech recognition and machine translation by
using a recurrent neural network (RNN)
 Autonomous cars and robots.

1.6 ASSOCIATION MODELS

1.6.1 Apriori algorithm

1.6.2 Eclat

1.7 MACHINE LEARNING PROCEDURE

1.7.1 Gradient Descent

1.7.2 Genetic algorithms

1.7.3 Bagging

1.7.4 Boosting

1.7.5 Regularization

1.7.6 Hyperparameter tuning

1.7.7 Ensemble of multiple models.

1.7.8 Grid Search

1.8 TRADEOFFS AND GOTCHAS

1.8.1 Relative advantages and disadvantages

1.8.2 Bias and variance

1.8.3 Overfitting and underfitting

10
1.8.4 Vanishing/exploding gradients

1.8.5 Missing data

1.8.6 Data leakage

MACHINE LEARNING ALGORITHMS AND LIBRARIES QUESTIONS

P1. You’re trying to classify images of cats and dogs. Plotting the images in some
transformed 2-dimensional feature space reveals the following pattern (on the left). In
some other space, images of dogs and wolves show a different pattern (on the right).

 What model would you use to classify cats vs. dogs, and what would you use for dogs vs.
wolves? Why?

P2. I’m trying to fit a single hidden layer neural network to a given dataset, and I find that the
weights are oscillating a lot over training iterations (varying wildly, often swinging between
positive and negative values). What parameter do I need to tune to address this issue?
P3. When training a support vector machine, what value are you optimizing for?
P4. Lasso regression uses the L1-norm of coefficients as a penalty term, while ridge
regression uses the L2-norm. Which of these regularization methods is more likely to
result in sparse solutions, where one or more coefficients are exactly zero?
P5. When training a 10-layer neural net using backpropagation, I find that the weights for the
top 3 layers are not changing at all! The next few layers (4-6) are changing, but very
slowly. What’s going on and how do I fix this?
P6. I’ve found some data about wheat-growing regions in Europe that includes annual rainfall
(R, in inches), mean altitude (A, in meters) and wheat output (O, in kgs/km2). A rough
analysis and some plots make me believe that output is related to the square of rainfall,
and log of altitude: O = β0 + β1 × R2 + β2 × loge(A)
P7. Can I fit the coefficients (β) in my model to the data using linear regression?

11
12

You might also like