0% found this document useful (0 votes)
44 views9 pages

ML - Interview Prep

The document discusses different machine learning algorithms including supervised algorithms like support vector machines, regression, naive bayes, random forest classifier and unsupervised algorithms like PCA and clustering. It also covers concepts like overfitting, underfitting, cross-validation, loss functions and cost functions.

Uploaded by

Kingsley Umeh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views9 pages

ML - Interview Prep

The document discusses different machine learning algorithms including supervised algorithms like support vector machines, regression, naive bayes, random forest classifier and unsupervised algorithms like PCA and clustering. It also covers concepts like overfitting, underfitting, cross-validation, loss functions and cost functions.

Uploaded by

Kingsley Umeh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

ML_Interview

Wednesday, June 28, 2023 11:19 PM

• Define
○ focuses on developing algorithms and models
○ that enable computers to learn and make predictions or
decisions
○ without being explicitly programmed.
• Machine Learning algorithms:
○ Supervised:
§ Define
□ machine learning algorithms that require labeled
training data
§ supervised learning algorithms
□ Support Vector Machines
® Facts on SVM
◊ It finds a hyperplane of dimension
(number of feature - 1)
◊ The hyperplane is defined by the
equation:
◊ w1 * Height + w2 * Weight - b = 0
◊ We need to find the values of w1, w2,
and b that define the hyperplane.
◊ To find the hyperplane with the
maximum margin, we need to solve the
following optimization problem:
◊ yi(w1 * xi + w2 * xi - b) >= 1 for all data
points
◊ So for each i, either xi is in the
hyperplane of 1 or -1.
◊ w . xi - b = 1 or w. xi - b = -1
□ Regression
® It deals with continuous data
® It predicts the relationship the data represents
® It used to find a mathematical function that
explain
® the relationship between the independent
variable and the dependent variable
® Types:
® It predicts the relationship the data represents
® It used to find a mathematical function that
explain
® the relationship between the independent
variable and the dependent variable
® Types:
◊ Logistic regression:
} z = β0 + β1*X1 + β2*X2 + ... +
βn*Xn
} Sigmoid function
– P(Y=1|X) = 1 / (1 + exp(-z))
◊ multinomial logistic regression
} softmax(z)
– = exp(z_k) / (Σ(exp(z_i)))
□ Naive Bayes
® Facts on Naive Bayes
◊ Unlike just Bayes' theorem,
◊ Naive Bayes assumes the relationship
◊ That class y and independent features
◊ The presence or absence of feature x1 is
independent of feature x2
® The different naive Bayes classifiers
◊ Bernoulli, binomial, Gaussian
□ Random forest Classifier:
® Facts:
◊ It utilizes ensemble learning that
combines multiple decision trees, to
make it prediction
® Workings:
◊ Random sampling
} Randomly sample subset of the
original dataset, for each decision
tree
} Ensure it’s the same size as original
dataset
} May contain duplicates
◊ Randomly select some features for node
splitting
◊ Decision tree construction
} Construction decision tree by
splitting nodes
} Split threshold that maximize
information gain are chosen
} classification
◊ Decision tree construction
} Construction decision tree by
splitting nodes
} Split threshold that maximize
information gain are chosen
} classification
– Gini impurity = 1 - (p0^2 + p1
^2)
– Entropy = -p0 * log2(p0) - p1
* log2(p1)
} Regression:
– MSE = (1/n) * Σ(y - ŷ)^2

◊ Voting and aggregation of decision trees,


for prediction
□ K-nearest Neighbor Algorithm
□ Neural Networks.
○ Unsupervised
§ Define
□ machine learning algorithms that doesn’t involve
labeled training data set
□ Doesn’t have a dependable variable, Y
§ Unsupervised Learning Algorithms:
□ PCA (Principal Component Analysis)
® Facts:
◊ A dimension reduction technique
◊ Removes features with little variation or
variance
◊ Ensures the relevant features are used in
model
® Procedure
◊ Standardize features
◊ Build covariance matrix of standardized
features
◊ Find eigen vectors and values
◊ Only accept the top 80% of eigen values
and vectors: Principle components
◊ Project the standardized dataset onto
the top 80% of Principle components
□ Clustering
® Facts:
◊ Group sets of records into groups
◊ Similar records are in the same cluster
the top 80% of Principle components
□ Clustering
® Facts:
◊ Group sets of records into groups
◊ Similar records are in the same cluster
◊ Dissimilar records are not in the same
cluster
® Types:
◊ K means clustering
} Process:
– Choose and randomly
initialize the k centroids
– Assignments;
w Assign all data points
to the nearest cluster
– Update centroids
w Recalculate the
centroids, by getting
the mean of each
cluster.
– Finally
w Repeat step 2 and 3
w Update Centroids until
they converge to the
right centroids

◊ Hierarchical clustering
◊ Density-based clustering
◊ Fuzzy clustering, etc.

□ Anomaly Detection,
□ Neural Networks and Latent Variable Models.
○ Reinforcement learning
• Cross-Validation
○ Split all data into
§ Training
□ Overfitting:
® Fact:
◊ The model learns the training dataset too
well.
◊ The model does poorly on the testing
data
□ Overfitting:
® Fact:
◊ The model learns the training dataset too
well.
◊ The model does poorly on the testing
data
® Solution:
◊ Increasing training dataset
◊ Cross-Validation
□ Underfitting:
® Fact:
◊ Fails to capture patterns in the data:
◊ Poor performance on both training and
testing
® Solution:
◊ Increase training data
◊ Hyperparameter tuning
◊ Increase model complexity
◊ Add more features
□ Loss Function
® Facts:
◊ It involves calculating the disparity or
error between the predicted value and
the actual value, for only a record or a
single data point.
® Examples of loss functions:
◊ Mean-Squared Error(MSE):
} FACTS:
– Used in regression task
} MSE = √(predicted value - actual
value)2
◊ Hinge loss:
} Facts:
– It is used to train the
machine learning classifier
} Hinge Loss = max(0, 1 - y * f(x))
– Y = -1 OR 1
– F(x) = predicted score
□ Cost Functions
® It aggregates the loss of each datapoints
§ Testing
§ Validation set
○ The dataset is divided into k equally sized folds.
□ Cost Functions
® It aggregates the loss of each datapoints
§ Testing
§ Validation set
○ The dataset is divided into k equally sized folds.
○ Iteratively, for each fold:
§ The model is trained on the combination of k-1 folds
(training set).
§ The trained model is evaluated on the remaining fold
(testing set).
○ The performance metrics (e.g., accuracy, precision, recall) are
recorded for each iteration.
• Bias in Machine Learning
○ Biases are inconsistencies in the dataset
○ The inconsistencies are not mutually exclusive
○ They can intertwine and affect each other
• F1 score
○ Facts:
§ Evaluates the performance and accuracy of classification
models
§ It combines precision and recall for its accuracy value
○ Components:
§ Precision
□ Out of the ones predicted to be true
□ Which were did it get right
□ Precision = TP / (TP + FP)
§ Recall
□ Out of the ones that were actually true
□ Which did it get right
□ Recall = TP / (TP + FN)
○ F1 Score = 2 * (Precision * Recall) / (Precision + Recall)
• Neural Network:
○ It is computational model inspired by the brain
○ It consist of inter-connect artificial neurons, organized in layers
○ The take in inputs process it through a series of mathematical
equations
○ Through this process it can learn patterns from data and perform
classification and regression task
• Ensemble learning
○ Facts:
§ It involves combining several models to create a more
powerful and robust model.
§ It also help a model balance between bias and variance
□ The bias-variance tradeoff involves a model ability to
• Ensemble learning
○ Facts:
§ It involves combining several models to create a more
powerful and robust model.
§ It also help a model balance between bias and variance
□ The bias-variance tradeoff involves a model ability to
fit to the training data (low bias) and then generalize
to unseen data (low variance)
§ Example of Ensemble learning
□ Voting:
® Independent base learners predict the class or
value
® For regression, we computer the average for
final prediction
® For classification, we compute the mode for
final value
□ Bagging
® It Voting with random samples and random
features of dataset
□ Boosting:
® It involves a sequence of base learners
® Subsequent learners focus of the records the
previous model got wrong
® To get aggregate predication, weighting of
model output could be used.
• Exploratory Data Analysis (EDA)
○ FACT
§ there is no specific way that lets us know which ML
algorithm to use
§ If Discrete data, we use classification models, like svm
§ If continuous data, we is regression models
○ EDA
§ Data collection and inspection
□ We classify our features as continuous, categorical
data
§ Data visualization
§ Perform descriptive statistics
□ Mean, median, std, variance
§ Data cleaning and preprocessing
□ Handle missing data
□ Handle Outliers:
® Facts:
◊ Outliers are data points that significantly
deviates from the majority data points
§ Data cleaning and preprocessing
□ Handle missing data
□ Handle Outliers:
® Facts:
◊ Outliers are data points that significantly
deviates from the majority data points
◊ They can occur due to
} Rare events
} Error or data corruption
® Detect outliers
◊ Z-score:
} Z-scores beyond a certain
threshold are outliers: 2 or 3
◊ IQR:
} Outliers < P25 - (t * IQR)
} Outliers >P75 + (t * IQR)
◊ Data visualization
} Box plots
} Histogram
} Scatter plot
® Handling outliers:
◊ Replace outlier with median, mode
◊ Transform feature: log transformation
◊ Create an additional binary feature to
indicate outlier
◊ Remove the outlier
□ Encoding of categorical data
§ ML algorithm selection
• Recommender System
○ Fact:
§ Predicts user interest
§ Recommend products based on predicted interest
○ Data used:
§ User ratings
§ Search engine queries
§ Purchase histories
○ Collaborative filtering vs Content-based filtering
Collaborative filtering Content-based filtering
Leverages the preferences and It focusses of utilizing the
behaviors of similar users to make features and characteristics
recommendations of items and user
preferences
○ Only Effective with large data on Doesn’t require large data
Leverages the preferences and It focusses of utilizing the
behaviors of similar users to make features and characteristics
recommendations of items and user
preferences
○ Only Effective with large data on Doesn’t require large data
user-item interaction on user-item interaction
Can capture complex user Content-Based Filtering
preferences and unexpected item focuses on matching item
associations features with user
preferences.

• Normality of a dataset:
○ Fact:
§ We check if a data is normally distributed
○ Tests
§ Histogram:
□ From the histogram plot, we can check for symmetric
bell shape
§ Quantile-Quantile plot:
□ plot the observed quantiles (x-axis) against the
theoretical quantiles (y-axis)
□ Look for a straight line between points
§ Shapiro-Wilk Test:
□ W = W statistics
□ Wc = critical value
□ If W > Wc, we reject the null hypothesis of normality.
□ If W ≤ Wc, we fail to reject the null hypothesis of
normality

You might also like