0% found this document useful (0 votes)

16 views10 pages

ML Interview Questions

Uploaded by

Mitesh Waghe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views10 pages

ML Interview Questions

Uploaded by

Mitesh Waghe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 10

21) Over fitting & under fitting:

- Over fitting: Model performs well on training data but poorly on unseen data.
- Under fitting: Model performs poorly on both training and test data, failing
to capture the underlying patterns.

- Use in Project: You monitored model performance metrics to prevent overfitting

and underfitting during model training in your machine learning projects.
- Code Example:

# Example metrics
training_error = 0.1 # Example training error
validation_error = 0.3 # Example validation error

if training_error < 0.1 and validation_error > 0.2:

print("Model is overfitting")

Overfitting and Underfitting are common problems in machine learning (ML)

models. Understanding these issues and how to address them is crucial for building
effective models.

1. Overfitting:
Overfitting happens when the model learns not only the underlying pattern but
also the noise and details of the training data. This leads to poor generalization
on new, unseen data.

# Signs of Overfitting:
- High accuracy on the training data but low accuracy on the test data.
- A large gap between training and validation errors.

# Causes:
- A model that is too complex (e.g., too many features or too many
parameters).
- Insufficient training data relative to the model complexity.

# How to address Overfitting:

- Simplify the model: Use a less complex model with fewer parameters.
- Regularization: Add regularization terms like L1 (Lasso) or L2 (Ridge) to
penalize large coefficients.
- More data: If possible, increasing the size of the training data can help.
- Cross-validation: Use k-fold cross-validation to ensure the model
generalizes well.
- Early stopping: When training neural networks, stop training when
performance on validation data starts to degrade.

# Code Example:
Here’s an example using Ridge regression to address overfitting.

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.datasets import make_regression

# Generating synthetic data

X, y = make_regression(n_samples=1000, n_features=20, noise=0.1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Ridge regression (L2 regularization)
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train, y_train)

# Predictions
train_pred = ridge_model.predict(X_train)
test_pred = ridge_model.predict(X_test)

# Mean Squared Error (MSE)

print(f'Train MSE: {mean_squared_error(y_train, train_pred)}')
print(f'Test MSE: {mean_squared_error(y_test, test_pred)}')

In this case, Ridge regression helps control overfitting by adding a penalty

to large coefficients.

2. Underfitting:
Underfitting occurs when the model is too simple to capture the underlying
patterns in the data. It results in poor performance on both the training and test
datasets.

# Signs of Underfitting:
- High error on both the training and validation/test datasets.
- The model fails to capture the complexity of the data.

# Causes:
- A model that is too simple (e.g., too few features or low model
complexity).
- Insufficient training time or iterations (in neural networks).

# How to address Underfitting:

- Increase model complexity: Use a more complex model with more features or
layers (in neural networks).
- Train for longer: If you're using neural networks, train for more epochs.
- Feature engineering: Add more relevant features or use more powerful
transformations.
- Remove regularization: If you're using regularization, consider reducing it
or turning it off.

# Code Example:
Here’s an example using a polynomial regression to solve underfitting.

from sklearn.preprocessing import PolynomialFeatures

from sklearn.linear_model import LinearRegression

# Generating synthetic data

X, y = make_regression(n_samples=1000, n_features=2, noise=30)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Polynomial transformation (increase model complexity)

poly = PolynomialFeatures(degree=3) # Trying polynomial of degree 3
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

# Linear regression on the transformed data

model = LinearRegression()
model.fit(X_train_poly, y_train)
# Predictions
train_pred = model.predict(X_train_poly)
test_pred = model.predict(X_test_poly)

# Mean Squared Error (MSE)

print(f'Train MSE: {mean_squared_error(y_train, train_pred)}')
print(f'Test MSE: {mean_squared_error(y_test, test_pred)}')

In this case, using a polynomial transformation helps the model capture more
complex patterns and reduce underfitting.

Summary:
- Overfitting: Model performs well on training data but poorly on test data.
- Fixes: Regularization, simpler models, more data, cross-validation.
- Underfitting: Model performs poorly on both training and test data.
- Fixes: Increase model complexity, train longer, better feature
engineering.

These adjustments can significantly improve model performance and ensure

better generalization to unseen data.

Overfitting & Underfitting

- Use in Project: You monitored model performance metrics to prevent
overfitting and underfitting during model training in your machine learning
projects.
- Code Example:

# Example metrics
training_error = 0.1 # Example training error
validation_error = 0.3 # Example validation error

if training_error < 0.1 and validation_error > 0.2:

print("Model is overfitting")

2) Loss & Cost Functions:

In machine learning, **loss** is the error or difference between predicted
and actual values for a single data point, while **cost** is the average of these
losses across all data points in a dataset.

- Loss Function: Measures how well a model's predictions match the actual
outcomes (error for one training example).
- Cost Function: The average of loss functions across all training examples.
- Use in Project: You used loss and cost functions to evaluate the performance
of your machine learning models during training.
- Code Example:

import numpy as np

# Mean Squared Error as a loss function

def mean_squared_error(y_true, y_pred):
return np.mean((y_true - y_pred) 2)

# Example usage
y_true = np.array([1, 2, 3])
y_pred = np.array([1, 2, 4])
print(mean_squared_error(y_true, y_pred)) # Output: 0.333...
- Use in Project: You used loss and cost functions to evaluate the
performance of your machine learning models during training.
- Code Example:

import numpy as np

# Mean Squared Error as a loss function

def mean_squared_error(y_true, y_pred):
return np.mean((y_true - y_pred) 2)

# Example usage
y_true = np.array([1, 2, 3])
y_pred = np.array([1, 2, 4])
print(mean_squared_error(y_true, y_pred)) # Output: 0.333...

3) Regression Models:
- Models used to predict a continuous outcome based on independent variables
(e.g., linear regression, logistic regression).

4) Encoding types in Machine learning?

In machine learning, encoding techniques are essential for converting
categorical data into numerical formats that algorithms can work with. Here are
some common types of encoding methods:

1. Label Encoding
- Description: Each category is assigned a unique integer label. This
method is suitable for ordinal data where the categories have an inherent order.
- Example:
- Colors: Red = 0, Green = 1, Blue = 2

2. One-Hot Encoding
- Description: Converts each category into a new binary column (0s and
1s). This method is useful for nominal data, where categories do not have an order.
- Example:
- Colors: Red = [1, 0, 0], Green = [0, 1, 0], Blue = [0, 0, 1]

3. Binary Encoding
- Description: Each category is first converted into an integer, then
that integer is converted into binary code. This method is efficient for high
cardinality categorical variables.
- Example:
- Colors: Red = 0 (00), Green = 1 (01), Blue = 2 (10) →
- Red = [0, 0], Green = [0, 1], Blue = [1, 0]

4. Frequency Encoding
- Description: Each category is replaced with the frequency of its
occurrence in the dataset. This can help retain some information about the
distribution of categories.
- Example:
- If Red appears 10 times, Green 5 times, and Blue 15 times:
- Red = 10, Green = 5, Blue = 15

5. Target Encoding (Mean Encoding)

- Description: Categories are replaced with the mean of the target
variable for each category. This method can be powerful but may lead to overfitting
if not handled properly.
- Example:
- If Red has an average target value of 2, Green 3, and Blue 1:
- Red = 2, Green = 3, Blue = 1

6. Ordinal Encoding
- Description: Similar to label encoding, but it specifically considers
the ordinal nature of the categorical data. Each category is assigned a rank based
on its order.
- Example:
- Sizes: Small = 1, Medium = 2, Large = 3

7. Hash Encoding
- Description: Categories are hashed into a fixed number of dimensions,
which helps handle high cardinality without creating too many columns. However, it
may lead to collisions.
- Example:
- Colors: Using a hash function that generates a fixed number of binary
columns.

8. Count Encoding
- Description: Similar to frequency encoding, but the categories are
replaced with their count in the dataset.
- Example:
- Red = 10, Green = 5, Blue = 15

9. Custom Encoding
- Description: Users can define their encoding scheme based on domain
knowledge, often combining several of the above methods.
- Example: Assigning specific numerical values based on business logic.

When to Use Each Encoding

- Label Encoding: Use for ordinal variables where the order matters.
- One-Hot Encoding: Use for nominal variables with no intrinsic
ordering.
- Binary Encoding: Use for high cardinality categorical variables to
save space.
- Frequency and Target Encoding: Use when you want to retain some
information about the distribution of the categorical variable, but be cautious of
overfitting.

These encoding techniques play a crucial role in preparing data for machine
learning models, helping ensure that algorithms can effectively learn from the data
provided. Let me know if you need more details on any specific encoding method!

5) What is bagging and boosting in Machine learning?

Boosting and Bagging are both ensemble learning techniques used to improve
the performance of machine learning models by combining multiple weaker models
(often called base learners) into a stronger one. Here’s a quick overview:

Bagging (Bootstrap Aggregating):

- Goal: Reduce variance and avoid overfitting.
- Method: Multiple models (often decision trees) are trained
independently on different random subsets of the dataset (with replacement). The
final prediction is made by averaging (for regression) or majority voting (for
classification).
- Popular Algorithm: Random Forest.
- Key Benefit: Stabilizes model by reducing variance.

Boosting:
- Goal: Reduce bias(error) and create a strong model by focusing on
mistakes.
- Method: Models are trained sequentially, with each new model focusing
on correcting the errors made by the previous ones. The final prediction is a
weighted combination of all models.
- Popular Algorithms: AdaBoost, Gradient Boosting, XGBoost.
- Key Benefit: Increases accuracy by focusing on the hardest-to-predict
examples.

In short, bagging builds independent models in parallel to reduce variance,

while boosting builds models sequentially to reduce bias and improve accuracy.

6) what is hyperparameter tuning?

Hyperparameter tuning is the process of optimizing the hyperparameters of a
machine learning model to improve its performance. Hyperparameters are parameters
that govern the training process or the structure of the model itself, and unlike
model parameters, they are not learned directly from the data. Instead, they are
set before training begins. Examples include the learning rate, number of hidden
layers in a neural network, tree depth in decision trees, and the number of
neighbors in a K-Nearest Neighbors model.

The goal of hyperparameter tuning is to find the optimal combination of these

hyperparameters that minimizes a model's error on validation data, helping achieve
the best generalization on new, unseen data. Tuning methods often include:

1. Grid Search: Tests all possible combinations of hyperparameters within a

specified range.
2. Random Search: Randomly samples hyperparameter combinations within a
specified range, which can be more efficient than grid search.
3. Bayesian Optimization: Uses probabilistic models to predict the
performance of different hyperparameter combinations, focusing on promising areas
in the search space.
4. Automated Methods: Techniques like Hyperband or Optuna that adaptively
adjust hyperparameters as the model trains.

Effective hyperparameter tuning can significantly improve model accuracy and

reduce overfitting or underfitting.

7) mean-->average of value-->Numerical, median--> middle value-->Numerical, mode-->

repatative values--> categorical

---
8) percentage-->
The percentage is a way of expressing a number as a fraction of 100. It’s
used to compare relative proportions or to quantify parts of a whole in a
standardized way. For example, 40% means 40 out of every 100.

---
9) varience-->
Variance measures the spread of data points around the mean in a dataset.
It’s calculated as the average of the squared deviations from the mean and gives
insight into data variability. A higher variance indicates more spread-out data.

---
10) standard deviation-->
Standard deviation is the square root of the variance. It quantifies the
amount of variation or dispersion in a dataset, showing how much individual data
points typically deviate from the mean.

---
11) Hypothesis testing-->
Hypothesis testing is a statistical method used to determine if there is
enough evidence in a sample to infer a condition about the larger population. It
involves making an initial assumption (null hypothesis), collecting data, and then
determining whether the data provides enough reason to reject this assumption.

---
12) Type of hypothesis testing--
Common types include:
- T-tests: Compare means between groups (e.g., two-sample t-test).
- ANOVA (Analysis of Variance): Compare means among three or more groups.
- Chi-Square Test: Test relationships between categorical variables.
- Z-tests: Often used for large samples to compare means.
- Non-parametric Tests: Tests like the Mann-Whitney or Wilcoxon for non-
normal data.

---
13) when we do hypothesis testing and why we do it
Hypothesis testing is used to make informed decisions based on sample data.
It helps to confirm or refute assumptions about population parameters and determine
statistical significance. For example, testing if a new treatment is effective or
if there's a relationship between two variables.

---
14) difference in SQL and pandas which one is used to which purposes in data
scientist roles/ projects?
- SQL: Used for querying and managing structured data in relational
databases. It's essential for extracting, filtering, and aggregating large datasets
directly from databases.
- Pandas: A Python library for data manipulation and analysis, ideal for in-
memory data processing. It’s commonly used for complex data transformations,
analysis, and feature engineering in data science workflows.
Data scientists use SQL for database operations and Pandas for more detailed
data analysis and processing.

---
15) what is time series
A time series is a sequence of data points collected or recorded at regular
time intervals. Examples include stock prices, weather data, and sales figures over
time. Time series analysis helps identify trends, seasonality, and patterns in
data.

---
16) random oversampling and udersampling?
- Oversampling: Increases the representation of minority classes by
duplicating instances to balance the class distribution in datasets.
- Undersampling: Reduces the representation of majority classes by removing
instances to balance class distribution.
Both techniques are used to address class imbalance in classification
problems.

---
17) de duplication in random forest?
Random Forest is inherently resilient to duplicates since it’s an ensemble
method that averages results from many decision trees, each using random samples of
data. However, pre-processing steps like removing duplicates can help improve model
performance and efficiency in some cases.

18) differences?
1) difference between random forest and support vector machine(SVM)?

RANDOM FOREST SUPPORT VECTOR

MACHINE
1) large dataset works well 1) works well on
small or mediam dataset
2) captures complex non liner patterns 2) effective for
liner seprable data
3) parallel training of decision trees for efficiency 3) traning may be
slow for large dataset
4) multiple decision trees 4) single model

2) difference between random forest and logistic regession?

RANDOM FOREST LOGISTIC

REGRESSION
1) it is suitable for both classification and regession 1) it
suitable for only binary classification
2) makes prediction based on an ensemble of decision tree 2) makes
prediction based on logistic function
3) can handle missing value outliers and independant and 3) assume
liner relationships between the non-linear relationships
dependant variables
4) more accurate and robust campared to individual decision tree 4) less
accurate and robust campare to random forest
5) can handle large amount of data efficiently 5) can
not handle large amount of data efficiently
6) can be time consuming for traning 6) quick
to train compared to random forest

3) difference between random forest and xgboost?

Random Forest
XGBoost
1)Model Building 1)Ensemble learning using independently built 1)Sequential
ensemble learning with trees
decision trees.
correcting errors of previous ones.
2)Optimization 2)Makes predictions by averaging individual 2)Employs
gradient boosting to minimize a loss
Approach tree outputs.
function and improve accuracy iteratively.
3)Handling Unbalanced 3)Can struggle a bit 3)Handles
it like a pro
Datasets
4)Ease of Tuning 4)Simple and straightforward 4)Requires
more practice but offers higher

accuracy
5)Adaptability to 5)Works well with multiple machines
5)Needs more coordination but can handle large Distributed Computing
datasets efficiently
6)Handling Large 6)Can handle them but may slow down with very 6)Built
for speed, perfect for big Datasets large data
datasets

7)Predictive Accuracy 7)Good, but not always the most precise 7)Superior
accuracy, especially in tough

situations
4) difference between liner and logistic regression?
LINEAR REGRESSION(Regression)
LOGISTIC REGRESSION(Classification)
1)Linear Regression is a supervised regression model. 1)Logistic
Regression is a supervised classification

model.

2)In Linear Regression, we predict the value by an integer 2)In Logistic

Regression, we predict the value by 1 or
number. 0.
3)Here no threshold value is needed. 3)Here a
threshold value is added.
4)Here we calculate Root Mean Square Error(RMSE) to predict 4)Here we use
precision to predict the next weight
the next weight value. value.
5)It is based on the least square estimation. 5)It is based
on maximum likelihood estimation.
6)Linear regression is used to estimate the dependent 6)Whereas
logistic regression is used to calculate the
variable in case of a change in independent variables. probability
of an event.
For example, predict the price of houses. For example,
classify if tissue is benign or

malignant.
19) Assumptions in linear regression?
1. there is relation in between dependant and independent variables
2. bias is very low
3. coliearnity

20) what is a activation funcation used in for logistic function to decide

thershold value.(sigmoidal or relu)
sigmoidal is based upon probablity

21) How to create logistic funcation?

22)ensemble techniques?
1)Bagging ---random forest
2)Boosting ---XGBoost

19) Gini index in random forest?

The Gini index (or Gini impurity) is a measure of the “impurity” or diversity
of data in decision trees, which are used in Random Forest models. It quantifies
how often a randomly chosen element from the set would be incorrectly labeled if it
were randomly labeled according to the distribution of labels in the set.

For a binary classification, the Gini index for a node is calculated as:

text{Gini} = 1 - (p^2 + q^2)

where:
(p) is the probability of one class,
(q) is the probability of the other class (for binary classes, ( q = 1 -
p )).

### How Gini Index is Used in Random Forest

In a Random Forest, Gini impurity helps select the best feature and split at
each node of each decision tree. During training, the algorithm evaluates potential
splits by calculating the Gini index for each feature, and the split that best
reduces impurity (lowest Gini index) is chosen. Lower Gini values mean purer nodes,
where each node ideally has samples from only one class, contributing to the
overall accuracy and stability of the Random Forest model.

Q) what is mean by entropy? how will you find the entropy of column of
dataset?------->Code
degree of Disorderness

Q) CART Algoritham?

Vortex Energy
100% (1)
Vortex Energy
15 pages
Ai Unit-I
No ratings yet
Ai Unit-I
41 pages
Mastering The Basics of Machine Learning
No ratings yet
Mastering The Basics of Machine Learning
65 pages
机器学习
No ratings yet
机器学习
41 pages
22K61A0654 2 Sasi Auto
No ratings yet
22K61A0654 2 Sasi Auto
24 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
116 pages
100 Days of Machine Learning
No ratings yet
100 Days of Machine Learning
14 pages
DSA Module 3 Notes
No ratings yet
DSA Module 3 Notes
22 pages
6.classification & Regression
No ratings yet
6.classification & Regression
45 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Linear Regression Summary
No ratings yet
Linear Regression Summary
57 pages
Udacity Machine Learning Analysis Supervised Learning
100% (1)
Udacity Machine Learning Analysis Supervised Learning
504 pages
Slides On DataI
No ratings yet
Slides On DataI
33 pages
ML Functions
No ratings yet
ML Functions
12 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
NNDL Notes
No ratings yet
NNDL Notes
73 pages
DA Programs
No ratings yet
DA Programs
44 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Lecture 5
No ratings yet
Lecture 5
26 pages
Nurulnazifarais Isp542 Part A
No ratings yet
Nurulnazifarais Isp542 Part A
3 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
64 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
Overfitting & Feature Engineering
No ratings yet
Overfitting & Feature Engineering
37 pages
Ai Lab
No ratings yet
Ai Lab
19 pages
Machine Learning Basics Understanding Overfitting and Underfitting
No ratings yet
Machine Learning Basics Understanding Overfitting and Underfitting
11 pages
Underfitting and Overfitting
No ratings yet
Underfitting and Overfitting
4 pages
AI Chapter 1
No ratings yet
AI Chapter 1
26 pages
ML 1
No ratings yet
ML 1
24 pages
CP4252 Lab Manual
No ratings yet
CP4252 Lab Manual
13 pages
Modules English Computer Science
No ratings yet
Modules English Computer Science
11 pages
DL Unit2
No ratings yet
DL Unit2
22 pages
CH 1
No ratings yet
CH 1
16 pages
DSA Module 3
No ratings yet
DSA Module 3
30 pages
ML
No ratings yet
ML
17 pages
Assignment DL
No ratings yet
Assignment DL
20 pages
Regression
No ratings yet
Regression
24 pages
Xiiaiuniticapstone Projectpartii
No ratings yet
Xiiaiuniticapstone Projectpartii
11 pages
SML
No ratings yet
SML
8 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
Index: Name - JINESH PRAJAPAT Class - B. Tech, III Year Branch - AI & DS Sem - V
No ratings yet
Index: Name - JINESH PRAJAPAT Class - B. Tech, III Year Branch - AI & DS Sem - V
35 pages
ML Solved Endsem
No ratings yet
ML Solved Endsem
16 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
130 pages
ML Record Print
No ratings yet
ML Record Print
20 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
Unit 2
No ratings yet
Unit 2
23 pages
Pi Is 0022391321002729
No ratings yet
Pi Is 0022391321002729
8 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
63 pages
Understanding Overfitting, Underfitting, Oversampling, and SMOTE in Machine Learning
No ratings yet
Understanding Overfitting, Underfitting, Oversampling, and SMOTE in Machine Learning
9 pages
Python Learning
No ratings yet
Python Learning
21 pages
ML Labs
No ratings yet
ML Labs
46 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Artificial Intelligence-2
No ratings yet
Artificial Intelligence-2
12 pages
Choosing Model and Tuning
No ratings yet
Choosing Model and Tuning
20 pages
ML 3 & 4 Notes
No ratings yet
ML 3 & 4 Notes
18 pages
ML Remaining
No ratings yet
ML Remaining
17 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Lab Manual 05
No ratings yet
Lab Manual 05
13 pages
Supervised Regression Notes
No ratings yet
Supervised Regression Notes
11 pages
Classification Review
No ratings yet
Classification Review
8 pages
05 E RandomForest LoanData
No ratings yet
05 E RandomForest LoanData
8 pages
MSC Programs
No ratings yet
MSC Programs
78 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
DM Mica 3.0
No ratings yet
DM Mica 3.0
21 pages
ML
No ratings yet
ML
8 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
DrlalPathlab Report 28.07.2022
No ratings yet
DrlalPathlab Report 28.07.2022
30 pages
Handwritten Character Recognition Using Machine Learning Methods PDF
No ratings yet
Handwritten Character Recognition Using Machine Learning Methods PDF
52 pages
Wa0002.
No ratings yet
Wa0002.
5 pages
Ai in Coaching
No ratings yet
Ai in Coaching
20 pages
Plan Stratégico Indra
No ratings yet
Plan Stratégico Indra
55 pages
What I Learned From Looking at 900 Most Popular Open Source AI Tools
No ratings yet
What I Learned From Looking at 900 Most Popular Open Source AI Tools
14 pages
Afacan-Machine Learning Techniques in Analog - RF Integrated Circuit Design, Synthesis, Layout, and test-NA
No ratings yet
Afacan-Machine Learning Techniques in Analog - RF Integrated Circuit Design, Synthesis, Layout, and test-NA
25 pages
The Prospects of The IT Industry
No ratings yet
The Prospects of The IT Industry
8 pages
AI As A Tool To Diminish Legal Inequality
No ratings yet
AI As A Tool To Diminish Legal Inequality
2 pages
Research Paper
No ratings yet
Research Paper
17 pages
Question Bank - Module 4
No ratings yet
Question Bank - Module 4
3 pages
6th International Conference On Advances in Artificial Intelligence Techniques (ArIT 2025)
No ratings yet
6th International Conference On Advances in Artificial Intelligence Techniques (ArIT 2025)
2 pages
The Dread of Ai Replacement of Humans Represented
No ratings yet
The Dread of Ai Replacement of Humans Represented
15 pages
CNN-Based Analysis of Ultrasound Images For PCOS Diagnosis
No ratings yet
CNN-Based Analysis of Ultrasound Images For PCOS Diagnosis
4 pages
Us Cons Automation in On Boarding and Ongoing Servicing of Commercial Banking Clients
No ratings yet
Us Cons Automation in On Boarding and Ongoing Servicing of Commercial Banking Clients
12 pages
Chapter10 - AI - SOCIAL IMPACT
No ratings yet
Chapter10 - AI - SOCIAL IMPACT
6 pages
2019 Law PG Courses
No ratings yet
2019 Law PG Courses
4 pages
Prompt Engr - Module 2
No ratings yet
Prompt Engr - Module 2
4 pages
Cientista de Dados - Curso
No ratings yet
Cientista de Dados - Curso
1 page
CV 2025031313331978
No ratings yet
CV 2025031313331978
2 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet

ML Interview Questions

Uploaded by

ML Interview Questions

Uploaded by

21) Over fitting & under fitting:

- Use in Project: You monitored model performance metrics to prevent overfitting

if training_error < 0.1 and validation_error > 0.2:

Overfitting and Underfitting are common problems in machine learning (ML)

# How to address Overfitting:

# Generating synthetic data

# Mean Squared Error (MSE)

In this case, Ridge regression helps control overfitting by adding a penalty

# How to address Underfitting:

from sklearn.preprocessing import PolynomialFeatures

# Generating synthetic data

# Polynomial transformation (increase model complexity)

# Linear regression on the transformed data

# Mean Squared Error (MSE)

These adjustments can significantly improve model performance and ensure

Overfitting & Underfitting

if training_error < 0.1 and validation_error > 0.2:

2) Loss & Cost Functions:

# Mean Squared Error as a loss function

# Mean Squared Error as a loss function

4) Encoding types in Machine learning?

5. Target Encoding (Mean Encoding)

When to Use Each Encoding

5) What is bagging and boosting in Machine learning?

Bagging (Bootstrap Aggregating):

In short, bagging builds independent models in parallel to reduce variance,

6) what is hyperparameter tuning?

The goal of hyperparameter tuning is to find the optimal combination of these

1. Grid Search: Tests all possible combinations of hyperparameters within a

Effective hyperparameter tuning can significantly improve model accuracy and

7) mean-->average of value-->Numerical, median--> middle value-->Numerical, mode-->

RANDOM FOREST SUPPORT VECTOR

2) difference between random forest and logistic regession?

RANDOM FOREST LOGISTIC

3) difference between random forest and xgboost?

2)In Linear Regression, we predict the value by an integer 2)In Logistic

20) what is a activation funcation used in for logistic function to decide

21) How to create logistic funcation?

19) Gini index in random forest?

text{Gini} = 1 - (p^2 + q^2)

### How Gini Index is Used in Random Forest

You might also like