0% found this document useful (0 votes)

3 views5 pages

VIVA

VIVA QUES ANS ON ML & ETC

Uploaded by

Anjan Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views5 pages

VIVA

VIVA QUES ANS ON ML & ETC

Uploaded by

Anjan Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

1. What is Bagging (Bootstrap Aggregating)?

● Answer: Bagging involves training multiple models (usually of the same type) on
different subsets of the training data, created by bootstrapping (sampling with
replacement). The final prediction is made by averaging (for regression) or voting (for
classification) across all the models.
● Advantages:
○ Reduces overfitting.
○ Improves model stability and accuracy.
● Disadvantages:
○ Can be computationally expensive.
○ May not perform well on certain types of data (e.g., with high bias).

2. What is Random Forest?

● Answer: Random Forest is an ensemble method using bagging, where many decision
trees are built and their predictions are averaged or voted upon.
● Advantages:
○ Handles large datasets well.
○ Robust to overfitting and noise.
● Disadvantages:
○ Can be computationally intensive.
○ Less interpretable compared to a single decision tree.

3. What is Boosting?

● Answer: Boosting is an ensemble technique that trains models sequentially. Each new
model corrects the errors made by the previous one, focusing on difficult-to-classify
instances.
● Advantages:
○ Often improves performance and accuracy.
○ Can turn weak learners into strong learners.
● Disadvantages:
○ Prone to overfitting, especially with noisy data.
○ Can be computationally expensive due to sequential training.
4. What is an Ensemble of Machine Learning Algorithms (Ensemble
Classifier)?

● Answer: Ensemble methods combine the predictions of multiple models (either the
same type or different types) to improve overall accuracy. Common methods include
bagging, boosting, and stacking.
● Advantages:
○ Can outperform individual models by reducing bias and variance.
○ More robust to overfitting and can handle complex data.
● Disadvantages:
○ More computationally demanding.
○ Difficult to interpret due to multiple models being combined.

5. What is Feature Engineering?

● Answer: Feature engineering is the process of creating, selecting, and transforming

features from raw data to improve model performance. It involves handling missing
values, encoding categorical data, scaling features, and creating new features.
● Advantages:
○ Can significantly improve model accuracy.
○ Provides deeper insights into the data.
● Disadvantages:
○ Time-consuming and requires domain knowledge.
○ Poor feature engineering can lead to worse model performance.

6. What is Exploratory Data Analysis (EDA)?

● Answer: EDA is the process of analyzing data sets to summarize their main
characteristics, often using visual methods (such as histograms, box plots) and statistical
methods (mean, median, variance).
● Advantages:
○ Helps in understanding the underlying data patterns and structures.
○ Identifies potential issues such as outliers, missing values, or correlations.
● Disadvantages:
○ Can be time-consuming.
○ Requires domain knowledge to interpret the results accurately.

7. What is a Transaction Data Model?

● Answer: A transaction data model represents data involving individual transactions or

events. It tracks activities like purchases, sales, or user interactions.
● Advantages:
○ Useful in analyzing customer behavior and transactional data.
○ Helps in applications like market basket analysis.
● Disadvantages:
○ May become large and difficult to manage with many transactions.
○ Can require complex systems to store and analyze.

8. What are Association Rules?

● Answer: Association rules are used to identify relationships between items in transaction
data. The goal is to discover items that frequently occur together.
● Advantages:
○ Useful for market basket analysis and recommendation systems.
○ Helps in identifying patterns and relationships in data.
● Disadvantages:
○ May lead to a large number of trivial rules.
○ Rules can be computationally expensive to generate.

9. What is the Apriori Algorithm?

● Answer: The Apriori algorithm is used to find frequent itemsets in a dataset and
generate association rules based on minimum support and confidence thresholds.
● Advantages:
○ Simple and easy to understand.
○ Can be applied to large datasets with sparse transactions.
● Disadvantages:
○ Can be slow for very large datasets.
○ Needs a lot of memory to store candidate itemsets.

10. What is Dimensionality Reduction?

● Answer: Dimensionality reduction is a technique used to reduce the number of input

variables (features) in a dataset while retaining important information. Common methods
include PCA and LDA.
● Advantages:
○ Reduces computational cost and overfitting.
○ Helps visualize high-dimensional data.
● Disadvantages:
○ Can result in loss of information.
○ May be difficult to interpret if the reduced dimensions don’t have clear meanings.
Here’s a concise list of the steps in Feature Engineering for Machine Learning (ML):

1. Data Collection

● Gather raw data from various sources.

2. Data Cleaning

● Handle missing values (imputation or removal).

● Remove duplicates and fix errors.

3. Feature Transformation

● Scaling/Normalization: Standardize or normalize features to bring them to the same

scale.
● Log Transformation: Apply transformations (e.g., log or square root) to handle skewed
data.

4. Handling Categorical Data

● One-Hot Encoding: Convert categorical variables into binary vectors.

● Label Encoding: Convert categories to numeric labels.

5. Feature Creation

● Generate new features from existing ones (e.g., interaction terms, date components,
aggregations).
● Example: Create new features like price per unit from total price and
quantity.

6. Feature Selection

● Select the most relevant features using techniques like correlation analysis, chi-square
tests, or model-based methods.
● Eliminate irrelevant or redundant features.

7. Outlier Detection and Removal

● Identify and handle outliers using methods like Z-scores or IQR (Interquartile Range).

8. Dimensionality Reduction

● Use techniques like PCA (Principal Component Analysis) to reduce the number of
features while retaining most of the information.
9. Encoding for High Cardinality Features

● Handle features with many unique categories using target encoding or frequency
encoding.

10. Feature Engineering for Time Series Data

● Create time-based features like day of the week, lag features, or moving averages for
time-dependent data.

11. Data Splitting

● Split the data into training, validation, and test sets to evaluate model performance.

These steps help improve the quality of the data and enhance model performance.

Unit - II MLT
No ratings yet
Unit - II MLT
75 pages
PYTHON PROGRAMMING FOR MACHINE LEARNING-220901004 - Compressed
No ratings yet
PYTHON PROGRAMMING FOR MACHINE LEARNING-220901004 - Compressed
6 pages
Lecture 5 - Feature Extraction, Model Building & Evaluation
No ratings yet
Lecture 5 - Feature Extraction, Model Building & Evaluation
35 pages
Machine Learning Qs
No ratings yet
Machine Learning Qs
10 pages
20 Questions On Feature Engineering and Eda
No ratings yet
20 Questions On Feature Engineering and Eda
9 pages
Answer 2022-23
No ratings yet
Answer 2022-23
22 pages
AI-Module 4 - Updated
No ratings yet
AI-Module 4 - Updated
53 pages
Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
No ratings yet
Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
5 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
5 pages
Data Mining University Answer
No ratings yet
Data Mining University Answer
10 pages
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
No ratings yet
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
20 pages
Ds
No ratings yet
Ds
8 pages
PRCV Viva Notes
No ratings yet
PRCV Viva Notes
32 pages
Deep Learning Vocabulary
No ratings yet
Deep Learning Vocabulary
6 pages
Fam Question Bank CT
No ratings yet
Fam Question Bank CT
14 pages
Feature Engineering
No ratings yet
Feature Engineering
11 pages
Supervised Learning Final With Diagrams Cleaned
No ratings yet
Supervised Learning Final With Diagrams Cleaned
7 pages
Unit 6aics
No ratings yet
Unit 6aics
25 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
4 pages
Technical Questions and Answers
No ratings yet
Technical Questions and Answers
12 pages
(A) What Is Machine Learning? Explain The Impact of Various Machine Learning Techniques in Today's World
No ratings yet
(A) What Is Machine Learning? Explain The Impact of Various Machine Learning Techniques in Today's World
6 pages
ML Assignment
No ratings yet
ML Assignment
13 pages
Data Collection
No ratings yet
Data Collection
8 pages
Chương
No ratings yet
Chương
12 pages
Class PPT - Unit2
No ratings yet
Class PPT - Unit2
139 pages
Ass Bigd
No ratings yet
Ass Bigd
9 pages
Ads Imp Qna 2025 15 04 06 06 35
No ratings yet
Ads Imp Qna 2025 15 04 06 06 35
33 pages
PRCV Unit-2
No ratings yet
PRCV Unit-2
24 pages
Machine Learning Most Important Question For Mid Term Ipu University
No ratings yet
Machine Learning Most Important Question For Mid Term Ipu University
36 pages
Intro ML 1 Day
No ratings yet
Intro ML 1 Day
43 pages
NN 7
No ratings yet
NN 7
26 pages
Full ml-2
No ratings yet
Full ml-2
1 page
BML Answer Key
No ratings yet
BML Answer Key
21 pages
Pattern Recognition Unit 2
No ratings yet
Pattern Recognition Unit 2
24 pages
Soil Fertility Prediction - A Machine Learning Approach (Completed)
100% (1)
Soil Fertility Prediction - A Machine Learning Approach (Completed)
50 pages
BBA AC CH Heello Orrss O Off Tteec CH HN NO Ollo OG GYY ( (C CO OM Mppu Utteerr SSC Ciieen NC Cee A AN NDD Een NG Giin Neeeerriin NG G) )
No ratings yet
BBA AC CH Heello Orrss O Off Tteec CH HN NO Ollo OG GYY ( (C CO OM Mppu Utteerr SSC Ciieen NC Cee A AN NDD Een NG Giin Neeeerriin NG G) )
18 pages
Common DS Interview Questions and Answers - 1
No ratings yet
Common DS Interview Questions and Answers - 1
4 pages
Exercise of Chapter 4 - Data Mining Tools and Techniques Worksheet
No ratings yet
Exercise of Chapter 4 - Data Mining Tools and Techniques Worksheet
4 pages
DM Vsaq
No ratings yet
DM Vsaq
8 pages
Detailed 12 Data Mining Answers
No ratings yet
Detailed 12 Data Mining Answers
3 pages
ML Notes All
No ratings yet
ML Notes All
32 pages
MCS224 Dec 2024 Solved
No ratings yet
MCS224 Dec 2024 Solved
22 pages
5 - Unit 2 - Lecture 2-Data Handling
No ratings yet
5 - Unit 2 - Lecture 2-Data Handling
15 pages
AI ML K6rn1i 54 Merged
No ratings yet
AI ML K6rn1i 54 Merged
6 pages
Machine Learning Engineer Interview Preparation Guide
No ratings yet
Machine Learning Engineer Interview Preparation Guide
14 pages
ML DS Interview Quetions
No ratings yet
ML DS Interview Quetions
17 pages
2 Mark Questions
No ratings yet
2 Mark Questions
13 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
15 Mlops Interview Questions For 2025
No ratings yet
15 Mlops Interview Questions For 2025
13 pages
2 Marks
No ratings yet
2 Marks
14 pages
Machine Learning: A Comprehensive Overview
No ratings yet
Machine Learning: A Comprehensive Overview
3 pages
1.what Is Data Cleaning in Rapidminer?
No ratings yet
1.what Is Data Cleaning in Rapidminer?
9 pages
Aiml Model
No ratings yet
Aiml Model
13 pages
MLT-CAT2-Question Bank Part 2
No ratings yet
MLT-CAT2-Question Bank Part 2
27 pages
Predicting Cricket Match 490021 1 en
No ratings yet
Predicting Cricket Match 490021 1 en
13 pages
Machine Learning Viva Questions
No ratings yet
Machine Learning Viva Questions
6 pages
Dsur Ea2352001010391 W7
No ratings yet
Dsur Ea2352001010391 W7
3 pages
Data Mining Question Bank 3,4,5
No ratings yet
Data Mining Question Bank 3,4,5
7 pages
Machine Learning in Advanced Python
No ratings yet
Machine Learning in Advanced Python
7 pages
Performance Enhancement of Underwater Acoustic Communication Using Deep Learning Approach
100% (1)
Performance Enhancement of Underwater Acoustic Communication Using Deep Learning Approach
11 pages
Machine Learning One Mark Answers
No ratings yet
Machine Learning One Mark Answers
4 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
Neewee Intro Solution 2021-04-06 HCH
No ratings yet
Neewee Intro Solution 2021-04-06 HCH
79 pages
Major Final Report Kartik
No ratings yet
Major Final Report Kartik
53 pages
Question For Interview Machine Leaning Part
No ratings yet
Question For Interview Machine Leaning Part
2 pages
SR 11-7, Validation and Machine Learning Models
100% (1)
SR 11-7, Validation and Machine Learning Models
31 pages
Sustainability 13 04120
No ratings yet
Sustainability 13 04120
16 pages
Aadarsha ML STW
No ratings yet
Aadarsha ML STW
35 pages
Report Diabetics
No ratings yet
Report Diabetics
8 pages
Heart Disease Prediction Report
No ratings yet
Heart Disease Prediction Report
60 pages
Goutham Resume
No ratings yet
Goutham Resume
2 pages
R07 Machine Learning - Answers
No ratings yet
R07 Machine Learning - Answers
6 pages
Software Engineering Class Notes
No ratings yet
Software Engineering Class Notes
27 pages
Paper 2
No ratings yet
Paper 2
6 pages
DocScanner 27-Sep-2024 09-09 AM
No ratings yet
DocScanner 27-Sep-2024 09-09 AM
5 pages
Rineng S 25 00942
No ratings yet
Rineng S 25 00942
55 pages
Prediction of Diabetes Using Machine Learning Techniques
No ratings yet
Prediction of Diabetes Using Machine Learning Techniques
10 pages
Sri Ram Project Phase 1 Report
No ratings yet
Sri Ram Project Phase 1 Report
36 pages
DocScanner 03-Oct-2024 09-24 PM
No ratings yet
DocScanner 03-Oct-2024 09-24 PM
7 pages
Stress Detection Report
No ratings yet
Stress Detection Report
11 pages
Adbms Aditi
No ratings yet
Adbms Aditi
24 pages
Machine Learning Sec-A
No ratings yet
Machine Learning Sec-A
22 pages
Machine Learning-Based Prediction of Parking Space
No ratings yet
Machine Learning-Based Prediction of Parking Space
16 pages
Prediction of Cervical Cancer From Behavior Risk Using Machine Learning Techniques
No ratings yet
Prediction of Cervical Cancer From Behavior Risk Using Machine Learning Techniques
10 pages
Cricket Players Performance Prediction and Evaluation Using Machine Learning Algorithms
No ratings yet
Cricket Players Performance Prediction and Evaluation Using Machine Learning Algorithms
7 pages
DSCI 6003 Class Notes
No ratings yet
DSCI 6003 Class Notes
7 pages
Ijase 202503 22 1 003
No ratings yet
Ijase 202503 22 1 003
15 pages
Prediction of Heart Disease Using Machine Learning and Hybrid Methods
No ratings yet
Prediction of Heart Disease Using Machine Learning and Hybrid Methods
7 pages
SVM Illustration
No ratings yet
SVM Illustration
7 pages
Bundled Auto Secure Private Car Policy (1 Year Term For Own Damage & 3 Years For Third Party)
No ratings yet
Bundled Auto Secure Private Car Policy (1 Year Term For Own Damage & 3 Years For Third Party)
7 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
DocScanner 09-Sep-2024 02-04 AM
No ratings yet
DocScanner 09-Sep-2024 02-04 AM
4 pages
Journal Review Assignment 0
No ratings yet
Journal Review Assignment 0
3 pages
Resume 7
No ratings yet
Resume 7
1 page
K Means Illustration Colab
No ratings yet
K Means Illustration Colab
5 pages
KNN Colab Illustration
No ratings yet
KNN Colab Illustration
5 pages
KNN Numerical
No ratings yet
KNN Numerical
4 pages
FCFS Scheduling - 2
No ratings yet
FCFS Scheduling - 2
3 pages
Machine Learning Methods For Surgery Cancellation
No ratings yet
Machine Learning Methods For Surgery Cancellation
4 pages

VIVA

Uploaded by

VIVA

Uploaded by

1. What is Bagging (Bootstrap Aggregating)?

2. What is Random Forest?

5. What is Feature Engineering?

● Answer: Feature engineering is the process of creating, selecting, and transforming

6. What is Exploratory Data Analysis (EDA)?

7. What is a Transaction Data Model?

● Answer: A transaction data model represents data involving individual transactions or

8. What are Association Rules?

9. What is the Apriori Algorithm?

10. What is Dimensionality Reduction?

● Answer: Dimensionality reduction is a technique used to reduce the number of input

● Gather raw data from various sources.

● Handle missing values (imputation or removal).

● Scaling/Normalization: Standardize or normalize features to bring them to the same

4. Handling Categorical Data

● One-Hot Encoding: Convert categorical variables into binary vectors.

7. Outlier Detection and Removal

10. Feature Engineering for Time Series Data

11. Data Splitting

You might also like