0% found this document useful (0 votes)
37 views11 pages

ML QB Answers

Machine learning question bank for gtu chapter 1 to 4

Uploaded by

ankittiwari4841
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views11 pages

ML QB Answers

Machine learning question bank for gtu chapter 1 to 4

Uploaded by

ankittiwari4841
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

[Note:

1.this doc have not contained all 25 answer.


2. All answers are written in sort points]

3. Write short note on Reinforcement learning.


- type of machine learning where an agent learns to make decisions by
performing actions and minimize errors, maximize reward.
- Trial-end-error approach, agent learns from its experiences and adjusts its
behaviour accordingly
- No need of clear and supervise data. It is unsupervised.
Key Components:
- Agent: The decision-maker
- Environment: world where the agent operates
- State: current situation of the agent
- Action: choices the agent can make
- Reward: signal from the environment that indicates the success or failure
How it Works:
- Initialization: The agent starts
- Action: agent takes an action
- Reward: based on action current state provide reward.
- Repat: Continue Action-Reward till termination condition reach.
Goal: Maximize reward
Examples: Playing games, Robotics, Recommendation sys.

4. Explain Key elements of Machine Learning. Explain various function


approximation methods.
Key Elements:
- Data: The raw material that ML algorithms learn from. (e.g., CSV, images,
text).
- Features: The relevant attributes or characteristics extracted from the
data that are used for learning.
- Algorithm: The mathematical method used to learn patterns from the
data.
- Model: The output of the learning process, which represents the learned
patterns.
- Evaluation: The process of assessing the model's performance on a
separate dataset.
Function Approximation
- core task in ML, goal is to learn a function that can map inputs to outputs.
Linear Models
- Linear Regression: Fits a linear equation to the data.
- Logistic Regression: Used for classification problems, predicts the
probability of belonging to a class.
Non-Linear Models
- Decision Trees: Tree structures, each node represents a test on a feature,
each leaf represents a class or a predicted value.
- Random Forests: combination of decision trees, for improved accuracy.
- Support Vector Machines (SVMs): Find a hyperplane that separates the
data into classes.
- Neural Networks: Complex models inspired by the human brain,
consisting of interconnected nodes (neurons).
- Deep Learning: A subset of neural networks with multiple layers, capable
of learning complex patterns.
- Kernel Methods: Transform the data into a higher-dimensional space to
make it linearly separable.

7. Explain any two important machine learning libraries in python.


1. Scikit-learn:
- Overview: Scikit-learn is a popular open-source Python library for
machine learning. It provides a simple interface for building and training
various machine learning models.
- Key Features:
- Classification: Logistic reg, SVMs, decision trees, random forests,
etc.
- Regression: Linear reg, ridge reg, lasso reg, etc.
- Clustering: K-means, hierarchical clustering, DBSCAN, etc.
- Dimensionality reduction: PCA, t-SNE, etc.
- Model selection: Cross-validation, grid search, hyperparameter
tuning.
- Preprocessing: Data cleaning, normalization, feature scaling, etc.
2. TensorFlow:
- Overview: TensorFlow is a flexible, high-performance open-source
platform for machine learning. It is particularly well-suited for deep
learning tasks.
- Key Features:
- Deep neural networks: Convolutional neural networks (CNNs),
recurrent neural networks (RNNs), etc.
- Tensor manipulation: Efficient operations on multi-dimensional
arrays.
- Automatic differentiation: Automatic calculation of gradients for
optimization.
- Deployment: Deploy models to various platforms, including mobile
devices and servers.
8. Define Followings. Machine Learning Concepts
a. Regression: Regression is a machine learning task that involves predicting a
continuous numerical value. For example, predicting house prices based on
features like size, location, and number of bedrooms.
b. Classification: Classification is a machine learning task that involves
predicting a categorical value. It is used to categorize data into discrete classes.
For example, classifying emails as spam or not spam, or images as cats or dogs.
c. Clustering: task that involves grouping similar data points together. It is used
to discover patterns or structures within the data. For example, clustering
customers based on their purchasing behavior.
d. Training Data: Training data is a dataset used to train a machine learning
model. It consists of input features and corresponding target values.
e. Test Data: Test data is a dataset used to evaluate the performance of a
trained machine learning model.
f. Function Approximation: is the task of learning a function that maps inputs
to outputs. it involves constructing a model that can accurately predict outputs
for new inputs based on the patterns learned from the training data.

g. Overfitting, Underfitting, and Perfect Fit


- Overfitting: A model is said to be overfitting when it performs well on
the training data but poorly on the testing data.
- Underfitting: A model is said to be underfitting when it performs poorly
on both the training and testing data.
- Perfect Fit: A model is said to have a perfect fit when it perfectly predicts
the training data. However, a perfect fit does not guarantee good
performance on new data, as it may indicate overfitting.
h. Cost Function: A cost function is a measure of how well a machine learning
model is performing. It quantifies the error between the model's predictions
and the true values. The goal of training a machine learning model is to
minimize the cost function.
9. Explain the flow diagram of machine learning procedure
- Data Collection: Gather relevant data
- Data Preprocessing: Clean and preprocess the data
- Feature Selection: Select the most relevant features
- Split Data: split in training and testing
- Model Selection: select appropriate model according to data.
- Model Training: train model with training data split
- Model Evaluation: Evolute the performance of model
- Model Deployment: deploy in production env
- Iteration and Improvement: ………

10. Issues in Machine Learning


1. Data Quality: Insufficient, noisy, or biased data can reduce Performance
2. Overfitting and Underfitting: refer Q8(g)
3. Interpretability: complex model like deep neural networks difficult to
interpret and challenging to understand how they make decisions.
4. Scalability: Handling large datasets and complex models can be
computationally expensive.
5. Bias: is leading to unfair decision making and outcomes.
6. Privacy, Ethics: Collecting and using personal data raises privacy
concerns.

11. Types of Data in Machine Learning


1. Numerical Data: Continuous, Discrete
2. Categorical Data: Nominal, Ordinal
3. Text Data: Unstructured, structured
4. Image Data: Structured data that represents visual information.
5. Audio Data: Unstructured data that represents sound information.
6. Time Series Data: Sequential data where observations are recorded at
specific time intervals.
Examples:
- Numerical Data: Age, income, temperature
- Categorical Data: Gender, country, color
- Text Data: Product reviews, news articles, social media posts
- Image Data: Photographs, medical scans, satellite images
- Audio Data: Speech recordings, music files, sound effects
- Time Series Data: Stock prices, temperature readings, sensor data

14. Explain the interpretation and comparison of Box Plot.


Box Plots: graphical representation of the distribution of a dataset. They
provide a summary of the five-number summary: minimum, 1st quartile (Q1),
median (Q2), 3rd quartile (Q3), and maximum.
Components:
- Q1: middle value of the dataset
- Q1 & Q3: values that divide the dataset into four equal parts
- IQR: The difference between Q3 and Q1
- Whiskers: Lines extending to minimum and maximum values
- Outlier: Data points that lie outside of the whiskers
Interpretation:
- Median: median indicates the central tendency of the data.
- IQR: represents the spread of the data. longer box larger
spread, shorter smaller spread.
- Whiskers: indicates the range of the data, excluding outliers.
- Outliers: …….
Comparison: idk

15. Write difference: a. Predictive and Descriptive Model. b. Lasy vs Eager


Learner
Predictive Models:
- Purpose: Predict future outcomes or values based on historical data.
- Focus: Predicting new, unseen data points.
- Examples: Regression models, classification models, time series models.
Descriptive Models:
- Purpose: Understand and summarize existing data.
- Focus: Describing patterns, relationships, and trends in the data.
- Examples: Clustering models, dimensionality reduction techniques.
Key Differences:

Feature Predictive Model Descriptive Model

Purpose Predict future outcomes Describe existing data


Focus New, unseen data Patterns in the data

Regression, classification, Clustering, dimensionality


Examples
time series reduction

Lazy vs. Eager Learners


Lazy Learners:
- Learn on the fly: Do not build a model until they receive new data.
- Store data: Store all training data.
- Prediction: Make predictions based on the similarity between the new
data point and the stored training data.
- Examples: k-nearest neighbors (k-NN), instance-based learning.
Eager Learners:
- Build a model beforehand: Construct a model from the training data
before making predictions.
- Generalize: Learn general patterns from the training data.
- Prediction: Use the learned model to make predictions on new data.
- Examples: Decision trees, neural networks, support vector machines.
Key Differences:

Feature Lazy Learner Eager Learner

Learning On-the-fly Pre-built model

Data
All training data Model parameters
Storage

Prediction Similarity-based Model-based


k-NN, instance-based Decision trees, neural networks,
Examples
learning SVMs

19. Explain K-fold and Leave-one-out cross-validation.


K-Fold and Leave-One-Out Cross-Validation
Cross-validation is a technique used to evaluate the performance of a machine
learning model by splitting the dataset into multiple folds and training the
model on different subsets.
K-Fold Cross-Validation:
Process:
1. Divide the dataset into k equal-sized folds.
2. For each fold:
- Use the remaining k-1 folds for training.
- Use the current fold for testing.
- Evaluate the model's performance on the testing set.
3. Calculate the average performance across all k folds.
- Advantages:
- Provides a more accurate estimate of the model's performance
compared to a single train-test split.
- Can be used for various evaluation metrics.
- Disadvantages:
- Can be computationally expensive for large datasets and large values of
k.
Leave-One-Out Cross-Validation (LOOCV):
 Process:
- Use all but one data point for training and the remaining data
point for testing.
- Repeat this process for each data point in the dataset.
- Calculate the average performance across all iterations.

 Advantages:
- Provides a very accurate estimate of the model's performance,
especially for small datasets.
- No need to split the data into training and testing sets.
 Disadvantages:
- Can be computationally expensive for large datasets.
- May not be as reliable as k-fold cross-validation for larger dataset

[19 to 23 questions are over dosed_ narcotics are danger for health]

24. Explain Silhouette width and its meaning in cluster.


- is a metric used to evaluate the quality of clustering results
- measures how similar a data point is to its own cluster compared to
other clusters
- A higher silhouette data points are well-clustered, lower may not optimal
Calculation:
- Calculate average distance to points in the same cluster
- Calculate average distance to points in the nearest different cluster
- Calculate silhouette coefficient.
Interpretation:
 Silhouette coefficient: A value between -1 and 1.
- 1 The data point is far from the nearest different cluster, indicating
good clustering.
- -1 The data point is closer to the nearest different cluster than its
own cluster, indicating poor clustering.
- 0 The data point is on the boundary between two clusters.

25. Write short note on Ensemble Methods.


Ensemble methods combine multiple machine learning models to improve
overall performance.
Common Ensemble Methods:
1. Bagging (Bootstrap Aggregating):
- Creates multiple models by training them on different bootstrap
samples of the training data.
- Combines the predictions of these models using averaging (for
regression) or voting (for classification).
- Example: Random Forest.
2. Boosting:
- Iteratively trains models, focusing on data points that were
misclassified by previous models.
- Weights the predictions of each model based on its performance.
- Examples: AdaBoost, Gradient Boosting Machine (GBM).
3. Stacking:
- Trains multiple base models on the same data.
- Combines the predictions of these models using a meta-model,
which learns to weigh the predictions of the base models.
Advantages:
- Improve accuracy
- Reduce overfitting
- Increase robustness
Disadvantages
- Computational complex
- Less interpretability

You might also like