0% found this document useful (0 votes)
9 views4 pages

ML Questions Answers

The document outlines various performance metrics for evaluating machine learning models, including accuracy, precision, recall, F1-score, and ROC-AUC. It also describes the gradient descent algorithm for linear regression, hyperparameters in decision trees, and steps for implementing KNN and K-Means algorithms. Additionally, it differentiates between supervised and unsupervised learning, highlighting their characteristics and applications.

Uploaded by

srikanth914
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views4 pages

ML Questions Answers

The document outlines various performance metrics for evaluating machine learning models, including accuracy, precision, recall, F1-score, and ROC-AUC. It also describes the gradient descent algorithm for linear regression, hyperparameters in decision trees, and steps for implementing KNN and K-Means algorithms. Additionally, it differentiates between supervised and unsupervised learning, highlighting their characteristics and applications.

Uploaded by

srikanth914
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Machine Learning Questions and Answers

Q.1: What are different performance metrics used for evaluating the performance of machine learning

Performance Metrics:
1. Accuracy:
Measures the fraction of correct predictions out of total predictions.
Accuracy = Correct Predictions / Total Predictions
Suitable for datasets where classes are balanced.

2. Precision:
Indicates the proportion of positive predictions that are actually correct.
Precision = True Positives / (True Positives + False Positives)
Important for cases where false positives are costly (e.g., spam detection).

3. Recall (Sensitivity):
Measures the model's ability to identify all actual positive samples.
Recall = True Positives / (True Positives + False Negatives)
Useful in medical diagnosis where missing a positive case is critical.

4. F1-Score:
Combines precision and recall into a single metric using their harmonic mean.
F1 = 2 * (Precision * Recall) / (Precision + Recall)
Best for imbalanced datasets.

5. ROC-AUC (Receiver Operating Characteristic - Area Under Curve):


Evaluates the trade-off between true positive rate (recall) and false positive rate.
Higher AUC indicates better performance.

Confusion Matrix:
A 2x2 matrix for binary classification problems:
[ TP FP ]
[ FN TN ]
- True Positives (TP): Correctly predicted positives.
- False Positives (FP): Incorrectly predicted positives.
- True Negatives (TN): Correctly predicted negatives.
- False Negatives (FN): Missed positives (predicted as negative).

Interpretation:
- Helps identify model strengths and weaknesses.
- Example: If false positives are high, precision may be low. If false negatives are high, recall may be p

-------------------------------------------
Q.2: Describe the steps in the gradient descent algorithm and its application in linear regression.

Steps in Gradient Descent:


1. Initialize Parameters:
- Randomly initialize weights (w) and bias (b).

2. Compute the Cost Function:


- Use a loss function like Mean Squared Error (MSE) for regression.
J(w, b) = (1/n) * Σ(y_i - y■_i)^2

3. Calculate Gradients:
- Derivatives of the loss function w.r.t. parameters.
∂J/∂w = -(2/n) Σ(y_i - y■_i)x_i

4. Update Parameters:
- Adjust w and b using the learning rate (α):
w = w - α * ∂J/∂w, b = b - α * ∂J/∂b

5. Iterate:
- Repeat steps 2–4 until convergence or a predefined number of iterations.

Application in Linear Regression:


- Objective: Minimize MSE to find the best-fitting line for a dataset (y = wx + b).
- Gradient descent adjusts w and b iteratively to reduce error.

-------------------------------------------
Q.3: What are the hyperparameters in a decision tree algorithm, and how do they affect the model?

Key Hyperparameters:
1. Max Depth:
- Limits the depth of the tree.
- Effect: Controls overfitting (deep trees) and underfitting (shallow trees).

2. Min Samples Split:


- Minimum number of samples required to split a node.
- Effect: Larger values make splits less frequent, reducing overfitting.

3. Min Samples Leaf:


- Minimum number of samples in a leaf node.
- Effect: Prevents overly small leaves, reducing variance.

4. Max Features:
- Number of features considered for splitting at each node.
- Effect: Introduces randomness, improving generalization.

5. Criterion:
- Metric for splitting (e.g., Gini Index or Entropy).
- Effect: Determines quality of splits.

6. Max Leaf Nodes:


- Limits the total number of leaf nodes.
- Effect: Controls tree size and complexity.

-------------------------------------------
Q.4: Describe the steps involved in implementing the KNN algorithm.

Steps:
1. Load Dataset:
- Obtain a labeled dataset with features and target labels.

2. Select k:
- Decide the number of nearest neighbors.

3. Compute Distances:
- Measure the distance between the test point and all training points using metrics like Euclidean dist

4. Identify Neighbors:
- Sort distances and select the k nearest points.

5. Vote or Average:
- For classification: Use majority voting among k neighbors.
- For regression: Use the average value of k neighbors.

6. Predict:
- Assign the final label or value to the test point.

-------------------------------------------
Q.5: Describe the steps involved in implementing the K-Means algorithm.

Steps:
1. Initialize Centroids:
- Randomly assign k initial centroids.

2. Assign Clusters:
- Compute distances between each data point and all centroids.
- Assign each point to the nearest centroid.

3. Update Centroids:
- Calculate the new centroid for each cluster as the mean of all points in that cluster.

4. Repeat:
- Iterate steps 2–3 until centroids stabilize or a maximum number of iterations is reached.

5. Output:
- Final cluster assignments and centroids.

-------------------------------------------
Q.6: Differentiate between supervised and unsupervised learning.

| Feature | Supervised Learning | Unsupervised Learning |


|------------------------|--------------------------------------|---------------------------------------|
| Input Data | Labeled data | Unlabeled data |
| Goal | Predict outcomes (X → Y) | Identify hidden patterns |
| Algorithms | Linear Regression, SVM, Random Forest | K-Means, PCA, DBSCAN |
| Output | Predictions (classes or values) | Clusters or latent representations |
| Applications | Fraud detection, medical diagnosis | Market segmentation, anomaly detection |

You might also like