ML Questions Answers
ML Questions Answers
Q.1: What are different performance metrics used for evaluating the performance of machine learning
Performance Metrics:
1. Accuracy:
Measures the fraction of correct predictions out of total predictions.
Accuracy = Correct Predictions / Total Predictions
Suitable for datasets where classes are balanced.
2. Precision:
Indicates the proportion of positive predictions that are actually correct.
Precision = True Positives / (True Positives + False Positives)
Important for cases where false positives are costly (e.g., spam detection).
3. Recall (Sensitivity):
Measures the model's ability to identify all actual positive samples.
Recall = True Positives / (True Positives + False Negatives)
Useful in medical diagnosis where missing a positive case is critical.
4. F1-Score:
Combines precision and recall into a single metric using their harmonic mean.
F1 = 2 * (Precision * Recall) / (Precision + Recall)
Best for imbalanced datasets.
Confusion Matrix:
A 2x2 matrix for binary classification problems:
[ TP FP ]
[ FN TN ]
- True Positives (TP): Correctly predicted positives.
- False Positives (FP): Incorrectly predicted positives.
- True Negatives (TN): Correctly predicted negatives.
- False Negatives (FN): Missed positives (predicted as negative).
Interpretation:
- Helps identify model strengths and weaknesses.
- Example: If false positives are high, precision may be low. If false negatives are high, recall may be p
-------------------------------------------
Q.2: Describe the steps in the gradient descent algorithm and its application in linear regression.
3. Calculate Gradients:
- Derivatives of the loss function w.r.t. parameters.
∂J/∂w = -(2/n) Σ(y_i - y■_i)x_i
4. Update Parameters:
- Adjust w and b using the learning rate (α):
w = w - α * ∂J/∂w, b = b - α * ∂J/∂b
5. Iterate:
- Repeat steps 2–4 until convergence or a predefined number of iterations.
-------------------------------------------
Q.3: What are the hyperparameters in a decision tree algorithm, and how do they affect the model?
Key Hyperparameters:
1. Max Depth:
- Limits the depth of the tree.
- Effect: Controls overfitting (deep trees) and underfitting (shallow trees).
4. Max Features:
- Number of features considered for splitting at each node.
- Effect: Introduces randomness, improving generalization.
5. Criterion:
- Metric for splitting (e.g., Gini Index or Entropy).
- Effect: Determines quality of splits.
-------------------------------------------
Q.4: Describe the steps involved in implementing the KNN algorithm.
Steps:
1. Load Dataset:
- Obtain a labeled dataset with features and target labels.
2. Select k:
- Decide the number of nearest neighbors.
3. Compute Distances:
- Measure the distance between the test point and all training points using metrics like Euclidean dist
4. Identify Neighbors:
- Sort distances and select the k nearest points.
5. Vote or Average:
- For classification: Use majority voting among k neighbors.
- For regression: Use the average value of k neighbors.
6. Predict:
- Assign the final label or value to the test point.
-------------------------------------------
Q.5: Describe the steps involved in implementing the K-Means algorithm.
Steps:
1. Initialize Centroids:
- Randomly assign k initial centroids.
2. Assign Clusters:
- Compute distances between each data point and all centroids.
- Assign each point to the nearest centroid.
3. Update Centroids:
- Calculate the new centroid for each cluster as the mean of all points in that cluster.
4. Repeat:
- Iterate steps 2–3 until centroids stabilize or a maximum number of iterations is reached.
5. Output:
- Final cluster assignments and centroids.
-------------------------------------------
Q.6: Differentiate between supervised and unsupervised learning.