0% found this document useful (0 votes)
10 views4 pages

Machine Learning Lab Assignment: Instructions

Uploaded by

Kanik Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views4 pages

Machine Learning Lab Assignment: Instructions

Uploaded by

Kanik Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Machine Learning Lab Assignment

Author : Lab Author Date : 2024-11-25

Instructions

This lab assignment consists of multiple tasks aimed at applying and understanding
concepts from the provided slides. Complete the tasks using Python and libraries such
as NumPy, pandas, matplotlib, and scikit-learn. Include plots, metrics, and
explanations for your findings. Submit your code and a report summarizing your
results.

Task 1: Debugging Regularized Linear Regression

Objective: Explore the effects of regularization in linear regression and debug common
issues.

Steps: 1. Dataset Preparation: - Create a synthetic dataset with features X and target y
using a polynomial function (e.g., y = 3x^2 + 2x + 1 + ε, where ε is Gaussian noise). -
Split the dataset into 70% training and 30% test sets.

1. Regularized Linear Regression:

◦ Implement or use Ridge Regression (from scikit-learn) to train the model.

◦ Experiment with different values of the regularization parameter λ (e.g., λ


= 0, 0.1, 1, 10, 100).

◦ Plot the training error and test error as a function of λ.

2. Analysis:

◦ Identify underfitting and overfitting regions from the plot.

◦ Discuss how λ affects the weights w_j and the model’s ability to generalize.

Task 2: Training, Validation, and Test Set Splits

Objective: Understand the importance of data splits in model evaluation.

Steps: 1. Dataset Splitting: - Use the dataset provided in the slides: Size (sq ft): [2104,
1600, 2400, 1416, 3000, 1985, 1534, 1427, 1380, 1494] Price (k$): [400, 330, 369,
232, 540, 300, 315, 199, 212, 243] - Split the dataset into: - 60% training set - 20%
validation set - 20% test set - Display the resulting splits.

1. Linear Regression Model:

◦ Train a linear regression model on the training set.

◦ Compute the Mean Squared Error (MSE) on the training, validation, and test
sets.

◦ Compare the errors across the three subsets.

2. Analysis:

◦ Discuss why the validation set is critical for model selection.

◦ Explain the significance of keeping the test set separate from training and
validation.

Task 3: Model Selection with Polynomial Regression

Objective: Select the best polynomial model for a given dataset using validation error.

Steps: 1. Dataset Preparation: - Use the same dataset from Task 2 or generate a
synthetic dataset with non-linear patterns.

1. Polynomial Regression:

◦ Train polynomial regression models of degree d = 1, 2, ..., 10.

◦ Compute the training error and validation error for each degree d.

2. Visualization:

◦ Plot the training error and validation error as a function of d.

◦ Identify the degree that minimizes the validation error.

3. Analysis:

◦ Discuss the concepts of underfitting and overfitting based on the plot.

◦ Justify the importance of validation error in choosing the polynomial


degree.

Task 4: Effect of Regularization on Bias-Variance Tradeoff

Objective: Examine the impact of regularization on bias and variance.

Steps: 1. Synthetic Dataset: - Generate a dataset with 100 examples and a true
relationship y = 2x + 3 + ε, where ε is Gaussian noise.

1. Ridge Regression:

◦ Train Ridge Regression models with λ = 0, 0.01, 0.1, 1, 10, 100.


◦ Compute the training error, validation error, and weights w_j for each λ.

2. Analysis:

◦ Plot the errors as a function of λ.

◦ Plot the magnitude of weights (|w_j|) as a function of λ.

◦ Discuss how increasing λ affects the bias-variance tradeoff.

Task 5: Neural Network Model Selection

Objective: Choose the best neural network architecture based on cross-validation error.

Steps: 1. Dataset Preparation: - Create a synthetic dataset with multiple input features
and a non-linear target relationship.

1. Neural Network Architectures:

◦ Define three neural network architectures:


▪ Architecture 1: 25 input units → 15 hidden units → 1 output unit

▪ Architecture 2: 20 input units → 12, 12 hidden units → 1 output unit

▪ Architecture 3: 32 input units → 16, 8, 4 hidden units → 1 output unit

2. Training and Validation:

◦ Train each architecture on the training set and evaluate on the validation
set.

◦ Compute the training and validation errors for each architecture.

3. Analysis:

◦ Select the architecture with the lowest validation error.

◦ Discuss why it is important to choose the architecture based on validation


error and not training error.

Task 6: Real-World Debugging

Objective: Debug and improve a poorly performing machine learning model.

Steps: 1. Scenario: - You are given a model with the following errors: - Training Error: 10
- Validation Error: 40 - Test Error: 42 - The large gap between training and validation
error indicates overfitting.

1. Debugging:

◦ Suggest three corrective actions to address overfitting (e.g., regularization,


adding more data, reducing model complexity).
2. Implementation:

◦ Implement one of these actions (e.g., increase λ) and re-train the model.

◦ Compute the new training, validation, and test errors.

3. Analysis:

◦ Compare the errors before and after applying the corrective action.

◦ Discuss the effectiveness of your approach.

Submission Instructions

1. Submit your Python code in a Jupyter Notebook or Python script.

2. Include a PDF report summarizing:


◦ Key results (tables, plots, metrics, etc.).

◦ Explanations and analyses for each task.

3. Ensure all plots are labeled and interpretations are included.

4. Submit your work by the deadline.

You might also like