Machine Learning Lab Assignment: Instructions
Machine Learning Lab Assignment: Instructions
Instructions
This lab assignment consists of multiple tasks aimed at applying and understanding
concepts from the provided slides. Complete the tasks using Python and libraries such
as NumPy, pandas, matplotlib, and scikit-learn. Include plots, metrics, and
explanations for your findings. Submit your code and a report summarizing your
results.
Objective: Explore the effects of regularization in linear regression and debug common
issues.
Steps: 1. Dataset Preparation: - Create a synthetic dataset with features X and target y
using a polynomial function (e.g., y = 3x^2 + 2x + 1 + ε, where ε is Gaussian noise). -
Split the dataset into 70% training and 30% test sets.
2. Analysis:
◦ Discuss how λ affects the weights w_j and the model’s ability to generalize.
Steps: 1. Dataset Splitting: - Use the dataset provided in the slides: Size (sq ft): [2104,
1600, 2400, 1416, 3000, 1985, 1534, 1427, 1380, 1494] Price (k$): [400, 330, 369,
232, 540, 300, 315, 199, 212, 243] - Split the dataset into: - 60% training set - 20%
validation set - 20% test set - Display the resulting splits.
◦ Compute the Mean Squared Error (MSE) on the training, validation, and test
sets.
2. Analysis:
◦ Explain the significance of keeping the test set separate from training and
validation.
Objective: Select the best polynomial model for a given dataset using validation error.
Steps: 1. Dataset Preparation: - Use the same dataset from Task 2 or generate a
synthetic dataset with non-linear patterns.
1. Polynomial Regression:
◦ Compute the training error and validation error for each degree d.
2. Visualization:
3. Analysis:
Steps: 1. Synthetic Dataset: - Generate a dataset with 100 examples and a true
relationship y = 2x + 3 + ε, where ε is Gaussian noise.
1. Ridge Regression:
2. Analysis:
Objective: Choose the best neural network architecture based on cross-validation error.
Steps: 1. Dataset Preparation: - Create a synthetic dataset with multiple input features
and a non-linear target relationship.
◦ Train each architecture on the training set and evaluate on the validation
set.
3. Analysis:
Steps: 1. Scenario: - You are given a model with the following errors: - Training Error: 10
- Validation Error: 40 - Test Error: 42 - The large gap between training and validation
error indicates overfitting.
1. Debugging:
◦ Implement one of these actions (e.g., increase λ) and re-train the model.
3. Analysis:
◦ Compare the errors before and after applying the corrective action.
Submission Instructions