0% found this document useful (0 votes)
61 views2 pages

ML 1 Project

This document provides instructions for Lab 5, which involves implementing and analyzing multiple regression methods on a machine learning benchmark dataset. Students must use nearest neighbor regression, linear regression, regression forest, and Gaussian process regression models. Results should be presented in a 3000-word PDF report including code. Models will be evaluated on accuracy, complexity, and sensitivity to hyperparameters. A toy problem must also be designed and used to validate the implementations. The SARCOS robotics dataset on Moodle will be used, with the task being to predict motor torque from other variables.

Uploaded by

Chaitanya Sanga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views2 pages

ML 1 Project

This document provides instructions for Lab 5, which involves implementing and analyzing multiple regression methods on a machine learning benchmark dataset. Students must use nearest neighbor regression, linear regression, regression forest, and Gaussian process regression models. Results should be presented in a 3000-word PDF report including code. Models will be evaluated on accuracy, complexity, and sensitivity to hyperparameters. A toy problem must also be designed and used to validate the implementations. The SARCOS robotics dataset on Moodle will be used, with the task being to predict motor torque from other variables.

Uploaded by

Chaitanya Sanga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Lab 5

(Deadline 18:00 10/01/2020)

Task
You will be provided with a machine learning benchmark dataset (details below). The task
focuses on the implementation and critical analysis of multiple regression methods: You should
implement, evaluate, and analyse each of the following algorithms:
● Nearest neighbour
● Linear regression
● Regression forest
● Gaussian process

The results should be presented as a report with a 3000 word limit in a ​pdf document​. Please
also include your code as well (Jupyter workbook and/or normal .py files).

Mark scheme
This project is worth 60% of your marks for the unit:
● 10 marks for the method,
● 20 marks for designing and validating on a toy problem (see below),
● 20 marks for the experiments,
● 10 marks for the analysis,
for a total of 60 marks.

If a piece of work is submitted after the submission date (and no extension has been explicitly
granted by the Director of Studies), the maximum possible mark will be 40% of the full mark. If
work is submitted more than five working days after the submission date, the student will receive
zero marks.

Data set
Included on Moodle is the SARCOS data set, a regression problem where the task is to predict
the torque of one motor of a robotic arm given physical joint details. Specifically, position,
velocity and acceleration for 7 degrees of freedom.

The data set is provided as one csv file - you are responsible for splitting it for train/test and
hyperparameter learning. Each row is an exemplar, and each column a feature. Your task is to
predict the last column (#22) given the first 21 columns. Be aware that you will not want to use
all of the exemplars when training a Gaussian process as it will get too slow.
Suggested document structure
Please structure your report sensibly. Below is a suggested structure which we would strongly
suggest you follow.

Section 1: Introduction
Explain the problem.

Section 2: Methods
Explain regression forests and Gaussian process regression. Include a discussion of
why (or why not) you would use these algorithms. Note that you do not need to discuss
nearest neighbours or linear regression.

Section 3: Validation on a toy problem


Design a toy problem and provide a discussion on validating your implementations on
this problem. Please explain the rationale of your toy problem design and how it does, or
does not, demonstrate that you have implemented your algorithm correctly.

A toy problem is used to debug, test and validate a machine learning algorithm; it will
usually have the following properties:
● Low dimensional (2 or 3 dimensions), so it can be visualised.
● Synthetic, so the correct answer is known and there is no noise.
● Designed so that the behaviour of the algorithms being tested is predictable, so
you will know if something is wrong.
● Very simple, e.g. you could type it into a calculator in a few seconds.

Note that a single toy problem can cover all four algorithms.

Section 4: Experiments and analysis


Please report your experimental results. This could include
1) Your hyperparameter selection strategies.
2) Performance comparison of different algorithms in terms of accuracy,
computational complexity, etc.
3) Analysis: How the results are influenced by different hyper-parameter choices?
Why does one algorithm perform better than the others on the given dataset?

Source
The data set was originally obtained from ​http://gaussianprocess.org/gpml/data/

You might also like