TMLS20 Machine Learning Coursework-1
TMLS20 Machine Learning Coursework-1
TMLS20 Machine Learning Coursework-1
Niklas Lavesson
[email protected]
January 13, 2020
1 Introduction
This document provides information about the mandatory and optional coursework for the course
TMLS20 Machine Learning. This document is updated frequently. Students need to make sure
to download the latest version from the course homepage in the learning management system.
The document is to be considered frozen or locked during the duration of the course. That is;
students can assume that the version of the document downloaded when the course has started
is applicable until the course officially ends. For mandatory coursework, refer to Section 2 and
Section 3. For optional coursework, refer to Section 4 and Section 5.
2 Assignments
It is possible use the following programming languages and environments for the assignments:
Jupyter Notebook1 with Python or Swift Playground Book2 . Students need to ensure that the
source code compiles, or can be interpreted by, Python 3.8+ or Swift 5+. Additional program-
ming languages may be supported but it is always the responsibility of the student to ensure
that the selected programming environment is accepted by the examiner. Python source may can
depend on the following libraries only: default installation libraries, scikit-learn, numpy,
panda. Swift source may depend on the Foundations library only.
Submission Format
Assignments containing more than one file musts be compressed and archived using Zip format.
Students must ensure that the archive can be decompressed on Unix compatible systems (Linux
variants or BSD variants including Darwin). The source code must be documented clearly and
concisely. A README file with complete compilation and running instructions is required. If the
source code is embedded in a Jupyter Notebook or Swift Playground Book (as it should), the
need for instructions is minimal.
Data Set Format
Data sets must conform to the ARFF standard or the TMLS20 Machine Learning Data Set
Standard described in this document. Alternatively, if Python is used for development, it is
possible to use datasets that can be loaded by scikit-learn utility functions. For the TMLS20
Machine Learning Data Set Standard, data sets are stored as comma separated files with two
header rows. The top header row (the first line in the file) provides the list of features (sometimes
referred to as attributes or variables), including potential target features. The bottom header
row (the second line in the file) provides the type for each listed feature. The following types are
available: n (nominal), r (real). The last feature represents the default target. The following
file is an example of a data set with five real input features and one nominal target feature:
The file includes one data instance, classified as yes. The second to last feature has a missing
value. Any white space excluding end-of-line must be skipped by a data reader. The comma
symbol is used to separate features. The period symbol is used before fractional digits. Students
should expect the examiner to test assignment submissions with data sets unavailable to the
students but which adhere to one of the standards above.
Assignment 3 (1.5 credits)
The aim of this assignment is to implement a multi-layer feed-forward neural network with back-
propagation for classification tasks. It should be possible for the user to specify (in the code)
the number of hidden layers and the number of neurons in each hidden layer. Choose at least
three benchmark datasets from a public repository and four hyperparameters in order to perform
parameter tuning to optimize predictive performance (accuracy) for each data set. The source
code should include justifications for the choice of hyperparameters as well as the interval and
step size used for parameter tuning.
3 Project
The main deliverable for the project is the project report. Project reports should be prepared and
typeset in Latex or Word using the IEEE conference proceedings template5 . The recommended
length of a report is four pages but students are allowed up to six pages, excluding references.
For the project, it is possible to use any freely available open source libraries and software
platforms. It is also possible to use any data set format. However: if the project depends on
other data set formats or any additional software and libraries compared to what is accepted for
assignments, students may be asked to book an appointment with the examiner after submission
to demonstrate compilation and running of the code for the project using their own computer
and equipment.
Students are recommended to work in pairs on projects. For pair projects, a section entitled
Disclosure of Contribution must be included in the project report. In that section, the students
clarify the individual contributions of each student. Both students need to submit identical files
for examination in pair projects.
4 Laboratory Exercises
Exercise 1 – Instance-based Learning
Implement the K-Nearest Neighbor algorithm for classification and regression from scratch and
verify that you achieve comparable results to the scikit-learn implementation of the algorithm,
using different K values, for various standard datasets available through scikit-learn. Use
cross-validation to compute average performance scores. Use accuracy (for classification) and
mean squared error (for regression) to compute performance scores.
5 Seminars
Seminar 1 – Experiments in ML
This seminar focuses on empirical machine learning and the use of experiments to explore topics
and to advance the field. The idea is to discuss the motivation for experimentation in computer
science in general and machine learning in particular. The seminar should bring up discussions
on maturity and quality of published results from machine learning experiments, the need to
perform scientific experiments in machine learning, and the overarching question concerning
whether experiments are relevant to computer science as a discipline
Learning outcomes addressed: i) Demonstrate the ability to plan and conduct machine
learning experiments and to describe algorithmic performance and behavior through analysis of
experimental results, ii) Demonstrate the ability to evaluate algorithms and algorithm parameter
configurations for a concrete task
Seminar 2 – Explainable AI
This seminar focuses on the area of explainable artificial intelligence (XAI) and, more broadly:
fairness, accountability, explainability, and ethics (FATE) in artificial intelligence and machine
learning. The idea is to discuss the motivation for XAI, including potential trade-offs with
other important factors to consider when implementing AI and machine learning in real-world
Learning outcomes addressed: i) Demonstrate knowledge of the machine learning area of
research ii) Demonstrate the ability to suggest a suitable machine learning approach for a problem
or real-world challenge iii) Demonstrate the ability to motivate the potential costs and benefits
of machine learning application for a given context