TMLS20 Machine Learning Coursework-1

Uploaded by

i.anonyme7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views5 pages

TMLS20 Machine Learning Coursework-1

Uploaded by

i.anonyme7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

TMLS20 Machine Learning

Coursework

Niklas Lavesson
[email protected]
January 13, 2020

1 Introduction
This document provides information about the mandatory and optional coursework for the course
TMLS20 Machine Learning. This document is updated frequently. Students need to make sure
to download the latest version from the course homepage in the learning management system.
The document is to be considered frozen or locked during the duration of the course. That is;
students can assume that the version of the document downloaded when the course has started
is applicable until the course officially ends. For mandatory coursework, refer to Section 2 and
Section 3. For optional coursework, refer to Section 4 and Section 5.

2 Assignments
It is possible use the following programming languages and environments for the assignments:
Jupyter Notebook1 with Python or Swift Playground Book2 . Students need to ensure that the
source code compiles, or can be interpreted by, Python 3.8+ or Swift 5+. Additional program-
ming languages may be supported but it is always the responsibility of the student to ensure
that the selected programming environment is accepted by the examiner. Python source may can
depend on the following libraries only: default installation libraries, scikit-learn, numpy,
panda. Swift source may depend on the Foundations library only.

Submission Format
Assignments containing more than one file musts be compressed and archived using Zip format.
Students must ensure that the archive can be decompressed on Unix compatible systems (Linux
variants or BSD variants including Darwin). The source code must be documented clearly and
concisely. A README file with complete compilation and running instructions is required. If the
source code is embedded in a Jupyter Notebook or Swift Playground Book (as it should), the
need for instructions is minimal.
1 https://jupyter.org
2 https://developer.apple.com/documentation/swift_playgrounds/

1
Data Set Format
Data sets must conform to the ARFF standard or the TMLS20 Machine Learning Data Set
Standard described in this document. Alternatively, if Python is used for development, it is
possible to use datasets that can be loaded by scikit-learn utility functions. For the TMLS20
Machine Learning Data Set Standard, data sets are stored as comma separated files with two
header rows. The top header row (the first line in the file) provides the list of features (sometimes
referred to as attributes or variables), including potential target features. The bottom header
row (the second line in the file) provides the type for each listed feature. The following types are
available: n (nominal), r (real). The last feature represents the default target. The following
file is an example of a data set with five real input features and one nominal target feature:
a,b,c,d,e,f
r,r,r,r,r,n
0.1,0.7,218.3,17,?,yes

The file includes one data instance, classified as yes. The second to last feature has a missing
value. Any white space excluding end-of-line must be skipped by a data reader. The comma
symbol is used to separate features. The period symbol is used before fractional digits. Students
should expect the examiner to test assignment submissions with data sets unavailable to the
students but which adhere to one of the standards above.

Assignment 1 (1.5 credits)

The aim of the assignment is to implement from scratch a) a Naive Bayes learning algorithm for
classification tasks, b) a cross-validation test and c) to plot the average ROC of a 10-fold cross-
validation test. The submitted code should demonstrate the 10-fold cross-validation average
area under the ROC curve value and the ROC plot for Naive Bayes on three different versions of
the iris dataset. This dataset is directly available via scikit-learn. The dataset contains
three classes (categories), remove one class (50 instances) for each scenario (while keeping the 100
instances from the two remaining classes). There are a number of different ways to implement
Naive Bayes. The source code should include motivations for the design choices taken.

Assignment 2 (1.5 credits)

The aim of the assignment is to implement a decision tree induction algorithm for classification
tasks and to demonstrate that it works as expected. The algorithm shall process data sets
according to an approved standard (see above). It must be able to handle real-valued and
nominal features. The algorithm does not need to handle missing values or real-valued target
features (regression tasks). The student chooses whether to use information entropy or gini
impurity as split criterion. To calculate binary splits for real-valued features, the following rule
must be applied: an instance with a feature value lower than the mean feature value follows the
left edge from the split node while all other instances follow the right edge from the split node.
Demonstrate that the algorithm works as expected on three classification data sets: Iris3 , Wine4 ,
and one additional data set of your own choice from the UCI machine learning repository.
3 http://archive.ics.uci.edu/ml/datasets/Iris
4 http://archive.ics.uci.edu/ml/datasets/Wine

2
Assignment 3 (1.5 credits)
The aim of this assignment is to implement a multi-layer feed-forward neural network with back-
propagation for classification tasks. It should be possible for the user to specify (in the code)
the number of hidden layers and the number of neurons in each hidden layer. Choose at least
three benchmark datasets from a public repository and four hyperparameters in order to perform
parameter tuning to optimize predictive performance (accuracy) for each data set. The source
code should include justifications for the choice of hyperparameters as well as the interval and
step size used for parameter tuning.

3 Project
The main deliverable for the project is the project report. Project reports should be prepared and
typeset in Latex or Word using the IEEE conference proceedings template5 . The recommended
length of a report is four pages but students are allowed up to six pages, excluding references.
For the project, it is possible to use any freely available open source libraries and software
platforms. It is also possible to use any data set format. However: if the project depends on
other data set formats or any additional software and libraries compared to what is accepted for
assignments, students may be asked to book an appointment with the examiner after submission
to demonstrate compilation and running of the code for the project using their own computer
and equipment.
Students are recommended to work in pairs on projects. For pair projects, a section entitled
Disclosure of Contribution must be included in the project report. In that section, the students
clarify the individual contributions of each student. Both students need to submit identical files
for examination in pair projects.

Machine Learning Project (3 credits)

The project should be of sufficient size and complexity. You should describe, in the project
report, which activities were necessary to perform and the approximate time to perform each
activity. The total time of a project of sufficient size and complexity is 160 hours (1.5 credit is
roughly equivalent to one week of full-time study, which is 40 hours, 3 credits equals 80 hours,
2 students thus have 160 hours in total). Note that the total time includes the time required to
study a topic and write the report.
The aim of the project is to choose a machine learning task, identify an appropriate learning
problem, identify a reasonable model type and learning algorithm, and to choose a systematic
approach to evaluate a solution of a real-world problem. The task, learning problem, model,
evaluation procedure, and learning algorithm should be described and justified in the report.
The student group is encouraged to choose an application of interest (natural language pro-
cessing, computer vision, data mining, pattern recognition, etc.). Kaggle describes a variety of
competitions that can be used as is or elaborated upon as inspiration.
A project should demonstrate the results of an independent investigation into an advanced
machine learning topic or application. In most cases, the project report could be used as a
preliminary study before taking on a Master’s thesis.
5 https://www.ieee.org/conferences/publishing/templates.html

3
4 Laboratory Exercises
Exercise 1 – Instance-based Learning
Implement the K-Nearest Neighbor algorithm for classification and regression from scratch and
verify that you achieve comparable results to the scikit-learn implementation of the algorithm,
using different K values, for various standard datasets available through scikit-learn. Use
cross-validation to compute average performance scores. Use accuracy (for classification) and
mean squared error (for regression) to compute performance scores.

Exercise 2 – K-Means Clustering

Implement the K-Means Clustering algorithm from scratch and evaluate your solution, using
different K values, for various standard datasets available through scikit-learn. Use cross-
validation to compute average performance scores. Search scientific literature to find a suitable
evaluation measure by which to evaluate your solution.

Exercise 3 – Hyperparameter Tuning

Use the scikit-learn implementation of Random Forests and pick at least two hyperpa-
rameters to optimize. Choose three datasets from a public repository. Perform a systematic
hyperparameter tuning to optimize performance for each dataset using a suitable performance
measure.

Exercise 4 – Generating Explanations with LIME

Use LIME to generate explanations for one natural language processing task and one image
recognition task.

Exercise 5 – Reinforcement Learning

Description will be available soon

5 Seminars
Seminar 1 – Experiments in ML
This seminar focuses on empirical machine learning and the use of experiments to explore topics
and to advance the field. The idea is to discuss the motivation for experimentation in computer
science in general and machine learning in particular. The seminar should bring up discussions
on maturity and quality of published results from machine learning experiments, the need to
perform scientific experiments in machine learning, and the overarching question concerning
whether experiments are relevant to computer science as a discipline
Learning outcomes addressed: i) Demonstrate the ability to plan and conduct machine
learning experiments and to describe algorithmic performance and behavior through analysis of
experimental results, ii) Demonstrate the ability to evaluate algorithms and algorithm parameter
configurations for a concrete task

4
Seminar 2 – Explainable AI
This seminar focuses on the area of explainable artificial intelligence (XAI) and, more broadly:
fairness, accountability, explainability, and ethics (FATE) in artificial intelligence and machine
learning. The idea is to discuss the motivation for XAI, including potential trade-offs with
other important factors to consider when implementing AI and machine learning in real-world
applications.
Learning outcomes addressed: i) Demonstrate knowledge of the machine learning area of
research ii) Demonstrate the ability to suggest a suitable machine learning approach for a problem
or real-world challenge iii) Demonstrate the ability to motivate the potential costs and benefits
of machine learning application for a given context

Seminar 3 – Lifelong and Transfer Learning

Description will be available soon

Seminar 4 – Machine Learning for Manufacturing

Description will be available soon

Seminar 5 – The Data-driven Industry and Society

Description will be available soon

E Data Analysis With Python Master Manual
No ratings yet
E Data Analysis With Python Master Manual
61 pages
Introduction of Machine Learning Course Code: 4350702
No ratings yet
Introduction of Machine Learning Course Code: 4350702
9 pages
AL-405 Machine Learning Lab Manual
No ratings yet
AL-405 Machine Learning Lab Manual
40 pages
ML Project Guidelines SWE Winter 2024
No ratings yet
ML Project Guidelines SWE Winter 2024
8 pages
Data Structures and Algorithm Analysis in C++, Third Edition
From Everand
Data Structures and Algorithm Analysis in C++, Third Edition
Clifford A. Shaffer
4.5/5 (5)
Data Structures and Algorithm Analysis in Java, Third Edition
From Everand
Data Structures and Algorithm Analysis in Java, Third Edition
Clifford A. Shaffer
4/5 (4)
Artificial Intellegence Lab Practical
No ratings yet
Artificial Intellegence Lab Practical
48 pages
Report Intership Chapters
No ratings yet
Report Intership Chapters
39 pages
Introduction To Machine Learning Course Code: 4350702
No ratings yet
Introduction To Machine Learning Course Code: 4350702
12 pages
Data Science Lab Exp Lis
No ratings yet
Data Science Lab Exp Lis
72 pages
AIot Lab Syllabus
No ratings yet
AIot Lab Syllabus
4 pages
Course Admin
No ratings yet
Course Admin
15 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
167 pages
ML Lab Manual
No ratings yet
ML Lab Manual
90 pages
AI and ML Laboratory
No ratings yet
AI and ML Laboratory
12 pages
COM7039M MachineLearning Assignment Brief-Level 7-1
No ratings yet
COM7039M MachineLearning Assignment Brief-Level 7-1
12 pages
Algorithms For Exercises
No ratings yet
Algorithms For Exercises
5 pages
P3 Practical
No ratings yet
P3 Practical
20 pages
Machine Learning Assignment 2: Assessment Type
No ratings yet
Machine Learning Assignment 2: Assessment Type
11 pages
HW 1
No ratings yet
HW 1
12 pages
CYBER 207 - Applied Machine Learning For Cybersecurity Syllabus
No ratings yet
CYBER 207 - Applied Machine Learning For Cybersecurity Syllabus
7 pages
7641 Assignment 2 Fall 2024
No ratings yet
7641 Assignment 2 Fall 2024
5 pages
Bnblist 3570
No ratings yet
Bnblist 3570
272 pages
Machine L-Lab-Manual
No ratings yet
Machine L-Lab-Manual
90 pages
JavaScript Introduction
From Everand
JavaScript Introduction
Lisa Saldivar
No ratings yet
00 EEME30002 Coursework Brief 24 25
No ratings yet
00 EEME30002 Coursework Brief 24 25
2 pages
EM 538 - ISE 489 Syllabus
No ratings yet
EM 538 - ISE 489 Syllabus
11 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
11 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
ML2 Write-Ups Prac 1-5
No ratings yet
ML2 Write-Ups Prac 1-5
11 pages
MCQ's
100% (1)
MCQ's
32 pages
ML Priyesha - 778
No ratings yet
ML Priyesha - 778
23 pages
SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
Machine Learning-Assignments PDF
No ratings yet
Machine Learning-Assignments PDF
2 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Machine Learning - Lab Wise Manual Abbbbb
No ratings yet
Machine Learning - Lab Wise Manual Abbbbb
13 pages
CS229 Final Project Spring 2023 Public PDF
No ratings yet
CS229 Final Project Spring 2023 Public PDF
12 pages
Python Machine Learning For Beginners Learning From Scratch Numpy Pandas Matplotlib Seaborn SKle
100% (1)
Python Machine Learning For Beginners Learning From Scratch Numpy Pandas Matplotlib Seaborn SKle
277 pages
536C3E
No ratings yet
536C3E
2 pages
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Assignment1 2020
No ratings yet
Assignment1 2020
6 pages
AA Syllabus 2024 25
No ratings yet
AA Syllabus 2024 25
4 pages
Assignment 3
No ratings yet
Assignment 3
2 pages
Assignment 3-PDS Python-24S3
No ratings yet
Assignment 3-PDS Python-24S3
5 pages
Corporate Presentation
No ratings yet
Corporate Presentation
112 pages
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
CPE 695 WS: Applied Machine Learning: Lecture 0: Course Logistics and Introduction To ML
No ratings yet
CPE 695 WS: Applied Machine Learning: Lecture 0: Course Logistics and Introduction To ML
17 pages
Microarray Data Analysis: Class Discovery and Class Prediction: Clustering and Discrimination
No ratings yet
Microarray Data Analysis: Class Discovery and Class Prediction: Clustering and Discrimination
70 pages
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
178 hw1
No ratings yet
178 hw1
4 pages
Hindusthan College of Engineering and Technology
No ratings yet
Hindusthan College of Engineering and Technology
9 pages
Mchine Learning Outlines
No ratings yet
Mchine Learning Outlines
4 pages
7641 Assignment 1
No ratings yet
7641 Assignment 1
4 pages
hw1 Problem Set
No ratings yet
hw1 Problem Set
8 pages
ML Unit 1 MCQ
100% (1)
ML Unit 1 MCQ
9 pages
Fundamentals of Machine Learning 4341603
No ratings yet
Fundamentals of Machine Learning 4341603
9 pages
Important Questions
No ratings yet
Important Questions
4 pages
Chapter 4 ML
No ratings yet
Chapter 4 ML
30 pages
Artificial Intelligence and Pattern Recognition Question Bank
100% (1)
Artificial Intelligence and Pattern Recognition Question Bank
5 pages
UCCD2063 Artificial Intelligence Techniques Practical Assignment
No ratings yet
UCCD2063 Artificial Intelligence Techniques Practical Assignment
3 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
3 pages
DM Unit I.1-Introduction
No ratings yet
DM Unit I.1-Introduction
93 pages
6.891 Machine Learning: Project Proposal
No ratings yet
6.891 Machine Learning: Project Proposal
2 pages
INF385T IMLsyllabus
No ratings yet
INF385T IMLsyllabus
4 pages
Aiet Brochure
No ratings yet
Aiet Brochure
14 pages
PG Syllabus 2022-24
No ratings yet
PG Syllabus 2022-24
86 pages
TensorFlow Developer Certificate Exam Practice Tests 2024 Made Easy
From Everand
TensorFlow Developer Certificate Exam Practice Tests 2024 Made Easy
Mr Troy
No ratings yet
M.L.CSE Syllabus
No ratings yet
M.L.CSE Syllabus
3 pages
Gs Language Bindings
No ratings yet
Gs Language Bindings
70 pages
Chapter 13 Clustering Algorithms
No ratings yet
Chapter 13 Clustering Algorithms
62 pages
Artificial Intelligence Application in Production Scheduling Problem Systematic Literature Review Bibliometric Analysis, Research Trend, and Knowledge Taxonomy
No ratings yet
Artificial Intelligence Application in Production Scheduling Problem Systematic Literature Review Bibliometric Analysis, Research Trend, and Knowledge Taxonomy
24 pages
Lecture 13
No ratings yet
Lecture 13
45 pages
A Comprehensive Suite For Multimodal Data-Model Co-Development
No ratings yet
A Comprehensive Suite For Multimodal Data-Model Co-Development
26 pages
Mit401 Unit 10-Slm
No ratings yet
Mit401 Unit 10-Slm
23 pages
Keyphrase Extraction Using Word Embedding
100% (1)
Keyphrase Extraction Using Word Embedding
8 pages
15CSL76
No ratings yet
15CSL76
3 pages
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
From Everand
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
Mustafa Al-Dori
5/5 (1)
CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
A Case Study of Jinan, China
No ratings yet
A Case Study of Jinan, China
13 pages
Ghostscript
No ratings yet
Ghostscript
11 pages
Fry 1979
No ratings yet
Fry 1979
17 pages
It6702data Warehousing and Data Mining
No ratings yet
It6702data Warehousing and Data Mining
2 pages
Big Data Analytics
No ratings yet
Big Data Analytics
12 pages
Correspondence Analysis and Classification: Lebart L.
No ratings yet
Correspondence Analysis and Classification: Lebart L.
18 pages
Chapter 5 Artificial Intelligence Notes
No ratings yet
Chapter 5 Artificial Intelligence Notes
7 pages
PE4000G Datasheet
No ratings yet
PE4000G Datasheet
3 pages
Awe The Audience: How The Narrative Trajectories Affect Audience Perception in Public Speaking
No ratings yet
Awe The Audience: How The Narrative Trajectories Affect Audience Perception in Public Speaking
12 pages
K Means Algorithm
No ratings yet
K Means Algorithm
6 pages
Using Accuracy and Diversity To Select Classifiers To Build Ensembles
No ratings yet
Using Accuracy and Diversity To Select Classifiers To Build Ensembles
7 pages
1000 Sites Swap Resource Plan
No ratings yet
1000 Sites Swap Resource Plan
9 pages
Pengelompokan Kejadian Gempa Bumi Menggunakan Fuzzy C-Means Clustering
No ratings yet
Pengelompokan Kejadian Gempa Bumi Menggunakan Fuzzy C-Means Clustering
8 pages
BPSU156 Minishell2 1776199872393805
No ratings yet
BPSU156 Minishell2 1776199872393805
2 pages
Customer Segmentation Using Clustering and Data Mining Techniques
No ratings yet
Customer Segmentation Using Clustering and Data Mining Techniques
6 pages
tspr20 - Assignment - Group7 - 2597 - 73620 - TSPR20 - Draft Scientific Report Assignment - Group 7
No ratings yet
tspr20 - Assignment - Group7 - 2597 - 73620 - TSPR20 - Draft Scientific Report Assignment - Group 7
4 pages
Lab 1 Unit Testing
No ratings yet
Lab 1 Unit Testing
4 pages
Lab 2 Unit Testing
No ratings yet
Lab 2 Unit Testing
3 pages
Review Groupe 14
No ratings yet
Review Groupe 14
3 pages
Assignment 2
No ratings yet
Assignment 2
1 page
Egzmm20b2 en
No ratings yet
Egzmm20b2 en
2 pages
Lab 2 Test-Driven Development
No ratings yet
Lab 2 Test-Driven Development
2 pages
Practical-8: Vishwakarma Govt. Engg. College
No ratings yet
Practical-8: Vishwakarma Govt. Engg. College
3 pages
Lab 3 Exploratory Testing
No ratings yet
Lab 3 Exploratory Testing
1 page
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet