(1122) AI Assignment2
(1122) AI Assignment2
2024/05/07
Classification is a general type of problems that many real-world
applications can be formulated as a classification problem. For binary
classification, a well-known metric is the f1-score
2𝑃𝑅
𝐹1 =
𝑃+𝑅
, which calculates the harmonic mean of precision
𝑇𝑃
𝑃=
𝑇𝑃 + 𝐹𝑃
and recall
𝑇𝑃
𝑅=
𝑇𝑃 + 𝐹𝑁
where TP, FP, FN are true positive, false positive, and false negative,
respectively.
This can be extended for multi-class classification, which is known as
macro f1-score
1
𝑀𝑎𝑐𝑟𝑜𝐹1 = ∑ 𝐹1𝑖 ,
𝑚 𝑖
𝑖
where 𝑚 is the number of classes, and 𝐹1 is the f1-score regarding the
𝑖 th class as positive.
Write a program to solve a classification problem on provided dataset
using machine learning methods given in this course. Use above metrics
to evaluate the methods.
Implementation
The training and testing datasets are stored in train.csv and test.csv files,
respectively. Each example in a line in turn includes input vector 𝒙 and
label 𝑦.
The following machine learning methods must be included:
1. decision tree,
2. linear classifier,
3. nearest neighbor, and
4. try to modify above or existing machine learning methods.
Separate your implementation into different files. Output the predicted
labels to result_train.csv and result_test.csv. Each line contains only an
integer as the prediction.
Analysis
Your report should clearly state all the operators and parameters you used
in each methods.
Compare the performance of above methods in terms of
1. solution quality, and
2. running time.
Describe your design and discuss your findings in this assignment.
Requirement
1. Write your program in C or C++. You will get no score if you use
other programming languages. Team members can share codes but
should take responsibility to check it.
2. Students should write report on your own without sharing to others
including team members.
3. You have to turn in your source code and a report for the assignment.
Do not turn in executable files. You will get zero score if the code
cannot be compiled or cannot provide correct results.
4. Upload your files in a zip file in the format: ML_StudentID.zip, where
StudentID is your student ID.
5. The due date is 2024/05/28. Every delay takes a penalty of 20 scores
per day.
6. Plagiarism is prohibited with no exception. Being identified as
plagiarism will get zero score for the assignment. This includes using
any nature language processing technique or large language model.