0% found this document useful (0 votes)

305 views4 pages

7641 Assignment 1

The document describes Assignment 1 for CS7641 Supervised Learning. Students are asked to: 1) Select two interesting classification problems and associated datasets. 2) Experiment with five machine learning algorithms (decision trees, neural networks, boosted decision trees, support vector machines, k-nearest neighbors) on each dataset, analyzing performance. 3) Submit a 10-page written analysis of their experiments and the two selected datasets. Proper citation and attribution is required to avoid plagiarism. The goal is for students to gain experience exploring, tuning, and analyzing machine learning algorithms and their behavior empirically on different datasets.

Uploaded by

Muhammad Aleem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

305 views4 pages

7641 Assignment 1

Uploaded by

Muhammad Aleem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

CS7641 Assignment 1

Supervised Learning

1 Assignment Weight
The assignment is worth 15% of the total points.
Read everything below carefully as this assignment has changed term-over-term.

2 Objective
The purpose of this project is to explore techniques in supervised learning. It is important to realize that
understanding an algorithm or technique requires understanding how it behaves empirically under a variety of
circumstances. As such, rather than implement each of the algorithms, you will be asked to experiment with
them and compare their performance. This is quite involved and also possibly quite different from what you
are used to; however, it is central and in many ways the essence of supervised learning.

3 Procedure
First, you should design two interesting classification problems. For the purposes of this assignment, a classifi-
cation problem is just a set of training examples and a set of test examples. You can download data, take from
your own research, or make up your own. Be careful about the datasets you choose, though. You’ll need to
explain why they are interesting, use them in later assignments, and have a deep understanding of them.
After selecting two interesting classification problems, you will go through the process of exploring the data,
tuning the algorithms you’ve learned about, and writing a thorough analysis of your findings. You need not
implement any learning algorithm yourself; however, you must participate in the journey of exploring, tuning,
and analyzing. Concretely, this means:

• You may program in any language you wish and are allowed to use any library, as long as it was not
written specifically to solve this assignment.
• TAs must be able to recreate your experiments on a standard linux machine if necessary.
• The analysis you provide in the report is paramount.

You should experiment with five learning algorithms on each dataset. They are:

• Decision Trees. Be sure to use some form of pruning. You are not required to use information gain (for
example, there is something called the GINI index that is sometimes used) to split attributes, but you
should describe whatever it is that you do use.

• Neural Networks. You may use networks of nodes with as many layers as you like and any activation
function you see fit.
• Boosted Decision Trees. As with decision trees, you will want to use some form of pruning. Since you
are using boosting you can afford to be much more aggressive about your pruning.
• Support Vector Machines. Make sure to try at least two different kernel functions.

• k-Nearest Neighbors. Make sure to try different values of k.

1
Each algorithm is described in detail in your textbook, the assigned readings on Canvas, and on the internet.
Instead of implementing the algorithms yourself, you should use libraries that do this for you and make sure to
provide proper attribution. Also, note that you’ll need to do some fiddling to obtain good results and graphs,
and this might require you to modify these libraries in various ways.

3.1 Experiments and Analysis

Your report should contain:

• A description of your classification problems, and why you feel they are interesting. Think hard about
this. To be interesting the problems should be non-trivial on the one hand, but capable of admitting
comparisons and analysis of the various algorithms on the other. Avoid the mistake of working on the
largest most complicated and messy dataset you can find. The key is to be interesting and clear, no points
for hairy and complex.
• The training and testing error rates you obtained running the various learning algorithms on your problems.
At the very least you should include graphs that show performance on both training and test data as a
function of training size (note that this implies that you need to design a classification problem that has
more than a trivial amount of data) and – for the algorithms that are iterative – training times/iterations.
Both of these kinds of graphs are referred to as learning curves.
• Graphs for each algorithm showing training and testing error rates as a function of selected hyperparameter
ranges. This type of graph is referred to as a model complexity graph (also sometimes validation curve).
Please experiment with more than one hyperparameter and make sure the results and subsequent analysis
you provide are meaningful.

• Analyses of your results. Why did you get the results you did? Compare and contrast the different
algorithms. What sort of changes might you make to each of those algorithms to improve performance?
How fast were they in terms of wall clock time? Iterations? Would cross validation help? How much
performance was due to the problems you chose? Which algorithm performed best? How do you define
best? Be creative and think of as many questions you can, and as many answers as you can.

Analysis writeup is limited to 10 pages.

Please keep your analysis as short as possible while still covering the requirements of the assignment.

3.2 Acceptable Libraries

Here are a few examples of acceptable libraries. You can use other libraries as long as they fulfill the conditions
mentioned above.
Machine learning algorithms:

• scikit-learn (python)
• Weka (java)
• e1071/nnet/random forest(R)
• ML toolbox (matlab)

• tensorflow/pytorch (python)

Plotting:

• matplotlib (python)
• seaborn (python)
• yellowbrick (python)

• ggplot2 (R)

2
4 Submission Details
You must submit:

• A file named README.txt containing instructions for running your code (see note below)
• A file named yourgtaccount-analysis.pdf containing your writeup (GT account is what you log in with,
not your all-digits ID)

Note: we need to be able to get to your code and your data. Providing entire libraries isn’t necessary when a
URL would suffice; however, you should at least provide any files you found necessary to change and enough
support and explanation so we can reproduce your results on a standard linux machine.

5 Rescoring Criteria
When your assignment is scored, you will receive feedback explaining your errors and successes in some level of
detail. This feedback is for your benefit, both on this assignment and for future assignments. It is considered a
part of your learning goal to internalize this feedback.
If you are convinced that your score is in error in light of the feedback, you may request a rescore within a week
of the score and feedback being returned to you. A rescore request is only valid if it includes an explanation of
where the grader made an error.
It is important to note that because we consider your ability to internalize feedback a learning goal, we also
assess it. This ability is considered 10 percent of each assignment. We default to assigning you full credit. If
you request a rescore and do not receive at least 5 points as a result of the request, you will lose those 10 points.

6 Plagiarism and Proper Citation

The easiest way to fail this class is to plagiarize. Using the analysis, code or graphs of others in this class
is considered plagiarism. The assignments are designed to force you to immerse yourself in the empirical and
engineering side of ML that one must master to be a viable practitioner and researcher. It is important that you
understand why your algorithms work and how they are affected by your choices in data and hyperparameters.
The phrase ”as long as you participate in this journey of exploring, tuning, and analyzing” is key. We take this
very seriously and you should too.
What is plagiarism?
If you copy any amount of text from other students, websites, or any other source without proper attribution,
that is plagiarism. The most common form of plagiarism is copying definitions or explanations from wikipedia
or similar websites. We use an anti-cheat tool to find out which parts of the assignments are your own and
there is a near 100 percent chance we will find out if you copy or paraphrase text or plots from online articles,
assignments of other students (even across sections and previous courses), or website repositories.
What does it mean to be original?
In this course, we care very much about your analysis. It must be original. Original here means two things: 1)
the text of the written report must be your own and 2) the exploration that leads to your analysis must be your
own. Plagiarism typically refers to the former explicitly, but in this case it also refers to the latter explicitly.
It is well known that for this course we do not care about code. We are not interested in your working out the
edge cases in k-nn, or proving your skills with python. While there is some value in implementing algorithms
yourselves in general, here we are interested in your grokking the practice of ML itself. That practice is about
the interaction of algorithms with data. As such, the vast majority of what you’re going to learn in order to
master the empirical practice of ML flows from doing your own analysis of the data, hyper parameters, and so
on; hence, you are allowed to steal ML code from libraries but are not allowed to steal code written explicitly
for this course, particularly those parts of code that automate exploration. You will be tempted to just run said
code that has already been overfit to the specific datasets used by that code and will therefore learn very little.
How to cite:
If you are referring to information you got from a third-party source or paraphrasing another author, you need
to cite them right where you do so and provide a reference at the end of the document [Col]. Furthermore,

3
“if you use an author’s specific word or words, you must place those words within quotation marks and you
must credit the source.” [Wis]. It is good style to use quotations sparingly. Obviously, you cannot quote other
people’s assignment and assume that is acceptable. Speaking of acceptable, citing is not a get-out-of-jail-free
card. You cannot copy text willy nilly, but cite it all and then claim it’s not plagiarism just because you cited it.
Too many quotes of more than, say, two sentences will be considered plagiarism and a terminal lack of academic
originality.
Your README file will include pointers to any code and libraries you used.
If we catch you. . .
We report all suspected cases of plagiarism to the Office of Student Integrity. Students who are under investi-
gation are not allowed to drop from the course in question, and the consequences can be severe, ranging from
a lowered grade to expulsion from the program.

References
[Col] Williams College. Citing Your Sources: Citing Basics. url: https://libguides.williams.edu/citing.
[Wis] University of Wisconsin - Madison. Quoting and Paraphrasing. url: https : / / writing . wisc . edu /
handbook/assignments/quotingsources.

Original assignment description written by Charles Isbell. Updated for Spring 2024 by John Mansfiled and
Theodore LaGrow. Modified for LATEX by John Mansfield.

2.0 Object Oriented Programming in Java
No ratings yet
2.0 Object Oriented Programming in Java
158 pages
Module 2 Ajava
No ratings yet
Module 2 Ajava
38 pages
String Objective Test For JAVA ICSE Class X
No ratings yet
String Objective Test For JAVA ICSE Class X
3 pages
Data Science Fir Civil Engineering Unit 1 Notes and Assignments
No ratings yet
Data Science Fir Civil Engineering Unit 1 Notes and Assignments
29 pages
1 - Practical Guide For Kaggle Competitions
No ratings yet
1 - Practical Guide For Kaggle Competitions
39 pages
Data Science Lab Exp Lis
No ratings yet
Data Science Lab Exp Lis
72 pages
Android App Development Tools
No ratings yet
Android App Development Tools
2 pages
Project Requirements Student Version 1.0
No ratings yet
Project Requirements Student Version 1.0
6 pages
ML Lab Manual
No ratings yet
ML Lab Manual
90 pages
Python Lab Manual Final
No ratings yet
Python Lab Manual Final
45 pages
Dde Prospectus 2012 13 220612
No ratings yet
Dde Prospectus 2012 13 220612
73 pages
CMPE 011 Topic 1
No ratings yet
CMPE 011 Topic 1
58 pages
Milestone
No ratings yet
Milestone
7 pages
Assignment On Declarative Programming Paradigm
No ratings yet
Assignment On Declarative Programming Paradigm
22 pages
50 Inference
No ratings yet
50 Inference
31 pages
LM32 Ait L22
No ratings yet
LM32 Ait L22
20 pages
BC 414 - Programming Database Changes Complee
No ratings yet
BC 414 - Programming Database Changes Complee
80 pages
B.SC Cs Batchno 8
No ratings yet
B.SC Cs Batchno 8
40 pages
Machine L-Lab-Manual
No ratings yet
Machine L-Lab-Manual
90 pages
P3 Practical
No ratings yet
P3 Practical
20 pages
HW 1
No ratings yet
HW 1
12 pages
1 s2.0 S0045790623000320 Main
No ratings yet
1 s2.0 S0045790623000320 Main
15 pages
COM7039M MachineLearning Assignment Brief-Level 7-1
No ratings yet
COM7039M MachineLearning Assignment Brief-Level 7-1
12 pages
Lab Assignment - SVM - 2024
No ratings yet
Lab Assignment - SVM - 2024
5 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
Smartform Output To PDF
No ratings yet
Smartform Output To PDF
10 pages
Striver SDE Sheet
100% (2)
Striver SDE Sheet
14 pages
Project Descr
No ratings yet
Project Descr
2 pages
ML Notes
No ratings yet
ML Notes
25 pages
R20 CS Course Structure Final
No ratings yet
R20 CS Course Structure Final
7 pages
Rendy Khonelius Studi Kasus MYSQL
No ratings yet
Rendy Khonelius Studi Kasus MYSQL
19 pages
ExtJS Toolbars Menus Buttons
No ratings yet
ExtJS Toolbars Menus Buttons
16 pages
Delta Extractors CO Poor Performance
No ratings yet
Delta Extractors CO Poor Performance
4 pages
C# Collections and Generics
No ratings yet
C# Collections and Generics
40 pages
Week 1: Python Basics: Class 1: Getting Started With Python
No ratings yet
Week 1: Python Basics: Class 1: Getting Started With Python
6 pages
Advanced Techniques in Machine Learning and Optimization
No ratings yet
Advanced Techniques in Machine Learning and Optimization
8 pages
7641 Assignment 2 Fall 2024
No ratings yet
7641 Assignment 2 Fall 2024
5 pages
TMLS20 Machine Learning Coursework-1
No ratings yet
TMLS20 Machine Learning Coursework-1
5 pages
Data Mining & Machine Learning Courseoutline
No ratings yet
Data Mining & Machine Learning Courseoutline
7 pages
Assignment 3
No ratings yet
Assignment 3
2 pages
CS229 Final Project Spring 2023 Public PDF
No ratings yet
CS229 Final Project Spring 2023 Public PDF
12 pages
F21DL 2024-25 Coursework-1 - 240918 - 110502
No ratings yet
F21DL 2024-25 Coursework-1 - 240918 - 110502
7 pages
Syllabus - ML Lab
No ratings yet
Syllabus - ML Lab
3 pages
CS502M Project Spec
No ratings yet
CS502M Project Spec
8 pages
Assignment 3-PDS Python-24S3
No ratings yet
Assignment 3-PDS Python-24S3
5 pages
RFC
No ratings yet
RFC
92 pages
SPCC Oral Questions
No ratings yet
SPCC Oral Questions
10 pages
AA Syllabus 2024 25
No ratings yet
AA Syllabus 2024 25
4 pages
Identifing Software Bugs or Not Using SMLT Model
No ratings yet
Identifing Software Bugs or Not Using SMLT Model
34 pages
A3 Classification and Feature Engineering
No ratings yet
A3 Classification and Feature Engineering
2 pages
HowTo5 - 001 HMI VIJEO DEIGNER MAGELIS
No ratings yet
HowTo5 - 001 HMI VIJEO DEIGNER MAGELIS
14 pages
Assignment2 2024
No ratings yet
Assignment2 2024
4 pages
ALGOL
No ratings yet
ALGOL
29 pages
The Stack and Subroutines
No ratings yet
The Stack and Subroutines
2 pages
Exercise 3: Create An XML File: Step 1: One Day's Forecast
No ratings yet
Exercise 3: Create An XML File: Step 1: One Day's Forecast
3 pages
hw1 Problem Set
No ratings yet
hw1 Problem Set
8 pages
INAIO Syllabus
No ratings yet
INAIO Syllabus
4 pages
Mid Semester Question Paper Programming in C
No ratings yet
Mid Semester Question Paper Programming in C
8 pages
UCCD2063 Artificial Intelligence Techniques Practical Assignment
No ratings yet
UCCD2063 Artificial Intelligence Techniques Practical Assignment
3 pages
Oops Assignment 1
No ratings yet
Oops Assignment 1
14 pages
Data Mining Assignment No 2
No ratings yet
Data Mining Assignment No 2
4 pages
Sentiment Analysis On Tweets
No ratings yet
Sentiment Analysis On Tweets
2 pages
Syl3 ML
No ratings yet
Syl3 ML
5 pages
6.891 Machine Learning: Project Proposal
No ratings yet
6.891 Machine Learning: Project Proposal
2 pages
CapStone Project
No ratings yet
CapStone Project
4 pages
What Is ActiveX
No ratings yet
What Is ActiveX
4 pages
Assignment - Machine Learning
No ratings yet
Assignment - Machine Learning
3 pages
Cap Classification System Web
No ratings yet
Cap Classification System Web
16 pages
TypeScript Interview Playbook
From Everand
TypeScript Interview Playbook
Tech Interviews
No ratings yet
GROKKING ALGORITHM BLUEPRINT: Advanced Guide to Help You Excel Using Grokking Algorithms
From Everand
GROKKING ALGORITHM BLUEPRINT: Advanced Guide to Help You Excel Using Grokking Algorithms
William Turner
No ratings yet
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
From Everand
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
PARTHA MAJUMDAR
No ratings yet
Introduction to Algorithms & Data Structures: A solid foundation for the real world of machine learning and data analytics
From Everand
Introduction to Algorithms & Data Structures: A solid foundation for the real world of machine learning and data analytics
Bolakale Aremu
No ratings yet
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
From Everand
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
Rajdeep Dua
No ratings yet
Pragmatic Machine Learning with Python: Learn How to Deploy Machine Learning Models in Production
From Everand
Pragmatic Machine Learning with Python: Learn How to Deploy Machine Learning Models in Production
Avishek Nag
No ratings yet
PRACTICAL GUIDE TO LEARN ALGORITHMS: Master Algorithmic Problem-Solving Techniques (2024 Guide for Beginners)
From Everand
PRACTICAL GUIDE TO LEARN ALGORITHMS: Master Algorithmic Problem-Solving Techniques (2024 Guide for Beginners)
MARTY TWITTY
No ratings yet
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
From Everand
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
Alok Kumar
No ratings yet
Algorithm Challenges: The Dojo Collection
From Everand
Algorithm Challenges: The Dojo Collection
Martin Puryear
No ratings yet
KNIME Essentials
From Everand
KNIME Essentials
Gábor Bakos
No ratings yet
Data Structures and Algorithm Analysis in Java, Third Edition
From Everand
Data Structures and Algorithm Analysis in Java, Third Edition
Clifford A. Shaffer
4/5 (4)
Analysis and Design of Algorithms: A Beginner’s Hope
From Everand
Analysis and Design of Algorithms: A Beginner’s Hope
Shefali Singhal
No ratings yet
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
COMPUTER SCIENCE FOR ROOKIES
From Everand
COMPUTER SCIENCE FOR ROOKIES
Angel Bahabwa
No ratings yet
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
From Everand
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
Peter Bradley
No ratings yet
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Software Engineering & Object Oriented Modeling
From Everand
Software Engineering & Object Oriented Modeling
Jitendra Patel
No ratings yet
Confident Programmer Problem Solver: Six Steps Programming Students Can Take to Solve Coding Problems
From Everand
Confident Programmer Problem Solver: Six Steps Programming Students Can Take to Solve Coding Problems
Cloudy Heaven Games
No ratings yet
Software Testing: A Guide to Testing Mobile Apps, Websites, and Games
From Everand
Software Testing: A Guide to Testing Mobile Apps, Websites, and Games
Mark Garzone
4.5/5 (3)
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet

7641 Assignment 1

Uploaded by

7641 Assignment 1

Uploaded by

CS7641 Assignment 1

• k-Nearest Neighbors. Make sure to try different values of k.

3.1 Experiments and Analysis

Analysis writeup is limited to 10 pages.

3.2 Acceptable Libraries

6 Plagiarism and Proper Citation

You might also like