0% found this document useful (0 votes)
29 views5 pages

Syllabus

The course CMSC 25025 / Stat 37601: Machine Learning and Large-Scale Data Analysis, taught by Yali Amit, covers machine learning and statistics, focusing on methods such as classification, regression, and clustering. Students are expected to have programming skills in Python and a background in linear algebra and calculus. Grading consists of homework, quizzes, a midterm, and a final project, with collaboration encouraged but original work required.

Uploaded by

Alexander Qu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views5 pages

Syllabus

The course CMSC 25025 / Stat 37601: Machine Learning and Large-Scale Data Analysis, taught by Yali Amit, covers machine learning and statistics, focusing on methods such as classification, regression, and clustering. Students are expected to have programming skills in Python and a background in linear algebra and calculus. Grading consists of homework, quizzes, a midterm, and a final project, with collaboration encouraged but original work required.

Uploaded by

Alexander Qu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CMSC 25025 / Stat 37601: Machine Learning and Large-Scale Data Analysis

Syllabus, Spring 2024

Instructor:

Yali Amit, [email protected], Office hours: Mondays 3-4PM

Course Assistants:

Yuhan Philip Liu: [email protected], Office hours: Mondays 5-6.

Kulunu Dharmakeerthi: [email protected], Office hours: Friday 2-3.

Office hours will be held via zoom.

Course Overview

This course is an introduction to machine learning and statistics. The course presents
motivation, methods, implementation and some supporting theory for several types of data
analysis, including classification and regression, clustering, unsupervised feature learning, and
multi-layer networks. The main objective of the course is for students to gain an understanding
of and experience with some essential statistical machine learning methodology and practice.
The course will also touch on social impacts of the use of machine learning.

The course will not follow a textbook closely. However, the following book contains some of the
course material: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, by
T. Hastie, R. Tibshirani, and J. Friedman, Springer, 2nd edition. The book is available at:

https://web.stanford.edu/~hastie/ElemStatLearn/index.html
What you will need to succeed in this course:
​ Some degree of:

o​ Programming maturity (python)


o​ Exposure to linear algebra and multivariable calculus (essential)
o​ Exposure to basic algorithmic ideas and methods
o​ Exposure to basic statistical ideas and methods
o​ Homework 1 will be posted on Monday March 18 5PM and is due Tuesday March 26 2PM.
It provides a good indication of the level of math and programming needed in the course. If
you have difficulty with this homework this course may not be for you.

Course Structure
Classes will be in person. The slides for each class will be posted before class.

Office hours will be held on zoom.

​ ​ ​ ​ ​ Grading
Assignments: Assignments will include a mix of problem solving and data analysis (coding).
Python will be the course programming language. And all programming assignments must be
handed in using jupyter notebook including cell outputs.

Quizzes

We will hold 2, multiple choice 20 minute quizzes during regular class hours, you will need to
have access to canvas either on a laptop or on your phone.

Exams

Midterm: April 25.

Final: We will assign a final project


The 3 components of your grade will be given the following weights (I will drop the lowest
scoring homework):

HW 40%, Quizzes 10%, Midterm 20%, Final Project 30%

Policy on Assignments and Projects

Assignments will be due in one week, submitted online on Gradescope by 2PM before start of
class.

Collaboration on homework assignments (other than HW1) with fellow students is encouraged.
However, such collaboration should be clearly acknowledged, by listing the names of the
students with whom you have had any discussions concerning the problem. You may not share
written work or code---after discussing a problem with others, the solution must be written by
yourself.
Tentative Schedule: Blue - supervised learning, Red - unsupervised
learning. Purple - both.

Week 1 Introduction

Clustering, K-means, Spectral Clustering

Week 2 PCA

Classification/Bayes classifier

Generative models for classification: Class


conditional Gaussian models, different and
same covariances.

Quiz 1 - Thursday March 28'th

Week 3 Generative models: Mixture models and EM

Discriminative models for linear classifiers and


optimization methods.

Week 4 Perceptrons.

SVMs

From linear classifiers to Kernel based SVMs

Sparse coding/Dictionary learning. (?)

Quiz 2 - Thursday April 11.

Week 5 Multilayer perceptrons/Back propagation for


SGD

Convolutional neural networks.


Week 6 Recurrent Neural Networks

Language models and and word embeddings

Midterm - April 25

Final Project assigned.

Week 7 Transformer methods in language models.

Week 8 From EM to Variational Autoencoders.

Other generative models with deep networks.

Week 9 Other unsupervised methods - contrastive


learning.

You might also like