0% found this document useful (0 votes)
4 views26 pages

Advanced Machine Learning: Course Overview

CS 726 is an Advanced Machine Learning course focusing on representing, generating, and reasoning about high-dimensional objects such as images, text, and time-series data. The course covers various topics including probabilistic graphical models, deep latent variable models, and inference techniques, with an emphasis on practical applications in NLP, vision, and more. Students will engage in quizzes, programming assignments, and exams to evaluate their understanding of the material throughout the semester.

Uploaded by

ads03122002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views26 pages

Advanced Machine Learning: Course Overview

CS 726 is an Advanced Machine Learning course focusing on representing, generating, and reasoning about high-dimensional objects such as images, text, and time-series data. The course covers various topics including probabilistic graphical models, deep latent variable models, and inference techniques, with an emphasis on practical applications in NLP, vision, and more. Students will engage in quizzes, programming assignments, and exams to evaluate their understanding of the material throughout the semester.

Uploaded by

ads03122002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

CS 726

Advanced Machine Learning


Course Overview
Sunita Sarawagi
Spring 2025
Scope of the course
Learning to represent, generate, and reason on objects:
○ High dimensional x = {x1,.....,xn}, space of x is large
○ Inter-dependent components
Examples:
○ Image
○ Video
○ Time-series
○ Text
Examples of high dimensional spaces

This image is very high-dimensional: comprising of 1024*1024*3 = 3 million dimensional real space
Words in a sentence

If you ask a question, you are a fool only once. If


you do not ask, you are a fool forever.

Assume a vocabulary size of 50 K.

The sentence of 25 words has 25*50 K = 1.25 million dimensional discrete space
Different task settings
Given training data D, train a model M that can be used for

● Generation
○ Unconditional: Generate a sample X that is representative in D
○ Conditional: Given an input prompt X, generate a likely sample Y.

● Density estimation:
○ What is the probability that a given sample X is part of the training distribution D

● Other forms of reasoning:


○ Causality, Counter-factual reasoning, recourse on predictions.
Text to text generation
● Write a poem

● Translation

● Text-to-tree generation
Translation
Input: x Predicted sequence: y

• Each token in the output is a random variable and there is inter-


dependence in the output tokens.

• We want to output a probability with the output translation, and not just
produce one translation.

• We cannot predict the whole sentence in one shot but need to


decompose it into parts
Text to image generation
● Imagen
● Stable diffusion
Topics for Generation
Goal: Output a distribution 𝑃𝜃 𝒚 𝑥 over a structured
output 𝒚 = 𝑦1 , … , 𝑦𝑛 , optionally conditioned on an
input x.
● Representation/Modeling: Form of 𝑃𝜃 , how to
represent 𝑃 𝒚 of high-dimensional y for easy
learnability and efficient inference.
● Training or learning: How to parameterize the
distribution and learn the parameters
● Inference: How to efficiently generate?
Key insight from the course

Decompose high-dimensional objects into


smaller manageable sub-parts
Representation
● With observed variables

● With latent variables

Can we make the dependency graph simpler via factorization?


Representation
● Represent the rate of change of a random
variable (stochastic differential equations)
Learning
● How to parameterize the joint distribution for
sample-efficient learning

● How to efficiently learn the parameters 𝜃 of the


distribution
○ Training data (conditional): 𝐷 = { 𝑥 1 , 𝒚1 , … , 𝑥 𝑁 , 𝒚𝑁 }
○ Training data: (unconditional) D = {x1, x2,...., xN}
Adapting trained distributions

● In-context learning for regression, time-series,


and language tasks

● Parameter efficient fine-tuning


Inference
● Given a 𝑥, how to efficiently find the most likely
𝑦1 , … , 𝑦𝑛 ∶ MAP Inference.
● How to generate multiple representative
examples from estimated model: Sampling
○ Generate examples that are representative of the
distribution
Density estimation
Given D = {x1, x2,...., xN} learn a P(x), so that given a new x we can efficiently
calculate the probability of “x”.

Applications: Out of distribution detection, outlier detection, classification

Density estimator
Course contents

Representation of P(X) or P(Y|X)


● Probabilistic graphical models: Bayesian Networks and Markov
Random Fields
○ Exact, efficient, but limited capacity
○ But, important to understand them to build a framework for
probabilistic reasoning
○ Intuitive and easy to incorporate prior knowledge and biases
○ Special Graphical models
■ Gaussian processes: special structure that allow trivial computation
of marginals
Representation (continued)

● Deep latent variable models:


● VAEs, GANs, Discrete diffusion models – technology behind latest image
generation models such as ImageGen
● Representation via variable transformation: Normalizing
flows
● Stochastic differential equations P(Y|X) where X is time
and distribution represented as rate of change →
continuous time diffusion model
Course contents
Learning
● Parameterization (model architectures for efficient learning)
○ Feature-based like in CRFs
○ Deep neural methods e.g. transformers
● Training algorithms
○ Maximum likelihood learning
○ Generalized Expectation Maximization: Variational Auto
Encoders, diffusion models for images
Learning (continued)
● Advanced topics from deep learning:
● In-context learning in foundation models
● Parameter efficient fine-tuning
● Model editing
Course contents

Inference
● Boolean queries on conditional inference
● Marginalization queries: P(Xi), max_x P(x)
○ Sum-product and max-product Inference in Graphical Models
● Sampling
○ Classical methods of sampling in tractable model: forward sampling, importance
weighted sampling, Markov Chain Monte Carlo sampling (MCMC),
○ Recent methods usable in deep learning: Monte-Carlo with Langevin dynamics
Inference (Continued)
● Inference challenges in modern LLMs (a special Bayesian network)
○ Limitations of greedy decoding
○ Sampling multiple generations
○ Grammar constrained decoding
○ Speculative decoding

● Other forms of Inference


● Causal effects
● Algorithmic recourse
Who should take the course
● Students who are interested in doing research in machine learning
● Students who want to learn to think about learning from a probabilistic
perspective in the context of modern deep learning
● Students who want to model learning tasks in a manner that cuts across
applications.
○ The course will cite applications in NLP, vision, time-series, event sequences, and speech
when relevant but it is not primarily about any of these applications.
Mode of running the course
● Two 85 minute slots per week:
● SAFE/Moodle quiz on the material covered in the prior week
○ 20 minute duration at a pre-announced time.
○ Grading will be done on top n-2 out of n quizzes. No compensation for missed quizzes.
○ First quiz on Jan 15th on probability and ML basics
● All materials will be uploaded on Moodle, announcements via Moodle,
questions on Moodle or [email protected]
○ Forum for each topic for discussions and questions.
Evaluation
Approximate credit structure
• 15% In-class Quizzes
• 20% 4—6 graded programming and paper homeworks (in teams of 3)
• 25% Mid-semester exam
• 35% End semester exam
• 3% Scribing
• 2% Attendance and class participation

Course calendar https://www.cse.iitb.ac.in/~sunita/cs726/

You might also like