0% found this document useful (0 votes)
28 views53 pages

Lecture 1 Part 2

Uploaded by

abczyxpqr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views53 pages

Lecture 1 Part 2

Uploaded by

abczyxpqr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

CS231n: Deep Learning

for Computer Vision

Lecture 1 – Part 2 – Overview

Fei-Fei Li, Ehsan Adeli Lecture 1 - 1 April 2, 2024


Instructors

Fei-Fei Li Ehsan Adeli

Fei-Fei Li, Ehsan Adeli Lecture 1 - 2 April 2, 2024


Co-Instructors

Zane Durante Ruohan Zhang Chen Wang

Fei-Fei Li, Ehsan Adeli Lecture 1 - 3 April 2, 2024


Today’s agenda

● A brief history of computer vision


● CS231n overview

Fei-Fei Li, Ehsan Adeli Lecture 1 - 4 April 2, 2024


Today’s agenda

● A brief history of computer vision


● CS231n overview

Fei-Fei Li, Ehsan Adeli Lecture 1 - 5 April 2, 2024


CS231n overview

● Deep Learning Basics


● Perceiving and Understanding the Visual World
● Generative and Interactive Visual Intelligence
● Human-Centered Applications and Implications

Fei-Fei Li, Ehsan Adeli Lecture 1 - 6 April 2, 2024


Deep Learning Basics
• Image Classification: A core task in Computer Vision

cat

This image by Nikita is


licensed under CC-BY 2.0

Fei-Fei Li, Ehsan Adeli Lecture 1 - 7 April 2, 2024


Deep Learning Basics
• Image Classification: A core task in Computer Vision

cat

This image by Nikita is


licensed under CC-BY 2.0 Linear Classifier

Fei-Fei Li, Ehsan Adeli Lecture 1 - 8 April 2, 2024


Deep Learning Basics
• Image Classification: A core task in Computer Vision

cat

This image by Nikita is


licensed under CC-BY 2.0 Regularization & Optimization

Fei-Fei Li, Ehsan Adeli Lecture 1 - 9 April 2, 2024


Deep Learning Basics
• Image Classification: A core task in Computer Vision

cat

This image by Nikita is


licensed under CC-BY 2.0 Neural Networks

Fei-Fei Li, Ehsan Adeli Lecture 1 - 10 April 2, 2024


CS231n overview

● Deep Learning Basics


● Perceiving and Understanding the Visual World
● Generative and Interactive Visual Intelligence
● Human-Centered Applications and Implications

Fei-Fei Li, Ehsan Adeli Lecture 1 - 11 April 2, 2024


CS231n overview

● Deep Learning Basics


● Perceiving and Understanding the Visual World
● Generative and Interactive Visual Intelligence
● Human-Centered Applications and Implications

Fei-Fei Li, Ehsan Adeli Lecture 1 - 12 April 2, 2024


Perceiving and Understanding the Visual World

Tasks Models

Fei-Fei Li, Ehsan Adeli Lecture 1 - 13 April 2, 2024


Tasks Beyond Image Classification
Semantic Object Instance
Classification
Segmentation Detection Segmentation

CAT GRASS, CAT, TREE, DOG, DOG, CAT DOG, DOG, CAT
SKY

No spatial extent No objects, just pixels Multiple Object This image is CC0 public domain

Fei-Fei Li, Ehsan Adeli Lecture 1 - 14 April 2, 2024


Tasks Beyond Image Classification
Video Multimodal Video Visualization &
Classification Understanding Understanding

Running? Jumping?

Fei-Fei Li, Ehsan Adeli Lecture 1 - 15 April 2, 2024


Models Beyond Multi-Layer Perceptron

Illustration of LeCun et al. 1998 from CS231n 2017 Lecture 1

Convolutional neural network

Fei-Fei Li, Ehsan Adeli Lecture 1 - 16 April 2, 2024


Models Beyond Multi-Layer Perceptron

Recurrent neural network Attention mechanism / Transformers


Fei-Fei Li, Ehsan Adeli Lecture 1 - 17 April 2, 2024
CS231n overview

● Deep Learning Basics


● Perceiving and Understanding the Visual World
● Generative and Interactive Visual Intelligence
● Human-Centered Applications and Implications

Fei-Fei Li, Ehsan Adeli Lecture 1 - 18 April 2, 2024


CS231n overview

● Deep Learning Basics


● Perceiving and Understanding the Visual World
● Generative and Interactive Visual Intelligence
● Human-Centered Applications and Implications

Fei-Fei Li, Ehsan Adeli Lecture 1 - 19 April 2, 2024


Beyond 2D Recognition

Fei-Fei Li, Ehsan Adeli Lecture 1 - 20 April 2, 2024


Beyond 2D Recognition: Self-supervised Learning

Fei-Fei Li, Ehsan Adeli Lecture 1 - 21 April 2, 2024


Beyond 2D Recognition: Generative Modeling

Style Transfer

Fei-Fei Li, Ehsan Adeli Lecture 1 - 22 April 2, 2024


Beyond 2D Recognition: Generative Modeling

“Teddy bears working on new


AI research underwater with
1990s technology”

DALL-E 2

Fei-Fei Li, Ehsan Adeli Lecture 1 - 23 April 2, 2024


Beyond 2D Recognition: Vision Language Models

Yasunaga, Michihiro, et al. "Retrieval-augmented multimodal


language modeling." arXiv preprint arXiv:2211.12561 (2022).

Contrastive pre-training in CLIP. The blue squares are the pairs for which we want to
optimize the similarity. Image derived from https://github.com/openai/CLIP

Fei-Fei Li, Ehsan Adeli Lecture 1 - 24 April 2, 2024


Beyond 2D Recognition: 3D Vision

Choy et al., 3D-R2N2: Recurrent Reconstruction Neural Network (2016)

Zhou et al., 3D Shape Generation and Completion through Point-Voxel Diffusion (2021) Gkioxari et al., “Mesh R-CNN”, ICCV 2019

Fei-Fei Li, Ehsan Adeli Lecture 1 - 25 April 2, 2024


Beyond 2D Recognition: Embodied Intelligence

Li et al., BEHAVIOR-1K: A Benchmark for Embodied AI with 1,000 Everyday Activities and Mandlekar and Xu et al., Learning to Generalize Across Long-
Realistic Simulation (2022) Horizon Tasks from Human Demonstrations (2020)

Fei-Fei Li, Ehsan Adeli Lecture 1 - 26 April 2, 2024


CS231n overview

● Deep Learning Basics


● Perceiving and Understanding the Visual World
● Generative and Interactive Visual Intelligence
● Human-Centered Applications and Implications

Fei-Fei Li, Ehsan Adeli Lecture 1 - 27 April 2, 2024


CS231n overview

● Deep Learning Basics


● Perceiving and Understanding the Visual World
● Generative and Interactive Visual Intelligence
● Human-Centered Applications and Implications

Fei-Fei Li, Ehsan Adeli Lecture 1 - 28 April 2, 2024


2018 Turing Award for deep learning
most prestigious technical award, is given for major contributions of lasting importance to computing.

Jeffrey Hinton Yoshua Bengio Yann LeCun


This image is CC0 public domain This image is CC0 public domain This image is CC0 public domain

Fei-Fei Li, Ehsan Adeli Lecture 1 - 29 April 2, 2024


IEEE PAMI Longuet-Higgins Prize
Award recognizes ONE Computer Vision paper from ten years ago with significant impact on computer vision
research.

At CVPR 2019, it was awarded to the 2009 original ImageNet paper

That’s Fei-Fei

Fei-Fei Li, Ehsan Adeli Lecture 1 - 30 April 2, 2024


Fei-Fei Li, Ehsan Adeli Lecture 1 - 31 April 2, 2024
Logistics

Fei-Fei Li, Ehsan Adeli Lecture 1 - 32 April 2, 2024


Fei-Fei Li, Ehsan Adeli Lecture 1 - 33 April 2, 2024
Lectures

- Tuesdays and Thursdays between 12:00 PM to 1:20 PM at NVIDIA Auditorium

- Lectures will not be streamed on Zoom but will be broadcast live via Panopto

- Slides will be posted on the course website shortly before each lecture

- All lectures will be recorded and uploaded to Canvas after the lecture under the
“Panopto Course Videos” Tab.

Fei-Fei Li, Ehsan Adeli Lecture 1 - 34 April 2, 2024


Course website [http://cs231n.stanford.edu/] - Refresh!

Fei-Fei Li, Ehsan Adeli Lecture 1 - 35 April 2, 2024


Friday Discussion Sections
6 Discussion sections Fridays 12:30-1:20 pm, NVIDIA Auditorium
04/05 Python / Numpy Review Session

04/12 Backprop Review Session

04/19 Final Project Overview and Guidelines

04/26 PyTorch / TensorFlow Review Session

05/03 Midterm Review Session


05/10 RNNs & Transformers

Hands-on tutorials, with more practical details than the main lecture

Check Canvas for the Zoom link for the discussion sessions! Recordings will be
available on Canvas.

This Friday: Python / numpy / Colab


Fei-Fei Li, Ehsan Adeli Lecture 1 - 36 April 2, 2024
Ed

For questions about assignments, final project, midterm, logistics, etc, use Ed!

Access: Canvas -> Deep Learning for Computer Vision -> Ed Discussion

SCPD students: Use your @stanford.edu address to register for Ed; contact scpd-
[email protected] for help.

Fei-Fei Li, Ehsan Adeli Lecture 1 - 37 April 2, 2024


Office Hours
We'll be hosting both in-person and remote office hours. (starting week 2)
- Location
- In-person: Huang Basement, check for CS231n signs,
check the course website and Canvas
- Remote: Zoom and QueueStatus to setup queues
- Please see Canvas or Ed for the QueueStatus link
- TAs will admit students to their Zoom meeting rooms for 1-1 conversations when it’s your turn
using QueueStatus.
- The office hour schedule is on the course website
- Ehsan office hours, over Zoom
- Please contact me by email. Explain your point of discussion. I will set up 15-minute meetings.

Fei-Fei Li, Ehsan Adeli Lecture 1 - 38 April 2, 2024


Overview on communication
Course Website: http://cs231n.stanford.edu/
- Syllabus, lecture slides, links to assignment downloads, etc
Ed:
- Use this for most communication with course staff
- Ask questions about homework, grading, logistics, etc
- Use private questions only if your post will violate honor code if you release publicly.
Mailing list
- [email protected]
Gradescope:
- For turning in homework and receiving grades
Canvas:
- For watching recorded lectures
- For watching recorded discussion sessions

Fei-Fei Li, Ehsan Adeli Lecture 1 - 39 April 2, 2024


Assignments
All assignments will be completed using Google Colab

Assignment 1: Will be out Friday 4/5, due 4/19 by 11:59 PM

- K-Nearest Neighbor
- Linear classifiers: SVM, Softmax
- Two-layer neural network
- Image features

Fei-Fei Li, Ehsan Adeli Lecture 1 - 40 April 2, 2024


Grading
All assignments, coding and written portions, will be submitted via Gradescope.

An auto-grading system:

- A consistent grading scheme


- Public tests:
- Students see results of public tests immediately
- Private tests
- Generalizations of the public tests to thoroughly test your implementation

Fei-Fei Li, Ehsan Adeli Lecture 1 - 41 April 2, 2024


Grading
3 Assignments: 10% + 20% + 15% = 45%
In-Class Midterm Exam: 20%
Course Project: 35%
- Project Proposal: 1%
- Milestone: 2%
- Final Project Report: 29%
- Poster & Poster Session: 3%

Participation Extra Credit: up to 3%


Late policy
- 4 free late days – use up to 2 late days per assignment
- Afterwards, 25% off per day late
- No late days for project report

Fei-Fei Li, Ehsan Adeli Lecture 1 - 42 April 2, 2024


Collaboration policy
We follow the Stanford Honor Code and the CS Department Honor Code – read them!
● Rule 1: Don’t look at solutions or code that are not your own; everything you
submit should be your own work
● Rule 2: Don’t share your solution code with others; however discussing ideas or
general strategies is fine and encouraged
● Rule 3: Indicate in your submissions anyone you worked with
Turning in something late / incomplete is better than violating the honor code

Fei-Fei Li, Ehsan Adeli Lecture 1 - 44 April 2, 2024


Prerequisites
Proficiency in Python
- All class assignments will be in Python (and use numpy)
- Later in the class, you will be using Pytorch and TensorFlow
- A Python tutorial available on course website
College Calculus, Linear Algebra
No longer need CS229 (Machine Learning)

Fei-Fei Li, Ehsan Adeli Lecture 1 - 45 April 2, 2024


Optional textbook resources
- Deep Learning
- by Goodfellow, Bengio, and Courville
- Here is a free version
- Mathematics of deep learning
- Chapters 5, 6 7 are useful to understand vector calculus and continuous optimization
- Free online version
- Dive into deep learning
- An interactive deep learning book with code, math, and discussions, based on the NumPy interface.
- Free online version

Fei-Fei Li, Ehsan Adeli Lecture 1 - 46 April 2, 2024


Learning objectives
Formalize computer vision applications into tasks
- Formalize inputs and outputs for vision-related problems
- Understand what data and computational requirements you need to train a model
Develop and train vision models
- Learn to code, debug, and train convolutional neural networks.
- Learn how to use software frameworks like PyTorch and TensorFlow

Gain an understanding of where the field is and where it is headed


- What new research has come out in the last 0-5 years?
- What are open research challenges?
- What ethical and societal considerations should we consider before deployment?

Fei-Fei Li, Ehsan Adeli Lecture 1 - 47 April 2, 2024


Why should you take this class?
Become a vision researcher (an incomplete list of conferences)
- Get involved with vision research at Stanford: apply using this form.
- CVPR 2024 conference
- ECCV 2024 conference
Become a vision engineer in industry (an incomplete list of industry teams)
- Perception team at Google AI, Vision at Google Cloud
- Vision at Meta AI
- Vision at Amazon AWS
- Nvidia, Apple, Microsoft, OpenAI, Salesforce, ……
Apply computer vision to solve problems in other fields of science & engineering
General interest
Fei-Fei Li, Ehsan Adeli Lecture 1 - 48 April 2, 2024
Syllabus

Deep Learning Basics Convolutional Neural Networks Computer Vision Applications

Data-driven approaches Convolutions RNNs / Attention / Transformers


Linear classification & kNN PyTorch / TensorFlow Image captioning
Loss functions Activation functions Object detection and segmentation
Optimization Batch normalization Style transfer
Backpropagation Transfer learning Video understanding
Multi-layer perceptrons Data augmentation Generative models
Neural Networks Momentum / RMSProp / Adam Self-supervised learning
Architecture design Vision and Language
3D vision
Robot learning
Human-centered AI
Fairness & ethics

Fei-Fei Li, Ehsan Adeli Lecture 1 - 50 April 2, 2024


Next time: Image classification with Linear Classifiers
k- nearest neighbor Linear classification

Plot created using Wolfram Cloud

Fei-Fei Li, Ehsan Adeli Lecture 1 - 51 April 2, 2024


Thank you!

Fei-Fei Li, Ehsan Adeli Lecture 1 - 52 April 2, 2024


We will return in 10 minutes

Fei-Fei Li, Ehsan Adeli Lecture 1 - 53 April 2, 2024


We will move to Zoom,
I will email you with instructions

Fei-Fei Li, Ehsan Adeli Lecture 1 - 54 April 2, 2024


The Stanford Institute for Human-Centered AI (HAI) recently
celebrated its 5th year anniversary and as part of
commemorating this achievement, they are producing
documentary-style videos featuring their senior scholars. Fei-Fei,
as co-founder and Denning co-director of HAI, will be featured
prominently. To capture the essence of Fei-Fei's contributions
and insights, a film crew will be present in Fei-Fei's class on April
2 to capture some b-roll footage. While the primary focus of the
filming will be on Fei-Fei, there is a possibility that some of you
might appear in the film as well. If you would like to opt out,
please see the production crew at the back of the room.
Fei-Fei Li, Ehsan Adeli Lecture 1 - 55 April 2, 2024

You might also like