0% found this document useful (0 votes)
16 views

AI ML tutorial

Lecture notes for AI

Uploaded by

mijih26665
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

AI ML tutorial

Lecture notes for AI

Uploaded by

mijih26665
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

EE382M.

20: System-on-Chip (SoC) Design Lecture 1

EE382M.20:
System-on-Chip (SoC) Design

Lecture 1 – Project Overview

Andreas Gerstlauer
Electrical and Computer Engineering
University of Texas at Austin
[email protected]

Lecture 1: Outline
• Marketing requirements
• Market focus, product description
• Cost metrics, product features

• Product requirements
• Deep learning
• Hardware acceleration

• Project description
• Deep/Convolutional Neural Networks (DNNs/CNNs)
• Object recognition
• You Only Look Once (YOLO) CNN
• Hardware and software development tasks

EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 2

© 2018 A. Gerstlauer 1
EE382M.20: System-on-Chip (SoC) Design Lecture 1

Market Focus

• Visual object recognition


• Computer vision for drones, self-
driving cars, home automation, …
 Camera-based automotive driver
assistance systems (ADAS)
Source: Jonathan Hui

 What problem are we trying to solve?


• Standard camera for
– Collision avoidance
– Lane tracking/keeping
– Traffic sign recognition
– …
• Detect, locate and classify
objects in video stream
– Bounding boxes
– Types of objects

EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 3

Competition

• MobilEye (an Intel company)


• http://mobileye.com
• Custom ASIC/SoC solution

• Movidius (an Intel company)


• https://www.movidius.com/
• Custom ASIC/SoC solution

• NVIDIA DRIVE PX
• https://www.nvidia.com/en-us/self-driving-cars
• ARM+GPU (Tegra) based solution
• Used by Tesla

EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 4

© 2018 A. Gerstlauer 2
EE382M.20: System-on-Chip (SoC) Design Lecture 1

Product Description

• Visual object recognition SoC


• Deliver hardware + software intellectual property (IP)

• Cost metrics
• Real-time: frames per second (FPS), reaction time
• Detection accuracy: mean average precision (mAP)
• Power/thermal: W and operating temperature (°C)
• Cost: $ or die area (mm2)

• Product features
• Supported image resolutions
• Supported detection classes
• Flexibility: dynamic, over-the-air reprogramming/updating

EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 5

Product Requirements
• High detection accuracy  deep learning
• Convolutional Neural Network (CNN)
• Trained on large image data set
• Very computationally intensive

• High frame rate, low power  hardware acceleration


• Key/dominating computational kernels
• Convolutions and matrix operations
• General matrix-matrix multiplication (GEMM)

• Flexiblity  software support


• Standard embedded Linux environment
• Software optimizations for performance and power

EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 6

© 2018 A. Gerstlauer 3
EE382M.20: System-on-Chip (SoC) Design Lecture 1

Objection Detection using Deep Learning


• Classification vs. detection

Image classification, global feature Object detection, classification + localization

• Convolutional neural networks (CNNs) widely used for


image classification
• Sliding windows of different size/shape + CNN-based
classification for brute-force, naïve object detection

EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 7

Traditional Neural Networks


Source: Stanford CS231n

Neuron

 Deep Neural Networks (DNNs) with many hidden layers


EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 8

© 2018 A. Gerstlauer 4
EE382M.20: System-on-Chip (SoC) Design Lecture 1

Convolutional Neural Networks (CNNs)


• Different types of layers
● Convolutional: convolutions with trainable filters usually
● Rectified linear units (ReLU): elementwise function combined
● Pooling: non-linear down-sampling
● Fully connected: traditional neural networks

Digit recognition CNN (image classification)

 Fully convolutional network (FCN): no fully connected layer


EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 9

Convolution Operations

EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 10

© 2018 A. Gerstlauer 5
EE382M.20: System-on-Chip (SoC) Design Lecture 1

Convolution Operations
Input map sliding window xc,i,,j (c = 0…C-1)

Filter kernel w0,c,f1,f2

*
F2
F1
(f1 = 0…F1-1, f2 = 0…F2-1)

Output element , , , , , , ,

N filters and output maps Filter kernel wN-1,c,f1,f2

*
F2
F1

Output element , , , , , , ,

EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 11

Casting Convolutions as GEMM


Win
Hin
Cout = 4

Cin = 3
Cin = 3
Conv‐BN‐ReLU

Hout
Cout = 4
Wout
=6
in
H

1. Convert input map


• Image-to-columns
• Sliding window order
• One column per window
• Concatenate columns across map
stacks

EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 12

© 2018 A. Gerstlauer 6
EE382M.20: System-on-Chip (SoC) Design Lecture 1

Casting Convolutions as GEMM


Win
Hin
Cout = 4

Cin = 3
Cin = 3
Conv‐BN‐ReLU

Hout
Cout = 4
Wout

2. Convert filter kernels


• One row per
filter stack

EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 13

Casting Convolutions as GEMM


Win
Hin
Cout = 4

Cin = 3
Cin = 3
Conv‐BN‐ReLU

Hout
Cout = 4
Wout

3. Perform
GEMM

EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 14

© 2018 A. Gerstlauer 7
EE382M.20: System-on-Chip (SoC) Design Lecture 1

Object Recognition (1)


• Region based (Fast/Faster/Mask R-CNN)

• Fast versions by sharing


convolutional layers
• Common feature extraction
for region proposal and
classification

EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 15

Object Recognition (2)

• Single Shot Multibox Detector (SSD)

• Slide window of fixed size and shape


• Detect both bounding box and class within window
• Predict likelihood of different box/class combinations

EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 16

© 2018 A. Gerstlauer 8
EE382M.20: System-on-Chip (SoC) Design Lecture 1

Object Detection (3)


• You Only Look Once (YOLO)
• Don’t slide window, predict for all possible boxes/classes

EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 17

You Only Look Once (YOLO)


https://pjreddie.com/darknet/yolo/

• Default implementation on top of Darknet


• General open-source CNN framework/library in C
• Also available for other deep learning frameworks
• PyTorch, Caffee2 [Facebook], TensorFlow [Google]
EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 18

© 2018 A. Gerstlauer 9
EE382M.20: System-on-Chip (SoC) Design Lecture 1

Project Description

• HW/SW co-design of an embedded SoC


• Low-power YOLO/Darknet implementation
• ARM-based target platform
– ARM Cortex-A9 processor, memory components, I/O devices
– Custom hardware accelerators
– Interconnected via standard system busses or memory/cache interfaces
• Virtual and physical prototyping
– SystemC TLM-based virtual platform model (QEMU ARM simulator)
– ARM- and Xilinx FPGA-based prototyping board (Zynq-7000)

 Lab and project in teams


 Max 10 teams for 10 boards

EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 19

Project Objectives and Activities


• Project objective:
• Implement the YOLO/Darknet code on a ARM based SoC
while meeting the performance, area and power metrics.
• Project activities:
• Profile the YOLO/Darknet software implementation to
determine performance bottlenecks
• Optimize the YOLO/Darknet software (fixed point operation)
• Partition the software into components which will run on the
ARM processor and on hardware accelerators
• Synthesize accelerators into Verilog for gate level
implementation
• Co-simulate and prototype the HW/SW implementation
• Estimate timing, area and power metrics and validate
against product requirements

EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 20

© 2018 A. Gerstlauer 10
EE382M.20: System-on-Chip (SoC) Design Lecture 1

Development Tasks
• ARM software development
• Compile and profile YOLO/Darknet on ARM board
• Convert floating-point to fixed-point code and check mAP
• Compile and profile fixed-point Yolo on ARM board
• Optimize software on dual-core ARM platform
• Develop hardware abstraction layer (HAL) and I/O handler
• Develop interrupt handler & driver (Linux kernel module)

• Hardware development on FPGA


• Hardware accelerators (synthesize fixed-point code)
• Interface to ARM board and on-chip bus
• Memory/cache interfaces (optional DRAM controller)
• Interrupt logic, clocking & reset
• Debug, diagnostics

EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 21

Xilinx Tincy YOLO on Zynq

TincyYOLO demo at NIPS’17

• Starting from Tiny YOLO


• Smaller CNN model for
constrained devices
• HW/SW co-optimizations
• HW acceleration
• SW optimizations
https://t.co/ffkZgMRmwM
EE382M.20: SoC Design, Lecture 1 © 2018 A. Gerstlauer 22

© 2018 A. Gerstlauer 11

You might also like