1 td19 Cadence
1 td19 Cadence
• Audio pre- and • Image and vision • Always-on sensor • Baseband • High-performance
post-processing pre-/post- processing processing DSPs, NPUs,
• Voice trigger processing • Multi-purpose • Radar, lidar, CPUs
• Noise reduction, • AI at the edge • IoT, consumer and • inference at the communications • Application-specific
audio encode and • SLAM, SGM industrial edge • 5G/4G, UE, V2X, data types
decode • CNN, RNN … IoT and • Custom ISA,
Infrastructure special functions
130
AUDIO SOFTWARE PACKAGES
#1 +PARTNERS
ECOSYSTEM 250
200
150
AUDIO DSP
CHOICE 100
50
300+
2006 2008 2010 2012 2014 2016 2018
➢ 3D Audio ➢ Flexibility
OpenCL
OpenCV based Imaging Library (lib “XI”)
BIFL library
Vision + AI Enhanced
15 © 2019 Cadence Design Systems, Inc. Confidential
Tensilica® Products – Fusion G
General
Noise/Echo Radar
Cancellation Processing
Signal
Processing
Low-End
Surround
Image
Sound
Processing
Motor Baseband Sensor
Control Processing Fusion
Hi
Hello
• Higher Resolution
Increase in Compute Requirement
• Move from Traditional to AI
• AR/VR
Latency Critical Applications
• Automotive
• Battery Life
Higher Power Efficiency
• Bandwidth Constraints
Throughput
• Sparse compute hardware engine avoids multiply by 0 (activation & weights)
• Significant higher throughput and lower power compared to traditional dense compute engines
Bandwidth
• Compressor/Decompressor hardware block avoids data movement of 0 (activation & weights)
• Efficient memory management
Programmable
• Integrated Vision P6 provides flexibility to support any hardware unsupported layer efficiently
• Integrated Vision P6 provides extensibility through TIE
Standalone
• Efficient convolution & fully connected layer support through high MAC occupancy rate
• Integrated VPU supports commonly used non-convolution layers
➢ Optimizer enables:
Optimizer and Code NN Library with
➢ Efficient memory management (ex. avoid unnecessary
Generation Meta Information
roundtrips, local memory partitioning and buffering)
• Clock and data gating implemented, plus logic and memory bus gating are also options
• Loop buffer can reduce instruction memory accesses
• ISA acceleration reduces cycle counts
Rich and optimized DSP libraries for basic and advanced DSP functions
– Vector fixed/floating-point, real/complex support
– Library source code available
Note: To cover the complete library routine list, all optional configurations have to be selected
Category Functions
Scalar and vector for: arc cosine/sine/tangent, cosine/sine/tangent/cotangent, hyperbolic sine/cosine/tangent, sigmoid, logarithm, anti-
Math
logarithm, exponential, square root, reciprocal, reciprocal square root
Magnitude, phase, conjugate, exponent, combined cosine and sine, normalization, division, polar to cartesian and cartesian to polar
Complex
conversion
FIR Filters Convolution, auto/cross correlation, interpolation, decimation, polynomial fitting/interpolation
IIR Filters Biquad, lattice block IIR
FFT Complex, real, FFT/IFFT, DFT
Vector Product, sum, magnitude, reciprocal, division, transcendental, peak, mean
Matrix Multiply, transpose/Hermitian, Cholesky/QR decomposition and recursion, determinant, inverse, linear equation solution
Communication
CRC, de-spreading, modulation, slicer, convolutional encoding, bit manipulation, PRBS Generation, space time coding
Systems
• Available in LX or NX pipelines
– Trade clock speed vs extreme flexibility
• Up to 5 Coremarks
– High performance multi-issue controllers
OpenCL
OpenCV based Imaging Library (lib “XI”)
BIFL library
Vision + AI Enhanced
34 © 2019 Cadence Design Systems, Inc. Confidential
XI-Library: Accelerates Commonly used OpenCV functionality
• Tile based processing for chaining of functions in local memory to efficiently use memory
bandwidth
• Specific kernel sizes, data types and modes as part of API to avoid excessive checking
inside the function for every tile
XI-Library • Uses more efficient data structures for images and tiles instead of OpenCV structures
that waste local memory
• In most cases, function works on image planes to avoid frequent deinterleaving of
interleaved image data
Host
• Graph Executor decides to
- execute graph/sub-graph/layer
- Selects full layer kernel
- Tile & DMA management*
- Data/weight rearrangement
• XRP sends commands to DSP
DSP
• NN HAL Command Interpreter
- Interprets commands from XRP to full layer kernel
• Full layer kernel
- Allocates local memory
- Loops over tiles, DMAs data, invokes library
Vision
DSP • XI CNN Library
- Optimized library using Google’s Quantization scheme
Source: https://developer.android.com/ndk/guides/neuralnetworks/index.html
38 © 2019 Cadence Design Systems, Inc. Confidential * Not needed with Vision DSP
Software Status
First release : Released 8.0.1 released with RI.1 Optimized for Vision Q7: Oct 2019
XI-Lib Available with XPG Also adding Matrix and FP functions in Nov 2019
512MAC:
Estimates for Resnet50 performance available
XNNC XNNC 1.5 for baseline 256 MAC available now Working on other networks
XNNC release Jan 2020 support for various networks with 512
MAC
Baseline 256 MAC for Android Q: to be release by end of August 2019
ANN Planning to align with Android Q release 512MAC Plan: Plan in August 2019
After we release Vision P6 for Android Q by end of August 2019
Available now
OpenCL
Released with 2019.1