0% found this document useful (0 votes)
71 views41 pages

1 td19 Cadence

The Cadence Tensilica Product Overview highlights a range of application-specific DSPs and processors designed for audio, vision, AI, and communication applications. It emphasizes the flexibility of the Xtensa processor with automated customization, advanced software tools, and a comprehensive ecosystem for software development. The document also showcases various DSP models, their functionalities, and the extensive software library available for developers.

Uploaded by

ziguoxut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views41 pages

1 td19 Cadence

The Cadence Tensilica Product Overview highlights a range of application-specific DSPs and processors designed for audio, vision, AI, and communication applications. It emphasizes the flexibility of the Xtensa processor with automated customization, advanced software tools, and a comprehensive ecosystem for software development. The document also showcases various DSP models, their functionalities, and the extensive software library available for developers.

Uploaded by

ziguoxut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Cadence Tensilica Product Overview

Tensilica Days Hannover


September 2019
Tensilica® Products

Common Tools, Models, Debug, Trace

HiFi Vision Fusion DNA ConnX Controllers/


Standalone
AI processor
Custom ISAs
Hi
Hello

• Audio pre- and • Image and vision • Always-on sensor • Baseband • High-performance
post-processing pre-/post- processing processing DSPs, NPUs,
• Voice trigger processing • Multi-purpose • Radar, lidar, CPUs
• Noise reduction, • AI at the edge • IoT, consumer and • inference at the communications • Application-specific
audio encode and • SLAM, SGM industrial edge • 5G/4G, UE, V2X, data types
decode • CNN, RNN … IoT and • Custom ISA,
Infrastructure special functions

Broad Range of Application-Specific DSPs Custom


Xtensa® Processor with Automated User-Defined Customization (TIE)
2 © 2019 Cadence Design Systems, Inc. Confidential
Tensilica® Products

Common Tools, Models, Debug, Trace

HiFi Vision Fusion DNA ConnX Controllers/


Standalone
HiFi 5 B20 Custom ISAs
(+ VFPU / NN)
Vision Q7 Fusion G6 AI processor
(+ VFPU) (+ VFPU) (+ VFPU / SPX)
HiFi 4 RISC CPU
(+ VFPU) Vision Q6 Fusion G3 DNA100 B10 (+ FPU)
(+ VFPU) (+ VFPU) (+ VFPU / SPX) To ~ 5 Coremarks
HiFi 3z
Vision P6 BBE64EP
(+ VFPU)
(+ VFPU)
Fusion F1 (+ VFPU) Custom CPU
HiFi 3 User-defined ISA
(+ VFPU) Vision C5 BBE32EP
(+ VFPU) (+ VFPU)
HiFi Mini
BBE16EP
(+ VFPU)

Broad Range of Application-Specific DSPs Custom


Xtensa® Processor with Automated User-Defined Customization (TIE)
3 © 2019 Cadence Design Systems, Inc. Confidential
Automated Tool, ISS, Model, RTL, and EDA Script Generation

Base Processor Complete Hardware Design


Dozens of templates for many
common applications Tensilica® Pre-verified
IP Iterate in
Synthesizable RTL
Pre-Verified Options minutes!
EDA scripts
Off-the-shelf DSPs, interfaces, Test suite…
peripherals, debug, etc.

Advanced Software Tools


Optional Customization IDE
Create your own instructions, Customer C/C++ compiler
data types, registers, interfaces IP Debugger
ISS simulator
SystemC® models
DSP code libraries

4 © 2019 Cadence Design Systems, Inc. Confidential


Automated Tool, ISS, Model, RTL, and EDA Script Generation

Xtensa Processor Generator Outputs


Tensilica®
IP Iterate in Hardware System Modeling / Design Software Tools
EDA Xplorer IDE
minutes! scripts
RTL
Instruction Set Simulator
(ISS) Graphical User Interface
to all tools
Fast Function Simulator
(TurboXim) GNU Software Toolkit
Synthesis (Assembler, Linker,
XTSC Debugger, Profiler)
SystemC
Block Place & Route System Xtensa C/C++ (XCC)
XTMP C-
Modeling Compiler
based
Verification System
Pin Level Modeling C Software Libraries
Customer Chip Integration /
co-
simulation
Operating Systems
IP Co-verification

To Fab / FPGA System Development Software Development

5 © 2019 Cadence Design Systems, Inc. Confidential


Easy DSP Software Development with Xtensa Xplorer
High-performance Tools “know” your
optimizing C/C++ Processor
Compiler configuration

Cleanly map C/C++ Launch on ISS,


to SIMD & VLIW SystemC, RTL,
with no assembly FPGA, or Silicon

Extensive Code coverage,


software DSP profiling, PC trace,
library & examples multi-core support

Familiar Eclipse- 3rd-party JTAG debug


based GUI and real-time trace
6 © 2019 Cadence Design Systems, Inc. Confidential
Benchmark and Analyze
Use Xplorer to Interactively Check Performance As You Build
Check boxes to select Profile your software to see the critical
pre-designed options loops and what operations are used most

Analyze performance bottlenecks


with pipeline view
Instantly view PPA impact as you select options

7 © 2019 Cadence Design Systems, Inc. Confidential


Tensilica® Products - HiFi

Common Tools, Models, Debug, Trace

HiFi Vision Fusion DNA ConnX Controllers/


Standalone
HiFi 5 B20 Custom ISAs
(+ VFPU / NN)
Vision Q7 Fusion G6 AI processor
(+ VFPU) (+ VFPU) (+ VFPU / SPX)
HiFi 4 RISC CPU
(+ VFPU) Vision Q6 Fusion G3 DNA100 B10 (+ FPU)
(+ VFPU) (+ VFPU) (+ VFPU / SPX) To ~ 5 Coremarks
HiFi 3z
Vision P6 BBE64EP
(+ VFPU)
(+ VFPU)
Fusion F1 (+ VFPU) Custom CPU
HiFi 3 User-defined ISA
(+ VFPU) Vision C5 BBE32EP
(+ VFPU) (+ VFPU)
HiFi 2/EP
BBE16EP
HiFi Mini (+ VFPU)

Broad Range of Application-Specific DSPs Custom


Xtensa® Processor with Automated User-Defined Customization (TIE)
8 © 2019 Cadence Design Systems, Inc. Confidential
HiFi DSPs for Audio, Voice and Speech Applications

#1 AUDIO DSP IP AUDIO ECOSYSTEM DIGITAL AUDIO SOFTWARE

130
AUDIO SOFTWARE PACKAGES

#1 +PARTNERS
ECOSYSTEM 250

200

150
AUDIO DSP
CHOICE 100

50
300+
2006 2008 2010 2012 2014 2016 2018

HiFi LICENSEES WORLDWIDE HiFi DSP SHIPMENTS

HiFi DSP HiFi DSPs

105+ Licensees FAMILY


1B SHIPPING
+WORDLWIDE
ANNUALLY

9 © 2019 Cadence Design Systems, Inc. Confidential


HiFi DSPs in Key Markets

Main Functions HiFi Advantage

➢ Front end processing ➢ High Performance

➢ Audio/Speech codecs ➢ Low Power/Energy

➢ Keyword/AI Speech ➢ Software Ecosystem

➢ Post Processing ➢ Ease of Programming

➢ 3D Audio ➢ Flexibility

10 © 2019 Cadence Design Systems, Inc. Confidential


300+ Codecs and Audio/Voice Enhancement Packages
Stereo Audio Voice Multi-Channel Enhancement Enhancement Enhancement
AAC G.711 AAC, HE-AAC (Plus) Accusonus dbx-tv Müller-BBM
WMA Pro, FLAC Focus-MD, Focus- Total Technology m|klang® ANC, ASD
HE-AAC G.722
MPEG-H DNR DTS NXP Software
HE-AAC Plus G.723.1 Alango TruSurround LifeVibes
MP3 G.726 Dolby VCP8, VEP TruVolume VoiceExperience
FLAC G.729AB MS10, MS11,MS12 Arkamys TruDialog QNX
WMA 9 Dolby AC-4 ImmerseU TruTools ANC
AMR-NB
Dolby Atmos Sound Staging TruGaming QSound
WMA 10 Pro AMR-WB
Digital AC-3 and Plus AM3D WOW XT microQ
REAL Audio 8, 9, 10 EVS
DDCE, True HD Diesel Power, Zirene Fortemedia mQSynth, mQ3D
Ogg Vorbis GSM-HR Pro Logic II/IIx Audyssey iS620, iS700, iS800 MQFX
AMR WB+ GSM-FR Dolby Mobile 3+ Dynamic Volume and Harman Qvoice
SBC Bluetooth GSM-EFR EQ Clari-Fi Retune DSP
DS1, DAa2, DAx3
Cadence Hillcrest Labs Beamform, AEC
BSAC AAC-ELD Dolby Digital Live
Sample Rate Freespace Rubidium
DAB Opus DTS Kronoton
Converter Speech Processing
DAB+ mSBC HD Master Audio PDM=>PCM HDSX Sensory
DRM (xHE-AAC) CVSD Express Converter Malaspina TrulyHandsFree 3.0
APE Skype SILK Transcoder, Neo:6 Conexant VoiceBoost Sonic Emotion
Maxim
AEC Broadcast, DMP, M6 AudioSmart Absolute 3D
Cywee Dynamic Speaker SPL
LEC Neural Surround
Sensor Fusion Hub Management Vitalizer
DTS Interactive
Cyberon MightWorks Waves Audio
Headphone:X Call Solution+SRE
Cspotter MAXXAudio,Voice, Nx
11 © 2019 Cadence Design Systems, Inc. Confidential
Tensilica® Products - Vision

Common Tools, Models, Debug, Trace

HiFi Vision Fusion DNA ConnX Controllers/


Standalone
HiFi 5 B20 Custom ISAs
(+ VFPU / NN)
Vision Q7 Fusion G6 AI processor
(+ VFPU) (+ VFPU) (+ VFPU / SPX)
HiFi 4 RISC CPU
(+ VFPU) Vision Q6 Fusion G3 DNA100 B10 (+ FPU)
(+ VFPU) (+ VFPU) (+ VFPU / SPX) To ~ 5 Coremarks
HiFi 3z
Vision P6 BBE64EP
(+ VFPU)
(+ VFPU)
Fusion F1 (+ VFPU) Custom CPU
HiFi 3 User-defined ISA
(+ VFPU) Vision C5 BBE32EP
(+ VFPU) (+ VFPU)
HiFi 2/EP
BBE16EP
HiFi Mini (+ VFPU)

Broad Range of Application-Specific DSPs Custom


Xtensa® Processor with Automated User-Defined Customization (TIE)
12 © 2019 Cadence Design Systems, Inc. Confidential
Vision Recent Customer Success

• Vision P5 & P6 DSP • Vision P5 & P6 DSP


• Integrated in AI Processing Unit (P60)1 • For 3D sensing, HMI, AR/VR apps
• 35+ Smartphone designs2 • 100+ Smartphone designs3

• Vision P6 used as Image Recognition • Hi3559AV100, Hi3519AV100 new- • Vision P5 DSP5


Processor; Quad-core configuration generation intelligent video processor • Creates high-resolution 3D images in
• GW5400, the World’s First Automotive
• Up to 1024 GOPs & 3.8x more power • Hi3359A (4x P6), Hi3519A (1x P6) real time based on advanced RF
Smart Viewing Camera Processor
efficiency than CPUs, • High-End Surveillance, professional technology
• Vision P5
• Vision P6 enables ISO26262 ASIL D camera, drone camera, extreme • Targeted for smart home, automotive,
• Zero airflow environment, smart rear-
certification sports motion DV smart retail, robotics markets
view mirror, backup-camera
13 © 2019 Cadence Design Systems, Inc. Confidential Source: 1, 2, 3, 4, 5
Vision Product Comparison Chart

Vision P6 Vision Q6 Vision Q7


Vision and Low-end AI
Vision and Low-end AI (up to 768GMAC/sec)
Vision and Low-end AI
Use Case (up to 256GMAC/sec)
(up to 384GMAC/sec) (maximizing MHz)
(maximizing MHz) ISA Improvements for
SLAM
MACs 8x8 256 256/512 (Optional)
(Higher MAC = 8x16 128 128
Higher
Compute) 16x16 64 64/128 (Optional)
16b Half
32 way SIMD (optional) 2X 32 way SIMD (Optional)
Vector Floating Precision
Point Unit 32b Single
16 way SIMD (optional) 2x 16way SIMD (Optional)
Precision
MAX SIMD Width 64 way 8bit 64 way 8 bit
SuperGather (Scatter Gather) Yes Yes

14 © 2019 Cadence Design Systems, Inc. Confidential


Tensilica: Comprehensive Vision Software Solutions
Full ecosystem of software frameworks and compilers for all vision programming styles

OpenCL Halide Embedded C/C++ OpenVx Graph*


User Code
Cadence Compiler / Tool
Cadence SW library / Runtime
Halide Compiler OpenVx Toolkit* Cadence Tensilica DSP
*Not supported on Vision Q7

OpenCL Compiler (LLVM) Xtensa C/C++ Compiler (XCC & LLVM)

OpenCL Runtime Tile Manager OpenVx Runtime*

OpenCL
OpenCV based Imaging Library (lib “XI”)
BIFL library

DMA Manager (libidma)


OpenVx/OpenCL Host
Code
XTOS (Single Thread) or XOS (Multithread) or Commercial RTOS

Linux Host CPU


Vision P6 / Q6 / Q7 DSP (Xtensa or Other)

Vision + AI Enhanced
15 © 2019 Cadence Design Systems, Inc. Confidential
Tensilica® Products – Fusion G

Common Tools, Models, Debug, Trace

HiFi Vision Fusion DNA ConnX Controllers/


Standalone
HiFi 5 B20 Custom ISAs
(+ VFPU / NN)
Vision Q7 Fusion G6 AI processor
(+ VFPU) (+ VFPU) (+ VFPU / SPX)
HiFi 4 RISC CPU
(+ VFPU) Vision Q6 Fusion G3 DNA100 B10 (+ FPU)
(+ VFPU) (+ VFPU) (+ VFPU / SPX) To ~ 5 Coremarks
HiFi 3z
Vision P6 BBE64EP
(+ VFPU)
(+ VFPU)
Fusion F1 (+ VFPU) Custom CPU
HiFi 3 User-defined ISA
(+ VFPU) Vision C5 BBE32EP
(+ VFPU) (+ VFPU)
HiFi 2/EP
BBE16EP
HiFi Mini (+ VFPU)

Broad Range of Application-Specific DSPs Custom


Xtensa® Processor with Automated User-Defined Customization (TIE)
16 © 2019 Cadence Design Systems, Inc. Confidential
Trends in Multiple Markets Driving the Need for Fusion G Family

More floating-point use… Multi-purpose…


• Algorithms and codecs in floating point • Run multiple algorithms on one ISA
• Time-to-market benefits • Fixed- and floating-point DSP
• Out-of-box performance • Support for multiple data types
• Easy to program and optimize • Efficient real-time control

17 © 2019 Cadence Design Systems, Inc. Confidential


Target Applications for Fusion G Family

General
Noise/Echo Radar
Cancellation Processing
Signal
Processing
Low-End
Surround
Image
Sound
Processing
Motor Baseband Sensor
Control Processing Fusion

Hi
Hello

18 © 2019 Cadence Design Systems, Inc. Confidential


Tensilica® Products – DNA (AI inference)

Common Tools, Models, Debug, Trace

HiFi Vision Fusion DNA ConnX Controllers/


Standalone
HiFi 5 B20 Custom ISAs
(+ VFPU / NN)
Vision Q7 Fusion G6 AI processor
(+ VFPU) (+ VFPU) (+ VFPU / SPX)
HiFi 4 RISC CPU
(+ VFPU) Vision Q6 Fusion G3 DNA100 B10 (+ FPU)
(+ VFPU) (+ VFPU) (+ VFPU / SPX) To ~ 5 Coremarks
HiFi 3z
Vision P6 BBE64EP
(+ VFPU)
(+ VFPU)
Fusion F1 (+ VFPU) Custom CPU
HiFi 3 User-defined ISA
(+ VFPU) Vision C5 BBE32EP
(+ VFPU) (+ VFPU)
HiFi 2/EP
BBE16EP
HiFi Mini (+ VFPU)

Broad Range of Application-Specific DSPs Custom


Xtensa® Processor with Automated User-Defined Customization (TIE)
19 © 2019 Cadence Design Systems, Inc. Confidential
AI Inference at the Edge Processing Needs

• IoT, Mobile, Surveillance, AR/VR, Automotive


Scalability
• <0.5TMAC to 100sTMAC

• Higher Resolution
Increase in Compute Requirement
• Move from Traditional to AI

• AR/VR
Latency Critical Applications
• Automotive

• Battery Life
Higher Power Efficiency
• Bandwidth Constraints

Flexibility • Continuously adapting/changing market needs

20 © 2019 Cadence Design Systems, Inc. Confidential


Tensilica® DNA 100 Key Features
Scalability
• Supports 256, 512, 1K, 2K Physical 8b MAC configurations
• Single click configuration select

Throughput
• Sparse compute hardware engine avoids multiply by 0 (activation & weights)
• Significant higher throughput and lower power compared to traditional dense compute engines

Bandwidth
• Compressor/Decompressor hardware block avoids data movement of 0 (activation & weights)
• Efficient memory management

Programmable
• Integrated Vision P6 provides flexibility to support any hardware unsupported layer efficiently
• Integrated Vision P6 provides extensibility through TIE

Standalone
• Efficient convolution & fully connected layer support through high MAC occupancy rate
• Integrated VPU supports commonly used non-convolution layers

21 © 2019 Cadence Design Systems, Inc. Confidential


Tensilica Neural Network Compiler
NN Framework
Key Features
(Caffe, Tensorflow)
➢ Support for DNA 100, Vision Q6, Vision C5, Vision P6 DSPs

➢ Support for Windows and Red Hat Linux


Trained Model
➢ Custom Layer
Cadence NN Optimizer
Tensilica Neural ➢ Analyzer enables:
NN Parser Network Compiler
Float to Fixed Point ➢ prioritize accuracy vs performance
Conversion
➢ Min/Max range control to guide quantization

➢ Optimizer enables:
Optimizer and Code NN Library with
➢ Efficient memory management (ex. avoid unnecessary
Generation Meta Information
roundtrips, local memory partitioning and buffering)

➢ Tile & DMA management

Optimized Target ➢ Graph transformation (ex. fusion, elimination)


Specific code for
➢ NN library showcasing processor specific optimized
Tensilica Products
functions

22 © 2019 Cadence Design Systems, Inc. Confidential


Tensilica® Products - ConnX

Common Tools, Models, Debug, Trace

HiFi Vision Fusion DNA ConnX Controllers/


Standalone
HiFi 5 B20 Custom ISAs
(+ VFPU / NN)
Vision Q7 Fusion G6 AI processor
(+ VFPU) (+ VFPU) (+ VFPU / SPX)
HiFi 4 RISC CPU
(+ VFPU) Vision Q6 Fusion G3 DNA100 B10 (+ FPU)
(+ VFPU) (+ VFPU) (+ VFPU / SPX) To ~ 5 Coremarks
HiFi 3z
Vision P6 BBE64EP
(+ VFPU)
(+ VFPU)
Fusion F1 (+ VFPU) Custom CPU
HiFi 3 User-defined ISA
(+ VFPU) Vision C5 BBE32EP
(+ VFPU) (+ VFPU)
HiFi 2/EP
BBE16EP
HiFi Mini (+ VFPU)

Broad Range of Application-Specific DSPs Custom


Xtensa® Processor with Automated User-Defined Customization (TIE)
23 © 2019 Cadence Design Systems, Inc. Confidential
Cadence Tensilica DSPs - ConnX Family
ConnX BBE & ConnX B for complex vector computation
ConnX • Radar / Comms / OFDM domain-optimized DSP with rich feature set
– Optimal for many classes of algorithm
– FFT, filters, matrix operations, math functions
– Native support for complex and real datatypes
– In fixed point or optional vector floating point
• Scalable solution – 5 family members BBE16/32/64EP & B10/20
• Baseband – A family of DSPs with source code compatibility
processing – Wide choice of PPA points for multiple applications with same source and tools
• Radar, lidar, – The same ISA is implemented in SIMD8, SIMD16 or SIMD32
communications – 16-bit real and complex values (datapath is 128b, 256b, 512b)
• 5G/4G, UE, V2X, – 32-bit real and complex values on B10/20 (datapath is 256b, 512b)
IoT and
Infrastructure • Datapath-width-tuned SIMD & VLIW architecture + local memory access
– L1 access is 128b, 256b, 512b
Automated User- – Two L1 memory accesses per cycle allowed – up to 1024b / cycle
Defined
Customization – Optional integrated DMA engine has its own AXI4 master port
– With access to L1 memories and can run “in the background”
Xtensa® Processor
Generator

24 © 2019 Cadence Design Systems, Inc. Confidential


Highlights of ConnX DSP enhanced performance
BBE16/32/64EP and B10/20

Optimal DSP engines

• Highly efficient MAC/ALU usage with broad DSP ISA

Broad range of DSPs buildable from a single platform

• Code compatible for ease of portability for performance scalability


• Scale to meet your needs: 16, 32, 64 and 128 MACs…
• Easily select useful options with click boxes
• Software tools and verification is automatic

Optimized for low power/energy

• Clock and data gating implemented, plus logic and memory bus gating are also options
• Loop buffer can reduce instruction memory accesses
• ISA acceleration reduces cycle counts

Limited area and power increase as frequency is increased

25 © 2019 Cadence Design Systems, Inc. Confidential


Libraries for ConnX DSPs

Rich and optimized DSP libraries for basic and advanced DSP functions
– Vector fixed/floating-point, real/complex support
– Library source code available
Note: To cover the complete library routine list, all optional configurations have to be selected

Category Functions
Scalar and vector for: arc cosine/sine/tangent, cosine/sine/tangent/cotangent, hyperbolic sine/cosine/tangent, sigmoid, logarithm, anti-
Math
logarithm, exponential, square root, reciprocal, reciprocal square root
Magnitude, phase, conjugate, exponent, combined cosine and sine, normalization, division, polar to cartesian and cartesian to polar
Complex
conversion
FIR Filters Convolution, auto/cross correlation, interpolation, decimation, polynomial fitting/interpolation
IIR Filters Biquad, lattice block IIR
FFT Complex, real, FFT/IFFT, DFT
Vector Product, sum, magnitude, reciprocal, division, transcendental, peak, mean

Matrix Multiply, transpose/Hermitian, Cholesky/QR decomposition and recursion, determinant, inverse, linear equation solution

Communication
CRC, de-spreading, modulation, slicer, convolutional encoding, bit manipulation, PRBS Generation, space time coding
Systems

26 © 2019 Cadence Design Systems, Inc. Confidential


Tensilica® Products – Controllers and Customisation

Common Tools, Models, Debug, Trace

HiFi Vision Fusion DNA ConnX Controllers/


Standalone
HiFi 5 B20 Custom ISAs
(+ VFPU / NN)
Vision Q7 Fusion G6 AI processor
(+ VFPU) (+ VFPU) (+ VFPU / SPX)
HiFi 4 RISC CPU
(+ VFPU) Vision Q6 Fusion G3 DNA100 B10 (+ FPU)
(+ VFPU) (+ VFPU) (+ VFPU / SPX) To ~ 5 Coremarks
HiFi 3z
Vision P6 BBE64EP
(+ VFPU)
(+ VFPU)
Fusion F1 (+ VFPU) Custom CPU
HiFi 3 User-defined ISA
(+ VFPU) Vision C5 BBE32EP
(+ VFPU) (+ VFPU)
HiFi 2/EP
BBE16EP
HiFi Mini (+ VFPU)

Broad Range of Application-Specific DSPs Custom


Xtensa® Processor with Automated User-Defined Customization (TIE)
27 © 2019 Cadence Design Systems, Inc. Confidential
Base RISC Capabilities

• Xtensa defines Powerful RISC ISA

• Available in LX or NX pipelines
– Trade clock speed vs extreme flexibility

• Can be used for tiny control tasks

• Can be used for high performance general processing

• Good debug, trace features

28 © 2019 Cadence Design Systems, Inc. Confidential


Configurability – features and performance

• High degree of configurability


– Base RISC ISA – tune to likely requirements (MUL, DIV, interrupt, debug, trace ..)
– Supports optional single and double precision Floating point
– Caches or TCM

• Can have user-defined extensions through TIE language


– Could be new calculation operations
– Register files, addressing modes
– Interfaces to HW engines

• Up to 5 Coremarks
– High performance multi-issue controllers

29 © 2019 Cadence Design Systems, Inc. Confidential


Differentiate with a custom ISA using “TIE”, automatic tools support
Tensilica Instructions Extensions
Example: I/O Queue adder
inA
• Simple Verilog-like language, where you can define… Create an addq instruction with three
outC
256 bit queues and an “add” operation:
– Input/output queues and ports inB +
queue inA 256 in
– Custom register files queue inB 256 in
– Fast lookup tables and local memories queue outC 256 out
operation addq {} {in inA, in inB, out outC} {
– Simple single-cycle or multi-cycle instructions
assign outC = inA + inB;
– SIMD for vectorization }
– FLIX (VLIW) for grouping parallel operations into one instruction High throughput without using system bus:
64 bytes in and 32 bytes out per operation
Virtually unlimited bandwidth

Example: Popcount acceleration


Create a pop_count instruction that counts the one’s in a 32-bit register by adding the bits together.
This simple Verilog-like code is all it takes to create both the pre-verified adder RTL (175 gates) and the instruction:

operation pop_count {out AR co, in AR ci}{}{


wire [3:0] a0 = ci[0] + ci[1] + ci[2] + ci[3] + ci[4] + ci[5] + ci[6] + ci[7];
wire [3:0] a1 = ci[8] + ci[9] + ci[10] + ci[11] + ci[12] + ci[13] + ci[14] + ci[15];
wire [3:0] a2 = ci[16] + ci[17] + ci[18] + ci[19] + ci[20] + ci[21] + ci[22] + ci[23];
wire [3:0] a3 = ci[24] + ci[25] + ci[26] + ci[27] + ci[28] + ci[29] + ci[30] + ci[31];
wire [5:0] sum = a0 + a1 + a2 + a3;
assign co = {26’b0, sum};
} Best hand-coded ASM using standard instructions takes >10 cycles.
This simple instruction takes it down to just one cycle for 10x speedup
30 © 2019 Cadence Design Systems, Inc. Easily
Confidential add custom acceleration into the processor ISA with full tools support
Vision Q7

31 © 2019 Cadence Design Systems, Inc. Confidential


Tensilica Vision Q7 DSP: 6th-Generation Vision and AI DSP

Follow on to the Vision Q6 DSP for Vision and AI:


Achieves 1.7GHz**/1.55** GHz peak frequency

Up to 1.7X higher TOPS in the same area


Delivers up to 2.06** TOPS/ 1.82* TOPS

Up to 2X performance for vision/AI applications,


including floating-point performance

Up to 2X GMAC/mm2 and GFLOPS/mm2

• * 16nm process with OD


• ** 1.7GHz in 7nm with OD
• ® Vision Q7 DSP performance is relative to the Vision Q6 DSP

32 © 2019 Cadence Design Systems, Inc. Confidential


Software Ecosystem

33 © 2019 Cadence Design Systems, Inc. Confidential


Tensilica: Comprehensive Vision Software Solutions
Full ecosystem of software frameworks and compilers for all vision programming styles

OpenCL Halide Embedded C/C++ OpenVx Graph*


User Code
Cadence Compiler / Tool
Cadence SW library / Runtime
Halide Compiler OpenVx Toolkit* Cadence Tensilica DSP
*Not supported on Vision Q7

OpenCL Compiler (LLVM) Xtensa C/C++ Compiler (XCC & LLVM)

OpenCL Runtime Tile Manager OpenVx Runtime*

OpenCL
OpenCV based Imaging Library (lib “XI”)
BIFL library

DMA Manager (libidma)


OpenVx/OpenCL Host
Code
XTOS (Single Thread) or XOS (Multithread) or Commercial RTOS

Linux Host CPU


Vision P6 / Q6 / Q7 DSP (Xtensa or Other)

Vision + AI Enhanced
34 © 2019 Cadence Design Systems, Inc. Confidential
XI-Library: Accelerates Commonly used OpenCV functionality

• Tile based processing for chaining of functions in local memory to efficiently use memory
bandwidth
• Specific kernel sizes, data types and modes as part of API to avoid excessive checking
inside the function for every tile
XI-Library • Uses more efficient data structures for images and tiles instead of OpenCV structures
that waste local memory
• In most cases, function works on image planes to avoid frequent deinterleaving of
interleaved image data

• XI Library source code workspace: Delivered with XPG


XI Library • XI Library performance
Deliverables • XI Library user’s guide

XI-Library • Image Proc Modules: Convolution functions, Geometric Functions,…


Example • Features Modules: Feature Detection, Feature Descriptor
• Motion Analysis & background analysis
Functions • Core modules: Binary elements, bitwise operation, vector operations…

35 © 2019 Cadence Design Systems, Inc. Confidential


Detail Description of XI Library Functions (few examples)

XI Library Module Sub Module Example functions (subset) Description


Convolution functions xiBilaterfilter, xiBoxfilter, xiCannyedge “Image processing” module
Geometric transformation xiResizeBilinear, xiTranspose, xiWarpAffine contains functions related to
Image Proc module geometrical image
Structural analysis xiConnectedComponents transformations, scaling, 2D
Miscellaneous transforms xiCvtColor, xiIntegral, xiIntegralSqr convolution, edge detection

Feature detection xiCornerHarris, xiFAST Feature detection and


Features module description functions are
Feature descriptor xiBRIEF
contained in the “features2D”
module
Motion analysis xiAccumulateWeighted, xiMeanShift, xiOpticalFlow_TrackPoint “Video” module contains
Video/Motion Analysis module Background subtraction xiBS_MOG functions related to video
processing.
Utility functions xiErrStr , xiCopytile, xiFillTile
Unary element wise operations xiAbs, xiBitwiseNot, xiClip, …
Binary element wise operations xiAbsdiffScalar, xiAddScalar, xiBitwiseAndScalar, …
with scalar value “Core” module includes low-
Core module level support and helper
Binary element wise operations xiAbsdiff, xiAdd, xiBitwiseOr, …
functions
Reduction operations xiCountNonZero, xiGatherLocationsEQ, xiMaxLoc, …
Vector operations xiMagnitude, xiPhase, xiPolarToCart, …
Miscellaneous operations xiExtractChannel, xiLUT, xiMerge2/xiMerge3, ...

36 © 2019 Cadence Design Systems, Inc. Confidential


Example of Lane Departure Warning: Work done at Cadence
Use of XI-Lib

37 © 2019 Cadence Design Systems, Inc. Confidential


Android Neural Network (ANN) API
Support for Android 8.1 Oreo ANN release

Host
• Graph Executor decides to
- execute graph/sub-graph/layer
- Selects full layer kernel
- Tile & DMA management*
- Data/weight rearrangement
• XRP sends commands to DSP

DSP
• NN HAL Command Interpreter
- Interprets commands from XRP to full layer kernel
• Full layer kernel
- Allocates local memory
- Loops over tiles, DMAs data, invokes library
Vision
DSP • XI CNN Library
- Optimized library using Google’s Quantization scheme
Source: https://developer.android.com/ndk/guides/neuralnetworks/index.html

38 © 2019 Cadence Design Systems, Inc. Confidential * Not needed with Vision DSP
Software Status

39 © 2019 Cadence Design Systems, Inc. Confidential


Vision and AI Software Status: Vision Q7
Current Release Next Release/Comment

Example SW package is available today with XPG


SW Pkg
Released with RI.1

Various SW All SW kernel listed in P6vs Q7 comparison are available today


Kernels (17,18,19). They are all optimized for Vision Q7

First release : Released 8.0.1 released with RI.1 Optimized for Vision Q7: Oct 2019
XI-Lib Available with XPG Also adding Matrix and FP functions in Nov 2019

512MAC:
Estimates for Resnet50 performance available
XNNC XNNC 1.5 for baseline 256 MAC available now Working on other networks
XNNC release Jan 2020 support for various networks with 512
MAC
Baseline 256 MAC for Android Q: to be release by end of August 2019
ANN Planning to align with Android Q release 512MAC Plan: Plan in August 2019
After we release Vision P6 for Android Q by end of August 2019

Available now
OpenCL
Released with 2019.1

Halide End of Aug 2019

40 © 2019 Cadence Design Systems, Inc. Confidential

You might also like