0% found this document useful (0 votes)
38 views46 pages

DWF13 Euf Net T0645

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views46 pages

DWF13 Euf Net T0645

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

TM

November 2013
• The baseband market trends and requirements
• QorIQ Qonverge B4860 block diagram overview
• e6500 and SC3900 cores
• Memory system and interconnect
• CPRI
• MAPLE B3
• Data Path Architecture ( DPAA )
• Power Management
• Q&A

TM 2
Higher Capacities and Data Rates Global LTE Macro Base Station Deployments
Smartphone Density to Reach 1.5 Million by 2015
(In-Stat, Sep-11)
• 32x increase per km2 by 2015
Internet over Mobile Global Capital Expenditures by Wireless
• 70% of mobile data by 2014 Carriers for 4G LTE Infrastructure Gear will
Reach $36.1 Billion by 2015
(Source: Bell Labs, Apr-11) (iSuppli Research, Jan-12)

Wireless Standards & Topologies Evolution


• Multi-standard support required (LTE, LTE-Advanced,
3G-HSPA, TD-SCDMA, GSM)
• Move to heterogeneous networks; Macro, RRH, Metrocell,
Picocell & Cloud RAN architectures

Rapidly Evolving Macro Infrastructure


• Drive for increased processing & performance
• Need to leverage spectrum availability
• Desire for efficient energy consumption
• Continual strive for increased efficiency and lower costs

TM 3
Connectivity
• Coverage: Urban, highways and rural
• Spectral efficiency: Radio and network performance
High
• Multi-standard: Supports variety of users Throughputs &
• Reliability: Zero down time Coverage

Capacity Multi Many


Standard Active
• Users: Hundreds of active users & SDR Users

• Throughputs: Over 1Gbps data rate


• Scalable/Modular: Sectors, antennas, users…
• Active Antenna, MIMO: Improved QoS
Lowering Costs
Cost Energy Efficiency

• Space: Miniaturization and consolidation of equipment


• Low Impact: Power & Cost
• Future Proof: Easy upgrades, SDR
• Complete solutions: Ease of development, faster time to market

TM 4
• Next generation, e6500 Dual-
Thread Power Architecture® cores
offer highest CoreMark/Watt with
AltiVec technology for dramatic L2
scheduling acceleration

• Next generation, SC3900


StarCore™ provides 2x DSP
performance compared to
competitive offerings

• 20GHz of Programmable
Performance

• Smart hardware acceleration for


Layer 1, 2, Control and Transport
allows for best in class
performance, power and cost

• Large scale SoC integration allows


for simpler programming models
and easier load balancing

• Integrated, Rich I/O including


backhaul & antenna interfaces
provides flexibility, interoperability
and reduces overall system cost

TM 5
3 sector, 20 MHz LTE 3 sector, 20 MHz LTE
with 5 major components on a single SoC

CPRI
Antenna
Layer-1
Back Haul
Antenna B4860 PHY 10 Gbps
GE
DSP PHY 1Gbps
I2C
Layer-2/3 sRIO
Transport UART
Maint.
& Control
CPRI

DSP Multicor SPI CPRI


sRIO
Switch
e
MPU
Flas
DDR2 DDR1
h

DSP
Flas
h
DDR
DDR3
POWER 3

B4860 SoC
4X Cost Reduction
3X Power Reduction

TM 6
High Density Baseband Solution

LTE/LTE-A SOLUTION Supports LTE-Advanced SOLUTION


Base station on a chip Large cells 60 MHz sector on a chip
20 MHz, 3-sector, 24x24 Ant. Supports multiple 16 Ant.
1.4 Gbps Aggregated Throughput Radio Access 1.8Gbps Aggregated Throughput
Technologies

WCDMA SOLUTION Supports LTE,


LTE-A, WCDMA, TD-SCDMA SOLUTION
Base station on a chip Base station on a chip
5 MHz, up to 6 cells, 12x12 ant. TD-SCDMA, GSM
32 carriers with a single device
318 Mbps Aggregated Throughput
Multi-mode
Support

• Industry leading performance solution for base stations


• Based on advanced Power Architecture, StarCore, CoreNet and MAPLE technologies
• First SoC in 28nm technology node for Wireless Infrastructure
• Supports the most advanced mobile wireless standards

TM 7
High Performance AltiVec AltiVec AltiVec AltiVec
T T T T T T T T
• 64-bit Power Architecture core

PMC

PMC

PMC

PMC
• Dual threads provide 1.7 times the e6500 e6500 e6500 e6500
performance of a single thread 32K 32K 32K 32K 32K 32K 32K 32K

• Clustered L2 cache allowing strict 2MB 16-way Shared L2 Cache, 4 Banks


allocation or full sharing
CoreNet Interface
• 128b AltiVec SIMD unit 40-bit Address Bus 256-bit Rd & Wr Data Busses
CoreNet Double Data Processor Port
Large Memory Space
• 40-bit real address Core Performance: CoreMark™
• Terabyte physical address
Industry’s
Increase Productivity Highest
CoreMark
• Core Virtualization per Watt
2.4x
─ Hypervisor
─ Logical to Real Address Translation

Energy Efficiency
• Drowsy: core, cluster, AltiVec e500 core e6500*
processor (2 thread)

*Based on simulation

TM 8
StarCore SC3900 Flexible Vector Processor
• High DSP performance without compromising flexibility SC3900 SC3900
High Speed FVP Core FVP Core
• Step function in performance over previous generation Baseband
Accelerators
─ 8 instructions per cycle Interface 32K 32K 32K 32K
o Up to 8 data lanes vector in a single instruction (SIMD8)
─ 38.4 GMACS per core @1.2 GHz & 1.2 Tbps
memory bandwidth per core 2MB 16-way Shared L2 Cache, 4 Banks
• State-of-the-art support for control code with Branch
Prediction
• Fully featured Memory Management Unit and Logical to CoreNet Coherent Fabric
Real Address Translation

37,460 BDTI
StarCore SC3900 FVP Clusters Highest
• Six SC3900 Cores Speed
Score
• Clustering two SC3900 under a 2MB, multi-banked L2 20,030
cache
• High bandwidth accelerator ports (up to 1Tbps per
cluster)
• Hardware support for memory coherency between L1,
L2 caches and the main memory Texas Freescale BDTIsimMark2000™
Instruments SC3900 BDTImark2000™
C66x 1.2GHz
1.5GHz
TM 9
Integrated vector processing unit DPAA Control/Transport Hardware Accelerators
for accelerating L2 scheduling
FMAN >20 Gbps aggregate throughput,
Frame Manager Parse, Classify, Distribute
BMAN Manages buffer pools for accelerators
Buffer Manager and network interfaces
Simplified sharing of network
QMAN
interfaces and hardware accelerators
Queue Manager
by multiple cores
Security acceleration for offload of RMAN
transport functionality Seamless mapping sRIO to DPAA
Rapid IO Manager
SEC SNOW-3G, Kasumi, ZUC, IPSec,
Security AES, DES, MD5, SHA-1/2….
Saving CPU Cycles for higher value work

MAPLE-B Layer-1 Hardware Accelerators


LTE, WCDMA, WiMAX, GSM and
Standards support
LTE-Advanced
Very high throughputs enabling low
Throughputs
processing latencies
Programming Simple API
Multimode operations LTE, LTE-A, WCDMA
Innovative MiMO Equalizer for
improved spectral efficiency and
Advanced MiMO
reduce processing latencies compared
to conventional techniques
Direct streaming to/from antenna
Streaming
MAPLE-B without core intervention
Work distribution between
Innovative Layer-1 Internal embedded flows for
cores and accelerators Internal Embedded
PUSCH/PDSCH Uplink/Downlink
Acceleration Flows
processing without core intervention
Completely offloads extensive Baseband algorithms

TM 10
Advance I/O –
Glueless Interfaces to Antenna
Backhaul networking, & Backhaul
delivering line-rate at Advanced Real-time
2x 10 GbE XFI/KR
smallest packet sizes Tracing and Monitoring Classify-Parse-Distribute
4x 1 GbE/2.5 GbE Timing Synchronization
iEEE1588v2
High-speed, industry-
8x CPRI v4.2 standard Antenna
Interfaces at 9.8G
2 controllers with 8 lanes
2x sRIO V2.1
5G
PCIe v2.0 Quad lanes at 5G
Modern NOR/NAND flash
IFC controller & Legacy ASIC
connectivity

SPI, I2C, USB, UART,


Other Peripherals
eMMC, etc

Standard, High Speed Advanced JTAG, Aurora


Antenna Interfaces Debug/Tracing (debug/trace)
sRIO & PCIe for multichip
connectivity

TM 11
TM
• 64-bit Power Architecture
• e5500 core features plus:
− Two threads per core (SMT)
− Dual load/store units, one per thread
• Shared L2 in cluster of 4 cores (8 threads
per cluster)
− 2048KB 16-way, 4 Banks
− High-performance eLink bus between
coreLd/St and instruction fetch units
• Power
− Drowsy core
− Power Mgt Unit
− Wait-on-reservation instruction
• Enhanced MPPerformance
− Accelerated Atomic Operations
− Optimized Barrier Instructions
− Fast intra-cluster sharing
• AltiVec SIMD Unit
• CoreNet BIU
− 256-bit Din and Dout data busses
• 36-bit Real Address
− 256 GByte physical addr. space
• Each thread: Superscalar , seven-issue, out-of-order execution/in-
order completion, Branch units with a 512-entry, 4-way set • Hardware Table Walk
associative Branch Target/History • LRAT
− Logical to Real Addr. translation mechanism
• Execution units: 1 Load/Store Unit per thread, 2 Simple integer per for improved hypervisor performance
thread,
1 Complex for integer Multiply & Divide, 1 Floating-point Unit,
Altivec
• 64 TLB SuperPages, 1024-entry 4K Pages, 36-bit Physical Address

TM 13
• Cluster consists of 2x SC3900 under large and fast shared memory
FVP Cluster (Kibo) – 2 Cores Cluster + Maple port
− Multibank Cache of 2 MB
SC3900 SC3900
− AXI based accelerators coupling port (45-90 GBps) AXI port
to Maple
AXI port
to Maple
Sub-System Sub-System

• Advanced bus architecture SC3900 SC3900


FVP Core FVP Core
− Out of order transaction completion
− Deep pipeline I-Cache D-Cache I-Cache D-Cache

− Full MESI+L HW Coherency


• Advanced L1 memory subsystem architecture
− 36bit address towards memory (64GB space) L2 Bank0 L2 Bank1 L2 Bank2 L2 Bank3

− 32 Kbyte L1 Instruction cache


Shared L2 Cache

− Streaming data paths for read and write


 32 Kbyte read only Data Cache
SoC Fabric
• 8 way, 128B line, streaming PLRU
 1 Kbyte Store/Gather Buffer
• Advanced debug and profiling support Debug Interrupts
256 bit read 256 bit read 512 bit write
Interface

− Rich event monitoring


EPIC Timer

− Multi-core tracing Task


Protection

• Improved core peripherals Debug Support 32 Kbyte


Instruction
32 Kbyte
Read Data
1 Kbyte
Write Data
Address
Translation
OCE DPU Cache Cache Cache
MMU
− All core peripherals are SoC accessible
− Low latency interrupt support SC3900
Core

TM 14
• SC3900 has a fully cache based memory:
 No constraints due to internal memory sizes and rigid allocations
 No DMA management and scheduling overheads
 Smaller internal memories required as only used code/data is allocated

• In addition, the SC3900 supports:


 Lock/Unlock of DDR space on the L2 Cache – M2 behavior
 Partition of the L2 Cache to several orthogonal L2 caches
 Cache equivalent, DMA operations
• Tightly coupled accelerator port - Ultra High bandwidth and low latency

• I/O Stashing and Intervention support

• Cache Management Engine per core

TM 15
• MAPLE/CPRI read/write coherent accesses to clusters L2 caches
− Provides tight coupling of MAPLE/CPRI to the DSP cores
− Provides high BW (>76GBps per cluster) and low latency
− Significantly reduces DDR bus load and coherency traffic
− Parallel access to multiple SC3900 clusters
• MAPLE/CPRI accesses to DDRs directly via CoreNet fabric
• Target selection is based on MAPLE MMU FVP Cluster (Kibo) – 2 Cores Cluster + Maple port

SC3900 SC3900
AXI port AXI port
to Maple Sub-System Sub-System
to Maple

SC3900 SC3900
CPRI FVP Core FVP Core

MAPLE B3
I-Cache D-Cache I-Cache D-Cache
Baseband Accel.

L2 Bank0 L2 Bank1 L2 Bank2 L2 Bank3

Shared L2 Cache

SoC Fabric

TM 16
SC3900 Main Features:
• 4 symmetric DMUs in the DALU
Caches / Memory
− 32xMACs per cycle

XDBWB – 256bits
XDBWA – 256bits

XDBRA – 256bits

XDBRB – 256bits
− 16 FLOPs/cycle Floating Point support

XABB – 32bits
XABA – 32bits
PDB- 256bits
− 4xSIMD8/vector support

PAB – 32bits
• High memory bandwidth
− Program bus – 256 bit/cycle
− Data bus – up to 1024 bit/cycle
• Address & integer unit
PCU
− 2 Load/Store units 8 Loop 32 Address
Fetch 64 Data Registers
Regs Registers
− General Integer Processing Unit (IPU)
Program DALU
− Support multiply and shift AGU
BRU
Control
128 entries MAC1
DMU MAC2
DMU MAC3
DMU MAC4
DMU
• Large register files BTB
LSU LSU IPU 8xMAC 8xMAC 8xMAC 8xMAC
− For both AGU and DALU 2xFMAD 2xFMAD 2xFMAD 2xFMAD

• Enhanced compiler support Debug


− Improved predication
− Enhanced prediction mechanisms
• Enriched Instruction Set
− New binary & syntax
• Improved Multi-core Debug and trace
features

TM 17
• SC3900 is optimized to efficiently handle Baseband PHY Layer
processing
• PHY layer processing can be divided into three categories:
− Computation intensive DSP code (mainly MAC intensive)
− Data manipulation and less intensive DSP code
− Control code
• Each one of the categories is non-negligible in processing
requirements
• There is no clear boundary separation
• SC3900 accelerates all types of Baseband L1 processing

TM 18
• SC3900 provides Vector processor capability by increasing the
execution units and the whole data-path accordingly
− Up to 32 MACs per cycles
− 64 dedicated data registers of 40bit each)
− Upto 1024bit (128B) core-to-L1 Data Cache throughput
per cycle
 Strong and flexible cache based data streaming abilities

• SC3900 optimized datapath lead to high MAC utilization

• Performance:
− SC3900 is 3.5x-4x better than SC3850 in intensive DSP code

TM 19
• “Data manipulation” stands for many different functions existing in Baseband Layer 1 - For
examples:
− Data preparation before/after intensive kernels
 Ex: data re-ordering, matrix transpose, pack/unpack
− Less regular kernels or serial/cyclic kernels with low parallelism
 Ex: QR Decomposition, IIR, Interleaver, encoder.

• SC3900 architecture addresses “Data manipulation” by different means:


− Data-path flexibility: This is the “Flexible Vector Processor” essence
 Register file flexibility: Each unit can read/write any registers
 Execution unit flexibility: Each unit can run different and independent instructions

− Rich and flexible Instructions set


 Efficient instruction set which large support of different data type and size
 New powerful data manipulation specific instructions

• Performance:
− SC3900 is 2x-3x better than SC3850 in “Data Manipulation”

TM 20
• SC3900 control code efficiency
− L1 control functions are tightly integrated with the Arithmetic intensive SW
− Useful for running scheduling functions that are control intensive

• Control code performance is affected by two main aspects:


− Core and Compiler efficiency in typical control code constructs
− Memory system efficiency

• Both have been addressed on the SC3900 – few examples:


− Ability to flatten decision trees using multiple predicates
− Third AGU unit with multiply and shift for address calculation and boost control code
performance
− Full support for non-aligned memory access without penalty
− Larger, clustered 2MB L2 cache to keep the program close to the core

TM 21
• Robust and flexible Instruction set
− Significant improvement over previous generations
− Instructions are highly flexible and fit wide range of DSP operations
− For example, MAC instruction is defined to support:
 Single precision 16bx16b, mixed-precision 16bx32b and double precision 32bx32b
 Saturated and non-saturated arithmetic
 SIMD and dot-product
 Real x Real, Real x Complex, Complex x Complex, Complex x Conjugate

• Uniform and consistent Instruction set


− All instructions can use every register, support all data type, and all addressing
modes
− Remove grouping restrictions
 All instructions can be grouped together

TM 22
• The Instruction set definition is based on deep analysis of baseband
requirements and MAPLE™ offloaded functionality
• Powerful application specific instructions are introduced in SC3900
(patents pending) – few examples:
 Maximum/Peak search
• Find maximum value and index between 4 words and previous results
• Up to 4 Maxsearch (total of 20 elements) per cycle
 Filter and correlation dedicated instructions
• Support for both complex and real filters
 FFT/DFT highly optimized kernel and instructions
 Specialized load/store instructions
• Support matrix transposed & manipulation (2x2, 2x4, 4x4, 2x8, 4x8, 8x8)
 Bit manipulation
• Dedicated instruction for scrambling, puncturing, interleaver
 Reciprocal (1/x), 1/Square Root and Log approximation instructions

TM 23
TM
• Instead of using a conventional L3 cache, the B4860 has a CoreNet
Platform Cache (CPC) — 512KB for each of the two DRAM
controllers.

• The platform cache can function as:


− L3 cache, or
− Scratchpad memory, optimizing traffic to main memory and providing
transient storage for sharing data among the DSPs and CPUs.

• Fully coherent with the other caches in the SoC

TM 25
• Data and code sharing between cores on the same cluster
 Low L2-cache access latency
 Increased cores utilization
 Reduce DDR traffic
 Lower power

• Simplified SoC interconnect


 Reduced area
 Lower power
 Higher performance

• Full hardware coherency


 Within the cluster
 SoC level

TM 26
• B4860 has two main fabrics
 Corenet fabric
 AXI fabric

• CoreNet fabric is the major fabric


 Connect between CPU cluster, DSP clusters, OCeaN world, FM and other
to CPC/DDR memories.
 42.5GB/s of raw bandwidth per cluster.

• AXI fabric
 Connect between Maple units, CPRI to SC3900 clusters and to CoreNet
fabric
 Allows for high throughput, low latency transfers in Layer 1 sub-system

TM 27
• Coherent fabric
 Maintains coherence among all the CPU and DSP caches and memories.
 Reduces multicore software development effort
 Easier software partitioning and upgrade
• High data bandwidth buses: 256-bit at 667 MHz
 42.5 GB/s of raw bandwidth per cluster
• Performance features
 Parallel accesses
 Deep pipeline
 Out-of-order completion
 Inter-processor communication
• Stashing
• PAMU - Peripheral Access Management Unit

TM 28
SC3900 FVP Core SC3900 FVP Core
SC3900
StarCore SC3900
FVP Core StarCore
TM 1.2GHz FVP Core
TM 1.2GHz e6500DT CPU Cluster 1.8GHz
SC3900
StarCore SC3900
FVP Core StarCore
TM 1.2GHz FVP Core
TM 1.2GHz 32 KB 32 KB 32 KB 32 KB DDR3 1.866GHz
512kB L3
DCache ICache
StarCore DCache
TM 1.2GHz ICache
StarCoreTM 1.2GHz I-Cache I-Cache I-Cache I-Cache 64-bit
32DCache
KB 32 ICache 32DCache
KB 32 ICache Power TM Power TM Power TM Power TM
Cache
KB KB DRAM Controller
32DCache
KB 32 ICache
KB 32DCache
KB 32 ICache
KB dual thread dual thread dual thread dual thread
32 KB 32 KB 32 KB 32 KB 1.8GHz 1.8GHz 1.8GHz 1.8GHz
Shared 2MB 32 KB 32 KB 32 KB 32 KB
Shared 1024 KB DDR3 1.866GHz
Shared
L2 Cache2MB D-Cache D-Cache D-Cache D-Cache 512KB L3
L2 Cache2MB
Shared 64-bit
L2 Cache
Shared 2MB Cache
L2 Cache L2 Cache DRAM Controller

3
2 CoreNet TM
PAMU Coherency Fabric 667MHz

1
• Step 1 - Initiator (Core/ Accelerator/ IO) requests data from
MAPLE B3
Baseband Accel.
CoreNet fabric
• Step 2 – CoreNet broadcasts the request to relevant
initiators/target that might have the data
• Step 3 – An initiator/Target which has the latest data
responds and the data reaches the requestor (and
optionally from memory) without a need for SW intervention

TM 29
SC3900 FVP Core SC3900 FVP Core
SC3900
StarCore SC3900
FVP Core StarCore
TM 1.2GHz FVP Core
TM 1.2GHz e6500DT CPU Cluster 1.8GHz
SC3900
StarCore SC3900
FVP Core StarCore
TM 1.2GHz FVP Core
TM 1.2GHz 32 KB 32 KB 32 KB 32 KB DDR3 1.866GHz
512kB L3
DCache ICache1.2GHz
StarCoreTM DCache ICache1.2GHz
StarCoreTM I-Cache I-Cache I-Cache I-Cache 64-bit
32DCache
KB 32 ICache 32DCache
KB 32 ICache Power TM Power TM Power TM Power TM
Cache
KB KB DRAM Controller
32DCache
KB 32 ICache
KB 32DCache
KB 32 ICache
KB dual thread dual thread dual thread dual thread
32 KB 32 KB 32 KB 32 KB 1.8GHz 1.8GHz 1.8GHz 1.8GHz
Shared 2MB 32 KB 32 KB 32 KB 32 KB
Shared Shared 1024 KB DDR3 1.866GHz
L2 Cache2MB D-Cache D-Cache D-Cache D-Cache 512KB L3
L2 Cache2MB
Shared 64-bit
L2 Cache
Shared 2MB Cache
L2 Cache L2 Cache DRAM Controller

2 3 CoreNet TM
PAMU Coherency Fabric 667MHz

1
• Step 1 - Initiator (Core/ Accelerator/ IO) informs the
MAPLE B3
Baseband Accel.
CoreNet fabric on its intention to update the memory
• Step 2 – CoreNet broadcasts the request to find the
relevant caches that might hold old copies of the data
• Step 3 – Initiators which hold an old version of the data
invalidate it and the write data is written by the requestor

TM 30
SC3900 FVP Core SC3900 FVP Core
SC3900
StarCore SC3900
FVP Core StarCore
TM 1.2GHz FVP Core
TM 1.2GHz e6500DT CPU Cluster 1.8GHz
SC3900
StarCore SC3900
FVP Core StarCore
TM 1.2GHz FVP Core
TM 1.2GHz 32 KB 32 KB 32 KB 32 KB DDR3 1.866GHz
512kB L3
DCache ICache
StarCore DCache
TM 1.2GHz
ICache
StarCoreTM 1.2GHz I-Cache I-Cache I-Cache I-Cache 64-bit
32DCache
KB 32 ICache 32DCache
KB 32 ICache Power TM Power TM Power TM Power TM
Cache
KB KB DRAM Controller
32DCache
KB 32 ICache
KB 32DCache
KB 32 ICache
KB dual thread dual thread dual thread dual thread
32 KB 32 KB 32 KB 32 KB 1.8GHz 1.8GHz 1.8GHz 1.8GHz
Shared 2MB 32 KB 32 KB 32 KB 32 KB
Shared Shared 1024 KB DDR3 1.866GHz
L2 Cache2MB D-Cache D-Cache D-Cache D-Cache 512KB L3
L2 Cache2MB
Shared 64-bit
L2 Cache
Shared 2MB Cache
L2 Cache L2 Cache DRAM Controller

2 CoreNet TM
PAMU
3 Coherency Fabric 667MHz

1
• Step 1 - Initiator (Core/ Accelerator/ IO) informs the
MAPLE B3
Baseband Accel.
CoreNet fabric on its intent to write data to a specific cache
(or cache and memory)
• Step 2 – CoreNet broadcasts the request to find the
relevant caches that might hold old copies of the data
• Step 3 – The data is written only to the designated target(s)

TM 31
• B4860 instantiates two OceAN DMAs
 Transfer data between PCI/SRIO to/from the device memories
 Transfer data between two locations in the memory

• Each DMA includes the following features


 Eight channels
 Advanced chaining and strides capabilities
 Priority support
 Can be activated using external signal

TM 32
TM
• The CPRI complex enables communication among radio devices over a CPRI
bus.
• The CPRI complex is designed to support the CPRI V4.2 specification and
can be configured to support several air interface standards, including
WiMAX, LTE, and WCDMA.
• The complex supports up to 8 CPRI links (4 pairs) with each link configurable
as a master or slave port.
• Up to 9.8Gbps per lane
• Each CPRI link supports three types of service access points (SAPs):
• IQ samples for antenna transferred through the SAP IQ Interface
• CPRI frames synchronized by the SAP synchronization interface
• CPRI link control and management (C&M) data transferred between
SAPs in both CPRI master and slave ports.

TM 34
TM 35
TM
PSIF
Central programmable control for:
SoC/FVP clusters
• Tasks Scheduling
• Efficient DMAing from/to SoC and internally
• Flexible processing flow allow multiple
standards support
Programmable System • BD parsing and job configurations
Interface (PSIF) • Interrupts handling
• Internal Embedded Data Flows

PE-s
• Highly efficient HW implementation of baseband
computational extensive algorithms
PE1 PE2 PEN
• Lego like concept allowing:
• Fast solution derivation (Macro to Femto)
Processing Elements • Use of algorithms commonality between
technologies (LTE, WCDMA, CDMA,
WiMAX)

TM 37
• LTE/LTE-Advanced, HSPA/WCDMA, WiMAX and Multi-Mode
acceleration solution. MAPLE-B3

• LTE/LTE-A acceleration R.10/R.11 compliant:


‒ PUSCH acceleration including Cancelation, Flexible MIMO
Equalization, iDFT, De-Modulator, De-Scrambler, DINTLV, UCI DMA x 12 DATA RAM
decoding, HARQ, FEC decoding.
‒ PDSCH acceleration including full PDSCH/PMCH from FEC to
RISC x12 INST. RAM
IFFT, internal RS generation, multiplexing of PBCH, PSS, SSS at
RE mapping.
EQPE2 PUPE2
‒ FFT/DFT and vector multiplication acceleration for PUSCH, x2 x2
PCFICH, PHICH, PDCCH, PRACH, Sounding and general
purpose use eTVPE2 PDPE2
x2 x2

• WCDMA/HSPA acceleration R.10/R.11 compliant: eFTPE2 DEPE


x8 x2
‒ HSDPA FEC encoding
‒ HSUPA, WCDMA FEC decoding CRPE-ULB2 TCPE
‒ Downlink Chip Rate acceleration x2
‒ Uplink Chip Rate acceleration with flexible scheme addressing: CRPE-DL2 CRPE-ULF2
 Low latency control channels processing
 Flexible Interference Cancelation, Grouping, Pre-
Despreading data channels processing CRCPE x3
 Flexible Path Searcher and RACH correlations

TM 38
DTX test
PUSCH Processing Decoded RI/ACK bits
CQI/PMI controls

From Guard Removal Channel IDFT QAM


MMSE
Antenna FFT Estimation + De- De-Scramble De-Interleave
Equalizer
Cyclic Prefix Removal SNR FC Mapper

Transport Rate
To Turbo HARQ
CRC24a Block CRC24b DeMat
Layer-2 Decode combine
Assembly ch

PDSCH Processing

From Code
Turbo HARQ
Layer-2 CRC24a Block CRC24b
Encoder Rate Match
Segment

Physical
Guard Insertion Downlink
To Resource MIMO Layer QAM
IFFT Reference Symbol Scrambling
Antenna Block Precode Mapper Mapper
Cyclic Prefix Generation
Mapper

MAPLE-PE
3GPP TS36.211/212/213
MAPLE
Embedded Flow

SC3900 Core

TM 39
TM
• Acceleration of frame/packet processing
− Network protocols (Layer-1 to 4)
− Standard algorithmics (security and content Ethernet
processing) Interfaces
• Classification and Distribution of data flows
among cores and software partitions
Frame
− Load balancing through Core
parse/classify/distribute Manager
L2$ Core
Core
− Load spreading through queues shared among L2$ D$ Core
I$
multiple consumers Core
SEC L2$ D$ Core
I$
Core

CoreNet™
• Abstract and manage efficiently Intercore
Queue L2$ D$ Core
I$
communications and the access to shared Core
resources (NW interfaces, HW-Accelerators, Manager L2$ D$ Core
I$
Core
Buffers, Queues) L2$ D$ Core
I$
Core
− More sophisticated approach compared to L2$ D$ CoreI$
basic BD/buffer list Core
L2$ D$ I$
• Scalability and Portability RMan
D$ I$
− Across ‘any’ mix of cores, accelerators and
device boundaries Buffer
− Across device generations Manager SRIO
Interfaces
Memory

TM 41
HW
Accel
Core
D$ I$

Core
Network Core


D$ I$


Interface D$
CoreI$
Network
D$
CoreI$ Interface
D$ I$

Core
D$ I$
Buffer
Manager

• Hardware managed inter-block


queuing
• Hardware buffer management
− Class based prioritization
− Hardware blocks acquire and release
− Large number of queues buffers without software intervention
− Universal: between all blocks • Lock free, low software overhead
• Lock free, low software overhead buffer pool management for hardware
queuing use

TM 42
Frame Manager is responsible for
preprocessing and moving Ethernet • General
• Supports offline PCD on frames extracted from
packets into and out of the datapath QMan
• Parsing • Supports “Independent” mode
• Packet Parsing at wire speed (no work with BMan & QMan, BD ring model)
• Supports standard protocols parsing and • Per port egress rate limiting
identification by HW • Statistics & Multicast support
(VLAN/IP/UDP/TCP/SCTP/PPPoE/PPP/MPL
S/GRE/IPSec …) • Support for IEEE1588 thru HW-Timestamping

• Supports non-standard UDF header parsing Frame Manager


for custom protocols

• Classification / Distribution
• Coarse classification based on Key generation
Hash and exact match lookup Parse, Classify,
• Supports aggregated speed of 20Gbps,
Distribute
30Mpps@667MHz
• Lookups configured by user, can be chained Buffers
• Classification result is frame queue ID,
storage profile and policing profile.
1G/2.5G/10G 1G/2.5G 1G/2.5G
• Ingress Policing
• Two rate – three colour marking algorithm ( rfc 1G/2.5G/10G 1G/2.5G 1G/2.5G
2968 & 4115)
• Up to 256 internal profiles

TM 43
TM
• Dual-AMC form factor
• Standalone operation or pluggable into open-top standard MicroTCA chassis
• 2x DDR3
− 4GB Dual rank 64b/72b, 1.867GHz w/ ECC
− 2GB 64b/72b, 1.866GHz w/ ECC
• Ethernet –
− Up to 6x 1G/2.5G SGMII
− Up to 2x 10G XFI/XAUI

• CPRI v4.2 – up to 8 ports at 9.8G


• sRIO v2.1 – up to 2 four lanes ports at 5G supporting x4/x2/x1 modes
• PCIe v2.0 – one port four lanes at 5G supporting x4/x2/x1 modes
• AMC connector for HSSI expansions
• SFP+ - Two optical transceivers
• NOR, NAND, I2C & SPI FLASH memories
• USB, UART
• JTAG & Aurora interfaces

TM 45
TM

You might also like