0% found this document useful (0 votes)
23 views12 pages

5.1.3 Universal Scalable Shader Engine (USSE) - Key Features

Uploaded by

bscaxsb1117
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views12 pages

5.1.3 Universal Scalable Shader Engine (USSE) - Key Features

Uploaded by

bscaxsb1117
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Introduction www.ti.

com

– Compressed textures PVR-TC1, PVR-TC2, ETC1


– Programmable support for all YUV formats
• Resolution support:
– Frame buffer maximum size = 2048 x 2048
– Texture maximum size = 2048 x 2048
• Texture filtering:
– Bilinear, trilinear, anisotropic
– Independent minimum and maximum control
• Antialiasing:
– 4x multisampling
– Up to 16x full scene anti-aliasing
– Programmable sample positions
• Indexed primitive list support
– Bus mastered
• Programmable vertex DMA
• Render to texture:
– Including twiddled formats
– Auto MipMap generation
• Multiple on-chip render targets (MRT).
Note: Performance is limited when the on-chip memory is not available.

5.1.3 Universal Scalable Shader Engine (USSE) – Key Features


The USSE is the engine core of the POWERVR SGX architecture and supports a broad range of
instructions.
• Single programming model:
– Multithreaded with 16 simultaneous execution threads and up to 64 simultaneous data instances
– Zero-cost swapping in, and out, of threads
– Cached program execution model
– Dedicated pixel processing instructions
– Dedicated video encode/decode instructions
• SIMD execution unit supporting operations in:
– 32-bit IEEE float
– 2-way 16-bit fixed point
– 4-way 8-bit integer
– 32-bit bit-wise (logical only)
• Static and dynamic flow control:
– Subroutine calls
– Loops
– Conditional branches
– Zero-cost instruction predication
• Procedural geometry:
– Allows generation of primitives
– Effective geometry compression
– High-order surface support
• External data access:
– Permits reads from main memory using cache

180 Graphics Accelerator (SGX) SPRUH73H – October 2011 – Revised April 2013
Submit Documentation Feedback
Copyright © 2011–2013, Texas Instruments Incorporated
www.ti.com Introduction

– Permits writes to main memory


– Data fence facility
– Dependent texture reads

5.1.4 Unsupported Features


There are no unsupported SGX530 features for this device.

SPRUH73H – October 2011 – Revised April 2013 Graphics Accelerator (SGX) 181
Submit Documentation Feedback
Copyright © 2011–2013, Texas Instruments Incorporated
Integration www.ti.com

5.2 Integration

GFX Subsystem
L3 Fast Master
Interconnect
L3 Fast Slave
Interconnect

MPU Subsystem THALIAIRQ


Interrupts

PRCM
CORE_CLKOUTM4
(200 MHz) pd_gfx_gfx_l3_gclk
SYSCLK
MEMCLK
0 pd_gfx_gfx_fclk
/1, /2 CORECLK
1
PER_CLKOUTM2
(192 MHz)

Figure 5-1. SGX530 Integration

5.2.1 SGX530 Connectivity Attributes


The general connectivity attributes of the SGX530 are shown in the following table.

Table 5-1. SGX530 Connectivity Attributes


Attributes Type
Power domain GFX Domain
Clock domain SGX_CLK
Reset signals SGX_RST
Idle/Wakeup signals Smart Idle
Initiator Standby
Interrupt request THALIAIRQ (GFXINT) to MPU Subsystem
DMA request None
Physical address L3 Fast slave port

5.2.2 SGX530 Clock and Reset Management


The SGX530 uses separate functional and interface clocks. The SYSCLK is the clock for the slave
interface and runs at the L3F frequency. The MEMCLK is the clock for the memories and master interface
and also runs at the L3F frequency. The CORECLK is the functional clock. It can be sourced from either
the L3F clock (CORE_CLKOUTM4) or from the 192 MHz PER_CLKOUTM2 and can optionally be divided
by 2.

Table 5-2. SGX530 Clock Signals


Clock signal Max Freq Reference / Source Comments
SYSCLK 200 MHz CORE_CLKOUTM4 pd_gfx_gfx_l3_gclk
Interface clock From PRCM
MEMCLK 200 MHz CORE_CLKOUTM4 pd_gfx_gfx_l3_gclk
Memory Clock From PRCM
CORECLK 200 MHz PER_CLKOUTM2 or pd_gfx_gfx_fclk
Functional clock CORE_CLKOUTM4 From PRCM

182 Graphics Accelerator (SGX) SPRUH73H – October 2011 – Revised April 2013
Submit Documentation Feedback
Copyright © 2011–2013, Texas Instruments Incorporated
www.ti.com Integration

5.2.3 SGX530 Pin List


The SGX530 module does not include any external interface pins.

SPRUH73H – October 2011 – Revised April 2013 Graphics Accelerator (SGX) 183
Submit Documentation Feedback
Copyright © 2011–2013, Texas Instruments Incorporated
Functional Description www.ti.com

5.3 Functional Description

5.3.1 SGX Block Diagram


The SGX subsystem is based on the POWERVR® SGX530 core from Imagination Technologies. The
architecture uses programmable and hard coded pipelines to perform various processing tasks required in
2D, 3D, and video processing. The SGX architecture comprises the following elements:
• Coarse grain scheduler
– Programmable data sequencer (PDS)
– Data master selector (DMS)
• Vertex data master (VDM)
• Pixel data master (PDM)
• General-purpose data master
• USSE
• Tiling coprocessor
• Pixel coprocessor
• Texturing coprocessor
• Multilevel cache
Figure 5-2 shows a block diagram of the SGX cores.

Figure 5-2. SGX Block Diagram

POWERVR
SGX530 Vertex data Coarse-grain
master scheduler Tiling
coprocessor
Universal
Prog. data
Pixel data sequencer scalable
master shader
engine
(USSE)
Data master Pixel
General-purpose selector coprocessor
data master

Power Texturing coprocessor Multilevel cache


management
control
register
block
MMU

SOCIF BIF

L3 interconnect L3 interconnect
sgx-003

5.3.2 SGX Elements Description


The coarse grain scheduler (CGS) is the main system controller for the POWERVR SGX architecture. It
consists of two stages, the DMS and the PDS. The DMS processes requests from the data masters and
determines which tasks can be executed given the resource requirements. The PDS then controls the
loading and processing of data on the USSE.
There are three data masters in the SGX core:
• The VDM is the initiator of transform and lighting processing within the system. The VDM reads an

184 Graphics Accelerator (SGX) SPRUH73H – October 2011 – Revised April 2013
Submit Documentation Feedback
Copyright © 2011–2013, Texas Instruments Incorporated
www.ti.com Functional Description

input control stream, which contains triangle index data and state data. The state data indicates the
PDS program, size of the vertices, and the amount of USSE output buffer resource available to the
VDM. The triangle data is parsed to determine unique indices that must be processed by the USSE.
These are grouped together according to the configuration provided by the driver and presented to the
DMS.
• The PDM is the initiator of rasterization processing within the system. Each pixel pipeline processes
pixels for a different half of a given tile, which allows for optimum efficiency within each pipe due to
locality of data. It determines the amount of resource required within the USSE for each task. It merges
this with the state address and issues a request to the DMS for execution on the USSE.
• The general-purpose data master responds to events within the system (such as end of a pass of
triangles from the ISP, end of a tile from the ISP, end of render, or parameter stream breakpoint
event). Each event causes either an interrupt to the host or synchronized execution of a program on
the PDS. The program may, or may not cause a subsequent task to be executed on the USSE.
The USSE is a user-programmable processing unit. Although general in nature, its instructions and
features are optimized for three types of task: processing vertices (vertex shading), processing pixels
(pixel shading), and video/imaging processing.
The multilevel cache is a 2-level cache consisting of two modules: the main cache and the
mux/arbiter/demux/decompression unit (MADD). The MADD is a wrapper around the main cache module
designed to manage and format requests to and from the cache, as well as providing Level 0 caching for
texture and USSE requests. The MADD can accept requests from the PDS, USSE, and texture address
generator modules. Arbitration, as well as any required texture decompression, are performed between
the three data streams.
The texturing coprocessor performs texture address generation and formatting of texture data. It receives
requests from either the iterators or USSE modules and translates these into requests in the multilevel
cache. Data returned from the cache are then formatted according to the texture format selected, and sent
to the USSE for pixel-shading operations.
To process pixels in a tiled manner, the screen is divided into tiles and arranged as groups of tiles by the
tiling coprocessor. An inherent advantage of tiling architecture is that a large amount of vertex data can be
rejected at this stage, thus reducing the memory storage requirements and the amount of pixel processing
to be performed.
The pixel coprocessor is the final stage of the pixel-processing pipeline and controls the format of the final
pixel data sent to the memory. It supplies the USSE with an address into the output buffer and then USSE
returns the relevant pixel data. The address order is determined by the frame buffer mode. The pixel
coprocessor contains a dithering and packing function.

SPRUH73H – October 2011 – Revised April 2013 Graphics Accelerator (SGX) 185
Submit Documentation Feedback
Copyright © 2011–2013, Texas Instruments Incorporated
Chapter 6
SPRUH73H – October 2011 – Revised April 2013

Interrupts

This section describes the interrupts for the device.

Topic ........................................................................................................................... Page

6.1 Functional Description ..................................................................................... 187


6.2 Basic Programming Model ................................................................................ 190
6.3 ARM Cortex-A8 Interrupts ................................................................................. 199
6.4 PWM Events .................................................................................................... 203
6.5 Interrupt Controller Registers ............................................................................ 204

186 Interrupts SPRUH73H – October 2011 – Revised April 2013


Submit Documentation Feedback
Copyright © 2011–2013, Texas Instruments Incorporated
www.ti.com Functional Description

6.1 Functional Description


The interrupt controller processes incoming interrupts by masking and priority sorting to produce the
interrupt signals for the processor to which it is attached. Figure 6-1 shows the top-level view of interrupt
processing.

NOTE: FIQ is not available on general-purpose (GP) devices.

Figure 6-1. Interrupt Controller Block Diagram


IRQ_q
Interrupt of
bank p
Software Interrupt
ISR_SETp

Interrupt Input Status


ITRp
Mask
MIRp

Priority Threshold
THRESHOLD Priority
Comparator

If (INT Priority
>Threshold)

Interrupt Priority and


FIQ/IRQ Steering
ILRq

PRIORITY

FIQNIRQ IRQ/FIQ Selector

PENDING_IRQp

PENDING_FIQp

New Agreement Bits


Active Interrupt Nb,
Control Priority Sorting Spurious Flag
and Priority
NEWFIQAGR
FIQ SIR_FIQ
NEWIRQAGR Priority
IRQ Sorter FIQ_PRIORITY
Priority
Sorter
SIR_IRQ

IRQ_PRIORITY

IRQ Input FIQ Input

Processor

SPRUH73H – October 2011 – Revised April 2013 Interrupts 187


Submit Documentation Feedback
Copyright © 2011–2013, Texas Instruments Incorporated
Functional Description www.ti.com

6.1.1 Interrupt Processing

6.1.1.1 Input Selection


The INTC supports only level-sensitive incoming interrupt detection. A peripheral asserting an interrupt
maintains it until software has handled the interrupt and instructed the peripheral to deassert the interrupt.
A software interrupt is generated if the corresponding bit in the MPU_INTC.INTC_ISR_SETn register is set
(register bank number: n = [0,1,2,3] for the MPU subsystem INTC, 128 incoming interrupt lines are
supported). The software interrupt clears when the corresponding bit in the
MPU_INTC.INTC_ISR_CLEARn register is written. Typical use of this feature is software debugging.

6.1.1.2 Masking

6.1.1.2.1 Individual Masking


Detection of interrupts on each incoming interrupt line can be enabled or disabled independently by the
MPU_INTC.INTC_MIRn interrupt mask register. In response to an unmasked incoming interrupt, the INTC
can generate one of two types of interrupt requests to the processor:
• IRQ: low-priority interrupt request
• FIQ: fast interrupt request (Not available on General Purpose (GP) devices)
The type of interrupt request is determined by the MPU_INTC.INTC_ILRm[0] FIQNIRQ bit (m= [0,127]).
The current incoming interrupt status before masking is readable from the MPU_INTC.INTC_ITRn register.
After masking and IRQ/FIQ selection, and before priority sorting is done, the interrupt status is readable
from the MPU_INTC.INTC_PENDING_IRQn and MPU_INTC.INTC_PENDING_FIQn registers.

6.1.1.2.2 Priority Masking


To enable faster processing of high-priority interrupts, a programmable priority masking threshold is
provided (the MPU_INTC.INTC_THRESHOLD[7:0] PRIORITYTHRESHOLD field). This priority threshold
allows preemption by higher priority interrupts; all interrupts of lower or equal priority than the threshold
are masked. However, priority 0 can never be masked by this threshold; a priority threshold of 0 is treated
the same way as priority 1. PRIORITY and PRIORITYTHRESHOLD fields values can be set between 0x0
and 0x7F; 0x0 is the highest priority and 0x7F is the lowest priority. When priority masking is not
necessary, a priority threshold value of 0xFF disables the priority threshold mechanism. This value is also
the reset default for backward compatibility with previous versions of the INTC.

6.1.1.3 Priority Sorting


A priority level (0 being the highest) is assigned to each incoming interrupt line. Both the priority level and
the interrupt request type are configured by the MPU_INTC.INTC_ILRm register. If more than one
incoming interrupt with the same priority level and interrupt request type occur simultaneously, the highest-
numbered interrupt is serviced first. When one or more unmasked incoming interrupts are detected, the
INTC separates between IRQ and FIQ using the corresponding MPU_INTC.INTC_ILRm[0] FIQNIRQ bit.
The result is placed in INTC_PENDING_IRQn or INTC_PENDING_FIQn If no other interrupts are currently
being processed, INTC asserts IRQ/FIQ and starts the priority computation. Priority sorting for IRQ and
FIQ can execute in parallel. Each IRQ/FIQ priority sorter determines the highest priority interrupt number.
Each priority number is placed in the corresponding MPU_INTC.INTC_SIR_IRQ[6:0] ACTIVEIRQ field or
MPU_INTC.INTC_SIR_FIQ[6:0] ACTIVEFIQ field. The value is preserved until the corresponding
MPU_INTC.INTC_CONTROL NEWIRQAGR or NEWFIQAGR bit is set. Once the interrupting peripheral
device has been serviced and the incoming interrupt deasserted, the user must write to the appropriate
NEWIRQAGR or NEWFIQAGR bit to indicate to the INTC the interrupt has been handled. If there are any
pending unmasked incoming interrupts for this interrupt request type, the INTC restarts the appropriate
priority sorter; otherwise, the IRQ or FIQ interrupt line is deasserted.

188 Interrupts SPRUH73H – October 2011 – Revised April 2013


Submit Documentation Feedback
Copyright © 2011–2013, Texas Instruments Incorporated
www.ti.com Functional Description

6.1.2 Register Protection


If the MPU_INTC.INTC_PROTECTION[0] PROTECTION bit is set, access to the INTC registers is
restricted to the supervisor mode. Access to the MPU_INTC.INTC_PROTECTION register is always
restricted to privileged mode. For more information, see Section 6.5.1.7, INTC_PROTECTION Register
(offset = 4Ch) [reset = 0h].

6.1.3 Module Power Saving


The INTC provides an auto-idle function in its three clock domains:
• Interface clock
• Functional clock
• Synchronizer clock
The interface clock auto-idle power-saving mode is enabled if the MPU_INTC.INTC_SYSCONFIG[0]
AUTOIDLE bit is set to 1. When this mode is enabled and there is no activity on the bus interface, the
interface clock is disabled internally to the module, thus reducing power consumption. When there is new
activity on the bus interface, the interface clock restarts without any latency penalty. After reset, this mode
is disabled, by default. The functional clock auto-idle power-saving mode is enabled if the
MPU_INTC.INTC_IDLE[0] FUNCIDLE bit is set to 0. When this mode is enabled and there is no active
interrupt (IRQ or FIQ interrupt being processed or generated) or no pending incoming interrupt, the
functional clock is disabled internally to the module, thus reducing power consumption.
When a new unmasked incoming interrupt is detected, the functional clock restarts and the INTC
processes the interrupt. If this mode is disabled, the interrupt latency is reduced by one cycle. After reset,
this mode is enabled, by default. The synchronizer clock allows external asynchronous interrupts to be
resynchronized before they are masked. The synchronizer input clock has an auto-idle power-saving
mode enabled if the MPU_INTC.INTC_IDLE[1] TURBO bit is set to 1. If the auto-idle mode is enabled, the
standby power is reduced, but the IRQ or FIQ interrupt latency increases from four to six functional clock
cycles. This feature can be enabled dynamically according to the requirements of the device. After reset,
this mode is disabled, by default.

6.1.4 Error Handling


The following accesses will cause an error:
• Privilege violation (attempt to access PROTECTION register in user mode or any register in user mode
if Protection bit is set)
• Unsupported commands
The following accesses will not cause any error response:
• Access to a non-decoded address
• Write to a read-only register

6.1.5 Interrupt Handling


The IRQ/FIQ interrupt generation takes four INTC functional clock cycles (plus or minus one cycle) if the
MPU_INTC.INTC_IDLE[1] TURBO bit is set to 0. If the TURBO bit is set to 1, the interrupt generation
takes six cycles, but power consumption is reduced while waiting for an interrupt. These latencies can be
reduced by one cycle by disabling functional clock auto-idle (MPU_INTC.INTC_IDLE[0] FUNCIDLE bit set
to 1), but power consumption is increased, so the benefit is minimal.
To minimize interrupt latency when an unmasked interrupt occurs, the IRQ or FIQ interrupt is generated
before priority sorting completion. The priority sorting takes 10 functional clock cycles, which is less than
the minimum number of cycles required for the MPU to switch to the interrupt context after reception of the
IRQ or FIQ event.
Any read of the MPU_INTC.INTC_SIR_IRQ or MPU_INTC.INTC_SIR_FIQ register during the priority
sorting process stalls until priority sorting is complete and the relevant register is updated. However, the
delay between the interrupt request being generated and the interrupt service routine being executed is
such that priority sorting always completes before the MPU_INTC.INTC_SIR_IRQ or
MPU_INTC.INTC_SIR_FIQ register is read.

SPRUH73H – October 2011 – Revised April 2013 Interrupts 189


Submit Documentation Feedback
Copyright © 2011–2013, Texas Instruments Incorporated
Basic Programming Model www.ti.com

6.2 Basic Programming Model

6.2.1 Initialization Sequence


1. Program the MPU_INTC.INTC_SYSCONFIG register: If necessary, enable the interface clock
autogating by setting the AUTOIDLE bit.
2. Program the MPU_INTC.INTC_IDLE register: If necessary, disable functional clock autogating or
enable synchronizer autogating by setting the FUNCIDLE bit or TURBO bit accordingly.
3. Program the MPU_INTC.INTC_ILRm register for each interrupt line: Assign a priority level and set the
FIQNIRQ bit for an FIQ interrupt (by default, interrupts are mapped to IRQ and priority is 0x0 [highest]).
4. Program the MPU_INTC.INTC_MIRn register: Enable interrupts (by default, all interrupt lines are
masked). NOTE: To program the MPU_INTC.INTC_MIRn register, the MPU_INTC.INTC_MIR_SETn
and MPU_INTC.INTC_MIR_CLEARn registers are provided to facilitate the masking, even if it is
possible for backward-compatibility to write directly to the MPU_INTC.INTC_MIRn register.

6.2.2 INTC Processing Sequence


After the INTC_MIRn and INTC_ILRm registers are configured to enable and assign priorities to incoming
interrupts, the interrupt is processed as explained in the following subsections. IRQ and FIQ processing
sequences are quite similar, the differences for the FIQ sequence are shown after a '/' character in the
code below.
1. One or more unmasked incoming interrupts (M_IRQ_n signals) are received and IRQ or FIQ outputs
(IRQ/FIQ) are not currently asserted.
2. If the INTC_ILRm[0] FIQNIRQ bit is cleared to 0, the MPU_INTC_IRQ output signal is generated. If the
FIQNIRQ bit is set to 1, the MPU_INTC_FIQ output signal is generated.
3. The INTC performs the priority sorting and updates the INTC_SIR_IRQ[6:0] ACTIVEIRQ
/INTC_SIR_FIQ[6:0] ACTIVEFIQ field with the current interrupt number.
4. During priority sorting, if the IRQ/FIQ is enabled at the host processor side, the host processor
automatically saves the current context and executes the ISR as follows.
The ARM host processor automatically performs the following actions in pseudo code:
LR = PC + 4 /* return link */
SPSR = CPSR /* Save CPSR before execution */
CPSR[5] = 0 /* Execute in ARM state */
CPSR[7] = 1 /* Disable IRQ */
CPSR[8] = 1 /* Disable Imprecise Data Aborts */
CPSR[9] = CP15_reg1_EEbit /* Endianness on exception entry */
if interrupt == IRQ then
CPSR[4:0] = 0b10010 /* Enter IRQ mode */
if high vectors configured then
PC = 0xFFFF0018
else
PC = 0x00000018 /* execute interrupt vector */
else if interrupt == FIQ then
CPSR[4:0] = 0b10001 /* Enter FIQ mode */
CPSR[6] = 1 /* Disable FIQ */
if high vectors configured then
PC = 0xFFFF001C
else
PC = 0x0000001C /* execute interrupt vector */
endif

190 Interrupts SPRUH73H – October 2011 – Revised April 2013


Submit Documentation Feedback
Copyright © 2011–2013, Texas Instruments Incorporated
www.ti.com Basic Programming Model

5. The ISR saves the remaining context, identifies the interrupt source by reading the
ACTIVEIRQ/ACTIVEFIQ field, and jumps to the relevant subroutine handler as follows:

CAUTION
The code in steps 5 and 7 is an assembly code compatible with ARM
architecture V6 and V7. This code is developed for the Texas Instruments Code
Composer Studio tool set. It is a draft version, only tested on an emulated
environment.

;INTC_SIR_IRQ/INTC_SIR_FIQ register address


INTC_SIR_IRQ_ADDR/INTC_SIR_FIQ_ADDR .word 0x48200040/0x48200044
; ACTIVEIRQ bit field mask to get only the bit field
ACTIVEIRQ_MASK .equ 0x7F
_IRQ_ISR/_FIQ_ISR:
; Save the critical context
STMFD SP!, {R0-R12, LR} ; Save working registers and the Link register
MRS R11, SPSR ; Save the SPSR into R11
; Get the number of the highest priority active IRQ/FIQ
LDR R10, INTC_SIR_IRQ_ADDR/INTC_SIR_FIQ_ADDR
LDR R10, [R10] ; Get the INTC_SIR_IRQ/INTC_SIR_FIQ register
AND R10, R10, #ACTIVEIRQ_MASK ; Apply the mask to get the active IRQ number
; Jump to relevant subroutine handler
LDR PC, [PC, R10, lsl #2] ; PC base address points this instruction + 8
NOP ; To index the table by the PC
; Table of handler start addresses
.word IRQ0handler ;For IRQ0 of BANK0
.word IRQ1handler
.word IRQ2handler

6. The subroutine handler executes code specific to the peripheral generating the interrupt by handling
the event and deasserting the interrupt condition at the peripheral side.
; IRQ0 subroutine
IRQ0handler:
; Save working registers
STMFD SP!, {R0-R1}
; Now read-modify-write the peripheral module status register
; to de-assert the M_IRQ_0 interrupt signal
; De-Assert the peripheral interrupt
MOV R0, #0x7 ; Mask for 3 flags
LDR R1, MODULE0_STATUS_REG_ADDR ; Get the address of the module Status Register
STR R0, [R1] ; Clear the 3 flags
; Restore working registers LDMFD SP!, {R0-R1}
; Jump to the end part of the ISR
B IRQ_ISR_end/FIQ_ISR_end

SPRUH73H – October 2011 – Revised April 2013 Interrupts 191


Submit Documentation Feedback
Copyright © 2011–2013, Texas Instruments Incorporated

You might also like