0% found this document useful (0 votes)

234 views29 pages

Talk Gatech DSP Compilation 2000

The document discusses the process of compiling DSP programs. It involves 6 main steps: 1) scheduling, 2) functional unit assignment, 3) register move scheduling, 4) register allocation, 5) instruction selection and code generation, and 6) register spill handling. It provides examples of scheduling and assigning operations to functional units for a 6th order IIR filter program. It also describes resolving output dependency and cross-path constraints through register moves and introducing pseudo-registers.

Uploaded by

larryshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

234 views29 pages

Talk Gatech DSP Compilation 2000

Uploaded by

larryshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

DSP compilation

Weidong Shi Supervisor: Dr. Kenneth Mackenzie YAMACRAW PROJECT

Compiling Procedure
1

Scheduling

Constraints -precedence -functional unit capacity -communication capacity* -output dependency* *optional constraint

Functional Unit Assignment

Register Move Scheduling

Constraints -functional unit capacity -communication capacity

Final Instruction Selection & Code Generation

Pipelining Memory Data Access

-branch scheduling -prologue -epilogue -loop counter -all other necessary instructions 7

Retrieved data (produced results) are immediately used (stored) after being available.

Algorithms have been developed for all steps except step 5 which has to be done by hand.

Scheduling

Simple DSP Specification

#M1 multiplier <opname, pipeline cycle, execution cycle> M1 = functional_unit { ops: <intmul,1,2>, <floatmul,1,3>;} M1 L1 M2 D1 = functional_unit { ops: <ldint, 1,5>, <stint, 1, 1>, <intadd, 1,1>;} L2

S1
D1

RegFileA RegFileB #shared cross path from register file B S3 PcrossB = path { connections: <RegFileB,M1>,<RegFileB,L1>, <RegFileB,S1>,<RegFileB,D1>; D2 capacity: 1; } Memory

Number of functional units and shared data paths constitute of architecture constraints that can be considered in scheduling.

Flow Graph - 2nd Order IIR

b0 x[n] D 2 5 D 3 b2 1 b1 4 7

2 D 8 D 9 a1

y[n]

#control variables used for scheduling In Node: 1 Out Node 8 IPB Lower Bound: 3 IPB Upper Bound: 5 T: 14

Determine Node Resource Set

Assign a set of functional units to each node for allocation.
N1 X Integer Multipliers Sn1= {M1,M2}

sSn1 tT

sSn2

N2
tT

Integer Adders Sn2={L1,S1,L2,S2,D1,D2}

Insert new nodes to represent communication operations. A set of data paths is assigned to each node for allocation.
N1 N3 N4

X
Integer Multipliers Data Paths Sn1= {M1,M2} Sn3={M1xRegA,M2xRegB}

Data Paths Integer Adders Sn4={RegAxL1,PcrossB, Sn2={L1,S1,L2,S2,D1,D2} RegAxS1,PcrossA,...}

Simplified flowgraph where node resource set is reduced and some nodes are deleted if their resource sets are not constraints.
N1 X Integer Multipliers Sn1= {M1,M2} N3 N4

Data Paths Data Paths Integer Adders Sn3={M1xRegA,M2xRegB} Sn4= {ProssA,PcrossB, Sn2={L1,S1,L2,S2,D1,D2} PwithinA,PwithinB}

Represent Constraints
Precedence Constraint
X N1

+
st sSn1 s tT st i

sSn1 tT

tN 2 tN1 C N1 D IPB
st sSn1 tT
N1 X N4 N2

Consistency Constraint
+

Cs execution cycle of functional unit s

Integer Multipliers Sn1= {M1,M2}

Data Paths Sn4= {ProssA,PcrossB, PwithinA,PwithinB}

Integer Adders Sn2={L1,S1,L2,S2,D1,D2}

N1
tT tT

M1t

N4
tT PcrossAt

PcrossAt

N 4PwithinAt
tT PwithinBt

N1
tT L2t tT

M2t

N4
tT tT

PcrossBt

N 4PwithinBt
tT

N4 N4
tT

N4
tT

N2
tT

N 2S2t N 2S1t
tT

N2 N2
tT

D2t

PcrossBt

N4
tT

PwithinAt

N2
tT

L1t

D1t

Represent Constraints
Functional Unit Constraint

Ni
iQt

M1t

1 2(1 IPBi )
Qt set of nodes with M1 assigned, repeat for each IPB and time t

Ni
iQt

M2t

1 2(1 IPBi )
Qt set of nodes with M2 assigned, repeat for each IPB and time t

...

Communication Capacity Constraint

Ni
iQt

PcrossAt

1 2(1 IPBi )

Qt set of communication nodes with PcrossA assigned, repeat for each IPB and time t

Ni
iQt

PcrossBt

1 2(1 IPBi )

Qt set of communication nodes with PcrossB assigned, repeat for each IPB and time t ...

Objective Function
Minimize IPB (iteration period bound).

Pipelining Data Access

Pipelining data access to minimize register use.
MPY reg1, rega, reg5 ADD regb, reg2, rege MPY reg3, reg8, reg6 ||ADD regc, 4, regf ADD reg4, regd, reg8 Hypothetical machine - 2 cycle load, 1 cycle add, 2 cycle mul - two register files. One load, store unit for each register file. Values to be stored - reg5,reg6, rege, regf Loaded value in - reg1, reg2, reg3, reg4, rega, regb, regc, regd

Register Source reg1 reg2 reg3 reg4

Register Dest reg5

Register Source rega regb

LD/ST ST reg6

LD/ST

rege regf

LD reg1 ST reg5

LD rega ST rege ST regf

reg6

regc regd

LD reg2 conflicts with ST reg5, LD regb conflict with ST rege Insert new time slot and move instructions.

Pipelining Data Access

reg2 reg3 reg4 reg1 reg5 reg6

regb regc regd rega

rege regf

LD reg1 ST reg5 LD reg2 ST reg6

LD rega ST rege LD regb ST regf

ST reg5 conflicts with LD reg1, ST regf conflict with LD regb ST reg5 replaces LD reg1. Insert new time slot to take LD reg1 and move instructions.

LD/ST

reg2 reg3 reg4 reg6

regb regc regd

rege regf

ST reg5 LD reg1 LD rega ST rege LD reg2 LD regb ST regf

reg1

reg5

rega

ST reg6

ST rege conflicts with LD rega, ST reg6 conflict with LD reg2 ST rege replaces LD rega. Insert new time slot to take LD rega and move instructions.

Pipelining Data Access

Register Source Register Dest Register Source Register Dest LD/ST LD/ST reg2 reg3 reg4 reg6 regb regc regd rege regf ST reg5 LD reg1 LD reg1 ST reg6 LD reg2 reg1 reg5 rega LD regb ST regf ST rege LD rega

ST regf conflicts with LD rega. ST regf replaces LD rega. Insert new time slot to take LD rega and move instructions.
Register Source Register Dest Register Source Register Dest LD/ST LD/ST

reg2 reg3 reg4 reg6

regb regc regd

rege regf

ST reg5 ST rege ST regf LD reg1 ST reg6 LD reg2 LD regb LD rega

reg1

reg5

rega

ST reg6 conflicts with LD reg1. ST reg6 replaces LD rega. Insert new time slot to take LD reg1 and move instructions.

Pipelining Data Access

Register Source Register Dest Register Source Register Dest LD/ST LD reg4 reg2 reg3 reg4 reg6 regb regc regd ST reg6 LD reg1 LD rega rege regf ST reg5 ST rege ST regf LD/ST LD regd

LD reg2 reg1 reg5 rega LD reg3

LD regb LD regc

After pipelining, loaded values (produced values) are immediately used (saved). There will be less amount of cycle overhead if only lds or only sts are pipelined.

Allocation of Loop Carried Registers

Logical Register Live Range
ADD reg1, reg2, reg3 ||MPY reg4, x, reg5 MV reg5, reg6 ||ADD reg6, reg9, reg4 LD reg7 ||LD reg8 LD reg2 ||LD reg9 ADD reg7, reg8, reg9 ||ADD reg3, x, reg1

1 reg1 reg2 reg3 reg4 reg5 reg6 reg7 reg8 reg9

Hypothetical Machine with 6 physical register. LD 2 execution cycle, ADD 1 execution cycle, MPY 2 execution cycle.

Allocation of Loop Carried Registers

Physical registers are allocated from left to right for each cycle.
1 2 3 4 5

reg1 reg2 reg3 reg4 reg5 reg6 reg7 reg8 reg9

A B A C

D E B F F

regA regB regC regD regE regF

1 reg1,reg3 reg2 reg4 reg5 reg6 reg9

3 reg7

reg8

Register Spill
There is a cost associated with each physical register allocation. Always allocate the physical register with the minimal cost. Sometime, spill is inevitable if all allocations have positive cost.

reg1 reg2 reg3 reg4 reg5

A B C

D
?

Cost of allocating A, B, C, D to logical register 5

A B C D Cost 0 reg2 spill cost + reg5 spill cost min (reg3 spill cost, reg5 spill cost) infinity

C
D ?

Cost of allocating A, B, C, D to logical register 5

A B C D Cost 0 reg2 spill cost min (reg3 spill cost, reg5 spill cost) infinity

Register Spill
Each register spill produces at least one extra LD and one extra ST. These new data access operations have to be pipelined before register allocation routine is executed once again.

Pipelining Data Access

Flow Graph-6th Order IIR

+ + + + + + +
N1 N0 N9 X N10 N17 N6 D D D D X N3 D X N4 N2 N5

D X N7

+ + + +

N8 N13

N11 X N16 X N20 X N22 X

N12 X N15 X

N14
N23

N24

X
X

N19 N21

Scheduling (6th Order IIR )

T1 MUL 191 221 T2 12-2 211 T3 6-2 150 T4 3-2 4-2 T5 110 201 T6 7-1 160

ADD

8-3

0-3
5-3 17-2

1-3 10-2
240

2-4
9-2 231

13-2
140

In each entry X y, X represents operation number in the flow graph and y represents iteration number.

Functional Unit Assignment (6th Order IIR )

T1
M1 M2 L1 L2 S1 191 221 8-3

T2
12-2 211 0-3 5-3 17-2

T3
6-2 150 1-3 10-2 240

T4
3-2 4-2

T5
110 201 2-4 9-2 231

T6
7-1 160 13-2 140 M1 M2 L1 L2 S1 S2

T1
221 191

T2
211 12-2 17-2

T3
6-2 150

T4
4-2 3-2

T5
201 110 231

T6
160 7-1

5
8-3

-3

-2

240 0-3 1-3 2-4 140

Assignment 1: This one has many cross-path violations

Assignment 2: only one cross-path violation (one of the best cases)

Communication cost can be minimized when functional units are assigned to operations. Note that assignment 2 does not have better cycle count than assignment 1 because there is only one cross-path constraint violation left for assignment 1 after register moves are inserted to deal with output dependency. The rest part of this presentation is based on assignment 1.

New Flow Graph With Pseudo-registers

reg37

N2 reg50

Reg24

Reg2

Reg25 Reg3

Reg31

Reg23 Reg8

+ N1
Reg1 Reg0

+ N5
Reg22

+ N0
Reg18 Reg26 Reg4

D
Reg32

+ N9
Reg17

X D
reg38

Reg9

reg46

+ N8 +
N13 reg49

+ N10
Reg16

reg43

Reg27 Reg5

Reg33

Reg21 Reg10

X N11
Reg28
reg39

X N12
D
Reg34

reg44

Reg20 Reg11

Reg6

Reg15

N17 reg47

N16 reg40

X D

N15

+ N14
Reg19
reg48

Reg29 Reg7

Reg35

+ N24
reg45

N20

X D
reg42

N19

Reg12

+ N23

Reg30 Reg14

Reg36

reg41

N22

N21

Reg13

Resolve Violation of Output Dependency and Cross-path Constraint by Register Move

Problem: Output Dependency violated at reg2, reg5, reg6, reg14, reg9, reg15, reg19, reg20, and reg23 (shown as red dashed lines in the previous flow graph). Solution: Old values are moved to new pseudo-registers (small blue ovals) before they are overwritten. Problem: At time T3, both operation 10 and operation 15 load values from register file A via the cross-path. Solution: The value used by operation 10 is pre-fetched to register file B by a register move.

Schedule Register Moves (6th Order IIR )

T1 M1 M2 L1 191 221 8-3 MV9 T2 12-2 211 0-3 5-3 MV10 T3 T4 6-2 150 1
-3 -2

T5 3-2 4-2

T7 110 201

T8 7-1 160 13
-2

Src MV1 MV2 MV3 MV4 MV5 MV6 MV7 MV8 reg2 reg5 reg6 reg14 reg9 reg15 reg19 reg20

Dest reg37 reg43 reg44 reg45 reg46 reg47 reg48 reg49

MV2 MV4 MV3

L2 S1
S2 D1 D2

2-4 9-2

140

MV7

-2

MV6

MV5
MV8

MV1

MV9
MV10

reg23
reg2

reg50
Reg39

Coalesce redundant register moves (only ten moves left after this is done). Schedule register MVs with new time slots inserted if necessary. Most restrictive register moves are scheduled first.

Assembly 1 - loop kernel (6th Order IIR )

MPY .m1 ||MPY .m2 ||ADD .l1 ||MV .l2x ||MV .s1x MPY .m1 ||MPY .m2 ||ADD .l1x ||ADD .l2x ||ADD .s1 ||MV .d1 MV .l1x ||MV .s1 MPY ||MPY ||ADD ||ADD ||ADD reg2, reg35, reg12 reg39, reg30, reg14 reg46, reg21, reg22 reg23, reg50 reg19, reg48 reg37, reg33, reg10 reg39, reg36, reg13 reg0, reg18, reg1 reg8, reg22, reg23 reg44, reg47, reg16 reg2, reg37 reg2, reg39 reg15, reg47 MV .l1x reg5, ||MV .l2x reg6, MPY .m1 ||MPY .m2 ||ADD .l1 ||ADD .l2x ||ADD .s1x reg43 reg44

reg2, reg27, reg5 reg39, reg29, reg7 reg37, reg50, reg24 reg17, reg4, reg18 reg12, reg13, reg19

MPY .m1 reg2, reg32, reg9 ||MPY .m2 reg39, reg28, reg6 ||ADD .l1 reg10, reg49, reg21 ||ADD .l2 reg11, reg48, reg20 ||MV .s1 reg9, reg46 ||MV .s2x reg20, reg49

.m1 reg2, reg26, reg4 .m2 reg34, reg39, reg11 .l1 reg1, reg3, reg2 .l2x reg43, reg16, reg17 .s1x reg45, reg7, reg15

MPY .m1 reg2, reg25, reg3 ||MPY .m2 reg31, reg2, reg8 ||MV .l2x reg14, reg45

Eight cycle assembly based on the previous reservation table.

Assembly 2 (pipelining data access) - loop kernel (6th Order IIR )

reg51 reg52
MPY .m1 ||MPY .m2 ||ADD .l1 ||MV .l2x ||MV .s1x ||LDH .d1 ||LDH .d2 MPY .m2 ||ADD .l1x ||ADD .l2x ||ADD .s1 ||MV .d1 MV .l1x ||MV .s1 ||MPY .m1 ||LDH .d1 ||LDH .d2 ||SUB .s2 reg2, reg35, reg12 reg39, reg30, reg14 reg46, reg21, reg22 reg23, reg50 reg19, reg48 *reg51, reg25 *reg52, reg31 reg39, reg36, reg13 reg0, reg18, reg1 reg8, reg22, reg23 reg44, reg47, reg16 reg2, reg37 reg2, reg39 reg15, reg47 reg37, reg33, reg10 *+reg51[1], reg27 *+reg52[1], reg29 reg54, 1 LDH .d1 *+reg51[3], reg35 ||LDH .d2 *+reg52[3], reg30 MPY .m1 ||MPY .m2 ||MV .l2x ||LDH .d2 ||LDH .d1 reg2, reg25, reg3 reg31, reg2, reg8 reg14, reg45 *+reg52[4], reg36 *reg53++, reg0

Coefficient Arrays reg53 reg55

MV .l1x reg5, reg43 ||MV .l2x reg6, reg44 ||LDH .d1 *+reg51[4], reg33 MPY .m1 reg2, reg27, reg5 ||MPY .m2 reg39, reg29, reg7 ||ADD .l1 reg37, reg50, reg24 ||ADD .l2x reg17, reg4, reg18 ||ADD .s1x reg12, reg13, reg19 ||LDH .d1 *+reg51[5], reg26 ||LDH .d2 *+reg52[5], reg34 MPY .m1 reg2, reg32, reg9 ||MPY .m2 reg39, reg28, reg6 ||ADD .l1 reg10, reg49, reg21 ||ADD .l2 reg11, reg48, reg20 ||MV .s1 reg9, reg46 ||MV .s2x reg20, reg49 ||STH .d2 reg24, *reg55++

...

Input Array

Output Array

MPY .m1 reg2, reg26, reg4 ||MPY .m2 reg34, reg39, reg11 ||ADD .l1 reg1, reg3, reg2 ||ADD .l2x reg43, reg16, reg17 ||ADD .s1x reg45, reg7, reg15 ||LDH .d1 *+reg51[2], reg32 ||LDH .d2 *+reg52[2], reg28 ||B .s2

Pipeline data access (loads and stores) to minimize register use. Retrieved data (produced results) are immediately used (stored) after being available.

Assembly 3 (register allocation) - loop kernel (6th Order IIR )

reg0 reg1 reg2 reg3 reg4 reg5 reg6 reg7 reg8 reg9 reg10 reg11 reg12 reg13 reg14 reg15 reg16 reg17 reg18 reg19 reg20 reg21 reg22 reg23 reg24 reg25 reg26 reg27 A0 A0 A12 A7 A0 A10 B0 B6 B3 A1 A8 B6 A2 B13 B3 A6 A14 B5 B13 A0 B8 A1 A2 B1 A2 A14 A6 A10 reg28 reg29 reg30 reg31 reg32 reg33 reg34 reg35 reg36 reg37 reg38 reg39 reg40 reg41 reg42 reg43 reg44 reg45 reg46 reg47 reg48 reg49 reg50 reg51 reg52 reg53 reg54 reg55 B0 B0 B0 B7 A3 A5 B5 A2 B1 A13 x B12 x x x B7 A14 A9 A3 A5 B2 A11 A3 A4 B4 A15 B14 B15
MPY .m1 ||MPY .m2 ||ADD .l1 ||MV .l2x ||MV .s1x ||LDH .d1 ||LDH .d2 MPY .m2 ||ADD .l1x ||ADD .l2x ||ADD .s1 ||MV .d1 MV .l1x ||MV .s1 ||MPY .m1 ||LDH .d1 ||LDH .d2 ||SUB .s2 A12, B12, A3, B1, A0, *A4, *B4, B12, A0, B3, A14, A12, A2, A2 B0, B3 A1, A2 A3 B2 A14 B7 B1, B13 B13, A0 A2, B1 A5, A14 A13 LDH .d1 *+A4[3], A2 ||LDH .d2 *+B4[3], B0 MPY .m1 ||MPY .m2 ||MV .l2x ||LDH .d2 ||LDH .d1 A12, A14, A7 B7, A12, B3 B3, A9 *+B4[4], B1 *A15++, A0

MV .l1x A10, B7 ||MV .l2x B0, A14 ||LDH .d1 *+A4[4], A5 MPY .m1 A12, A10, A10 ||MPY .m2 B12, B0, B6 ||ADD .l1 A13, A3, A2 ||ADD .l2x B5, A0, B13 ||ADD .s1x A2, B13, A0 ||LDH .d1 *+A4[5], A6 ||LDH .d2 *+B4[5], B5 MPY .m1 A12, A3, A1 ||MPY .m2 B12, B0, B0 ||ADD .l1 A8, A11, A1 ||ADD .l2 B6, B2, B8 ||MV .s1 A1, A3 ||MV .s2x B8, A11 ||STH .d2 A2, *B15++

A12, B12 A6, A5 A13, A5, A8 +A4[1], A10 +B4[1], B0 B14, 1

MPY .m1 A12, A6, A0 ||MPY .m2 B5, B12, B6 ||ADD .l1 A0, A7, A12 ||ADD .l2x B7, A14, B5 ||ADD .s1x A9, B6, A6 ||LDH .d1 *+A4[2], A3 ||LDH .d2 *+B4[2], B0 ||B .s2

Nine cycle loop kernel after physical register allocation. Note that prologue and epilogue are not shown.

Bio (In Focus Year 12)
67% (3)
Bio (In Focus Year 12)
636 pages
Computer Architecture
100% (2)
Computer Architecture
46 pages
Hls PDF
No ratings yet
Hls PDF
68 pages
Code Generation
No ratings yet
Code Generation
43 pages
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
97 pages
EE457Unit9a OoO
No ratings yet
EE457Unit9a OoO
77 pages
Lec02 Superscalar SW VLIW 22 23
No ratings yet
Lec02 Superscalar SW VLIW 22 23
34 pages
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
138 pages
5 Advanced-1
No ratings yet
5 Advanced-1
60 pages
Onur Digitaldesign - Comparch 2021 Lecture16 Out of Order Execution Beforelecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture16 Out of Order Execution Beforelecture
89 pages
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
No ratings yet
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
67 pages
Chapter 04 Processor 3.5
No ratings yet
Chapter 04 Processor 3.5
52 pages
Pipe 2 New
No ratings yet
Pipe 2 New
41 pages
Lec5 - ILP Issues in Pipeline Design
No ratings yet
Lec5 - ILP Issues in Pipeline Design
38 pages
Lecture Notes On Code Generation
No ratings yet
Lecture Notes On Code Generation
74 pages
07 Ooo Spec
No ratings yet
07 Ooo Spec
85 pages
Instruction Scheduler in LLVM
No ratings yet
Instruction Scheduler in LLVM
20 pages
18 Code Gen
No ratings yet
18 Code Gen
24 pages
Course 3 Module 5
No ratings yet
Course 3 Module 5
23 pages
Verilog Programming Styles
No ratings yet
Verilog Programming Styles
95 pages
High-Level Synthesis (HLS) : ECE 3401 Digital Systems Design
No ratings yet
High-Level Synthesis (HLS) : ECE 3401 Digital Systems Design
32 pages
Unit V
No ratings yet
Unit V
23 pages
Lecture10 - High-Level Digital Design Automation
No ratings yet
Lecture10 - High-Level Digital Design Automation
34 pages
06 Ooo Basics
No ratings yet
06 Ooo Basics
74 pages
CMP3010L07 Tomasulo
No ratings yet
CMP3010L07 Tomasulo
70 pages
Ee660 2017 Spring Materials Week 04 Slides
No ratings yet
Ee660 2017 Spring Materials Week 04 Slides
40 pages
05_Scheduling_24-25 (33897713)
No ratings yet
05_Scheduling_24-25 (33897713)
35 pages
Computer Architecture: Introduction To The Concept of Pipelined Processor
No ratings yet
Computer Architecture: Introduction To The Concept of Pipelined Processor
20 pages
Sdca Course Info
No ratings yet
Sdca Course Info
5 pages
Vliw/Epic:: Statically Scheduled ILP
No ratings yet
Vliw/Epic:: Statically Scheduled ILP
34 pages
Chapter 4 The Processor
No ratings yet
Chapter 4 The Processor
72 pages
Embedded Systems - 16CS402: Department of Computer Science and Engineering, Dayananda Sagar University, Bengaluru
No ratings yet
Embedded Systems - 16CS402: Department of Computer Science and Engineering, Dayananda Sagar University, Bengaluru
137 pages
COA DR MVN 5 UNIT - Latest PDF
No ratings yet
COA DR MVN 5 UNIT - Latest PDF
24 pages
Parallel Processing
No ratings yet
Parallel Processing
32 pages
Es ZG626 Course Handout
No ratings yet
Es ZG626 Course Handout
11 pages
Computer Architecture
No ratings yet
Computer Architecture
100 pages
16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013
No ratings yet
16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013
42 pages
Architectural and System Synthesis: Camposano, J. Hofstede, Knapp, Macmillen Lin
No ratings yet
Architectural and System Synthesis: Camposano, J. Hofstede, Knapp, Macmillen Lin
106 pages
Parallel Processing: 6.004x Computation Structures Part 3 - Computer Organization
No ratings yet
Parallel Processing: 6.004x Computation Structures Part 3 - Computer Organization
41 pages
M116C 1 M116C 1 Lec10-Pipeline-II
No ratings yet
M116C 1 M116C 1 Lec10-Pipeline-II
18 pages
Introduction To Compilers: Jun.-Prof. Dr. Christian Plessl Custom Computing University of Paderborn
No ratings yet
Introduction To Compilers: Jun.-Prof. Dr. Christian Plessl Custom Computing University of Paderborn
51 pages
1-6 Practice
No ratings yet
1-6 Practice
2 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
38 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
SAT Suite Question Bank - Problem Solving and Data Analysis AnsResults
No ratings yet
SAT Suite Question Bank - Problem Solving and Data Analysis AnsResults
113 pages
High Level Synthesis
No ratings yet
High Level Synthesis
4 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
28 pages
The Practice of Ecological Art Sacha KAGAN, Institute of Sociology 2014
No ratings yet
The Practice of Ecological Art Sacha KAGAN, Institute of Sociology 2014
7 pages
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
No ratings yet
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
14 pages
Scheduling Algorithms For High-Level Synthesis
No ratings yet
Scheduling Algorithms For High-Level Synthesis
10 pages
Sp23 Solution
No ratings yet
Sp23 Solution
22 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
30 pages
95 843 Xiameter Ofx 0531 Fluid
No ratings yet
95 843 Xiameter Ofx 0531 Fluid
5 pages
Super Scalar 2
No ratings yet
Super Scalar 2
46 pages
Verilog HDL: Top.v Cpu.v Ram.v Io.v
No ratings yet
Verilog HDL: Top.v Cpu.v Ram.v Io.v
126 pages
Topics: Basics of Register-Transfer Design
No ratings yet
Topics: Basics of Register-Transfer Design
41 pages
CA Lec06 Chpater 3 Dynamic Scheduling
No ratings yet
CA Lec06 Chpater 3 Dynamic Scheduling
113 pages
ACA Question Bank
No ratings yet
ACA Question Bank
19 pages
Uninterruptible Power Supply (UPS)
No ratings yet
Uninterruptible Power Supply (UPS)
11 pages
Exe On Pipelining
No ratings yet
Exe On Pipelining
12 pages
03 Pipeline
0% (1)
03 Pipeline
38 pages
CV Riston Belman Sidabutar
No ratings yet
CV Riston Belman Sidabutar
6 pages
List of Is Codes: SL No.1 To SL No.7 Are Design of Gates
No ratings yet
List of Is Codes: SL No.1 To SL No.7 Are Design of Gates
3 pages
SAT Suite Question Bank - 1 o 10 Difficult and Hard Grammar 2622024 Answers
No ratings yet
SAT Suite Question Bank - 1 o 10 Difficult and Hard Grammar 2622024 Answers
10 pages
Inventory Management and Control System
No ratings yet
Inventory Management and Control System
88 pages
ATI FT Sensor Catalog 2005
No ratings yet
ATI FT Sensor Catalog 2005
32 pages
Additive Manufacturing For 3-Dimensional (3D) Structures: (Emphasis On 3D Printing)
No ratings yet
Additive Manufacturing For 3-Dimensional (3D) Structures: (Emphasis On 3D Printing)
153 pages
SQL Commands
No ratings yet
SQL Commands
21 pages
Unit 1-Omd553-Telehealth Technology
No ratings yet
Unit 1-Omd553-Telehealth Technology
53 pages
CHE486 - EXPERIMENT 7 (Film Boiling Condensation) UiTM
No ratings yet
CHE486 - EXPERIMENT 7 (Film Boiling Condensation) UiTM
11 pages
Attack and Risk Analysis For Hardware Supported Software Copy Protection Systems
No ratings yet
Attack and Risk Analysis For Hardware Supported Software Copy Protection Systems
25 pages
Assignment # 1,2 - HE
No ratings yet
Assignment # 1,2 - HE
8 pages
Create Gantt Chart and Cash Flow Using Excel With A File
No ratings yet
Create Gantt Chart and Cash Flow Using Excel With A File
6 pages
Ilovepdf - Merged - 2024-06-07T151331.684
100% (1)
Ilovepdf - Merged - 2024-06-07T151331.684
7 pages
Chapter 1.
No ratings yet
Chapter 1.
6 pages
Sika® ViscoCrete®-TS 100-2
0% (1)
Sika® ViscoCrete®-TS 100-2
3 pages
Hype Cycle For Human Capital 2022
No ratings yet
Hype Cycle For Human Capital 2022
99 pages
Conf Micro 2006
No ratings yet
Conf Micro 2006
26 pages
Journal of Materials Processing Tech.: Harikrishna Rana, Vishvesh Badheka
No ratings yet
Journal of Materials Processing Tech.: Harikrishna Rana, Vishvesh Badheka
13 pages
Online Learning Interactions During The Level I Covid-19 Pandemic Community Activity Restriction: What Are The Important Determinants and Complaints?
No ratings yet
Online Learning Interactions During The Level I Covid-19 Pandemic Community Activity Restriction: What Are The Important Determinants and Complaints?
16 pages
Summary of Learning
No ratings yet
Summary of Learning
10 pages
Department of Computer Science and Engineering Subject Name: Advanced Computer Architecture Code: Cs2354
No ratings yet
Department of Computer Science and Engineering Subject Name: Advanced Computer Architecture Code: Cs2354
7 pages
Scaffolding in Learning
No ratings yet
Scaffolding in Learning
5 pages
Gaming and Digital Right Management On Mobile Handset: Larry Shi
No ratings yet
Gaming and Digital Right Management On Mobile Handset: Larry Shi
39 pages
Talk Gatech Ixp 2002
No ratings yet
Talk Gatech Ixp 2002
30 pages
An Intrusion-Tolerant and Self-Recoverable Network Service System Using A Security Enhanced Chip Multiprocessor
No ratings yet
An Intrusion-Tolerant and Self-Recoverable Network Service System Using A Security Enhanced Chip Multiprocessor
18 pages
An Integrated Framework For Dependable and Revivable Architecture Using Multicore Processors
No ratings yet
An Integrated Framework For Dependable and Revivable Architecture Using Multicore Processors
27 pages
Efficient Implementation of Packet Scheduling Algorithm On Network Processor
No ratings yet
Efficient Implementation of Packet Scheduling Algorithm On Network Processor
19 pages
Conf Graphics Hardware 2006
No ratings yet
Conf Graphics Hardware 2006
25 pages
Conf Wassa 2004
No ratings yet
Conf Wassa 2004
24 pages
Talk Gatech Lighting 2001
No ratings yet
Talk Gatech Lighting 2001
22 pages
High Efficiency Counter Mode Security Architecture Via Prediction and Pre-Computation
No ratings yet
High Efficiency Counter Mode Security Architecture Via Prediction and Pre-Computation
22 pages
Coolpression: A Hybrid Significance Compression Technique For Reducing Energy in Caches
No ratings yet
Coolpression: A Hybrid Significance Compression Technique For Reducing Energy in Caches
14 pages
Talk Gatech Game 2000
No ratings yet
Talk Gatech Game 2000
12 pages
Conf Nossavl 2010
No ratings yet
Conf Nossavl 2010
9 pages
Architectural Support For High Speed Protection of Memory Integrity and Confidentiality in Multiprocessor Systems
No ratings yet
Architectural Support For High Speed Protection of Memory Integrity and Confidentiality in Multiprocessor Systems
29 pages
Strategic Plan For 2011-2016: Vision
No ratings yet
Strategic Plan For 2011-2016: Vision
4 pages
Midterm Recap: Performance Evaluation
No ratings yet
Midterm Recap: Performance Evaluation
5 pages
VHDL 2 Proc
No ratings yet
VHDL 2 Proc
10 pages
Distributed-Lag Models: Dynamic Effects of Temporary and Permanent Changes
No ratings yet
Distributed-Lag Models: Dynamic Effects of Temporary and Permanent Changes
20 pages
MiniROVER Data Sheet 2013 Lo 1
No ratings yet
MiniROVER Data Sheet 2013 Lo 1
2 pages
Poster Gatech Power Dve 2002
No ratings yet
Poster Gatech Power Dve 2002
1 page
Least Mastered Skills in Math III Questionnaire
No ratings yet
Least Mastered Skills in Math III Questionnaire
3 pages
Capstone Update 2
No ratings yet
Capstone Update 2
2 pages

Talk Gatech DSP Compilation 2000

Uploaded by

Talk Gatech DSP Compilation 2000

Uploaded by

DSP compilation

Weidong Shi Supervisor: Dr. Kenneth Mackenzie YAMACRAW PROJECT

Functional Unit Assignment

Register Move Scheduling

Constraints -functional unit capacity -communication capacity

Final Instruction Selection & Code Generation

Pipelining Memory Data Access

Simple DSP Specification

Flow Graph - 2nd Order IIR

Determine Node Resource Set

Integer Adders Sn2={L1,S1,L2,S2,D1,D2}

Data Paths Integer Adders Sn4={RegAxL1,PcrossB, Sn2={L1,S1,L2,S2,D1,D2} RegAxS1,PcrossA,...}

Cs execution cycle of functional unit s

Integer Multipliers Sn1= {M1,M2}

Data Paths Sn4= {ProssA,PcrossB, PwithinA,PwithinB}

Integer Adders Sn2={L1,S1,L2,S2,D1,D2}

Communication Capacity Constraint

Pipelining Data Access

Pipelining Data Access

Register Source reg1 reg2 reg3 reg4

Register Dest reg5

Register Source rega regb

LD rega ST rege ST regf

Pipelining Data Access

reg2 reg3 reg4 reg1 reg5 reg6

regb regc regd rega

LD reg1 ST reg5 LD reg2 ST reg6

LD rega ST rege LD regb ST regf

reg2 reg3 reg4 reg6

regb regc regd

ST reg5 LD reg1 LD rega ST rege LD reg2 LD regb ST regf

Pipelining Data Access

reg2 reg3 reg4 reg6

regb regc regd

ST reg5 ST rege ST regf LD reg1 ST reg6 LD reg2 LD regb LD rega

Pipelining Data Access

LD reg2 reg1 reg5 rega LD reg3

Allocation of Loop Carried Registers

Allocation of Loop Carried Registers

1 reg1 reg2 reg3 reg4 reg5 reg6 reg7 reg8 reg9

Allocation of Loop Carried Registers

reg1 reg2 reg3 reg4 reg5 reg6 reg7 reg8 reg9

regA regB regC regD regE regF

1 reg1,reg3 reg2 reg4 reg5 reg6 reg9

reg1 reg2 reg3 reg4 reg5

Cost of allocating A, B, C, D to logical register 5

Cost of allocating A, B, C, D to logical register 5

Pipelining Data Access

Flow Graph-6th Order IIR

N11 X N16 X N20 X N22 X

Scheduling (6th Order IIR )

Functional Unit Assignment (6th Order IIR )

240 0-3 1-3 2-4 140

Assignment 1: This one has many cross-path violations

Assignment 2: only one cross-path violation (one of the best cases)

New Flow Graph With Pseudo-registers

Resolve Violation of Output Dependency and Cross-path Constraint by Register Move

Schedule Register Moves (6th Order IIR )

Dest reg37 reg43 reg44 reg45 reg46 reg47 reg48 reg49

MV2 MV4 MV3

Assembly 1 - loop kernel (6th Order IIR )

Eight cycle assembly based on the previous reservation table.

Assembly 2 (pipelining data access) - loop kernel (6th Order IIR )

Coefficient Arrays reg53 reg55

Assembly 3 (register allocation) - loop kernel (6th Order IIR )

A12, B12 A6, A5 A13, A5, A8 *+A4[1], A10 *+B4[1], B0 B14, 1

You might also like

A12, B12 A6, A5 A13, A5, A8 +A4[1], A10 +B4[1], B0 B14, 1