Menu

[r976]: / apl2002.tex  Maximize  Restore  History

Download this file

246 lines (186 with data), 10.7 kB

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
%% LyX 1.1 created this file. For more info, see http://www.lyx.org/.
%% Do not edit unless you really know what you are doing.
\documentclass[twocolumn]{article}
\usepackage[T1]{fontenc}
\usepackage[latin1]{inputenc}
\makeatletter
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands.
\providecommand{\LyX}{L\kern-.1667em\lower.25em\hbox{Y}\kern-.125emX\@}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% User specified LaTeX commands.
\usepackage{times}
\usepackage{a4}
\makeatother
\begin{document}
\title{ Vector Pascal an array language for Multimedia code}
\author{Paul Cockshott, University of Glasgow, Imaging Faraday Partnership}
\maketitle
\subsection*{Introduction}
The introduction of SIMD instruction sets\cite{IA32 }\cite{AMD}\cite{Peleg97}\cite{Intel00}
to Personal Computers potentially provides substantial performance increases,
but the ability of most programmers to harness this performance is held back
by two factors. The first is the limited availability of compilers that make
effective use of these instructionsets in a machine independent manner. This
remains the case despite the research efforts to develop compilers for multi-media
instructionsets\cite{Cheong97}\cite{Leupers99}\cite{Krall00}\cite{Sreraman00}.
The second is the fact that most popular programming languages were designed
on the word at a time model of the classic von Neuman computer rather than
the SIMD model.
\subsection*{Data parallelism in Vector Pascal}
Vector Pascal incorporates Iverson's approach to data parallelism. Its aim is
to provide a notation that allows the natural and elegant expression of data
parallel algorithms within a base language that is already familiar to a considerable
body of programmers and combine this with modern compilation techniques.
\subsubsection*{Assignment maps}
Standard Pascal allows assignement of whole arrays. Vector Pascal extends this
to allow consistent use of mixed rank expressions on the right hand side of
an assignment. Given
\textbf{r0:real; r1:array{[}0..7{]} of real; r2:array{[}0..7,0..7{]} of real}
then we can write
%\begin{enumerate}
\textbf{r1:= r2{[}3{]}; \{ in standard Pascal \}}
\textbf{r1:= 1/2; \{ 0.5 to each element of r1 \}}
\textbf{r2:= r1{*}3; \{ 1.5 to every element of r2\}}
\textbf{r2[3]:= r2[4]+r1 ; \{ self explanatory \}}
%\end{enumerate}
The variable on the left hand side of an assignment defines an array context
within which expressions on the right hand side are evaluated. Each array context
has a rank given by the number of dimensions of the array on the left hand side.
A scalar variable has rank 0. Variables occuring in expressions with an array
context of rank \emph{r} must have \emph{r} or fewer dimensions. The \emph{n}
bounds of any \emph{n} dimensional array variable, with \( n\leq r \) occuring
within an expression evaluated in an array context of rank \emph{r} must match
with the rightmost \emph{n} bounds of the array on the left hand side of the
assignment statement.
Where a variable is of lower rank than its array context, the variable is replicated
to fill the array context. Because
the rank of any assignment is constrained by the variable on the left hand side,
no temporary arrays, other than machine registers, need be allocated to store
the intermediate array results of expressions.
Maps are implicitly and promiscously defined on both monadic operators and unary functions.
If \textbf{f} is a function or unary operator
mapping from type \textbf{r} to type \textbf{t} then if \textbf{x} is an array
of \textbf{r} then \textbf{a:=f(x)} assigns an array of \textbf{t} such that
\textbf{a{[}i{]}=f(x{[}i{]})}.
Functions can return any data type whose size is known at compile time, including arrays and records.
A consistent copying semantics is used.
\subsubsection*{Operator Reduction}
Maps take operators and arrays and deliver array results. The {\em reduction }
abstraction takes a dyadic operator and an array and returns a scalar result. It is denoted
by the
functional form \textbackslash{}. Thus if a is an array, \textbackslash{}+a denotes
the sum over the array. More generally \( \setminus \Phi x \) for some dyadic
operator \( \Phi \) means \( x_{0}\Phi (x_{1}\Phi ..(x_{n}\Phi \iota )) \)
where \( \iota \) is the identity element for the operator and the type.
%Thus
%we can write \textbackslash{}+ for \( \sum \), \textbackslash{}{*} for \( \prod \)
%etc.
The dot product of two vectors can thus be written as :
\textbf{x:=\textbackslash{}+(y{*}z);}
Reduction is always performed along
the last array dimension of its argument.
\subsubsection*{Implicit indices}
Assignment maps and reductions involve implicit indices. It can be useful to have access to these.
\paragraph{iota\label{iota}}
The syntactic form \ {\bf iota }{\em i } returns the {\em i}th current implicit index.
%\paragraph{Examples}
Thus given the definitions
\textbf{ v1: array{[}1..3{]}of integer; v2: array{[}0..4{]} of
integer;}
the program fragment
\textbf{v1:=iota 0; }
\textbf{v2:=iota 0 {*}2;}
would set v1 and v2 as follows:
\begin{verbatim}
v1= 1 2 3
v2= 0 2 4 6 8
\end{verbatim}
\paragraph{trans}
The syntactic form {\bf trans\index{trans}}{\em x} transposes a vector matrix, or tensor. It achieves
this by cyclic rotation of the implicit indices\index{indices}\index{implicit indices}.
Thus if \textbf{trans} \emph{e} is evaluated in a context with implicit indices
\textbf{iota} \emph{0}.. \textbf{iota} \emph{n }
then the expression e is evaluated in a context with implicit indices
\textbf{iota}'\emph{0}.. \textbf{iota}'\emph{n}
where
\textbf{iota}'\emph{x} = \textbf{iota} ( (\emph{x+1})\textbf{mod} \emph{n+1})
It should be noted that transposition is generalised to arrays of rank greater
than 2.
\paragraph{Unary operations}
The unary operators supported are \textbf{+, -, {*}, /,
div, not, round, sqrt, sin, cos, tan, abs, ln, ord, chr, succ, pred} and \textbf{@}.
The dyadic operators \textbf{+, -, {*}, /, div} are all extended to unary context
by the insertion of an identity element under the operation.
For sets the notation \textbf{-s} means the
complement of the set \textbf{s}.
\paragraph{Dyadic Operations}
Dyadic operators supported are \textbf{+, +:, -:, -, {*}, /, div, mod , {*}{*},
pow, <, >, >=, <=, =, <>, shr, shl, and, or, in, min, max}. All of these are consistently
extended to operate over arrays. The operators {*}{*}, pow denote exponentiation
and raising to an integer power as in ISO Extended Pascal.
The operators +: and -: exist to support saturated
arithmetic on bytes as supported by the MMX instructionset.
\subsection*{Implementation}
The Vector Pascal compiler is implemented in Java. It uses the ILCG\cite{Cockshott00}(Intermediate Language for Code Generators)
portable code generator generator system.
%A Vector Pascal program is translated into
%an abstract semantic tree implemented as a Java datastructure. The tree is passed
%to a machine generated Java class corresponding to the code generator for the
%target machine.
Code generator classes currently exist for the Intel 486, Pentium
with MMX, and P3 and also the AMD K6.
%Output is assembler code which is assembled
%using the NASM assembler and linked using the gcc loader.
Vector Pascal currently
runs under Windows98 , Windows2000 and Linux. Separate compilation using Turbo
Pascal style units is supported. C calling conventions allow use of
existing libraries.
The code generators follow the pattern matching approach described in\cite{Aho}\cite{Cattel80}and
\cite{graham80}, and are automatically generated from machine specifications
written in ILCG .
% ILCG is a strongly
%typed language which supports vector data types and the mapping of operators over vectors.
%It is well suited to describing SIMD instructionsets. The code generator
%classes export from their interfaces details about the degree of parallelism
%supported for each data-type. This is used by the front end compiler to iterate
%over arrays longer than those supported by the underlying machine. Where supported
%parallelism is unitary, this defaults to iteration over the whole array.
Selection of target machines is by a compile time switch which causes the appropriate
code generator class to be dynamically loaded.
\subsection*{Conclusions}
Vector Pascal provides a new approach to providing a programming environment
for multi-media instructionsets. It borrows abstraction mechanisms that have
a long history of sucessfull use in interpretive programming languages, combining
these with modern compiler techniques to target SIMD instructionsets. It provides
a uniform source language that can target multiple different processors without
the programmer having to think about the target machine. Use of Java as the
implementation language aids portability of the compiler accross operating systems.
Work is underway to
port the BLAS library to Vector Pascal, and to develop an IDE and literate programming
system for it.
\begin{thebibliography}{10} \small
\bibitem{AMD}Advanced Micro Devices, 3DNow! Technology Manual, 1999.
\bibitem{Aho}Aho, A.V., Ganapathi, M, TJiang S.W.K., Code Generation Using Tree Matching
and Dynamic Programming, ACM Trans, Programming Languages and Systems 11, no.4,
1989, pp.491-516.
\bibitem{Cattel80}Cattell R. G. G., Automatic derivation of code generators from machine descriptions,
ACM Transactions on Programming Languages and Systems, 2(2), pp. 173-190, April
1980.
\bibitem{Cheong97}Cheong, G., and Lam, M., An Optimizer for Multimedia Instruction Sets, 2nd SUIF
Workshop, Stanford University, August 1997.
\bibitem{Cockshott00}Cockshott, Paul, Direct Compilation of High Level Languages for Multi-media
Instruction-sets, Department of Computer Science, University of Glasgow, Nov
2000.
\bibitem{graham80} Graham, S.L., Table Driven Code Generation, IEEE Computer, Vol 13, No. 8,
August 1980, pp 25..37.
\bibitem{IA32 }Intel, Intel Architecture Software Developers Manual Volumes 1 and 2, 1999.
\bibitem{Intel00}Intel, Willamette Processor Software Developer's Guide, February, 2000.
\bibitem{Krall00}Krall, A., and Lelait, S., Compilation Techniques for Multimedia Processors,
International Journal of Parallel Programming, Vol. 28, No. 4, pp 347-361, 2000.
\bibitem{Leupers99}Leupers, R., Compiler Optimization for Media Processors, EMMSEC 99/Sweden 1999.
\bibitem{Peleg97}Peleg, A., Wilke S., Weiser U., Intel MMX for Multimedia PCs, Comm. ACM, vol
40, no. 1 1997.
\bibitem{Sreraman00}Srereman, N., and Govindarajan, G., A Vectorizing Compiler for Multimedia Extensions,
International Journal of Parallel Programming, Vol. 28, No. 4, pp 363-400, 2000.
%
\end{thebibliography}
\end{document}
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.