0% found this document useful (0 votes)

4 views9 pages

Fast Algorithm

This paper presents a new fast algorithm for computing the 2-D discrete cosine transform (DCT) that significantly reduces the number of multiplications required compared to conventional methods. The proposed algorithm utilizes only N 1-D DCTs for an N x N DCT, where N = 2^m, resulting in a hardware implementation that requires only a quarter of the multipliers needed for traditional approaches. The algorithm is shown to be efficient for VLSI implementation and maintains a systematic computation structure.

Uploaded by

neelimanallamolu957

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views9 pages

Fast Algorithm

Uploaded by

neelimanallamolu957

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 38, NO.

3, MARCH 1Y91 291

Fast Algorithm and Implementation of 2-D

Discrete Cosine Transform
Nam Ik Cho, Student Member, IEEE, and Sang Uk Lee, Member, IEEE

Abstract -In this paper, a new algorithm for the fast computation of a for the larger size transforms, the 2-D recursive technique
2-D discrete cosine transform (DCT) is presented. It is shown that the employing the 4X 4 DCT [17] requires more multiplicationo
N X N DCT, where N = 2m, can be computed using only N 1-D DCT's than those of [15] and [161.
and additions, instead of using 2 N 1-D DCT's, as in the conventional
row-column approach. Hence the total number of multiplications for In this paper, a new fast 2-D DCT algorithm, which may
the proposed algorithm is only half of that required for the row-column be viewed as a modification and generalization of the 4 X 4
approach, and is also less than that of most of other fast algorithms, DCT [17], is proposed. It will be shown that the proposed
while the number of additions is almost comparable to that of others. It algorithm requires only N 1-D DCT's for the computation of
is also shown that only N / 2 1-D DCT modules are required for the N X N DCT, where N = 2". Hence the number of
hardware parallel implementation of the proposed algorithm. Thus the
number of actual multipliers being used is only a quarter of that multiplications required for the proposed algorithm is only
required for the conventional approach. half of that required for the conventional approach, which is,
in fact, the same number of multiplications as reported in
[16]. However, as compared with Duhamel's algorithm [16],
the proposed algorithm has advantage in that the computa-
I. INTRODUCTION tion structure is highly regular and systematic, and only real

S INCE D C T approaches the statistically optimal

Karhunen-Loeve transform (KLT) for highly correlated
signals, it is widely used in digital signal processing, espe-
arithmetic is required. Also, we shall show that N / 2 1-D
DCT modules are sufficient for the hardware parallel imple-
mentation of the algorithm. Hence the number of actual
cially for speech and image data compression [l], [2]. Thus multipliers being used in hardware implementation is a quar-
many algorithms and VLSI architectures for the fast computer of that required for the conventional approach. Thus the
tation of DCT have been proposed [3]-[lo]. For the fast proposed 2-D DCT algorithm is very suitable for the VLSI
computation of 2-D DCT, the conventional approach is the implementation.
row-column method. This method requires 2 N I - D DCT's The rest of the paper is organized as follows. In Section 11,
for the computation of the N X N DCT. However, for we will introduce a new fast 2-D DCT algorithm along with
hardware parallel implementation of the conventional ap- an examples for 8 x 8 DCT. Also, the examples for 4 x 4
proach, a complicated matrix transposition architecture as DCT, 8 x 8 inverse DCT (IDCT), and 4 x 4 IDCT are pro-
well as 2 N 1-D DCT modules is required. Thus for more vided. The comparison of the number of multiplications and
efficient computation or parallel implementation of the 2-D additions with other fast algorithms [14]-[16] is also given in
DCT, the algorithms that work directly on the 2-D data set this section. In Section 111, we shall discuss the parallel
have been introduced [ 111-[ 161. The most efficient 2-D DCT implementation of the algorithm. Finally, in Section IV, we
algorithm appeared in the literature is the direct polynomial give conclusions.
approach proposed by Duhamel[16], in which the number of
multiplications is reduced to 50% of the conventional ap- 11. THEFASTALGORITHM
A N D ITS
proach. On the other hand, the algorithms in [13] and [14] PARALLEL
IMPLEMENTATION
require 75%, and the indirect approach using the polynomial
transform FFT and rotation proposed by Vetterli [151 re- In this section, we shall describe a new fast algorithm for
quires between 50% and 75% of the conventional approach. 2-D DCT that requires only half the number of multiplica-
More recently, a fast algorithm for the 4 x 4 DCT is pro- tions compared to the conventional row-column method.
posed [17]. This algorithm is restricted to the size of 4 x 4 Also, we shall provide examples for 8 x 8 and 4 x 4 DCT's.
transform, because the derivation is very complicated to be The examples for IDCT are also given.
generalized to any 2" x 2 " cases, where m is a positive
integer. Hence it is useful for the computation of larger size A . A Fast 2-0 DCT Algorithm
transforms only by incorporating with the recursive 2-D DCT For a given 2-D data sequence { x j j : i,J = 0,l; . ., N - 11,
technique [141. In the case of the 4 x 4 DCT, it requires the the 2-D DCT sequence {Ynln:m ,n = 0,l; . ., N - 1) is given
same number of multiplications as in [15] and [16]. However,
by
4
Manuscript received April 25, 1990. This paper was recommended by Ymn = i u ( m > m (n )
Associate Editor T. R. Hsing. N
The authors are with the Department of Control and Instrumentation
Engineering, Seoul National University, Seoul 151-742, Korea.
IEEE Log Number 9041758.

0098-4094/91/0300-0297$01 .OO 01991 IEEE

298 IEEE TRANSACTIONSON CIRCUITS AND SYSTEMS, VOL. 38, NO. 3, MARCH 1991

where +
condition, we can see that ( 2j 1) should be either a multi-
+
ple of (2i 1) modulo 2 N or 2 N minus a multiple of (2i 1) +
u( m ) =
i::". m=O,
otherwise.
mudulo 2 N , i.e.,

( 2 j + l ) = p ( 2 i + l ) mod2N (7.a)
We will neglect the scale factor 4 u ( m ) u ( n ) / N 2 for conve-
nience. Then, let us define a denormalized form of y,, as or

N-l N-l (2i + 1)m ( 2 j + 1)n ( 2 j + 1) = 2 N - p ( 2 i + 1) mod 2 N (7.b)

Ymn= C C xijcos
;=o j=o 2N n-cos ____2N
T (2.a)
where p is an odd integer ranging from 1 to N - 1, because
the value of p out of this range yields the same value of j as
i.e.. one of those produced by the p in the range. The relations
in (7) are equivalent to
4
Y,, = -2 u ( m )U ( n 1. Ymn. (2.b) j = p i + ( p - 1 ) / 2 mod N (8-a)
The main idea behind the 4 x 4 algorithm proposed in [17] or
is that the 4 x 4 DCT can be decomposed into four separate j=N-l-pi-(p-l)/2 mod N ,
four-point 1-D DCT's by using the following relations:
for p = 1,3; .., N - 1 . (8.b)
(2i+I)m (2j+I)n
cos
2N
T cos
2N
n- - It can be easily shown that when i ranges from 0 to N - 1,
N sequences for j obtained by (8) are mutually different.
(2i + 1 ) m + ( 2 j + 1). Thus the 2-D input data set can be grouped into N different
= 12 (cos 2N
7r data sets, each of whose indexes satisfies the relations in (8).
Then, we can see that the kernel of the transforms for each
( 2 i + 1 ) m - ( 2 j + 1)n of these data sets is equivalent to that of 1-D DCT. To
+ cos 2N T). (3) distinguish each of the sequences of j obtained by (8.a) or
(8.b) for p = 1,3,5,. . ., N - 1, let us denote them as
In this paper, we shall make use of the above relation for
the computation of the N X N DCT, where N > 8. Using j ( p ; a ) = p i + ( p - 1 ) / 2 mod N (9.a)
the relation in (3), the N X N DCT can be separated into or
two transforms as given by
j ( p ; b ) = N - 1 - pi - ( p - 1 ) / 2 mod N ,
N-1 N-l (2i+I)m+(2j+I)n
T forp=1,3;..,N-l,and i=0,1,2;*.,N-l. (9.b)
y,,=1/2{ i = o j = o xjjcos 2N
That is, for given p , { j ( p ; a ) :i = 0,l; . ., N - 1) is the se-
N-1 N-1
( 2 i + 1 ) m - ( 2 j + 1). quence of j obtained by (8.a) and { j ( p ;b): i = 0,1,. . ., N - l }
+ xjjcos is the sequence of j obtained by (8.b). Hence, by grouping
i=o j-0 2N the 2-D input sequence {xlj: i , j = 0,1,2; . ., N - 1) into N
1-D sequences {xij(p;a):i = 0,1,2; . N - 1) and {xij(p;b):
e ,

for m , n = 1 , 1 , 2 ; . . , N - l . (4) i = 0 , 1 , 2 ; . . , N - l ) for p = 1 , 3 , 5 ; . . , N - l , the 1-D trans-

forms in ( 5 ) can be expressed as sum of 1-D DCT's. We will
For convenience, by defining new transforms A,,
as
and B,, denote these I-D data sequences by R i and Ri, respectively.
Then, they can be expressed as
N-1 N-l ( 2 i + l ) m + ( 2 j + 1).
A,,= c
;=o j=o
xijcos
2N
n- (5.a) R; ={ x ~ ~ ( i~=; 0,1,2;.
~): ., N -1,

j ( P ; a ) = p i + ( p - 1 ) / 2 mod N I . (l0.a)
N-1 N - l (2i+I)m-(2j+I)n
B,, = xijcos T (5.b)
i=o j - 0 2N R:={Xjj(p;b): i = OJ,2,. ' ' 7 N -1,
y,, can be rewritten as j(p;b)=N-l-pi-(p-1)/2 mod N } ,

Ym, = 1/2(Am, + B m n ) . for p = 1,3,5;. N -1. (lO.b)

( 6) 1,

Now, we shall show that A,, and B,, can be expressed in However, for the proof of which we are in pursuit, it is
terms of N 1-D DCT's by some data ordering and manipula- necessary to know the exact result of pi + ( p - 1)/2 divided
tions, SO that the N x N DCT can be obtained from N by N , while only the remainder of the division can be
separate I-D DCT's. It is noted that the condition for the perceived from (10). In other words, we need to know the
kernels of the transforms in (5) to be equivalent to that of quotient of the division as well as the remainder. Hence, by
+ +
1-D DCT's is that {(2i 1)m +(2j 1)n) should be expressed introducing a new integer sequence qpi, which is a quotient
as (2i+1) multiplied by some integer. To satisfy such a of pi + ( p - 1)/2 divided by N , we can rewrite (10) (without
CHO AND LEE: FAST ALGORITHM 299

"mod") as Then, from (6), y,, can be rewritten as

N-1
R; = { x ~ ~ ( i~=; O,1,2;
~ ) : . . ,N - I , ymn= c
p=l
1/2{T,"(m,n)+T,b(m,n)
j ( P ;a ) = Pi + ( P - 1)/2 - m,,} (1l.a) ( p : odd)

RE=(Xjj(p;b):i = O , 1 , 2 , ' " , N - I ,

+ S ; ( m , n ) + S,b(m,n)}. (16)
Thus, in order to show that ymn is the summation of 1-D
j ( p ; b ) = N - 1- pi - ( p - 1)/2 + Nq,;), DCT's, it remains to show that T:(m, n ) , T,b(m,n), S$m, n ) ,
and S,b(m,n) can be expressed in terms of 1-D DCT's. In
for = 1,3,5,. . . , - 1. (ll.b)
order to do so, by substituting the relation in (1l.a) into
As an example, for N = 8, since p has the value of 1, 3, 5, (15.a), we have
and 7, the 8 X 8 2-D data set can be grouped into Tpa(m,n)
Rq = ('00, '113 '227 '339 '44, ' 5 5 , '66, '77) (12.a) N-l (2i + l ) m + ( p ( 2 i + 1) -2Nqp,}n x
= Xij(p;a)COS
RP = ( '07 > ' 16 9 '25 > '34 > '43 9 '52 9 '61 9 '70) ( 12.b) i=O
2N
but it can be separated into two cases where n is even or
R ~ = ( x ~ ~ ~ X 1 4 ~ X 2 7 ~ X ~ 2 ~ X 4 ~ ~ X ~ 0(12")
~ X 6 3 ~ odd,
X 7 6 i.e.,
)

C Xij(p:a)COS
i=O 2N
x,

when n is even (17.a)

i
=
R; = ('03, '12, '21, '30, '47, '567 x65, '74) (12.g)
(2i+l)(m+np)
R$ = {'M? '15, '267 x 4 0 , '51 x62 ? '731 ( 12'h) c(
N-l

r=O
- l)Y""l/(p,apS 2N
n-?

where the quotient sequences qp,'s for p = 1, 3, 5, and 7 are

411 = (0,0,0,0,0,0,0,0~ (13.a) when n is odd. (17.b)

(13.b) Also, substitution of the relation in (1l.b) into (15.b) leads to

q3, = (0,0,0,1,1,2,2,2}

N-1 (2i+l)(m -np)+2N(l+qp,)n

= Xij(p:b)CoS 77
i=O
2N
but this is also expressed separately depending on n , i.e.,

I
N-1 (2i+l)(m-pn)
'ij(p;b)'OS 77,
i =0 2N

when n is even (1S.a)

T,b(m,n)=

(2i+l)m+(2j+l)n
t when n is odd. (18.b)

Tp"(m,n)= XI/ cos n- (15.a) In the same way, by substituting (1l.a) into (15.c), we have
X,, E R;
2N

(2i +l)m +(2j+l)n C Xij(p;a)COS

i=O 2N
TTT,

T,h(m,n) = x,/cos x (15.b)

2N
XZJ E Rf,
when n is even (19.a)
(2i + 1 ) m - ( 2 j + 1)" S,.(m,n) = 1

S;(m,n)=
x#J E R:
x,,cos
2N
n- (15.c)
(2i + 1 ) (m - np)
=TT,
2N
(2i+I)m-(2j+l)n
S,b(m,n)=
XtJ E R,h,
xijcos
2N
77. (15.d)
I when n is odd. (19.b)

-m-
I

300 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 38, NO. 3, MARCH 1991

Also, by substituting (1l.b) into (15.d), we have it can be seen that

+ np)
[
N-1
(2i + 1)(m + p n ) (2i + l ) ( m
( - l)qP'(Xij(p;a)- Xij(p;b))cos rr
i =i Ol X i j ( p ; b ) c o S
N 2N rr' i=O 2N
(25 .a)
I when n is even (20.a) and
S,b(m,n) = N-1
(2i+I)(m-np)
- (2i+l)(m+np) C (- "'( Xij(p;n) - Xij(p;b)) cos
2N
rr
( - l)qp'Xij(p;b) cos r 7 i=O
i=O 2N
(25.b)

I when n is odd. (20.b) correspond to one of k gpl for some I = 0,1,2; * ., N - 1.

Hence, for the computation of the N X N DCT sequence
{ym,,: m, n = 0,1,2; . ., N - 11, we need only {fpl: I =
Then, by substituting (17)-(20) into (16), we can express ymn
0,1,2;..,N-l} and (gpl: 1 = 0 , 1 , 2 ; . . , N - 1 } for p =
as
1,3,5,. . .,N - 1. This implies that the computation of N X N
DCT requires only the computation of N 1-D DCT's.
Now, after f p l and gpl are obtained, let us discuss the
additions and other operations required for the computation
(p: odd) of N X N DCT. From (21) and the definitions in (23) and
(2i + l ) ( m + np) (24), it is seen that ym,'s are expressed in terms of the
cos
2N TI summation of fpl's and gpl's. In order to see the relation-
N-l
(2i+I)(m-np)
ships that exist between ymn and fpl's,gpl's, for some arbi-
trarily chosen m and n , let us give some examples for N = 8
,E
t r=o (xt,(P;L?) + x,,(P,6))cos 2N TI]'
as follows:
when n is even (21 .a) 0 1/2(f,3
~ 3= + f i 3 + f 3 3 + f33 + f53 + f s 3 + f73 + f 7 3 )
(26.a)
Y52 = 1 / 2 ( f 1 7 + f13 - f35 + f31 - f 5 l + f 5 5 - f73 - f 7 7 )
(26.b)

I when n is odd.

But, it can be seen that

and
N-l
(2i + 1)(m - n p )
C
i=O
(Xij(p;a)+ Xij(p;b))COS
2N
T (22.b)
In the above example, addition operation in terms of fpl's
correspond to one of 1-D DCT's of data sequence ( x ~ ~ ( ~ ; ~and +
, gpl's for computing ymn's looks complicated. However,
we shall show that the addition operation can be imple-
Xj,(p;b)} depending on m and n. That is, by defining
mented by butterfly stages as in ordinary fast algorithms for
the discrete Fourier transform (DFT) and DCT. Now, in the
N-1 (2i 1)l +
case n is even, it can be seen that
fpl =
i=O
( X i j ( p ; u ) + Xij(p;b))cos rr (23) 7
(2i f 1)(m - n( N - p ) ) (2i + I ) ( m + n p )
cos rr = &cos
we can see that (22.a) and (22.b) correspond to one of f p I 2N + 2N
or - fpl for some 1 = 0,1,2; . ., N - 1. In the case of (21.b),
(27)
by defining
which implies that if
CHO AND LEE: FAST ALGORITHM

for some 1 = 0,1,2,. . .,N -1, then

N-l
(Xij(N-p;u) + xij(N-p;b))
i=O

(2i + l ) ( m - n( N - p ) ) (28 .b)

.cos 57 = ff ( N - p ) l .
2N
In the same way, for n odd, it can be shown that if
N-1

(2i+l)(m+np)
' cos 77- = g,l (29.a)
2N
then
N-l
( - l)qP'(Xij(N-p;a) - xij(N-p;b))
i=O

(2i + 1)( m - n( N - p))

.cos 'T = f g(N-pXN-1). (29.b)
2N
These relationships reveal that fpl always appears with
& f(N-p,l in (26), allowing us to form a butterfly stage. In
the case of g p l , since it appears with f g ( N - p ) ( N - l ) , we can
also form a butterfly stage. For the example of N = 8, (26)
can be rewritten as follows:
y30=1/2{(f13 +f73) +(f13+f73) +(f33+f53)

+ ( f 3 3 + f53)) (30.a)

Y52 = (f17 -f77) + ( f13 - f73)

-(f35+f.55)+(f31-f51)~ (30.b)

Y34 = ' l 2 { ( f 1 7 + f77) '(fll +f71)

- ( f 31 + f S 1 - (f31 + f 5 l ) (30 .c)

Y 26 = ( + O) + ( f 1 4 - f 7 4
- ( f 3 4 - f 5 4 ) - ( f 3 0 + f50)) (30.d)

Y41= ' l 2 { ( g l 5 + g73) +( g l 3 - g7.5)

+ ( g 3 7 + g51) + ( g 3 , - g57) 1 (30.e)

Y03=1/2{(g13- g75)+(g13-g75)

- ( g 3 7 + g5l) - ( g 3 7 + g d ) ( 30 .f)
y35=1/2{(0+ g70) +(gl2 + g76)

-(g32 + g5d - ( g 3 4 - g 5 4 ) ) (30.4

y.57 = -( gl4 + g74) '(gI2 - g76)
(b)
+ ( g 3 6 + gs*) - (g30 + O ) ) . (30.h)
Fig. 1. The signal flow graph for 8 x 8 DCT. (a) Signal flow graph from
Based on the formations shown above, as an example, the x i j to f and gpi. (b) Signal flow graph from fpi to ymn where n is
signal flow graph for an 8 x 8 DCT algorithm is shown in Fig. even. (cy Signal flow graph from g P i to ymn where n is odd. Broken
1. The signal flow graph is separated into three parts for lines represent transfer factors - 1 and full lines represent unity transfer
factor. 0 represents adders and + with 1/2 represents multiplication
convenience, i.e., Fig. l(a) is the signal flow graph from xij's by 1/2, which is equivalent to shift operation.
to fpr's and gpl's, Fig. l(b) is from fp,'s to ymn's, where n is
even, and Fig. l(c) is from gpl's to ymn's, where n is odd.
From Fig. l(a), it is seen that the 8 X 8 DCT requires only 8 In Figs. 1 and 2, since the multiplications by one-half are
I-D DCT's. From Fig. l(b) and (c), it is also seen that the equivalent to shift operations, the multiplications are re-
addition operations after the I-D DCT stages can be imple- quired only for the computation of I-D DCT's. Conse-
mented in butterfly form. The example for a 4 x 4 DCT is quently, the number of multiplications required for N X N
also shown in Fig. 2. DCT is equivalent to that for N 1-D DCT's.
I , I

3L2 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 38, NO. 3, MARCH 1991

xoo
XI1
x22
x33
‘03
XI2
x21
x30
‘01
XI0
‘23
x3Z
xo2
XI3
xzo
31

Fig. 2. The signal flow graph for 4 X 4 DCT.

the I - D DCT’s, we may use any existing 1-D DCT algo-

rithms. When the highly efficient I - D DCT algorithms pro-
posed by Lee [61 or Hou [71 are used in the I-D DCT
(C)
computation, which require ( N/2)log2 N multiplications for
an N-point 1-D DCT, the required number of multiplica-
Fig. 1. (Continued) tions for the N X N DCT is given by
B. Signal Flow Graph for IDCT M = ( N 2 / 2 ) log, N.
In the case of orthogonal transforms, if the scale factor is On the other hand, the required number of additions is
not taken into account, then the signal flow graph for the the summation of those required for N I-D DCT’s and those
inverse transform is just the inverse of that for the forward for other additions as shown in Fig. 1. In other words, as can
transform. Similarly, in the proposed algorithm, the signal be seen from Figs. 1 or 2, we need additions for (1 +log, N)
flow graph for the denormalized IDCT can be obtained by butterfly stages and for the 1-D DCT stage. However, at the
simply inverting the forward DCT. However, we need some last stage, it is observed that the additions for yoj’s, yio’s and
modifications if we take into account the scale factor as y ( y / 2 X N / 2 ) are not required. Also, we can see that g,,’s and
shown in (1). As can be seen from the proposed signal flow fpo s, except for flo and f30, do not require butterfly pairs.
graph for the forward transform in Figs. 1 or 2, there are Thus the number of additions, except for those required for
some nodes that do not have their pairs. But this problem 1-D DCT stages, is shown to be N 2 ( 1 + l o g 2 N ) - 2 N -
can be easily solved by multiplying the node variables by 2 (N-2). Since the number of additions required for 1-D
when the direction of the flow is inverted. For example, in DCT is (3N/2)log2N- N + 1 [6], [7], the total number of
Fig. l(b), it is seen that two nodes in the line of fso and f,” additions required for the N x N DCT is given by
do not have their pairs, and thus the node variables should
be multiplied by 2 to keep the inverse flow correctly. Also,
due to the scale factor required for 1-D IDCT, the zeroth Thus, with the computations according to (31) and (321, we
input to 1-D IDCT should be multiplied by one-half, which is can compare the number of multiplications and additions
equivalent to shift operation. However, in the case of 1-D with those required for other fast 2-D DCT algorithms, such
IDCT’s for gp,’s,the multiplications by one-half for gp0’s and as [14]-[16]. The results are summarized in Table I. It is seen
the multiplications by 2 for the node variables are cancelled that the proposed algorithm requires the same number of
out by each other. The signal flow graph for the 8x8 and multiplications as in [16], which is the least of all and only
4 x 4 IDCT’s are shown in Figs. 3 and 4, respectively. If one half of that required for the conventional approach, while
warts to maintain the scale factor for output xij’s correctly, the number of additions is almost comparable to that of
it is necessary to multiply every node variables by one-half or other algorithms.
to divide the input y,,’s by N2/2. However, in the case of
fixed point computation, the former approach is better. In 111. PARALLEL
IMPLEMENTATION
summary, when the output of the forward transform is used OF THE PROPOSED
ALGORITHM
as input to these signal flow graphs for IDCT’s shown in
Figs. 3 and 4, they generate the same data sequence as the For VLSI or hardware parallel implementation of an
input to the forward transform. algorithm, reducing the number of multipliers is very impor-
tant, because they occupy a large area of the chip. Also
important considerations are regularity, modularity in the
C. Comparison with Other Fast 2 - 0 DCT Algorithms computation structure, and the complexity of data access
scheme. In this context, we first describe an implementation
In this section, we will compare the number of multiplica- scheme which reduces the number of multipliers being used
tions and additions with those of other fast algorithms for parallel implementation, and then discuss the problems
[14]-[16]. Let the number of multiplications and additions such as modularity, regularity, and data access scheme of the
required for the proposed algorithm by M and A , respec- architecture.
tively. Previously, we have shown that only N 1-D DCT‘s are It was shown that the number of multiplications required
required for the computation of the N x N DCT, and thus for the proposed algorithm is equivalent to that required for
the number of multiplications is only half of that required for N I-D DCT’s. Also, it seems that N I-D DCT modules are
the conventional row-column approach. In implementing required to compute N x N DCT in parallel. However, we

nl-- T
CHO AND LEE: FAST ALGORITHM 303

input

Y 11
YO1
Y77
Y67
Y57
Y47
Y37
Y27
Y17
Y45
Y55
YS5
Y5l
Y4l
Y3l
Y2l
Yl5
Y25
Y35
Ya3
Y3a
Y23
Y13
Yo3
Y75
Y61
Y7l
Yo7
Y53
Y63
Y73
YO 5

Fig. 3. The signal flow graph for 8 x 8 IDCT. (a) Signal flow graph from ymn to f,,, where n is even. (b) Signal flow graph
from ymn to g,!, where n is odd. (c) Signal flow graph from f,, and g,, to x,,.

TABLE I
COMPARISON
OF THE NUMBER
OF MULTIPLICATIONS
AND ADDITIONS
~~~

Number of multiplications Number of additions

Conven- Conven-
tional Other fast algorithms Proposed tional Other fast algorithms Proposed

4x4
8x8
16x16
32x32
algorithm
32
192
1024
5120
[14]
24
144
768
3840
[I51
16
104
568
2840
[I61
16
96
512
2560
algorithm

512
2560
i: 1 algorithm
72
464
2592
13376
[14]
72
464
[15]
70
462
[I61
68
484
2592 2558 2531
13376 12950 12578
algorithm
74
466
2530
12738

o u t p lI t 111put shall show that N / 2 1-D DCT modules are sufficient by the
XOO use of multiplexers and demultiplexers. In Fig. l(a), it is seen
XlI
that the results for f,, and gpl remain to be the same even if
3
x22

03
the order of addition and 1-D DCT operation is reversed.
XI2 Hence the signal flow graph in Fig. l(a) is equivalent to Fig.
XZI
x30 5, in which the order of addition and 1-D DCT operation is
xo1 reversed for the data sets Rt;, R!, Rq, and R$. Now, by using
XI0
3 3 the multiplexers and demultiplexers as shown in Fig. 6, we
'32
x02 can reduce the number of 1-D DCT modules to N / 2 . That
'13
x20
is, in the upper part of Fig. 6, while the additions for R: and
x31 R! are in progress, 1-D DCT's for R! and Rb can be started.
Fig. 4. The signal flow graph for the 4 x 4 IDCT. Then, the results of additions for Rf and R f a r e sent to 1-D
304 IEEE TRANSACTIONS O N CIRCUITS AND SYSTEMS, VOL. 38, NO. 3, MARCH 1991

DCT
-
1- D

DCT

DCT
-
-
1- D

DCT
x ; w - - - -%G -

Fig. 5. Alternate signal flow graph of Fig. ](a). Fig. 6. Implementation of Fig. l(a) with multiplexers and demulti-
plexers.

DCT processors. This is possible because the 1-D DCT x 00

P
architectures in [6] and [7] allow us to perform the multiple x 11
x 22
processing in parallel and pipelined environment. More x 33
X 03
specifically, since the data move successively in the pipelined x 12
x 21
structure, we can start the computation of next input data X 30
x 01
immediately after the current input data. Thus it is noted x 10
x 23
that the 1-D DCT module need not run twice the speed, x 32
x 02
yielding almost the same computation time as compared to X 13
that of implementation with N 1-D’DCT modules. Similarly, x 20
X 31
in the lower part of Fig. 6, while the additions for R! and R$
are being performed, 1-D DCT’s for R4 and R: can also be Fig. 7. Parallel implementation of the 4x4 DCT algorithm.
started. Then, the results of additions are sent to 1-D DCT
modules throughout the multiplexer. The example for the
quires more complicated data access scheme and computa-
parallel implementation of 4 x 4 DCT is also shown in Fig. 7.
tion structure than the simple matrix-vector (row-column)
From Figs. 6 and 7, it is seen that the number of 1-D DCT
approach [ H I . But the matrix-vector approach results in the
modules required for N X N DCT is N/2, which is a quar-
largest chip area. There are always trade-off‘s between the
ter of that required for the conventional approach.
chip area and computation time, and hence there exist many
If we consider modularity and regularity, the proposed
variations in the implementations. Conclusively, it is very
implementation scheme has advantage over other fast algo-
difficult to determine an appropriate criterion for the VLSI
rithms such as [15] and [16], in which the polynomial trans-
implementation of the algorithms. However, this problem is
form and the complex arithmetic are required. Hence, the
beyond the scope of this paper.
proposed algorithm is believed to be more suitable for the
VLSI implementation than other 2-D FDCT algorithms in
terms of the number of multipliers, modularity, and regular- IV. CONCLUSIONS
ity. However, there are another problems that should be In this paper, a fast algorithm for the 2-D DCT is pro-
addressed in the VLSI implementation. For example, like posed. It is shown that the N X N DCT is obtained from N
most of other fast algorithms, the proposed algorithm re- 1-D DCT’s with some additions and shift operations. Thus
CHO AND LEE: FAST ALGORITHM 305

the total number of multiplications required for the N X N [Ill J. Makhoul, “A fast cosine transform in one and two dimen-
DCT is N times that for the I - D DCT, which is only half of sions,” IEEE Trans. Acoust., Speech, Signal Processing, vol.
ASSP-28, pp. 27-34, Feb. 1980.
that required for the conventional row-column approach. [12] F. A. Kamangar and K. R. Rao, “Fast algorithms for the 2-D
Hence the number of multiplication is the same as that of discrete cosine transform,” IEEE Trans. Comput., vol. C-31,
previously reported algorithm [16], which is known to be the pp. 899-906, Sept. 1982.
best in terms of the number of multiplications, while the [ 131 M. A. Haque, “A two-dimensional fast cosine transform,”
IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-33, pp.
number of additions is comparable to others. However, the 1532-1539, Dec. 1985.
proposed algorithm has advantages that it has regular and [14] C. Ma, “A fast recursive two dimensional cosine transform,”
systematic structure, and requires only real arithmetic, while Intelligent Robots and Computer Vision: Seventh in a Series,
the algorithm in [16] requires complex arithmetic. Also, in David P. Casasent, Ed., in Proc. SPIE 1002, pp. 541-548, 1988.
this paper, for the purpose of reducing the hardware com- [15] M. Vetterli, “Fast 2-D discrete cosine transform,” in Proc.
ICASSP’85, Mar. 1985.
plexity for the parallel implementation, an alternative scheme [16] P. Duhamel and C. Guillemot, “Polynomial transform compu-
is described with slight increase in time complexity. The tation of 2-D DCT,” in Proc. ICASSP ’90,pp. 1515-1518, Apr.
proposed scheme requires N / 2 I - D DCT modules, while 1990.
the direct implementation requires N 1-D DCT modules. [17] N. I. Cho and S. U. Lee, “A fast 4 x 4 DCT algorithm for the
Since there are always trade-offs between the chip area and recursive 2-D DCT,” ZEEE Trans. Acoust., Speech, Signal
Processing, submitted for publication.
computation time, it is very difficult to compare the perfor- [18] M.-T. Sun, T.-C. Chen, and A. M. Gottlieb, “VWI implemen-
mance of the implementation of the fast algorithms. How- tation of a 16x 16 discrete cosine transform,” IEEE Trans.
ever, considering only the hardware complexity, the pro- Circuits Syst., vol. 36, pp. 610-617, Apr. 1989.
posed algorithm is advantageous in that it requires very small
number of multiplications and has regular and systematic
structure compared to other fast algorithms.
Finally, another important aspect in a VLSI implementa-
tion is the precision of the algorithm, i.e., the amount of
errors due to the fixed-point implementation. This problem
is currently under investigation. Nam Ik Cho (S’86) received the B.S. and
M.S. degrees from Seoul National University,
REFERENCES Seoul, Korea, in 1986 and 1988, respectively,
in control and instrumentation engineering.
[I] N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine He is currently working toward the Ph.D.
degree at Seoul National University.
transform,” IEEE Trans. Commun., vol. COM-23, pp. 90-93, His research interest is in digital signal
Jan. 1974. processing, including adapative filtering and
[2] N. Ahmed and K. R. Rao, Orthogonal Transformy for Digital VLSI implementation.
Signal Processing. New York: Springer-Verlag, 1975.
[3] W. H. Chen, C. H. Smith, and S. C. Fralick, “A fast computa-
tional algorithm for discrete cosine transform,” IEEE Trans.
Commun., vol. COM-25, pp. 1004-1009, Nov. 1977.
[4] M. J. Narashimha and A. M. Peterson, “On the computation of
the discrete cosine transform,” IEEE Trans. Commun., vol.
COM-26, pp. 934-936, June 1978.
[51 M. D. Wagh and H. Ganesh, “A new algorithm for the discrete
cosine transform of arbitrary number of points,’’ IEEE Trans
Comput., vol. C-29, pp. 269-277, Apr. 1980.
[61 B. G. Lee, “A new algorithm to compute the discrete cosine Sang Uk Lee (S’75-M’79) received the B.S.
transform,” IEEE Trans. Acoust ., Speech, Signal Processing, degree from Seoul National University in
vol. ASSP-32, pp. 1243-1245, Dec. 1984 1973, the M.S. degree from Iowa State Uni-
[7] H. S. Hou, “A fast recursive algorithms for computing the versity in 1976, and the Ph.D. degree from
discrete cosine transform,” IEEE Trans. Acoust., Speech, Sig- the University of Southern California, Los
nul Processing, vol. ASSP-35, pp. 1455-1461, Oct. 1987. Angeles, in 1980, all in electrical engineering.
[81 N. I. Cho and S. U. Lee, “DCT algorithms for VLSI parallel In 1980, he was with General Electric,
implementation,” IEEE Trans. Acowt., Speech, Signal Process- Lynchburg, VA, and in 1981 he joined the
ing, vol. 38, pp. 121-127, Jan. 1990. M/A-COM Research Center, Rockville, MD.
[91 M. Vetterli and H. Nussbaumer, “Simple FFT and DCT algo- He is now with the Department of Control
rithms with reduced number of operations,” Signal Process, vol. and Instrumentation at Seoul National Uni-
6, pp. 267-278, Aug. 1984 versity, where he is an Associate Professor. His current research
[lo] P Duhamel and H. H’Mida, “New 2“ DCT algorithms suitable interests are in the areas of image and speech signal processing,
for VLSI implementation,” in Proc. ICASSP’87, pp. 1805-1808, including VLSI and neural computing.
1987. Dr. Lee is a member of Phi Kappa Phi.

Module 2 Image Transforms
No ratings yet
Module 2 Image Transforms
21 pages
DCT/IDCT Implementation With Loeffler Algorithm
No ratings yet
DCT/IDCT Implementation With Loeffler Algorithm
5 pages
2.8 DCT Notes
No ratings yet
2.8 DCT Notes
8 pages
Abu Dhabi Ports Company (PJSC) : Shamal Development - New 33/11Kv Substation
No ratings yet
Abu Dhabi Ports Company (PJSC) : Shamal Development - New 33/11Kv Substation
52 pages
DCT Presentation1
100% (1)
DCT Presentation1
39 pages
Architecture For Efficient Implementation of 3 - D DCT - Ii
No ratings yet
Architecture For Efficient Implementation of 3 - D DCT - Ii
6 pages
4.3 2-D Discrete Cosine Transforms: N N K N N K N N X K K X
No ratings yet
4.3 2-D Discrete Cosine Transforms: N N K N N K N N X K K X
19 pages
VLSI Architecture For DCT Based On High Quality DA: Urbi Sharma, Tarun Verma, Rita Jain
No ratings yet
VLSI Architecture For DCT Based On High Quality DA: Urbi Sharma, Tarun Verma, Rita Jain
4 pages
Tenses - Ready Reckoner: Tense Affirmative/Negative/Question Use Signal Words
100% (2)
Tenses - Ready Reckoner: Tense Affirmative/Negative/Question Use Signal Words
7 pages
Improved Performance For "Color To Gray and Back" For Orthogonal Transforms Using Normalization
No ratings yet
Improved Performance For "Color To Gray and Back" For Orthogonal Transforms Using Normalization
6 pages
binDCT VLSI
No ratings yet
binDCT VLSI
14 pages
DC-6 Om
100% (4)
DC-6 Om
522 pages
L8 - Discrete Cosine Transform
No ratings yet
L8 - Discrete Cosine Transform
17 pages
2 - FPGA Implementation of Pipelined 2D-DCT and Quantization Architecture For JPEG Image Compression.
No ratings yet
2 - FPGA Implementation of Pipelined 2D-DCT and Quantization Architecture For JPEG Image Compression.
6 pages
Polynomial Transform Based DCT Implementation
No ratings yet
Polynomial Transform Based DCT Implementation
5 pages
Discrete Cosine Transform PDF
No ratings yet
Discrete Cosine Transform PDF
4 pages
2D Discrete Fourier Transform PDF
No ratings yet
2D Discrete Fourier Transform PDF
40 pages
Medical Astrology - Medicine by The Stars
No ratings yet
Medical Astrology - Medicine by The Stars
4 pages
A Low-Power, High-Speed DCT Architecture For Image Compression: Principle and Implementation
No ratings yet
A Low-Power, High-Speed DCT Architecture For Image Compression: Principle and Implementation
6 pages
Lecture 11: Discrete Cosine Transform
No ratings yet
Lecture 11: Discrete Cosine Transform
33 pages
The Discrete Cosine Transform
No ratings yet
The Discrete Cosine Transform
13 pages
On Discrete Cosine Transform: Jianqin Zhou
No ratings yet
On Discrete Cosine Transform: Jianqin Zhou
6 pages
Conjugate Beam Method SLU
No ratings yet
Conjugate Beam Method SLU
41 pages
DCT From Nptel
No ratings yet
DCT From Nptel
17 pages
DCT Haweel 17 2016
No ratings yet
DCT Haweel 17 2016
31 pages
Progress Report On Project Phase-1first Oral Review: Radix-2 DCT Algorithm
No ratings yet
Progress Report On Project Phase-1first Oral Review: Radix-2 DCT Algorithm
12 pages
Ctaacs12 Submission 19
No ratings yet
Ctaacs12 Submission 19
7 pages
Ict U3 LP12
No ratings yet
Ict U3 LP12
47 pages
Haweel2014 Ess
No ratings yet
Haweel2014 Ess
6 pages
Cosine Discret Demo
No ratings yet
Cosine Discret Demo
7 pages
DCT
No ratings yet
DCT
5 pages
Cintra Et Al (2014) - DCT Approximations Based On Integer Functions
No ratings yet
Cintra Et Al (2014) - DCT Approximations Based On Integer Functions
14 pages
Area and Power Efficient DCT Architecture For Image Compression
No ratings yet
Area and Power Efficient DCT Architecture For Image Compression
9 pages
Wwwwrtyyu FGDH
No ratings yet
Wwwwrtyyu FGDH
25 pages
VLSI Design of A Fast Pipelined 8x8 Discrete Cosin
No ratings yet
VLSI Design of A Fast Pipelined 8x8 Discrete Cosin
6 pages
Chapter 18
No ratings yet
Chapter 18
14 pages
Signal Processing: Image Communication: C.J. Tablada, T.L.T. Da Silveira, R.J. Cintra, F.M. Bayer
No ratings yet
Signal Processing: Image Communication: C.J. Tablada, T.L.T. Da Silveira, R.J. Cintra, F.M. Bayer
10 pages
Image Compression Using The Discrete Cosine Transform: Andrew B. Watson, NASA Ames Research Center
No ratings yet
Image Compression Using The Discrete Cosine Transform: Andrew B. Watson, NASA Ames Research Center
8 pages
Signal and Image Compression Using Quantum Discrete Cos 2019 Information Sci
No ratings yet
Signal and Image Compression Using Quantum Discrete Cos 2019 Information Sci
21 pages
Teach Eal My Atsu
No ratings yet
Teach Eal My Atsu
37 pages
DC T Explained
No ratings yet
DC T Explained
4 pages
Asic Based DCT2016
No ratings yet
Asic Based DCT2016
5 pages
DCT
No ratings yet
DCT
17 pages
Dctinfpga
No ratings yet
Dctinfpga
85 pages
Monsoon Theories
100% (1)
Monsoon Theories
14 pages
High-Efficiency and Low-Power Architectures For 2-D DCT and IDCT Based On CORDIC Rotation
No ratings yet
High-Efficiency and Low-Power Architectures For 2-D DCT and IDCT Based On CORDIC Rotation
6 pages
Review of Transforms: ECGR 6118 Computer Project: Transforms Student Name
No ratings yet
Review of Transforms: ECGR 6118 Computer Project: Transforms Student Name
25 pages
Application: The DCT and JPEG Image and Video Processing Dr. Anil Kokaram Anil - Kokaram@tcd - Ie
No ratings yet
Application: The DCT and JPEG Image and Video Processing Dr. Anil Kokaram Anil - Kokaram@tcd - Ie
24 pages
The FFT Artigo 2000
No ratings yet
The FFT Artigo 2000
5 pages
FPGA Based Implementation of 2D Discrete Cosine Transform Algorithm
No ratings yet
FPGA Based Implementation of 2D Discrete Cosine Transform Algorithm
13 pages
Fast Fourier Transform: XK Xne K N
No ratings yet
Fast Fourier Transform: XK Xne K N
44 pages
Essay Topics Grade 11
100% (2)
Essay Topics Grade 11
5 pages
Loeffler DCT
No ratings yet
Loeffler DCT
4 pages
A Fast Algorithm of The DCT and IDCT For VLSI Implementation
No ratings yet
A Fast Algorithm of The DCT and IDCT For VLSI Implementation
4 pages
Mini Project: Fpga Implementation of 2D DCT
No ratings yet
Mini Project: Fpga Implementation of 2D DCT
16 pages
Two Dimensional DCTIDCT Architecture 2001
No ratings yet
Two Dimensional DCTIDCT Architecture 2001
29 pages
Chapter 18
No ratings yet
Chapter 18
14 pages
DCT
No ratings yet
DCT
39 pages
Oracle Final Exam Semester 1
100% (1)
Oracle Final Exam Semester 1
22 pages
Fast Algorithm For DCT
No ratings yet
Fast Algorithm For DCT
20 pages
Understanding SAP EWM Wave
No ratings yet
Understanding SAP EWM Wave
8 pages
Econ2330 Ch09
No ratings yet
Econ2330 Ch09
65 pages
Wu Icip08
No ratings yet
Wu Icip08
4 pages
On Discrete Cosine Transform: Jianqin Zhou
No ratings yet
On Discrete Cosine Transform: Jianqin Zhou
6 pages
A First Introduction To P-Adic Numbers
No ratings yet
A First Introduction To P-Adic Numbers
6 pages
Quick Start Guide: Register Your Product and Get Support at
No ratings yet
Quick Start Guide: Register Your Product and Get Support at
6 pages
Packet Tracer Activity 3.5.1
No ratings yet
Packet Tracer Activity 3.5.1
2 pages
Kursus ICT Refresh Course Programme (ICTRCP) Tahun 2024 (Sesi 6)
No ratings yet
Kursus ICT Refresh Course Programme (ICTRCP) Tahun 2024 (Sesi 6)
32 pages
Lesson 4 (Computer Maintenance)
No ratings yet
Lesson 4 (Computer Maintenance)
4 pages
Mkt350 Final Report The Art of Potano
No ratings yet
Mkt350 Final Report The Art of Potano
30 pages
Sony Ericsson Product
No ratings yet
Sony Ericsson Product
34 pages
Data Security
No ratings yet
Data Security
13 pages
Sony KDL - 52s5100 Chasis Exr2
No ratings yet
Sony KDL - 52s5100 Chasis Exr2
104 pages
BROSURABFPLOFT20112
No ratings yet
BROSURABFPLOFT20112
6 pages
Nama Alat Dan Spesifikasi
No ratings yet
Nama Alat Dan Spesifikasi
128 pages
Telangana State Report 10-05-2022
No ratings yet
Telangana State Report 10-05-2022
34 pages
2022ce11566 Srijan Lab
No ratings yet
2022ce11566 Srijan Lab
9 pages
Accuriopress 6136 6136p 6120 - Additional Information - en - 3 1 0
No ratings yet
Accuriopress 6136 6136p 6120 - Additional Information - en - 3 1 0
60 pages
Think Pair Share Food Safety 2
No ratings yet
Think Pair Share Food Safety 2
4 pages
Update On Renewed Effort To Strengthen Routine Immunization
No ratings yet
Update On Renewed Effort To Strengthen Routine Immunization
49 pages
English Yr5 2015 Ms
No ratings yet
English Yr5 2015 Ms
9 pages
What We Do - MeisterKraft
No ratings yet
What We Do - MeisterKraft
1 page
Porsche Case Study
No ratings yet
Porsche Case Study
4 pages
New App-Karen Ortiz
No ratings yet
New App-Karen Ortiz
2 pages
Optimization Theory with Applications
From Everand
Optimization Theory with Applications
Donald A. Pierre
4/5 (4)
Worked Examples in Mechanical Vibrations using MATLAB
From Everand
Worked Examples in Mechanical Vibrations using MATLAB
Eric Okoth Ogur
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet
Loop-shaping Robust Control
From Everand
Loop-shaping Robust Control
Philippe Feyel
No ratings yet
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet

Fast Algorithm

Uploaded by

Fast Algorithm

Uploaded by

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 38, NO.

3, MARCH 1Y91 291

Fast Algorithm and Implementation of 2-D

S INCE D C T approaches the statistically optimal

0098-4094/91/0300-0297$01 .OO 01991 IEEE

N-l N-l (2i + 1)m ( 2 j + 1)n ( 2 j + 1) = 2 N - p ( 2 i + 1) mod 2 N (7.b)

for m , n = 1 , 1 , 2 ; . . , N - l . (4) i = 0 , 1 , 2 ; . . , N - l ) for p = 1 , 3 , 5 ; . . , N - l , the 1-D trans-

Ym, = 1/2(Am, + B m n ) . for p = 1,3,5;. N -1. (lO.b)

"mod") as Then, from (6), y,, can be rewritten as

RE=(Xjj(p;b):i = O , 1 , 2 , ' " , N - I ,

when n is even (17.a)

where the quotient sequences qp,'s for p = 1, 3, 5, and 7 are

(13.b) Also, substitution of the relation in (1l.b) into (15.b) leads to

N-1 (2i+l)(m -np)+2N(l+qp,)n

when n is even (1S.a)

(2i +l)m +(2j+l)n C Xij(p;a)COS

T,h(m,n) = x,/cos x (15.b)

Also, by substituting (1l.b) into (15.d), we have it can be seen that

I when n is odd. (20.b) correspond to one of k gpl for some I = 0,1,2; * ., N - 1.

But, it can be seen that

for some 1 = 0,1,2,. . .,N -1, then

(2i + l ) ( m - n( N - p ) ) (28 .b)

(2i + 1)( m - n( N - p))

Y52 = (f17 -f77) + ( f13 - f73)

Y34 = ' l 2 { ( f 1 7 + f77) '(fll +f71)

Y41= ' l 2 { ( g l 5 + g73) +( g l 3 - g7.5)

+ ( g 3 7 + g51) + ( g 3 , - g57) 1 (30.e)

-(g32 + g5d - ( g 3 4 - g 5 4 ) ) (30.4

Fig. 2. The signal flow graph for 4 X 4 DCT.

the I - D DCT’s, we may use any existing 1-D DCT algo-

Number of multiplications Number of additions

DCT processors. This is possible because the 1-D DCT x 00

You might also like