Fast Algorithm
Fast Algorithm
Abstract -In this paper, a new algorithm for the fast computation of a for the larger size transforms, the 2-D recursive technique
2-D discrete cosine transform (DCT) is presented. It is shown that the employing the 4X 4 DCT [17] requires more multiplicationo
N X N DCT, where N = 2m, can be computed using only N 1-D DCT's than those of [15] and [161.
and additions, instead of using 2 N 1-D DCT's, as in the conventional
row-column approach. Hence the total number of multiplications for In this paper, a new fast 2-D DCT algorithm, which may
the proposed algorithm is only half of that required for the row-column be viewed as a modification and generalization of the 4 X 4
approach, and is also less than that of most of other fast algorithms, DCT [17], is proposed. It will be shown that the proposed
while the number of additions is almost comparable to that of others. It algorithm requires only N 1-D DCT's for the computation of
is also shown that only N / 2 1-D DCT modules are required for the N X N DCT, where N = 2". Hence the number of
hardware parallel implementation of the proposed algorithm. Thus the
number of actual multipliers being used is only a quarter of that multiplications required for the proposed algorithm is only
required for the conventional approach. half of that required for the conventional approach, which is,
in fact, the same number of multiplications as reported in
[16]. However, as compared with Duhamel's algorithm [16],
the proposed algorithm has advantage in that the computa-
I. INTRODUCTION tion structure is highly regular and systematic, and only real
298 IEEE TRANSACTIONSON CIRCUITS AND SYSTEMS, VOL. 38, NO. 3, MARCH 1991
where +
condition, we can see that ( 2j 1) should be either a multi-
+
ple of (2i 1) modulo 2 N or 2 N minus a multiple of (2i 1) +
u( m ) =
i::". m=O,
otherwise.
mudulo 2 N , i.e.,
( 2 j + l ) = p ( 2 i + l ) mod2N (7.a)
We will neglect the scale factor 4 u ( m ) u ( n ) / N 2 for conve-
nience. Then, let us define a denormalized form of y,, as or
j ( P ; a ) = p i + ( p - 1 ) / 2 mod N I . (l0.a)
N-1 N - l (2i+I)m-(2j+I)n
B,, = xijcos T (5.b)
i=o j - 0 2N R:={Xjj(p;b): i = OJ,2,. ' ' 7 N -1,
y,, can be rewritten as j(p;b)=N-l-pi-(p-1)/2 mod N } ,
Now, we shall show that A,, and B,, can be expressed in However, for the proof of which we are in pursuit, it is
terms of N 1-D DCT's by some data ordering and manipula- necessary to know the exact result of pi + ( p - 1)/2 divided
tions, SO that the N x N DCT can be obtained from N by N , while only the remainder of the division can be
separate I-D DCT's. It is noted that the condition for the perceived from (10). In other words, we need to know the
kernels of the transforms in (5) to be equivalent to that of quotient of the division as well as the remainder. Hence, by
+ +
1-D DCT's is that {(2i 1)m +(2j 1)n) should be expressed introducing a new integer sequence qpi, which is a quotient
as (2i+1) multiplied by some integer. To satisfy such a of pi + ( p - 1)/2 divided by N , we can rewrite (10) (without
CHO AND LEE: FAST ALGORITHM 299
C Xij(p:a)COS
i=O 2N
x,
i
=
R; = ('03, '12, '21, '30, '47, '567 x65, '74) (12.g)
(2i+l)(m+np)
R$ = {'M? '15, '267 x 4 0 , '51 x62 ? '731 ( 12'h) c(
N-l
r=O
- l)Y""l/(p,apS 2N
n-?
I
N-1 (2i+l)(m-pn)
'ij(p;b)'OS 77,
i =0 2N
(2i+l)m+(2j+l)n
t when n is odd. (18.b)
Tp"(m,n)= XI/ cos n- (15.a) In the same way, by substituting (1l.a) into (15.c), we have
X,, E R;
2N
S;(m,n)=
x#J E R:
x,,cos
2N
n- (15.c)
(2i + 1 ) (m - np)
=TT,
2N
(2i+I)m-(2j+l)n
S,b(m,n)=
XtJ E R,h,
xijcos
2N
77. (15.d)
I when n is odd. (19.b)
-m-
I
300 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 38, NO. 3, MARCH 1991
I when n is odd.
and
N-l
(2i + 1)(m - n p )
C
i=O
(Xij(p;a)+ Xij(p;b))COS
2N
T (22.b)
In the above example, addition operation in terms of fpl's
correspond to one of 1-D DCT's of data sequence ( x ~ ~ ( ~ ; ~and +
, gpl's for computing ymn's looks complicated. However,
we shall show that the addition operation can be imple-
Xj,(p;b)} depending on m and n. That is, by defining
mented by butterfly stages as in ordinary fast algorithms for
the discrete Fourier transform (DFT) and DCT. Now, in the
N-1 (2i 1)l +
case n is even, it can be seen that
fpl =
i=O
( X i j ( p ; u ) + Xij(p;b))cos rr (23) 7
(2i f 1)(m - n( N - p ) ) (2i + I ) ( m + n p )
cos rr = &cos
we can see that (22.a) and (22.b) correspond to one of f p I 2N + 2N
or - fpl for some 1 = 0,1,2; . ., N - 1. In the case of (21.b),
(27)
by defining
which implies that if
CHO AND LEE: FAST ALGORITHM
(2i+l)(m+np)
' cos 77- = g,l (29.a)
2N
then
N-l
( - l)qP'(Xij(N-p;a) - xij(N-p;b))
i=O
+ ( f 3 3 + f53)) (30.a)
Y 26 = ( + O) + ( f 1 4 - f 7 4
- ( f 3 4 - f 5 4 ) - ( f 3 0 + f50)) (30.d)
Y03=1/2{(g13- g75)+(g13-g75)
- ( g 3 7 + g5l) - ( g 3 7 + g d ) ( 30 .f)
y35=1/2{(0+ g70) +(gl2 + g76)
3L2 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, VOL. 38, NO. 3, MARCH 1991
xoo
XI1
x22
x33
‘03
XI2
x21
x30
‘01
XI0
‘23
x3Z
xo2
XI3
xzo
31
nl-- T
CHO AND LEE: FAST ALGORITHM 303
input
(C)
111put
Y 11
YO1
Y77
Y67
Y57
Y47
Y37
Y27
Y17
Y45
Y55
YS5
Y5l
Y4l
Y3l
Y2l
Yl5
Y25
Y35
Ya3
Y3a
Y23
Y13
Yo3
Y75
Y61
Y7l
Yo7
Y53
Y63
Y73
YO 5
Fig. 3. The signal flow graph for 8 x 8 IDCT. (a) Signal flow graph from ymn to f,,, where n is even. (b) Signal flow graph
from ymn to g,!, where n is odd. (c) Signal flow graph from f,, and g,, to x,,.
TABLE I
COMPARISON
OF THE NUMBER
OF MULTIPLICATIONS
AND ADDITIONS
~~~
Conven- Conven-
tional Other fast algorithms Proposed tional Other fast algorithms Proposed
4x4
8x8
16x16
32x32
algorithm
32
192
1024
5120
[14]
24
144
768
3840
[I51
16
104
568
2840
[I61
16
96
512
2560
algorithm
512
2560
i: 1 algorithm
72
464
2592
13376
[14]
72
464
[15]
70
462
[I61
68
484
2592 2558 2531
13376 12950 12578
algorithm
74
466
2530
12738
o u t p lI t 111put shall show that N / 2 1-D DCT modules are sufficient by the
XOO use of multiplexers and demultiplexers. In Fig. l(a), it is seen
XlI
that the results for f,, and gpl remain to be the same even if
3
x22
03
the order of addition and 1-D DCT operation is reversed.
XI2 Hence the signal flow graph in Fig. l(a) is equivalent to Fig.
XZI
x30 5, in which the order of addition and 1-D DCT operation is
xo1 reversed for the data sets Rt;, R!, Rq, and R$. Now, by using
XI0
3 3 the multiplexers and demultiplexers as shown in Fig. 6, we
'32
x02 can reduce the number of 1-D DCT modules to N / 2 . That
'13
x20
is, in the upper part of Fig. 6, while the additions for R: and
x31 R! are in progress, 1-D DCT's for R! and Rb can be started.
Fig. 4. The signal flow graph for the 4 x 4 IDCT. Then, the results of additions for Rf and R f a r e sent to 1-D
304 IEEE TRANSACTIONS O N CIRCUITS AND SYSTEMS, VOL. 38, NO. 3, MARCH 1991
DCT
-
1- D
DCT
DCT
-
-
1- D
DCT
x ; w - - - -%G -
Fig. 5. Alternate signal flow graph of Fig. ](a). Fig. 6. Implementation of Fig. l(a) with multiplexers and demulti-
plexers.
the total number of multiplications required for the N X N [Ill J. Makhoul, “A fast cosine transform in one and two dimen-
DCT is N times that for the I - D DCT, which is only half of sions,” IEEE Trans. Acoust., Speech, Signal Processing, vol.
ASSP-28, pp. 27-34, Feb. 1980.
that required for the conventional row-column approach. [12] F. A. Kamangar and K. R. Rao, “Fast algorithms for the 2-D
Hence the number of multiplication is the same as that of discrete cosine transform,” IEEE Trans. Comput., vol. C-31,
previously reported algorithm [16], which is known to be the pp. 899-906, Sept. 1982.
best in terms of the number of multiplications, while the [ 131 M. A. Haque, “A two-dimensional fast cosine transform,”
IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-33, pp.
number of additions is comparable to others. However, the 1532-1539, Dec. 1985.
proposed algorithm has advantages that it has regular and [14] C. Ma, “A fast recursive two dimensional cosine transform,”
systematic structure, and requires only real arithmetic, while Intelligent Robots and Computer Vision: Seventh in a Series,
the algorithm in [16] requires complex arithmetic. Also, in David P. Casasent, Ed., in Proc. SPIE 1002, pp. 541-548, 1988.
this paper, for the purpose of reducing the hardware com- [15] M. Vetterli, “Fast 2-D discrete cosine transform,” in Proc.
ICASSP’85, Mar. 1985.
plexity for the parallel implementation, an alternative scheme [16] P. Duhamel and C. Guillemot, “Polynomial transform compu-
is described with slight increase in time complexity. The tation of 2-D DCT,” in Proc. ICASSP ’90,pp. 1515-1518, Apr.
proposed scheme requires N / 2 I - D DCT modules, while 1990.
the direct implementation requires N 1-D DCT modules. [17] N. I. Cho and S. U. Lee, “A fast 4 x 4 DCT algorithm for the
Since there are always trade-offs between the chip area and recursive 2-D DCT,” ZEEE Trans. Acoust., Speech, Signal
Processing, submitted for publication.
computation time, it is very difficult to compare the perfor- [18] M.-T. Sun, T.-C. Chen, and A. M. Gottlieb, “VWI implemen-
mance of the implementation of the fast algorithms. How- tation of a 16x 16 discrete cosine transform,” IEEE Trans.
ever, considering only the hardware complexity, the pro- Circuits Syst., vol. 36, pp. 610-617, Apr. 1989.
posed algorithm is advantageous in that it requires very small
number of multiplications and has regular and systematic
structure compared to other fast algorithms.
Finally, another important aspect in a VLSI implementa-
tion is the precision of the algorithm, i.e., the amount of
errors due to the fixed-point implementation. This problem
is currently under investigation. Nam Ik Cho (S’86) received the B.S. and
M.S. degrees from Seoul National University,
REFERENCES Seoul, Korea, in 1986 and 1988, respectively,
in control and instrumentation engineering.
[I] N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine He is currently working toward the Ph.D.
degree at Seoul National University.
transform,” IEEE Trans. Commun., vol. COM-23, pp. 90-93, His research interest is in digital signal
Jan. 1974. processing, including adapative filtering and
[2] N. Ahmed and K. R. Rao, Orthogonal Transformy for Digital VLSI implementation.
Signal Processing. New York: Springer-Verlag, 1975.
[3] W. H. Chen, C. H. Smith, and S. C. Fralick, “A fast computa-
tional algorithm for discrete cosine transform,” IEEE Trans.
Commun., vol. COM-25, pp. 1004-1009, Nov. 1977.
[4] M. J. Narashimha and A. M. Peterson, “On the computation of
the discrete cosine transform,” IEEE Trans. Commun., vol.
COM-26, pp. 934-936, June 1978.
[51 M. D. Wagh and H. Ganesh, “A new algorithm for the discrete
cosine transform of arbitrary number of points,’’ IEEE Trans
Comput., vol. C-29, pp. 269-277, Apr. 1980.
[61 B. G. Lee, “A new algorithm to compute the discrete cosine Sang Uk Lee (S’75-M’79) received the B.S.
transform,” IEEE Trans. Acoust ., Speech, Signal Processing, degree from Seoul National University in
vol. ASSP-32, pp. 1243-1245, Dec. 1984 1973, the M.S. degree from Iowa State Uni-
[7] H. S. Hou, “A fast recursive algorithms for computing the versity in 1976, and the Ph.D. degree from
discrete cosine transform,” IEEE Trans. Acoust., Speech, Sig- the University of Southern California, Los
nul Processing, vol. ASSP-35, pp. 1455-1461, Oct. 1987. Angeles, in 1980, all in electrical engineering.
[81 N. I. Cho and S. U. Lee, “DCT algorithms for VLSI parallel In 1980, he was with General Electric,
implementation,” IEEE Trans. Acowt., Speech, Signal Process- Lynchburg, VA, and in 1981 he joined the
ing, vol. 38, pp. 121-127, Jan. 1990. M/A-COM Research Center, Rockville, MD.
[91 M. Vetterli and H. Nussbaumer, “Simple FFT and DCT algo- He is now with the Department of Control
rithms with reduced number of operations,” Signal Process, vol. and Instrumentation at Seoul National Uni-
6, pp. 267-278, Aug. 1984 versity, where he is an Associate Professor. His current research
[lo] P Duhamel and H. H’Mida, “New 2“ DCT algorithms suitable interests are in the areas of image and speech signal processing,
for VLSI implementation,” in Proc. ICASSP’87, pp. 1805-1808, including VLSI and neural computing.
1987. Dr. Lee is a member of Phi Kappa Phi.