LIEN Chun-hung, LAI Eugene, and CHANG Wen-ching
(Dept. of Electrical Engineering, Tamkang University Taipei Chinese 25137)
Fast Computing Scheme of DCT Coefficients for Image Processing
LIEN Chun-hung, LAI Eugene, and CHANG Wen-ching
(Dept. of Electrical Engineering, Tamkang University Taipei Chinese 25137)
Because of the importance of the discrete cosine transform (DCT)in the field of image processing,various algorithms and architectures for 2-D DCT processor design have been proposed. In this paper, a novel fast computing mechanism for 2-D 8×8 DCT and quantization for JPEG or MPEG codec is presented. By an effective judging mechanism for an 8×8 image block, the algorithm will adjust DCT computing time depending on the distribution of an image block. This algorithm costs a few adders, shifters and comparators, but it reduces significantly the number of DCT computing times which dominates the computing performance of image processing. The simulation result shows that the proposed algorithm could save more DCT calculation than conventional DCT and integer DCT, and when quantization parameter is large, such as 32, the performance of proposed algorithm is better than that of small one.
discrete cosine transform; fast computing scheme; image processing; 2-D DCT
The discrete cosine transform (DCT)plays an important role in many image and video processing applications[1], especially in the mobile environment where the limited computing power or battery supply is a significant concern. In mobile or portable applications, the cost of multiplications is typically much more expensive than other operations such as additions, subtractions and binary shifts.
There are many papers discussing the fast implementation of the 1D and 2-D DCT. One category of papers use the method based on the decomposition[2-5].Another group of papers use the so-called direct method which has lower computation complexity but leads to slightly irregular structures[6-9]. Recently,several systolic implementations of DCT processors have been proposed[10-11]. These architectures have a high throughput rate and parallel output.
In this paper, we will present a new algorithm and its corresponding architecture with low arithmetic hardware complexity for fast computation of 2-D 8×8 DCT. The kernel architecture only consists of several adders, shifters and comparators.
Figure 1 Block diagram of an image encoder
The DCT is a transform that can reduce the spatial redundancy and is known to have better energy compaction performance than other transforms[1]. The definition of one dimensional DCT is shown below as Eq. (1)[12]:
The inputted 8×8 pixel blocks are transformed by 2-D DCT to generate 8×8 DCT coefficients, which are quantized for compression. The block diagram is shown in Fig. 1. If we define x(n,m), 0≤n≤7,0 ≤ m ≤ 7 , as pixel-values in an 8×8 pixel block before the DCT, the 2-D 8×8 DCT coefficients F(k,l),0≤k≤7,0≤l≤7, can be computed by Eq. (3).
After the DCT, the DCT coefficients are quantized. These quantized DCT coefficients are scanned in a zig-zag scanning order, as shown in Fig. 2.The zig-zag scan converts the 2-D 8×8 DCT coefficients into a one-dimensional sequence in an approximately ascending spatial frequency order. Since many high-frequency DCT coefficients will be quantized to zeros, there is usually a long stream of zeros at the end of the sequence. This is represented by an end-of-block (EOB)symbol after the last nonzero coefficient to indicate that after this position, all the DCT coefficients in the block are zeros. Fig. 3 shows the distribution of EOB for different quantization parameters. From Fig. 3, we can see that the EOB position decreases when the quantization parameter increases because more DCT coefficients are quantized to zero.
Figure 2 Zig-zag scanning order
This result shows that when the step size of quantization is large, we can be more aggressive in setting the high-frequency DCT coefficients to zeros.
Figure 3 Distribution of EOB for Lena
Depending on the characteristic of the EOB location, we can estimate the EOB location in advance by judging the complexity of input image. If we know the EOB location in advance, we can just calculate several DCT coefficients and neglect high frequency part. The steps shown below exhibit how our proposed algorithm saves the computational effort in DCT.
(3)After the SAD of an 8x8 block is found,compare it with each threshold value to determine which interval the block belongs. The different interval represents the different pixel distribution of an 8x8 block, and the lower threshold value represents the smoother distribution of a block.
(4)From Fig. 4, by judging which interval the block belongs to, the number of calculation times of DCT and quantization can be determined.
Figure 4 Block diagram of the proposed algorithm
If the block is with low complexity, we may just calculate some DCT coefficients, but if the block is with high complexity, we should calculate most of the DCT coefficients by the judgment of our algorithm.
In this paper, 3 quantization values are used for testing the proposed algorithm, and the values are 8, 16 and 32, respectively. In order to evaluate the performance of the proposed algorithm, we use the accumulated times of addition/subtraction,multiplication and shift/round. To compare with other operations of DCT and quantization, the times of multiplication must be particularly mentioned due to its high complexity and long processing time. We use the integer DCT proposed by Ref. [13], which plays an important role in H.263, for the comparison. This integer DCT already reduces much computing complexity from the conventional DCT.
PSNR is also adopted for determining the loss of image quality since the image quality may be reduced in the proposed algorithm, where PSNR is defined as Eq (7).
Table 1 Simulation result of Lena with quantization parameter of 8
where MSE(mean square error)is shown below:
where Xiis the original pixel and Xi' is the pixel after processed by the proposed algorithm.
The PSNR results of Conventional DCT, Integer DCT and Proposed Algorithm are 36.752 dB, 36.901 dB and 36.898 dB respectively.
Table 1 is the simulation result of several algorithms under quantization parameter of 8 and the sample image is the commonly used Lena. It shows that our proposed algorithm saves most in multiplication among these 3 algorithms.
If we use larger quantization parameter, such as 16 or 32, as shown in Fig. 5, it is obvious that our algorithm saves more computing power in multiplication. From the simulation result of PSNR, we also find that the image quality does not degrade significantly. Fig. 6 shows the percentage of saved multiplication with 3 different quantization parameters of 8, 16 and 32 by using the commonly used 4 samples.
Figure 6 Percentage of saved multiplication vs.3 quantization parameters
When quantization parameter is large, the proposed algorithm can save more in multiplication. In addition to the quantization parameter, the distribution of the image also affects the performance of the proposed algorithm. If the content of an image is complicated, the proposed algorithm may save less in multiplication than the simple one.
In this paper, we propose a new algorithm to save the computation in DCT and quantization. Based on the simulation result, the proposed algorithm can substantially reduce the calculation complexity of 2-D 8x8 DCT. The performance of the proposed algorithm is also concerned with the content of image sample.
Besides, based on the simulation result, the degradation of image quality in this algorithm is negligible. But if the image quality is an issue, we can easily adjust the threshold value for obtaining high output image quality. The architecture of the proposed algorithm is effective and practical by using adder,shifter and comparator.
[1]RAO K R, YIP P. Discrete Cosine Transform- Algorithms,Advantage, Applications[M]. New York: Academic Press,1990.
[2]AHMED N. NATARAJSN T, Rao K R. Discrete cosine transform[J]. IEEE Trans Computers, 1974, 23(1): 90-93.
[3]LI W. A new algorithm to compute the DCT and its inverse.IEEE Trans Signal Processing, 1991, 39(6): 1305-1313.
[4]CHAITALI C, JOSEPH J. Systolic architectures for the computation of the discrete Hartley and the discrete cosine transforms based on prime factor decomposition[J]. IEEE Trans Computers, 1990, 39(11): 1359-1368.
[5]SHING C C, KA L H. Fast algorithms for computing the discrete cosine transform[J]. IEEE Trans Circ Syst, 1993,4 4:185-190.
[6]LIU K J R, CHIU C T, KOLAGOTAL K. Optimal unified architecture for the real-time computation of time-recursive discrete sinusoidal transforms[J]. IEEE Trans Circ Syst for Video Tech, 1994, 4(2): 168-180.
[7]CHO N I, LEE S U. Fast algorithm and implementation of 2-D discrete cosine transform[J]. IEEE Trans Circuits System. 1991, 38: 297-305.
[8]DUHAMEL P, GUILLEMOT C. Polynomial transform computation of 2-D DCT[C]//Proc Int Conf Acoustics,Speech, Signal Processing. Albuquerque, NM, USA: IEEE,1990: 1515-1518.
[9]WU H R, PAOLOUI F J. A two-dimensional fast cosine transform algorithm based on Hou’s approach[J]. IEEE Trans Signal Processing, 1991, 39(2): 544-546.
[10]WEIZHEN M. A novel systolic array implementation of DCT DWT and DFT[C]// IEEE Region 10 Conference on Computer and Communication Systems. Guangzhou. IEEE,1990: 211-215.
[11]CHANG L W, WU M C. A unified systolic array for discrete cosine and sine transform[J]. IEEE Trans Signal Processing, 1991, 39: 192-194.
[12]DIMITROV V S, JULLIEN G A, MILLER W C , A new DCT algorithm based on encoding algebraic integers[C]//Proceedings of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, [S.l]: [s.n], 1998:1377-1380.
[13]SHIN Kuei-tsong, TSAI Chia-yang, HANG Hsueh-min.Real-time implementation of H.263+ using TI TMS320C6201 digital signal processor[C]//Proceedings of the 2003 International Symposium on Circuits and Systems,[S.l]: IEEE, 2003, 2: 900-903.
編 輯 張 俊
2009- 03- 24;
2010- 01- 26
連俊宏(1976- ),男,博士,主要從事影像壓縮及影像處理相關(guān)算法、架構(gòu)及軟硬件設(shè)計方面的研究.
針對影像處理的快速二維離散余弦轉(zhuǎn)換算法
連俊宏,賴友仁,張文清
(淡江大學(xué)電機(jī)工程學(xué)系 中國 臺北 25137)
由于離散余弦轉(zhuǎn)換在影像處理領(lǐng)域之重要性與日俱增,且消耗許多處理器運(yùn)算時間,所以眾多快速二維離散余弦轉(zhuǎn)換算法不斷被發(fā)表。該文提出一個應(yīng)用于JPEG及MPEG圖像處理的快速二維8×8離散余弦轉(zhuǎn)換算法,該算法主要運(yùn)用基本的累加及移位運(yùn)算,快速評估8×8影像區(qū)塊的復(fù)雜程度,可調(diào)整離散余弦轉(zhuǎn)換參數(shù)的計算數(shù)量。該算法只需花費(fèi)少量硬件成本,如比較器、加法器、移位器,便可有效降低離散余弦轉(zhuǎn)換運(yùn)算時間,且在模擬結(jié)果顯示所提出的算法與傳統(tǒng)及整數(shù)離散余弦轉(zhuǎn)換相比,可達(dá)到較快的運(yùn)算速度,且在量化系數(shù)較大的情況下,可得到更好效果。
散余弦轉(zhuǎn)換; 快速轉(zhuǎn)換算法; 影像處理; 二維離散余弦轉(zhuǎn)換
TN301.6
A
10.3969/j.issn.1001-0548.2010.05.010
Figure 5 Percentage of saved multiplication of integer DCT and proposed algorithm when using Lena
date:2009 – 03 – 24;Revised date:2010 – 01 – 26
Biofraphy:LIEN Chun-hung was born in 1976, male, doctor, his research interests include algorithms, architecture, and software/hardware co-design of video compression and video processing systems.