A Fast Algorithm For YCbCr To RGB Conversion

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

1490 IEEE Transactions on Consumer Electronics, Vol. 53, No.

4, NOVEMBER 2007

A Fast Algorithm for YCbCr to RGB Conversion


Yang Yang, Peng Yuhua, and Liu Zhaoguang

Abstract —YCbCr to RGB conversion with precision There are several conventional methods including hardware
reduction is requisite in many fixed-point applications, e.g. 24 and software method. Based on the comparison of them this
bits YCbCr data is converted to 16 bits RGB data for LCD study lays down a way to find the most economical expression
display. We propose the rules to select the optimal fixed-point at any given quantification error which is more appropriate for
shift and addition operations to take place of the float-point fixed-point DSPs. It also achieves satisfying results for other
multiplication operation, thus time consumption is reduced applications such as on general PC etc.
while the image quality maintains. In other word, it shows the
way to find the most economical expression based on the II. CONVENTIONAL METHODS
given quantification error. Comparison with the conventional
methods shows its validity. This algorithm is proposed based A. Conventional YCbCr to RGB Conversion Algorithms
on DSPs but is not restricted, and in the end of the paper the 1) Direct Method
validity of our algorithm on other platforms such as general The direct method is float-point library routine followed by
PC is also discussed and it shows a satisfying result. quantification according to matrix (1). Obviously, the
advantage is free of extra memory consumption while its
Index Terms —YCbCr, RGB, Conversion, Fixed DSPs
disadvantage is the abundant float-point multiplications.
I. INTRODUCTION As YCbCr to RGB conversion often have to be done on
fixed-point DSPs. For these embedded systems, the float-point
RGB, YUV and YCbCr [1] are common color models in use. operations are converted to fixed-point shift and addition
YUV and YCbCr are adopted in video codec and operations. For a 8 bit fixed-point routine, the float-point
transmission, while RGB is adopted for display, so the formula of R=Y+1.402× (Cr-128) is approximated by
conversion between them is unavoidable. R=Y+(Cr-128)>>2+(Cr-128)>>3+(Cr-128)>>6+(Cr-128)>>7
There are several different standards in use. Without lose of 2) Lookup Table Method
generality, we focus on the standard ITU-R Recommendation The Look-up Table (LUT) method is one of the high
BT.601-5[2], and according to this standard, the relationship efficient methods especially for embedded systems so other
of RGB and YCbCr is as follows, method often compares itself with LUT to show its efficiency
⎡Y ⎤ ⎡ 0.299 0.587 0.114 ⎤ ⎡ R ⎤ ⎡0 ⎤ [3]. There are two kinds of LUT methods for YCbCr to RGB
⎢Cb⎥ = ⎢ -0.169 -0.331 0.499 ⎥ × ⎢G⎥ + ⎢128⎥ conversion, i.e. once-check and twice-check LUT. For the 24
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢⎣Cr ⎥⎦ ⎢⎣ 0.499 -0.418 -0.0813⎥⎦ ⎢⎣ B ⎥⎦ ⎢⎣128⎥⎦ bits case, the once-check LUT needs no calculation, and just
checks the index table of 6×256×256×256=100663296 bytes
⎡R ⎤ ⎡ 1 0 1.402 ⎤ ⎡ Y ⎤
⎢ G ⎥ = ⎢ 1 -0.344 -0.714 ⎥ × ⎢ C b − 128 ⎥ (1) long, which turns to be impractical for embedded systems. For
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ twice-check LUT, there are two series corresponding to the
⎣⎢ B ⎦⎥ ⎣⎢ 1 1.772 0 ⎦⎥ ⎣⎢ C r − 128 ⎦⎥ higher and lower 4 bits of YCbCr and RGB data respectively
As the only difference between YCbCr and YUV is the dc and 6 tables each series so the index table is only
component 128 but the conversion matrix is the same, so we 2×6×16×16×16=49152 bytes. First, break the 8 bit data into
focus on YCbCr in the sequel. According to the standard [2], two parts of 4 bits data, and then check two tables index, and
the range of each component of YCbCr and RGB is in Table I. last add the results to get the final result. Compared with once-
check LUT, twice-check LUT needs extra blocking, shift, and
TABLE I addition operations so is relatively slower but the memory
COLOR SPACE OF YCBCR AND RGB occupancy is much smaller.
RGB YCBCR Compared with the direct method, LUT needs much less
calculation so it is a practical algorithm and more efficient
R G B Y Cb Cr
MAX 255 255 255 235 240 240 than any other software algorithm. However, its deadly
MIN 0 0 0 16 16 16 disadvantage is the table consumes memory greatly.
3) Hardware Method
This work was supported by Nature Science Foundation of Shandong This method uses operational amplifier and resistance
Province,China, under Grant Z2004G01 and also supported by scientific and network or more recently ASIC to realize the YCbCr to RGB
technological project of Shandong Province,China, 2005, under Grant conversion [5]. Compared with software algorithms, hardware
2005GG3201117
The authors are with the school of information science and engineering, method is rather fast in most cases as it just needs 1 or 2 clock
Shandong University, China. (Phone: 86-531-88361589; e-mail: yangyang@ time. The disadvantage is that it is not as convenient as
mail.sdu.edu.cn). software algorithm. There are many kinds of YCbCr and RGB
Contributed Paper
Manuscript received July 16, 2007 0098 3063/07/$20.00 © 2007 IEEE
Y. Yang et al.: A Fast Algorithm for YCbCr to RGB Conversion 1491

formats so it is rather difficult to find a universal resistance C. Example


network, meanwhile for ASIC it costs more and is more Take the commonly used 16 bit LCD as an example. The
difficult to be utilized especially on platforms like DSPs or PC YCbCr data is 24 bit, and bit numbers for R, G, and B
without changing the original circuits. components are 5, 6, and 5 respectively (denote it as R5, G6
III. PROPOSED FAST ALGORITHM FOR YCBCR TO RGB and B5 for expression convenience in the sequel). The steps of
CONVERSION our algorithm is as below,
Step 1) Determine that the precision of YCbCr data is Y8,
A. Analysis of YCbCr to RGB Conversion on DSPs U8, V8 and the precision of RGB data is R5, G6, B5
It is a basic method to convert the float-point operations to Step 2) Determine that the requested quantification errors is
fixed-point shift and addition operations. As there are usually 3 bits for R5, 2 bits for G6 and 3 bit for B5, namely the
many shift and addition combination corresponding to a float- permitted quantification step of R is 8, G is 4 and B is 8
point operations, we should find the most economical one at Step 3) Consider the formula
any given quantification error. R=Y+ (1.402+X) × (Cr-128);
If YCbCr data uses 8 bits each and the converted RGB data X is just the variable we are seeking for. The range of Cr is
still uses 8 bits, the quantification error should be within 0.5 16~240(See Table I), so the range of Cr -128 is -112~112.
bits. If the converted RGB data is less than 8 bits, the Take the max and min value -112 and 112 separately we get
quantification error permitted could be enlarger. For example, the equations below,
if the converted RGB is 6 bits, the quantification error could R=Y- (1.402+X) ×112;
be enlarger to 2 bits, the quantification step then is 4, namely R=Y+ (1.402+X) ×112;
(-2, 2). The RGB value after conversion is the same within all The permitted quantification step of R is 8, so the range of
the permitted range of quantification step. So it gives us the R is (-4, 4), X is the only variable above and we get
freedom to find a more efficient conversion formula. -4<112×X<4;
Fixed-point DSPs is most commonly used [6]. There are Namely,
two solutions when it carries on float-point multiplication. -0.035714<X<0.035714
One is to use assemble language. There are special-purpose Then we get the permitted range of the coefficient of
routine package for multiplication and it may obtain high component R is (1.402-0.035714, 1.402+0.035714), namely
efficiency in theory, but such is not the case in practice. One (1.366286, 1.437714)
problem is the formula (1) contains decimal so the data has to Step 4) Seek for the best expression in the above range,
be shifted to integer before and after the calculation. These As -112≤ Cr -128≤ 112;
shift operations greatly reduced the efficiency. The other
Cr -128 takes 7 bits so shifts operation cannot surpass 6
problem is the assemble language is boring to programmers
bits. Under these conditions, we seek a group of 6 bits in the
and the versatility is poor between platforms.
range of (1.366286, 1.437714) to find the result is: 1.375,
The other solution is to use the C language by compiler to
1.390625, 1.40625, 1.421875 and 1.4375 and their binary
realize the conversion between fixed-point and float-point
representations is as follows,
data automatically. The advantage is that the programming is
1.375=1+1/4+1/8
convenient, and the versatility is good so it is easier to be
1.390625=1+1/4+1/8+1/64
transplanted cross platforms. The disadvantage is the C
1.40625=1+1/4+1/8+1/32
compiler efficiency is not high enough.
1.421875=1+1/4+1/8+1/32+1/64
Based on the discussion above, we proposed a fast
1.4375=1+1/4+1/8+1/16
algorithm for YCbCr to RGB conversion on fixed-point DSPs.
In the above, the number with least shift and the addition is
It is written in C language and obtains more or the same
1.375; we can also get it as it has the least sum of 1.
efficiency as the assemble language and other method, and
Step 5) Convert 1.375 into shift and addition combination
time consumption is reduced while the precision is kept.
to get the final algorithm
B. Steps of Fast Algorithm As 1.375=1+1/4+1/8, namely 1.375=1+1>>2+1>>3
Step 1) Determine the precision of YCbCr and RGB data, So the final algorithm is
Step 2) Determine the requested quantification error and R=Y+ Cr + Cr>>2+ Cr >>3;
permitted quantification step, Cr =Cr1 -128; Cr1 is the original data from 16 to 240.
Step 3) According to the above as well as the maximum Apply this algorithm to the computation of G and B
value of input data to calculate the permitted range of YCbCr component to obtain a group of fast algorithm
coefficients of the conversion formula, R=Y+ Cr + Cr>>2+ Cr >>3;
Step 4) Find the best expression in the above range. In G=Y-(Cb>>2+Cb>>4+Cb>>5)-
other words, it means to find the number with the least shift (Cr>>1+Cr>>3+Cr>>4+Cr>>5);
and addition operation, B=Y+Cb>>1+Cb>>2;
Step 5) Convert this binary number into shift and addition Cr=Cr1-128; Cb=Cb1-128; Cr1, Cb1 are the original data
combination to get the final algorithm. with the range of 16-240.
1492 IEEE Transactions on Consumer Electronics, Vol. 53, No. 4, NOVEMBER 2007

Our algorithm is further illustrated in Fig.1. On the x axis with speed ordinary option direct float-point method needs
are input variable Cr values of 8 bits long near the max value 8,675 clock cycles and direct fixed-point method needs 57
of the input, and on the y axis are the output R values after the clock cycles, but our algorithm only needs 25 clock cycles,
conversion and the results without quantification is still 8 bits time consumption is only 1/347 and 1/2.28. The results of
long while results with quantification just keep the high 5 bits these methods are totally the same.
long and set the rest bits zero. From Fig.1 we get the As a conclusion, we compare the different methods utilized
conclusion that the results without quantification of direct on DSPs in Table III. Hardware method is most efficient but
float-point and our non-truncated method are slightly different the difficulty and costs are the highest. LUT method needs
as red curve and green curve shows. And the results with extra memory and the direct method is lack of efficiency.
quantification of direct float-point and our truncated method Through this table, we can get the conclusion that our
are the same as light blue curve and blue curve shows (in fact algorithm fits the application of DSPs well.
they are overlapped).
TABLE III
COMPARISON OF DIFFERENT ALGORITHMS
Time Difficulty Costs Extra Memory
Direct
Longest Low Low Small
way
LUT Small High Low Big
Hardware Smallest Highest High None
Our Small Low Low Small

B. On General PC
As general PC is a convenient platform to simulate other
Fig.1. The quantification curve of the input and output data platforms so we also do the experiment on general PC in the
case that Y8Cb8Cr8 convert to R8G8B8. According to the
IV. PERFORMANCE ANALYSIS
proposed algorithm we obtained the formula as below
A. On DSPs R = Y + Cr + Cr >> 2 + Cr >> 3 + Cr >> 5;
We compare the direct method, Twice-Check Lookup Table G = Y − (Cb >> 2 + Cb >> 4 + Cb >> 5)
method and our algorithm in the case that Y8Cb8Cr8 convert − (Cr >> 1 + Cr >> 3 + Cr >> 4 + Cr >> 5);
to R5G6B5 on different DSPs. B = Y + Cb >> 1 + Cb >> 2 + Cb >> 6;
1) On OMAP5910
We carry on the test on several video sequences and the
We do experiment on TI OMAP5910 [7] with CCS2.20
results are shown in Table IV. We define the error as the
using C language. The result of different optimize level of C
difference between our methods and direct float-point method.
compiler shows in table II
From the table we can get the conclusion that the biggest error
TABLE II
COMPARISONS OF SOFTWARE ALGORITHMS ON OMAP5910 is just 2 and not more than 0.02% in these standard sequences.
Optimized Our algorithm
Twice- Direct Direct It is acceptable so this algorithm is also fit for general PC etc.
Check LUT float-point fixed-point
level (clock)
(clock) (clock) (clock) TABLE IV
O1 39 32 1404 95 ERROR DISTRUBITION OF OUR ALGORITHMS ON GENERAL PC
O2 39 29 1404 95 Video Test Errors=0 Errors=1 Errors=2 Errors>2
O3 29 25 1357 68 format Sequence (%) (%) (%) (%)
qcif foreman 97.2284 2.76071 0.01078493 0
4:2:0 hall 96.3518 3.64587 0.00230166 0
Using highest priority O3, direct float-point method needs 300f container 96.3518 3.64587 0.00230166 0
1,357 clock cycles and direct fixed-point method needs 68 cif foreman 97.0006 2.98379 0.01551649 0
clock cycles, but our algorithm only needs 29 clock cycles, 4:2:0 hall 95.9145 4.08206 0.00341632 0
time consumption is only 1/46.79 and 1/2.3448. The 300f container 97.73047 2.25783 0.01169573 0
efficiency of our algorithm is almost the same with Twice- Condition P4 2.6G, 512M DDRAM, Windows 2000 professional, VC++6.0
Check Lookup Table, but saves the massive memories.

2) On DM642 V. CONCLUSION
We also do the experiment on another multimedia processor In summary, the new algorithm has very high superiority on
DM642 [8]. Using O3 optimization level with speed first fixed-point DSPs as OMAP5910 and DM642 platforms and
option direct float-point method needs 7,417 clock cycles and also can be expanded to other embedded systems. The new
direct fixed-point method needs 51 clock cycles, but our algorithm allows the programmer to use C language to achieve
algorithm only needs 22 clock cycles, time consumption is the same efficiency of assembly language while guarantee the
only 1/337.14 and 1/2.3182; and using O3 optimization level correctness of the result.
Y. Yang et al.: A Fast Algorithm for YCbCr to RGB Conversion 1493

Although this method is faced to the application of DSPs Yang Yang received the B.S. degrees, in 2004 from
Shandong University, Jinan, China. He is now pursuing
and embedded system, the application is not restricted to them the Ph.D. degree with the School of Information Science
as we shown in general PC. Generally speaking, it is suitable and Engineering, Shandong University. His current
for YCbCr to RGB color conversion which requires harsh research interests include digital image processing and
memory and unsuitable to use hardware methods. Digital Filter design
.

REFERENCES
Peng Yuhua received a B.Eng. degree in Department of
[1] Poynton, Charles A., Digital video and HDTV: algorithms and
Information and Control Engineering, Xi’an Jiaotong
interfaces San Francisco, Amsterdam, Boston: Morgan Kaufmann
University in 1988, and received the M.D. and Ph.D.
Publishers, 2003, pp.313-321.
degree in the School of Information at the same
[2] ITU-R Recommendation BT.601-5: Studio encoding parameters of
University in 1991 and 1994 respectively.
digital television for standard 4:3 and wide-screen 16:9 aspect ratios
From Mar. 2001 to Mar. In 2002, she worked as a
[3] Webb, J.L.H., “Efficient table access for reversible variable-length
visiting professor for one year,in the Faculty of Design
decoding”; IEEE Trans. Circuits and Systems for Video Technology,
and Technology in University of Central Lancashire,United Kingdom, and
Vol. 11, Issue 8, Aug. 2001, pp.981 – 985.
took part in a project of EEC 4th Framework in the Industrial and Materials
[4] Tang Guowei, Li Hong, Li Jinghui,YUV Video Player based on Table-
Technologies Program. She has been a professor of Shandong University
Searching Method ,Journal of Daqing Petroleum Institute, China,
since 1999, in School of Information Science and Engineering, Shandong
Vol.29,No.2 ,2005
University. Her research interests include: Multi-scale Analysis, Image
[5] Rui Wang, VLSI Design of Universal Color Conversion Circuit, in Proc.
Processing, Wavelet Transform and its Applications in Engineering, Digital
IEEE, Radio Science, Aug 2004, pp. 269 – 272.
Signal Processing and Analysis, Biomedical Image Processing, Microwave
[6] Inacio, C. Ombres, D. The DSP decision: fixed point or floating? IEEE,
Engineering, video encoding and decoding etc
Spectrum, Sept. 1996 Vol. 33, Issue. 9, pp. 72 – 74
[7] Texas Instruments, OMAP1510 Multimedia Processor Technical
Reference Manual, 2002
[8] Texas Instruments. TMS320DM642 Video/Imaging Fixed-Point Digital Liu Zhaoguang received B.Eng. and M.Phil. degrees in
Signal Processor Data Manual, 2003 1998 and 2001, respectively from the Xi’an University of
technology. He is currently pursuing the Ph.D degree in
the Shandong University. His research interests include
scalable video coding, and video transcoding.

You might also like