A CCITT compatible coding algorithm for digital recording of moving images

15
Signal Processing: Image Communication 2 (1990) 155-169 155 Elsevier A CCIIT COMPATIBLE CODING ALGORITHM FOR DIGITAL RECORDING OF MOVING IMAGES F. PEREIRA Instituto Superior T~cnico, Lisbon, Portugal L. CONTIN and M. QUAGLIA Centro Studi e Laboratori Telecomunicazioni, Via Guglielmo Reiss Romoli, 274, 1-10148 Torino, Italy P. DELICATI universit6 La Sapienza, Roma, Italy Received 22 January 1990 Revised 19 April 1990 Abstract. This paper describes the work carried out in CSELT with the aim of providing a sensible solution to the problem of recording moving images on digital storage media. The starting point for the proposed algorithm has been the CCITT H.261 ReferenceModel, future standard for synchronous transmissions at p × 64 kbit/s (p = 1,..., 30). The algorithm provides all the facilities required for recording purposes. Keywords. Digital storage media, recording facilities, CCITT H.261 algorithm. 1. Introduction Real video images and sound are powerful tools to provide a more natural interface when present- ing many different kinds of information in applica- tions such as education, training and maintenance, entertainment, advertising and so on. Publishers and advertisers are ready to prepare multimedia documents exploiting all the possibilities of the multimedia concept; since late 1987, when the David Sarnoff Research Center presented its implementation of a universal all-digital medium, the expectation of low-cost, compatible, high- performant systems has increased every day. The obstacle towards reaching the third require- ment is still represented by the moving video images: CD is certainly one of the most important physical media that experts have in mind as widely spread support for multimedia interactive docu- ments and they just allow a throughput around 0923-5965/90/$03.50 © 1990- Elsevier Science Publishers B.V. 1.2 Mbit/s. At such a bitrate the known algorithms for moving image coding do not provide yet, in all shooting conditions, full satisfactory image quality. It must, moreover, be taken into account that the coding algorithm must include the possi- bility of accommodating features as reverse play- back, random access, etc., already present in nor- mal analogue recording devices, and possibly some new ones. Concerning the cost of the equipments, low prices are possible when mass production is reached and different applications may make use of the same integrated components which can be found on the market. To fulfil the described requirements, together with the appealing perspec- tive of the compatibility issue, ISO has started in the MPEG subgroup the work for the definition of the standard coding algorithm to be used for recording moving images on digital storage media.

Transcript of A CCITT compatible coding algorithm for digital recording of moving images

Signal Processing: Image Communication 2 (1990) 155-169 155 Elsevier

A C C I I T C O M P A T I B L E C O D I N G A L G O R I T H M F O R D I G I T A L R E C O R D I N G

O F M O V I N G I M A G E S

F. PEREIRA

Instituto Superior T~cnico, Lisbon, Portugal

L. C O N T I N and M. QUAGLIA

Centro Studi e Laboratori Telecomunicazioni, Via Guglielmo Reiss Romoli, 274, 1-10148 Torino, Italy

P. DELICATI

universit6 La Sapienza, Roma, Italy

Received 22 January 1990 Revised 19 April 1990

Abstract. This paper describes the work carried out in CSELT with the aim of providing a sensible solution to the problem of recording moving images on digital storage media. The starting point for the proposed algorithm has been the CCITT H.261 Reference Model, future standard for synchronous transmissions at p × 64 kbit/s (p = 1,..., 30). The algorithm provides all the facilities required for recording purposes.

Keywords. Digital storage media, recording facilities, CCITT H.261 algorithm.

1. Introduct ion

Real video images and sound are powerful tools

to provide a more natural interface when present- ing many different kinds of information in applica-

tions such as education, training and maintenance,

entertainment, advertising and so on. Publishers

and advertisers are ready to prepare multimedia

documents exploiting all the possibilities of the

multimedia concept; since late 1987, when the

David Sarnoff Research Center presented its implementation of a universal all-digital medium,

the expectation of low-cost, compatible, high- performant systems has increased every day.

The obstacle towards reaching the third require- ment is still represented by the moving video

images: CD is certainly one of the most important physical media that experts have in mind as widely spread support for multimedia interactive docu- ments and they just allow a throughput around

0923-5965/90/$03.50 © 1990- Elsevier Science Publishers B.V.

1.2 Mbit/s. At such a bitrate the known algorithms

for moving image coding do not provide yet, in

all shooting conditions, full satisfactory image

quality. It must, moreover, be taken into account that the coding algorithm must include the possi- bility of accommodating features as reverse play-

back, random access, etc., already present in nor-

mal analogue recording devices, and possibly some

new ones.

Concerning the cost of the equipments, low

prices are possible when mass production is

reached and different applications may make use of the same integrated components which can be

found on the market. To fulfil the described

requirements, together with the appealing perspec- tive of the compatibility issue, ISO has started in

the MPEG subgroup the work for the definition of the standard coding algorithm to be used for recording moving images on digital storage media.

156 F.. Pereira et al. / A C C I T T compatible coding algorithm

This paper describes the work carried out in CSELT with the aim of providing a sensible sol- ution to the problem. The starting point for the proposed coding algorithm has been the Reference Model (release RM8) which was defined in CCITT by the 'Specialists Group on coding for visual telephony' [4]. The reasons for this choice are manyfold but two of them are of basic importance: to have the maximum of commonality between the equipments used in telecommunications and con- sumer environments, and, even more conclusive, the fact that nobody presented up to now a different algorithm performing better than that.

The original contributions described in the paper

are formed by - - T h e modification of the RM8 scheme; the main

effect of this is to make the temporal coding nearly symmetric to allow a good motion rendi- tion in reverse playback.

- - T h e use of a new method for the fast conver- gence of still images. The other important feature included in the cod-

ing algorithm, as it really provides improved per- formances in several video sequences, is the global motion compensation which is implemented according the method described in [1].

In the following the description of the proposed coding algorithm derived from RM8, with par- ticular emphasis on the features suitable for future developments, is provided, and a few simulation results are presented.

range p x 64 kbit/s (p = 1, 2 . . . . ,30) and accept video sources according to the Common Inter- mediate Format (CIF). The CIF is characterized by having a spatial resolution of 360 pixels per line and 288 lines per image (non-interlaced for- mat) and a picture rate of 29.97 Hz.

2.1. The layered image structure

In RM the redundancy reduction is achieved by means of a 'block coding' technique. Using a 'bot- tom-up' description, the structure of an image is the following:

1. Block A block is formed by partitioning the images

into square non-overlapping matrices of pixels either of luminance or chrominance. The dimensions of the blocks are 8 x 8.

2. Macro Block (MB) Four contiguous luminance blocks (in a 2 x 2

arrangement) together with the spatially corre- sponding two chrominance blocks form a Macro Block (MB).

3. Group Of Blocks (GOB) To make the coding-parameter control easier,

macro blocks are arranged into rectangular matrices with dimensions 3 (vertical) by 11 (horizontal).

2. The CCITT Reference Model

The Reference Model represents the conver- gence point reached by the CCITT 'Specialists Group on Coding for Visual Telephony' after a four-year effort to find via computer simulation the best algorithm to code moving picture at fixed bitrates with the typical requirement arising from the telecommunications field of relatively low delay introduced by the co-decoding process.

The coding technique is based on a hybrid DPCM/transform coder that allows bitrates in the Signal Processing: Image Communication

4. Image As already mentioned the spatial format of the

images is CIF, i.e., an orthogonal pattern of 360 pixels by 288 lines for the luminance (Y) and 180 pixels by 144 lines for the two colour-difference components (CB and CR). Being 360 not a multi- ple of 16 (the MB dimension), four columns on the left-hand side and as many on the right have been discarded giving a total width of 352 pixels for Y and 176 pixels for CB and CR. In that spatial format 6 (vertical)x 2 (horizontal) GOB are con- tained.

1~ Pereira et al. / A CCITT compatible coding algorithm 157

2.2. The coding algorithm

The generic architecture of the coding algorithm is known as a hybrid DPCM/transform coder. Prediction is performed in the temporal direction, i.e., a coded image is used to predict the following one. The prediction errors, still having some spatial statistical dependence within each image, are decorrelated using a two-dimensional transform- ation.

The key elements of the coding process are the following:

1. Motion Compensation (MC) The MC is applied only to MB that have changed

significantly. The displacement estimation is achieved by a block matching technique using a search window of :~7 pixels in the previous coded frame.

is an important target in the International Organi- sation for Standardisation ( ISO) - -MPEG Group. This coding algorithm must be prepared to provide not only the Normal Video Playback but also the Reverse Video Playback, the Fast Forward Video Playback, the Fast Reverse Video Playback, Ran- dom Access and High Quality Still Mode.

Among the known algorithms that could be con- sidered as suitable candidates, the RM8 previously described, which is now in the process of being converted into a CCITT Recommendation (H.261), appears the best one for the reasons indi- cated in the introduction.

In the following, we will describe the variations and additions brought to the RM8 coding algorithm, in order to make it able to provide all the facilities required for recording of moving images on digital storage media.

2. Coding loop

A simple DPCM loop, working in the temporal dimension, i.e., interframe, followed by a Discrete Cosine Transformation (DCT) can be identified as the kernel of the configuration. The DCT coefficients are uniformly quantised using one among a predefined set of quantiser steps.

3. Variable Length Coding (VLC)

VLC is applied after the quantisation of the DCT coefficient. A zig-zag path is used to scan the DCT coefficients matrices.

4. Buffer control To smooth the inherent instability and to control

the bits production, a buffer is used. Depending on its fulness the stepsize of the quantiser is adjusted allowing a fast recover from a bit produc- tion surplus.

3. The coding algorithm for digital recording of moving images

The definition of a standard for the coding of moving image information on digital storage media

3.1. The codec structure

The codec structure is presented in Figs. 1 and 2 and combines intra/interframe coding with global and local motion compensation and trans- form coding. In order to improve the H.261 algorithm performance some new coding tools have been introduced.

1. Global motion compensation

The motion compensation is improved using simultaneously local and global displacement esti- mations. The global motion compensation acts over the whole frame and allows to improve the

prediction for the interframe coding. Two special cases of motion are for the moment considered: panning and zoom. In the panning compensation a panning vector is transmitted that indicates the displacement that must be applied to the previous decoded frame before the normal interframe cod- ing. In the zoom compensation a zoom factor is transmitted that specifies the expansion or com- pression to be applied to the previous decoded frame before the normal interframe coding. The panning vector and the zoom factor are included in the Picture Header; when there are no panning

Vol. 2, No. 2, August 1990

158 F. Pereira et al. / A CCITT compatible coding algorithm

~ IH.G. OOA..Tq 1 OONTROL I

INTRA/INTER ', [- . . . . . . . . . . . . . . . . . . . . . . . . . . + - - - T . . . . . . . . . . . 7 . . . . . . . . . q ,

- - +

VIDEO, ~ , ~ I I I WEIGHTING I ~ I I I-----'--I IN -I RESOLUTION / _if'/" "~ ~ TRANSFORM ~-~ AND ~ LENGTH H MULTIPLEXER ~--~ BUFFER 7 I I IQUANTISATIONI ; I CODING I [ J I I l

l ,NVERSE 1 STORAGE WEIGHT/QUANT ] MEDIUM

INVERSE TRANSFORM

FRAME h I PRED CTION I-

GLOBAL/LOCAL I I SIDE INFORMATION DISPLACEMENT I ESTIMATION J

Fig. 1. The coder.

STORAGE MEDIUM - ~ D E M U ' T , P ' E X ~ R ~ ,NVEBSE U ,NVERSE E,"'Z', V'o~O , i , ....... - I I o~mq -IWE'GHT'OOANT'I -I TRANSFORM FL+~ / "

, SIDE INFORMATION ,~

Fig. 2. The decoder.

vec tor /zoom factor, the motion compensation is identic to that of the H.261 algorithm.

2. Weighting of transform coefficients Transform coefficients are weighted before

quantisation [1]; at the decoder the inverse process

is performed. This weighting operation allows to improve the final subjective quality of the coded images.

3. Noise filtering Noise filtering is applied with all the macroblock

classes but the intracode; this operation allows to obtain noticeable subjective improvements since it is in fact a dithering operation. Signal Processing: Image Communication

4. Resolution control Since, with the present state of the art, there is

no serious chance of representing the video signal with C C I R 601 resolution reaching the target quality using bitrates around 1.2 Mbit/s, the spatial

and temporal subsamplings appear without alternative; the chosen solution is spatial C IF res- olution, which means 352 x 288 pels for luminance and 176 x 144 pels for chrominance at 25 Hz; the selected algorithm must therefore have a reduction factor of about 20 to get the target bitrates.

One of the additional features required to the coding algorithm is the possibility of increasing the quality of particular (still) pictures, selected during the coding phase, by using an amount of

F. Pereira et al. / A CCITT compatible coding algorithm

information which is not read during the normal play; this is called High Quality Still Mode (HQSM). In the High Quality Still Mode the resol- ution may increase until CCIR 601 resolution (704×288×2 for luminance); the spatial resol- ution increase strategy will be explained when describing this facility.

A B (N1) C

5. Frame prediction The frame prediction accuracy is of fundamental

importance for the video coding algorithm efficiency. Beside the introduction of the global displacement estimation, the separation between the local displacement estimation and the filtering has been introduced. This feature implies the introduction of four new macroblock classes and the correspondent changing of the VLC tables.

6. Coding control---quantisation step control In the Reference Model, the quantisation step

is determined by the buffer fulness in order to guarantee a fixed output bitrate. For registration purposes the bitrate constraints are not so stringent (there is a peak bitrate limitation) and this allows distributing the available bitrate depending on the image activity and trying to absorb at least short term variations.

The quantisation step is computed for each GOB based on the value of the parameter 'Excess' defined as the difference between the number of bits produced after the beginning of the trans- mission and the number of bits that could be transmitted during the same time on a fixed chan- nel working at the bitrate initially agreed [3].

3.2. The temporal hierarchical coding

As the Reference Model implements a pure interframe coding, it is impossible to provide all the required registration facilities previously indi- cated; this fact justifies the necessity of breaking the temporal coding correlation introducing recovering points. The introduction of these recovering points is very delicate since it is impor- tant to avoid quality breaks corresponding to these

159

ILCl illLtl ..... iLtllil A o r B B

(N2) e e e

A - PURE INTRAFRAME CODING

B - INTERFRAME CODING WITHOUT MC

AND INTRABLOCK CODING

C - IDENTIC TO B (THE SAME FRAME

IS ALSO [NTRAFFIAME CODED)

D - [NTERFRAME CODING

Fig. 3. The temporal hierarchical coding.

moments, providing at the same time adequate fast playback modes without prohibitive hardware and computational costs. These arguments justify the introduction of a more complex and dynamic tem- poral coding structure called 'Three levels tem- poral hierarchical structure' (Fig. 3). This temporal structure considers four frame coding modes: - - M o d e A. Pure intraframe coding for all the

frame; - - M o d e B. Normal Reference Model coding

excluding global and local motion compensa- tion and intrablock coding;

- - Model C. Identic to B; it is used for the frames

that are coded with Mode A (these frames are coded twice);

- - Mode D. Normal Reference Model Coding with global motion compensation. The present temporal structure has two defining

parameters, --N1, the number of Mode B coded frames

between two Mode A coded frames; - - N 2 , the number of Mode D coded frames

between two Mode B/A coded frames. Changing these parameters, it is possible to

obtain different temporal structures from the pure periodic intraframe coding or pure CCITT H.261 scheme to other more complex schemes. It is also possible to change the coding mode of a specific frame inside the periodic temporal structure since the frame class is clearly indicated in the Picture

Vol. 2, No. 2, Augus t 1990

160

Header of the H.261 bitstream. This fact allows, with an adequate criterion, to dynamically adapt the temporal coding structure to the image activity characteristics.

As the diverse coding modes are characterized by different coding compression factors, it is important to avoid noticeable quality variations in time attributing to each coding mode a quantisa- tion step privilege. - - Mode A. The quantisation step of each GOB is

identical to the quantisation step of the corre- sponding GOB in the previous Mode C coded frame;

- - M o d e B. The quantisation step computed by the quantisation step control is diminished by 2;

- - M o d e C. The quantisation step computed by the quantisation step control is diminished depending on the average value of the quantisa- tion step in the previous 10 frames;

- - M o d e D. No privileges (normal quantisation step control). These quantisation privileges allow to smooth

the picture quality, avoiding noticeable quality breaks at the recovering points. The simulation results demonstrated that the coding of Mode A and Mode B frames is determinant for the overall picture quality.

3.3. The recording facilities

The designed coding algorithm must provide not only the Normal Video Playback but also the addi- tional performances required for recording pur- poses. In the following these facilities are briefly analysed:

3.3.1. Normal Video Playback (NVP) This feature is implemented by the normal

sequencial decoding of the bitstream. It is impor- tant to avoid noticeable quality variations due to the introduced recovering points.

3.3.2. Normal Reverse Playback (NRP) The dynamic temporal hierarchical structure

that is presented allows multiple choices for the Signal Processing: Image Communication

F. Pereira et al. / A C C I T T compatible coding algorithm

implementation of this facility depending on the required refresh frequency.

Low refresh frequency This option is essentially characterized by the

fact that supplemental memory is not needed. The feature is implemented by decoding only the Mode A frames and the Mode B frames (if N~ greater than zero) and repeating each one N2+ 1 times; if N, is zero only Mode A frames (pure intracoded) are used, while for N~ greater than zero also Mode B frames are probably used. The pure reverse pro- cessing, where one Mode B frame is obtained subtracting the correspondent differences from the previous Mode B or Mode A frame, is possible due to the introduction of the Mode C frames that maintain the temporal chain. The process is not ideal because the image resulting from Mode C is not equal to that resulting from Mode A; however the simulations have shown that reasonable results may be expected. The use of this option for the NRP implementation will depend on the value of N2; note that there is no delay.

Normal refresh frequency This option requires supplemental memory. The

feature is implemented decoding all the frames between two Mode B frames (or Mode B and Mode A). If a reverse playback quality identical to the normal playback quality is requested, it is necessary to buffer all the frames and make the display in reverse order; if a lower refresh frequency is admitted, it is possible to buffer one frame out of 2 or 3, saving memory and repeating the display of each frame a convenient number of times. The delay depends on the N2 value.

3.3.3. Fast Forward Playback (FFP) The Fast Forward Playback implementation has,

for this dynamic temporal hierarchical structure, many possibilities depending on the values of N~ and N2. As we must respect the limits for the medium data rate, the FFP implementation will depend on the maximum number of bits per frame since we use for this feature the recovering frames

F. Pereira et al. / A CCITT compatible coding algorithm

(Mode A frames) that are the frames with the

lowest compression factor; the ISO requirements speak of a speed up factor between 8 and 10. I f

we consider an average bitrate of 900kbi t /s , a burstiness factor (frame level) of 3 and a maximum

medium data rate of 1.2 Mbit/s , we may conclude that it is possible to read at maximum 1.11 (peak

frame/or iginal s) for a speed up factor of 10 and 1.39 (peak frame/or iginal s) for a speed up factor of 8 (average values). The solution will result com-

bining these values with the N1 and N2 choices that will probably depend on the image activity

characteristics. For example, N~ equal to 3 and N2 equal to 5 gives a speed up factor of 8 if we repeat

each Mode A frame 3 times at 25 Hz or 6 times at 50 Hz; in this situation we must read 1.04 (peak f rame/or ig ina ls ) , which is below the indicated limit value.

3.3.4. Fast Reverse Playback (FRP)

For Fast Reverse Playback all the comments made for Fast Forward Playback are still valid. The solution for the implementat ion of this facility may be the same reading, of course, the data in reverse mode.

3.3.5. Random Access (RA) Random Access is the feature that allows to

access directly to a target frame. This feature must use the address table to reach the nearest recover- ing point (Mode A frame) to the target frame and from that point proceeds until the display of the claimed frame.. The maximum Random Access time depends on the burstiness factor (frame level), on the temporal frequency and for the hierarchical

coding on the N1 and N2 values. The hierarchical temporal coding has a maximum Random Access time (MRAT) of about

MRAT = [[int((N~ + 1)/2) + 2] x BF

+ ( (N2+ 1))] x (1/fq) (s),

where BF is the burstiness factor (frame level) and fq is the temporal frequency. It is considered here

161

a pessimist case where all the Mode A, B and C frames are peak frames.

3.3.6. Compatibility with the CCITT H.261 Algorithm

As the presented algorithm is based on the

CCITT H.261 Reference Model, the codec struc- ture is completely compatible; however due to the introduction of some new coding tools and the temporal hierarchical scheme some problems

appear in the decoding of a hierarchically coded signal with a H.261 codec. The situation may be summarized by

- - there is 100 % capability to decode H.261 coded signals considering that all the frames will be Mode D frames;

- - the decoding of hierarchically coded signals by

a H.261 codec will only be possible with the introduction of the Picture Header analysis related to the frame coding mode detection and the disabling of the new coding tools.

3.3.7. High Quality Still Mode (HQSM)

One of the additional features required to the coding algorithm is the possibility of increasing

the quality of particularly interesting (still) pic- tures, making use of a certain amount of informa-

tion which is recorded on the media but is not read

during the normal play; this High Quality Still Mode (HQSM) may simply produce an image with the same spatial resolution but with a better rendi- tion of the details or even a picture with a higher resolution (e.g., according to CCIR601) depend- ing on the used amount of information.

The implementat ion of this feature has been done in a flexible way since it is possible to choose the strategy of the quality increasing defining

- - t h e successive levels of resolution choosing between >level 1 - - C I F at 25 Hz, >level 2 - - C I F with double horizontal

resolution at 25 Hz, >level 3-- level 2 at 50 Hz;

- - the maximum number of frame codings in each intermediate resolution level.

Vol. 2, No. 2. August 1990

162

During the High Quality Still Mode coding a quantisation step convergence process that allows to optimize the coding efficiency is implemented. All the Still Mode frames are Mode D coded.

The convergence process

The convergence process here described is a particular case of a more general convergence pro- cess [2] for the bitrate and quality optimization of the transmission of fixed pictures with a hybrid scheme. The general process admits not only artificially fixed pictures (e.g., data bases) but also pictures resulting from a fixed camera and affected by the camera noise. The method detects automati- cally the fixed image and implements its coding in an optimal way considering that the sequence may become again a normal moving sequence.

For the specific case here considered the auto- matic detection is not necessary since the pictures that must be submitted to the High Quality Still Mode are specifically selected during the coding. - - The quantisation step strategy

The necessity of a special quantization step strategy for a still image coding is motivated by the non-efficient progress of the coding if we use the quantisation steps resulting from the normal quantisation step control. These quantisation steps, that result essentially from the necessity of respecting an agreed average bitrate, are normally too near or even equal for the same GOB of two consecutive frame codings, leading to the repeated coding of the accumulation of the arithmetic and quantisation errors, without any image quality improvement.

The purpose of the convergence process is to distribute, in an optimal way, the available bits taking into account that the agreed average bitrate must always be respected.

In the convergence process two quantisation steps act on each GOB: - - R e a l Quantisation Step ( R Q S ) is at each

moment the minimum quantisation step already used for the coding of each GOB. The process finishes when the RQS is the minimum QS for all the GOB's.

Signal Processing: Image Communication

F. Pereira et al. / A CCITT compatible coding algorithm

- - C o d i n g Quantisation Step ( c o s ) is the value

actually used during the coding in the conver- gence process. If the available bitrate does not allow an RQS decrease for the present GOB, the CQS assumes a political value that over- comes the errors and allows to maintain the full compatibility without wasting bits. For this case the choice of the political QS value is not critical on what happens when coding fixed pictures embedded in moving sequences, since we must be prepared to code a scene cut /movement overcoming at the same time the errors and the camera noise. It has been concluded by simula- tions that, if the average bitrate control allows an RQS decrease for a GOB, the optimal quan- tisation step evolution is to make the CQS equal to half the RQS; RQS is in this case updated. In order to guarantee a uniform image quality,

the RQS of a GOB may only be decremented when the difference between the actual RQS and the maximum RQS in the frame is below a selected threshold. The convergence process is mainly based on a continuous interaction between RQS and CQS which leads the coding process; the cod- ing is stopped when the agreed number of bits for the HQSM is reached. - - The convergence process with resolution

improvements

The increasing of the resolution may be done using o n c o r two steps as explained above. In the resolution changes the picture is interpolated and filtered to obtain the first prediction for the new resolution level coding. Starting from the default level (352 x 288 pixels for luminance and 176 x 144 pixels for chrominance) we may double first the luminance and chrominance horizontal resolution and afterwards the temporal resolution or do the two resolution increases simultaneously. The resolution increases follow the rules: - - t h e initial resolution level is always the default

resolution level (level 1); - - the transition to a higher resolution level is done

when the average RQS (frame level) reaches a threshold value or when the agreed maximum

F. Pereira et al. / A C C I T T compatible coding algorithm 163

number of frames for an intermediate resolution

level is reached; the resolution level changes may be imple- mented with or without memory on the quanti- sation step convergence process (note that the subjective impact of the two options is not the

same): with memory: the quantisation step convergence process acts as described above along all the High Quality Still Mode coding; in the resol- ution level changes to each GOB is attributed the RQS of the correspondent GOB in the pre- vious resolution level. This option, which is in the long term more efficient, may bear a negative subjective impact if the available number of bits for the HQSM is not enough to code at least one time each GOB in the final resolution level. without memory: within each resolution level, the quantisation step convergence process acts as described above. In the first coding of each new resolution level, the RQS and the CQS values are made equal for each GOB and are only determined by the quantisation step con-

trol respecting however an upper threshold; the quality improvement is very smooth.

The described convergence process guarantees an image quality improvement obtained in a smooth way in terms of bitrate, signal-to-noise ratio and image subjective impact.

4. Results

I. Normal coding

To analyse the performance of the temporal hierarchical coding algorithm presented in Section 3..2, we have coded the sequences 'Table Tennis'

and 'Diva" at 900 kbit/s, using some of the more interesting combinations of N~ and N2; in Figs. 4-9 the luminance signal-to-noise ratio (dB) related to the studied cases is represented.

The presented results suggest the following com- ments: - - T h e differences on the luminance signal-to-

noise ratio between the Reference Model and

37

Luminance Signal to Noise Ratio (dB) "TABLE TENNIS" - 900 kbit/s

36

35

34

33

32

31

30

29

28

27

26

25

l/

25 50 75 100 125 150 175 200 225 250

frame

Fig. 4. Reference Model.

Vol. 2, No. 2, August 1990

164 F. Pereira et al. / A C C I T T compatible coding algorithm

38.8

38.6

38.4

38.2

38

37.8

37.6

37.4

37.2

37

36.8

36.6

36.4

36.2

36

35.8

35.6

Luminance Signal to Noise Ratio (dB) "DIVA" - 900 kbit/s

25 50 75 1 O0 125

frame

Fig. 5. Refe rence Model .

35

34

33

32

31

30

29

28

27

26

25

Luminance Signal to Noise Ratio (dB) "TABLE TENNIS" - 900 kbit/s

ArAI

v i I I I

A /

' l v ' ~ I I A

25 50 75 100 125 150

frame

Fig. 6. N I = 0 ; N 2=11.

175 200 225 250

Signal Processing: Image Communication

F. Pereira et aL / A CCITT compatible coding algorithm 165

Luminance Signal to Noise Ratio (dB) "DIVA" - 900 kbit/s

38

37

36

35

34

33 25 50 75 100

frame

125

Fig . 7. N 1 = 0; N 2 = 11.

35

34

33

32

31

30

29

28

27

26

25

24

Luminance Signal to Noise Ratio (dB) "TABLE TENNIS" - 900 kbit/s

I

25 50 75 100 125 150

frame

Fig. 8. N~ = 4; N z = 4.

175 200 225 250

VoL 2, No. 2, August 1990

166 F. Pereira et al. / A CCITT compatible coding algorithm

38

37.5

37

36.5

38

35.5

35

34.5

3 4 - -

33.5

Luminance Signal to Noise Ratio (dB) "DIVA" - 900 kbit/s

AI U II I

w ll I IVLAlll t/IV " l /V l lV I

' 1 Y VUIII/V ',l w wl

. . . . . . . . . . . . . . . . . . . . . . . . I . . . . . . . . . . . . . . . . . . . . . . . . l . . . . . . . . . . . . . . . . . . . . . . . . I . . . . . . . . . . . . . . . . . . . . . . . . I . . . . . . . . . . . . . . . . . . . . . . . .

25 50 75 100 125

frame

Fig . 9. N l = 4; N 2 = 4.

the Temporal Hierarchical Coding here described are essentially due to the variable bitrate approach and the global motion com- pensation; this remark is particularly evident on the clear zoom and the panning of the sequence 'Table Tennis'.

- - The picture quality for the presented algorithm depends on the image activity since the imple- mented variable bitrate coding allows only to absorb short term variations; this conclusion may be observed comparing the 'Table Tennis' and 'Diva' results.

- - T h e picture quality is not the same for all the frame coding modes; in fact the A and B modes introduce breaking points not only in temporal correlation but also on picture quality. The quantisation step privileges result from the research on these frames of the optimum trade- off between the expended bits and the picture quality. The mode A frames picture quality is critical since it will seriously affect the quality of all the remaining frames.

Signal Processing: Image Communication

- - T h e ideal combination of Nt and N2 depends on the image activity and also on the perform- ance required for the registration facilities, essentially the fast modes and the random a c c e s s .

2. High Quality Still Mode

The performance of the HQSM has been tested using the sequence 'Still Flower' coded at 900kbit /s; this sequence has 25 moving frames (coded with Nt = 0 and N2 = 11) being the last one coded with the HQSM. The results are presented in Figs. 10-12 and suggest the following remarks: - - The convergence process allows to improve the

coding efficiency (Fig. 10--the LSNR is measured over the CIF matrices). The initial differences are due to a first image different QS.

- - The convergence with memory imposes the cod- ing of all the GOB's in the' final resolution level in order to avoid the negative subjective impact of a frame with different qualities in the various

F. Pereira et al. / A CCITT compatible coding algorithm 167

39

38

37

36

35

34

33

32

31

30

29

28

27

26

25

Luminance Signal to Noise Ratio (dB) "STILL FLOWER" - 900 kbit/s + 270 kbits

~T

frame o Without Converg. + with Converg.

Fig. 10. With/without convergence--Res. Level = I.

41

Luminance Signal to Noise Ratio (dB) "STILL FLOWER" - 900 kbit]s + 1.45 Mbits

40

39

cir

311 35 34 33 32 31 3O 29 28 --=" J

25 I = 23 22 21 . 20

/ ccJ S

i . . . . i . . . . i . . . . i . . . . . . . i . . . . i . . . . i . . . . i . . . . i . . . . i . . . .

5 10 15 20 25 25 25 25 25 25 25 25

frame t3 Without Memory + With Memory

Fig. l l . Res. Level 3--With/without memory (l, 45 Mbits),

25

V o l . 2 , N o . 2 , A u g u s t 1 9 9 0

168 F. Pereira et aL / A CCITT compatible coding algorithm

50

48

46

44

42

40

38

36

34

32

30

28

26

24

22

20

Luminance Signal to Noise Ratio (dB) "STILL FLOWER" - 900 kbit/s

o

25 25 25 25 25 25

frame With memory • Without memory

Fig. 12. Res. Level 3 - -Cony . until QS = 1 for all GOB's .

GOB's, while the convergence without memory guarantees a very smooth quali ty improvement reaching more rapidly a global uniform quality after the resolution level changes.

- - T h e convergence process with resolution improvements and memory seems more efficient in the long term than the convergence process with resolution improvements and without memory. In Fig. 11 the LSNR evolution for these two possibilities (resolution level 3) using 1.45 Mbits is presented, since this is the value that corresponds to the situation where all the GOB's are coded once in the final resolution level. In Fig. 12 the LSNR evolution until the quantisation step is 1 for all the GOB's (res. level 3) is presented; note that the coding evolution is not the same: the coding without memory expends 3.8 Mbits and reaches an LSNR of 45.4 dB while the coding with memory expends 4.4 Mbits but reaches an LSNR of 48.25dB (Figs. 11 and 12--the LSNR is measured over the CCIR matrices).

Signal Processing: Image Communication

5. Conclusions

The basic architecture of the presented algorithm is that of the CCITT H.261 algorithm; this choice is motivated by the performance of this architecture that is until now the most promising and also by the necessity of increasing the compati- bility between all the video applications.

The improvements introduced in the coding scheme try to cope with the new requirements imposed by the nature of the image sequences different from the well-known videotelephone images. Another motivation was the necessity of providing all the required recording facilities among which the reverse modes and the random a c c e s s .

In order to fulfil all the requirements many sol- utions have been examined from the simplest one-- introduct ion of a periodic intracoded f rame-- to others that are more sophisticated. Special attention has been dedicated to the resol- ution of the reverse processing problem in order

F. Pereira et al. / A CCITT compatible coding algorithm 169

to obtain an almost symmetric (in time) hybrid coding. The modifications motivated by this feature decrease the coding efficiency; however this decrease is acceptable as may be observed in the results presented for the luminance signal-to- noise ratio. Another consequence is the necessity, for the normal playback, of duplicating the memory for the decoded image.

The other extension introduced in the algorithm is related to coding, with high quality, of still pictures. The results show a better performance when using the presented convergence process; since this high quality scheme affects only the coding control and not the H.261 compatibility, it seems interesting to consider its introduction in the final recommended algorithm.

References

[1] C. Herpel, D. Hepper and D. Westerkamp, "'Adaptation and improvement of CCITT Reference Model 8 video coding for digital storage media applications", Image Communica- tion, Vol. 2, No. 2, August 1990, pp. 171-185.

[2] F. Pereira and L. Masera, "High Quality Still Picture Mode embedded into a hybrid coding scheme", Picture Coding Symposium, Cambridge, U.S.A., March 1990.

[3] F. Pereira and M. Quaglia, "Extension of the CCITT visual communication coding algorithm for operation in ATM networks", Image Commun. J., 1990.

[4] CCITT SG XV, Draft Recommendation H.261, Tokyo Meeting, October 1989.

Vol. 2, No. 2, August 1990