An efficient protocol architecture for error-resilient MPEG2 video communications over ATM networks

IEEE TRANSACTIONS ON BROADCASTING, VOL. 45, NO. 1, MARCH 1999 129

An Efficient Protocol Architecture for Error-Resilient MPEG-2 Video Communications over ATM Networks

Pedro Cuenca, Antonio Garrido, Francisco Quiles and Luis Orozco-Barbosa

Abstract-MPEG-2 video communications over ATM networks is one of the most active research areas in the field of computer communications. In the transmission process of a variable bit rate video signal over an ATM network, cells are inevitably exposed to delays, errors and losses due to the statistical multiplexing used in these networks. These phenomena affect the quality of the video signal and without adequate measures to control the propagation of the impairments the quality of the service may fall below acceptable levels. In this paper, we study the impact of cell losses on the quality of a MPEG-2 video sequence encoded in a variable bit rate mode. We introduce a set of control mechanisms at different levels of the protocol architecture to be used in MPEG-2- based video communications systems using ATM networks as their underlying transmission mechanism. Our results (using different video sequences) show the effectiveness to improve the video quality by using a structured set of control mechanisms to overcome for the loss of cells carrying VBR MPEG-2 video streams. We argue that in order to be able to create video systems able to cope with cell losses encountered in computer communications systems, a structured set of error-resilient protocol mechanisms is needed.

Index Terms- MPEG-2 Video, Video Quality, ATM Networks, Cell Losses, Error Resilient Mechanisms.

I. INTRODUCTION

Recent developments on the areas of video coding and compression techniques are enabling the deployment of computer-based video communications systems. However, one of the main challenges to overcome remains the design and deployment of protocol architectures able to cope with the stringent requirements of video communications. In order to prove effective a video communication system will not only require the use of powerful processors and high-speed networks, but the use of resource control and error recovery mechanisms able to properly manage the system resources and cope with system errors. The design of an integrated protocol architecture regrouping the different control mechanisms has to take into account the characteristics of the various system elements: from the application down to the transmission mechanism. A clear understanding of the characteristics and functions of the different layers is essential for the design of effective integrated protocol architectures.

Pedro Cuenca. Antonio Garrido and Francisco Quiles are with the Departamento de Infordtica, Universidad de Castilla-La Mancha. 02071, Albacete, SPAIN (e-maiklpcuenca. antonio, paco] @info-ah.uclm.es) Luis Orozco-Barbosa is with School of Information Technology and Engineering, University of Ottawa, 161 Louis Paster St, Ottawa, Ontario K1N 6N5, CANADA (e-mail : [email protected]) Publisher Item Identifier: S 0018-9316(99)02628- I

In this work, we describe a set of protocol mechanisms to be used in the design and development of MPEG-2 video communications system using ATM networks as the underlying communications support (figure 1).

PHYSICAL w Enor Concerlmrnl

Video De~odiiie

HierarLhical

I I I I

Figure 1. Protocol archtecture.

We organize this paper by introducing one by one the control mechanisms following the path from the encoder to the decoder via the ATM network. In section 11, we start by briefly describing the characteristics of the MPEG-2 video coding scheme. A trace of a video stream is given to illustrate the statistical characteristics of a typical MPEG-2 video stream. In section 111, we present an analysis of the perceptual impact of cell losses over the quality of the MPEG-2 video sequences. In section IV, we introduce an Adaptive Hierarchical Coding Scheme. This scheme is fully compatible with the MPEG-2 specifications [14]. In section V, we propose a novel Hierarchical Packing Scheme for the transmission of MPEG-2 video streams over ATM. Other packing schemes have been proposed in the literature [lo], but none of them considers the use of different priority levels for the transport of ATM cells. In section VI, we explore the effect of the MPEG-2 VBR video traffic on the performance of an ATM network model in terms of cell loss rate and cell loss distributions. In particular, we explore how the temporal properties of MPEG-2 VBR video traffic affect network performance when many traffic streams are multiplexed onto a single channel. The use of Cell Discarding Mechanisms is also considered to manage the buffer occupancy of the switch in order to increase the error-resilient of the MPEG-2 video communications system. Finally, in

0018-9316/99$10.00 0 1999 IEEE

I30

section VII, we present an algorithm able to select the best error concealment strategy according to the characteristics of the video sequence and particular piece of information being lost. Our global results show that when using the hierarchical encoding scheme and packing strategy previously introduced, this scheme may considerable improve the performance of the system, Le., the video quality. Section VI11 concludes the paper.

11. MPE;G-2 VIDEO FRAME STRUCTURE

In the MPEG-2 coding scheme, three types of frames have been defined. Each type uses a slightly different coding scheme. I-frames use only intra-frame coding, based on the discrete cosine transform and entropy coding. P-frames use similar coding algorithm to I-frames, but with addition of motion compensation to the previous I or P-frame. B-frames are similar to P-frames, except that the motion compensation is done with respect to previaus I or P-frame, the next I or P-frame, or an interpolation between them (see figure 2).

Typically, :[-frames require more bits than P-frames. B- frames have the: lowest bandwidth requirement. The different ways in coding frames result in different traffic characteristics for the different frame types. At the encoder side, the frames are arranged in a deterministic periodic sequence, e.g. “IBBPBBPBBPBBPBBI”, called Group of Pictures (GOP). This pattern is repeated periodically until the whole stream is compressed. The pattern is defined by the numbers and types of P and B-frames between successive I-frames. The transmitted sequence is a permutation of the ordering of the original sequence, e.g. “IPBBPBBPBBPBBIBB”.

B / 7 B

/

L L Figure 12. Example of MPEG-2 temporal structure.

The GOP plays a major role on the source statistics of an MPEG-2 video stream as it fixes the periodic nature of the MPEG-2 video source. Figure 3 shows the statistics of a representative segment of traffic generated from the sequence flower garden encoded using a VBR MPEG-2 encoder. The quantization fac:or, Q, has been set to 20 and the frame rate to 30 frames/s. From the figure, the I-frames can be easily identified. Starting from the left, every peak appearing at regular intervals of aboL t 500 ms corresponds to an I-frame.

Figure 3. Traffic generated for Flower Garden sequence.

111. PERCEPTUAL IMPACT OF CELL LOSSES OVER VIDEO QUALITY

The fact that the MPEG-2 video coding scheme uses compression techniques makes any MPEG-2-based video communications application very vulnerable to losses. Figure 4 shows the effect over a decoded digital image as a result of losing five ATM cells. The digital video sequence was encoded with a (video) slice corresponding to a horizontal strip of the image. In the absence of any error propagation control mechanism, the loss of each cell causes the loss of information up to the next resynchronization point (e.g. slice headers). This is referred to as spatial loss propagation. Obviously, the amount of data actually lost will depend on the relative position of the lost information within the slice.

Figure 4. Spatial Propagation Phenomenon.

Due to the predictive nature of the MPEG-2 algorithm, when losses occur in a reference picture (Intra-coded I, or predictive P, frames) it will remain until the next intra-coded picture is received. This results in the impairment propagation across several pictures, this is known as temporal loss propagation. Figure 5 illustrates this phenomenon when losing five ATM cells conveying parts of an I-frame. As shown in the figure all B and P frames within the GOP are affected.

These phenomena affect the quality of the video signal and without adequate measures to control the propagation of the impairments the quality of the services may fall below acceptable levels [4].

%IS (I)

Tema

Res ynchmnirdtion

1raICell Losses Prop sagation

Figure 5. Temporal Propagation Phenomenon.

IV. ADAPTIVE HIERARCHICAL CODING SCHEME

In this section, we present the first measure to control the propagation of the impairments, an Adaptive Hierarchical Coding Scheme. This scheme splits the MPEG-2 video bitstream into two sub-bitstreams. This split is done by taking into account the relevance of the different pieces of information of the original MPEG-2 encoded bitstream. This scheme offers advantages over similar hierarchical schemes proposed in the past [ 11][ 131[ 171. Under this scheme, low frequency coefficients together with other relevant information, are transmitted at a high priority level, also denominated base layer. High frequency coefficients and other less important information are then transmitted at a lower priority level, also denominated enhancement layer. If parts of the enhancement layer are lost, they are simply replaced by zeros and the image is reconstructed using the base layer and the dummy enhancement layer. The quality of the resulting image -though somewhat distorted- may be acceptable. This scheme can be easily adapted to transmission networks which support different priority levels, such as ATM-based networks. By using this scheme a correct transmission of the most important information of the video signal can be somehow guaranteed. Furthermore, the base layer can be designed so that it can provide by itself a minimum acceptable image quality in situations in which the enhancement layer is completely lost.

In this scheme, the base layer contains all the headers, all the control information at the macroblock level, such as, motion vectors, macroblock type, motion type, relative address of the macroblock in the slice as well as the DCT coefficient of each block encoded as intra. The base layer will also contain all the DCT coefficients different from zero, if any, up to the point indicated by the breakpoint. The remaining DCT coefficients different from zero, up to the end of block (EOB) will make part of the enhancement layer. In the case that the number of coefficients different from zero in one block is lower than the

number specified by the breakpoint, an EOB marker is inserted at the end of the base layer leaving empty the enhancement layer. Sequence, GOP, picture, slice and end of sequence headers make part of the enhancement layer too (see figure 6). The replication of these headers into the enhancement layer is the only overhead added to the bitstream. In the case of errors or losses, this information will be used to resynchronize the two partitions within the limits of the affected slice. The Adaptive Hierarchical Coding Scheme offers advantages over similar schemes proposed in the past [ 171 [ 131. In [ 171, the overhead associated to the hierarchical coding scheme presented therein accounts for 20%; this high overhead is due to code words added to the two substreams. In [13], a more efficient implementation is described. However, the associated overhead still high. For instance, an overhead of 9% is introduced when applying the proposed scheme to the flower garden video stream. This has been achieved by including all headers at the beginning of the sequence, but not at each slice since only the slice headers are needed to maintain proper synchronization.

m=r PICTURETPICTURE I . . . . I PICTURE I

Figure 6. Information in each partition.

The fact of defining a constant breakpoint for the whole video sequence implies that this scheme gives priority treatment to B and P-frames as contrasted to I-frames. This is due to the fact that a B-type blocks have a larger number of zero-valued coefficients than P and I-type blocks, and P-type blocks contain a larger number of zero-valued coefficients than I-type blocks. Under our scheme a different breakpoint is used for the three different types of frames. Breakpoint corrective factors between 0 and 1 are used for P and B frames. Once a breakpoint has been assigned to I frames, a reduced breakpoint is used to encode P- type blocks and an even lower corrective factor is assigned when encoding B-type blocks. In this way the number of DCT coefficients of a block which must go in the base layer can be adapted in a more equitable way. With this adaptive system of hierarchical encoding, traffic distribution between both partitions can be more efficiently adapted.

Figure 7 shows the results obtained for the Adaptive Hierarchical Coding Scheme when applied to the flower garden

I32

and mobile calendar video sequences. The bitstream was encoded several times using different breakpoint values. The values used for the corrective factors were set to 0.75 and 0.25 for P and B-ty!3e blocks, respectively. The results show the efficiency of the Adaptive Hierarchical Coding Scheme. In the case of the flower garden and mobile calendar sequences, the overhead (OH) introduced by our scheme is less than 1.8% regardless of the: Q factor being used, in contrast to 20% and 9% values reported in [ 171 [ 131 for the same sequence. It can also be observed that by varying the breakpoint we can vary the amount of video data to be included in each layer.

Traffic D i s t r i b u t i o n and Overhead vs B r e a k p o i n t F l o w e r G a r d e n Sequence

Q=S 0,37 Q-12 O S 5 Q=20 0.92 Q=2S 1,26 Q=40 1,74

I 3 5 7 9 11 13 15 17 19 21 23 25 27

Breakpoint

(4 Ttaff ic D i s t r i b u t i o n and Overhead vs B r e a k p o i n t

100

90

80

70

E 60

0 50 +

h

E

;j 40

+ 30 0

20

10

0

M o b i l e Ca lendar Sequence

HP Q=20 0,77

I 3 5 7 9 11 13 15 17 19 21 23 25 27

Breakpoint

(b) Figure 7. Traffic distribution and overhead versus breakpoint. (a) Flower Garden sequence. (b) Mobile Calendar sequence.

The main goal when applying this two layer coding scheme is to find the optimum breakpoint. The optimum breakpoint is defined as the breakpoint value maximizing the amount of video

data to be included in the enhancement layer (LP) for which the quality of the image reconstructed by using only the base layer (HP) remains acceptable. For the sequences studied it was observed that a breakpoint of five could yield acceptable quality (see figure 8). For instance, in the case of theflower garden and mobile calendar sequences using a breakpoint of five, the enhancement layer accounts approximately for 59%, 46%, 30% 21% and 13 % (in the case offlower garden), and 62%, 52%, 36%, 26% and 17% (in the case of mobile calendar), of the total video data when encoded with a Q factor set to 8, 12, 20, 28 and 40, respectively. This traffic can be discarded in the intermediate buffers of the ATM network before high priority information starts to be discarded.

Rmaknoint=2 Rreaknnint=Z

Breakpoint=5 Breakpoint=5

Figure 8. Subjective video quality for different breakpoints in Flower Garden and Mobile Calendar video sequences.

V. HIERARCHICAL PACKING SCHEME

In this section, we propose a novel Hierarchical Packing Scheme for the transmission of hierarchical VBR MPEG-2 video streams over ATM as presented above. The proposed scheme overcomes the problem that when no provisions are taking to properly pack a hierarchical-encoded video stream, the loss of a high priority ATM cell will result in the complete loss of a group of cells. In other words, a cell loss will translate in the loss of a partial MPEG-2 slice, where a slice is usually a complete line of macroblocks in the image (Spntial Propagation Phenomenon). This is due to the fact that the slice headers are used as the basic resynchronization points in the MPEG-2 video

133

signal. The macroblocks forming a slice contain information coded differentially with respect to precedent macroblocks. This implies that when a macroblock is lost, all the macroblocks that follow up to the end of the current slice lose their predictors and must be discarded. The structure of the slice defined by the MPEG-2 standard does not facilitate efficient concealment of the information lost, as the lost of several macroblocks in the image is translated into partial losses of slices, depending on the piece of information lost. The loss of several macroblocks on a row within a same frame reduces the effectiveness of error concealment mechanisms [5]. Temporal or spatial error concealment mechanisms can not count on neighboring macroblocks to reconstruct the lost macroblocks by interpolating motion vectors or pixel information. In order to limit the spatial impairment propagation due to the loss of a cell, we have implemented a packing scheme for ATM cells supporting hierarchical encoded video streams [6] [ 81.

HIGH PRioRirY PACKING SCHEME

7 , ,r SPLIT BETWEEN TWO CELL

5 BYTES I BYTE (1 BITS I BITS PAYLOAD (45 BYTES)

.( . OVERHEAD

4.2 x SN: Sequence Number

LOW PRlORlTY PACKING SCHEME

5 BYTES 1 BYTE OVERHEAD PAYLOAD (47 BYTES)

0 0 %

Figure 9. The proposed hierarchical packing scheme

The proposed packing scheme increases the ability of the video decoder to resynchronize at the cell level. The problem of efficient packing of video streams over ATM cells lies in the differential coding scheme used to encode the macroblocks. Information such as motion vectors, intra macroblocks DC coefficient and macroblock location in the slice are coded in reference to the previous macroblock. The scheme presented herein is robust in the presence of transmission errors, since each cell carries enough information allowing to decode a macroblock without having to make use of the information carried by other cells. This packetization scheme eliminates the spatial propagation of the impairment present when using a direct packetization method (see figure 4).

Our packing scheme is shown in figure 9. In our scheme two fields per cell are added to the high priority bitstream packing. The overhead introduced by the scheme consists of a field of 9 bits and a second field of 7 bits. These two fields provide the information necessary to make a cell self-contained.

The 9-bits field specifies the position of the first complete macroblock in the cell. Since the useful load can have up to 360 bits, nine bits are required to encode any position in the useful load. The 7-bits field contains the absolute position of the macroblock before the first complete macroblock in the cell, that is, the absolute direction of the macroblock starting in the previous cell and finishing in the current cell. For the cell to be self-contained and independent of the information contained in other cells, predictors for motion vectors and for intra macroblocks DC coefficients need to be reset for the first complete macroblock in a cell. Under this packetization scheme the overall overhead consists of the two fields used in each cell as well as the predictors for each new macroblock in a cell. The introduction of two fields in each cell represents an overhead of 4.2% while the amount of overhead introduced by the predictor depends on the characteristics of the sequence.

For the case of the enhancement layer, a direct packetization method is used, i.e., the bitstream is divided into segments of 47 bytes and inserted in the ATM cell payload. In this case there is no overhead associated to the packetization of the enhancement layer. We have adopted this scheme since the rapid synchronization at this level is not needed. The resynchronization for the low-priority sub-bitstream is performed upon receiving the next slice header. In this way no overhead is introduced in the low priority sub-bitstream. In other words, the overhead introduced by the packing scheme is limited to the one used for the transport of the high-priority substreams.

Under the proposed packing scheme, the receiving end can make use of the information conveyed by a cell following a lost cell. This is done by discarding the information contained in the incoming cell payload up to the point indicated by the 9 bits field. The receiving end can obtain the absolute position in the slice of the first complete macroblock by using the 7 bits field and the relative direction contained in this macroblock. In this way the first macroblock in the cell is provided with the necessary information to be decoded such that the remaining macroblocks in the cell can still be decoded. Using this scheme, we limit the effects of a cell loss to the level of one cell. The decoder is able to resynchronize without having to wait for the next slice header. In the event of losing a low priority cell, the decoder reconstructs the video signal by reading only the high priority sub-bitstream, until both sub-bitstreams are resynchronized again. In this case, a minimum level of quality is guaranteed by the high priority sub-bitstream. The cell loss. is detected at the receiving end using the cell sequence number. In the transmission process of video signal over the ATM network we assume the use of a unique ATM VC for low and high priority.

Figure 10 shows in detail the overhead (OH) required when the Hierarchical Packing Scheme is applied to the MPEG-2 ayersroc and hook video sequences, for different bitstreams split levels and different Q factors. The overhead introduced by the implementation of our Hierarchical Packing plus the Adaptive

I34

Hierarchical Coding Scheme, for the optimum breakpoint, is approximately for 4%. 5%, 7%, 8% and 9% (in the case of ayersroc), and 3%,4%, 5%, 7% and 8% (in the case of hook), of the total video data when encoded with a Q factor set to 8, 12, 20,28 and 40, respectively.

TraIIic Distribution nnd Overhead vs Breakpoint Ayersroc Sequence

‘ , I’ , /- 2-’ Hp

I , 5 I 9 I, I , I, I , 19 2 , I , 2s 27

Breakpoint

(a) l’raflic Dibtributiun and Overhead vs Breakpoint

Hook Sequence

I 3 5 7 9 I I 13 I5 17 19 21 23 25 27 Breakpoint

(b) Figure 10. Traffic distribution and overhead versus breakpoint.

(a) Ayersroc sequence. (b) Hook sequence.

Figures 11 shows the perceptual impact on the video quality of the Hierarchical Packing Scheme proposed when applied to the ayersroc and hook video sequences. The effectiveness of this scheme translates in a better picture quality in the presence of losses. The propagation of the impairment is better controlled by using the Hierarchical Packing Scheme. It is worth to mention that cn top of the Adaptive Hierarchical Coding Scheme and HIerarchical Packing Scheme proposed in this study, error/loss concealment schemes can be put in place in the decoder. The main purpose of error concealment schemes is to reduce to a minimum the impact of cell losses over the image quality. Since error-loss concealment techniques regenerate most of the information by means of interpolation, multiple

losses of macroblocks on a row (as Spatial Propagation Phenomenon does) reduces the effectiveness of error concealment mechanisms, as the lost macroblocks do not have neighboring macroblocks to interpolate, such as motion vectors for temporal concealment or pixel information when using spatial concealment. Therefore, the implementation of our Hierarchical Packing Scheme will be of great help to increase the error-resilient of the protocol architecture.

Figure 11. Perceptual impact on the video quality of the Hierarchical Packing Scheme in Ayersroc, Hook and Martin video sequences.

VI. PRIORITY MECHANISMS IN ATM SWITCHES

In order to achieve several levels of loss performance (QoS requirements), buffers management strategies at ATM switches can be implemented. In this way, Cell Discarding Mechanisms (CDM) can be implemented at an ATM switch [4]. Under these mechanisms, the information carried by high priority (HP) cells is considered by the switches to be more important than information carried by low priority (LP) cells. Hence, CDMs with two priority classes can be used to reduce the cell loss rate for HP cells at the expense of higher loss rates for LP cells.

Several cells discarding mechanisms have been proposed in the literature. Most of them are variations of two basic schemes: Push-Out [15] and Partial Buffer Sharing (PBS) [18]. Push- Out algorithms come into effect when the buffer ATM switch is full. When a low priority cell arrives at a full buffer, i t is dropped. When a high priority cell arrives at a full buffer, if there is a lower priority cell in the buffer the lower priority cell is “pushed out” and replaced by the high priority cell. If a high priority cell arrives at a full buffer and there is no lower priority

135

cell in the buffer the arriving cell is dropped. Under these algorithms, two main replacement strategies can be defined: FIFO and LIFO as shown the figure 12. Push-Out algorithms drop cells only when the buffer is full, however, the cost of the algorithm is high, each time a cell must be dropped the buffer has to be scanned to determine which cell to discard.

Full Buffei t \

No replacement Low Priority Cell Discard

LIFO replacement O=> Low Priority Cell Discard

t FIFO replacement

Low Pnority Cell Discard

High Priority Cell

Low Pnority Cell

Figure 12. Push-Out mechanism.

Under the simplest of PBS mechanisms, the buffer is partitioned into two regions by a threshold which represents the proportion of the buffer shared by cells of both classes. Cells of the low priority class are admitted to the buffer if its content is less than T. Otherwise, these cells are dropped. The high priority cells can enter the buffer up to the maximum size of the buffer, denoted by B. Once a cell of either class is admitted to the buffer, it cannot be discarded. Partial Buffer Sharing requires much simpler buffer management than the Push-Out approach, but the performance of such an approach is highly dependent on the choice of the threshold and the proportion of the traffic in each priority class. The structure of the PBS scheme is shown in figure 13.

Source 2 '-\

Source 1

I* Hp

Buffer : 1- B Cells

Source N

H P High Priority

LP: Low Prionty

Figure 13. Partial buffer sharing.

Recently in [12] has been proposed the Triggered Threshold priority scheme (figure 14). Under this scheme, two thresholds are used: a high threshold TH and a low threshold TL. According to its occupancy the buffer of an ATM switch can be in one of two possible states. The buffer is either in the heavy

state or in the light state. When the buffer is in the light state, all arriving cells are stored in the buffer if the current queue length is smaller than the TH, and only the high priority arriving cells are admitted into the buffer if the current queue length reaches the high threshold TH. When the buffer is in the heavy state, all the arriving cells are stored in the buffer if the current queue length is smaller than the low threshold TL, and only the high priority arriving cells are stored in the buffer if the current queue length reaches the low threshold TL. Initially, the buffer is in the light state. When the queue length reaches the high threshold TH, the buffer state changes into the heavy state. When the buffer state is in the heavy state, it will change into the light state only if the queue length is below the low threshold TL. The implementation of this mechanism is as simple as the PBS.

LP? LP

I Heavy State Buffer

B Cells 4

Figure 14. Triggered threshold mechanism.

In this section, simulation experiments are conducted to study the performance of the different cell discarding mechanisms, in terms of CLR, when used in an ATM switch supporting hierarchical encoded VBR MPEG-2 video traffic (as generated by our proposals), from a number of independent video sources. The ATM network is modeled as depicted in figure 15. The ATM switch is modeled as a finite buffer with capacity B (in cells), one server with service rate C, which is determined by the output link capacity. The service rate of the ATM switch is adjusted to obtain a desired level of load. We define the load of the network as the ratio of the aggregate arrival rate from all multiplexed sources to the service rate. A FIFO service discipline is assumed. The four different cell discarding mechanisms previously described, are used to decide the method in which the cells are added to the buffer based on their priority class. The thresholds used were set to 70% of the buffer size for PBS, and 70% and 30% of the buffer size for the triggered threshold mechanism. The buffer size was set to 500 cells. Input to the switch consists of several real prioritized MPEG-2 VBR video sources. The frames arrive at a rate of 30 frameskecond. The input video is smoothed in a frame so that the cells within a frame are uniformly spaced. It is assumed that video data in each frame are packetized into HP and LP ATM cells as described in the hierarchical packing. Three different video sequences are considered, namely: flower, tennis and mobile. They have been encoded using a Q factor equal to 20, and a breakpoint equal to 5.

136

C P S S

4 T r l g g a . d

To show the influence of the GOP (Group of Pictures) pattern on the clA1 losses, we study two multiplexing schemes. In the first scheme, called High Source Alignment, all sources start the GOP at the same time, i.e. all sources transmit their I, P and B frames during the same time intervals. Under the second alternative, referred as Low Source Alignment, the starting times of the GOPs of the N multiplexed streams are shifted by a time period equal to T a p , i.e., a minimum overlapping of the I-frames is ach eved. The two scenarios under consideration, high and low alignment synchronization, represent the worst and best case scenarios. The cell loss rates for any other scenario should lie between the ones provided from this study.

Links OC-3 1555111 Mbs L i n b OC-3 155.520 Mh Rcrl Video Sequence 1 111 Km 111 Km Video Dirplry 1

Priuritizcd Tirflic

Hcrl Vidrii Scqurnrr I . Prioritized Trartlr

\ / Cell Discarding Mechanism Implemented

Push-Out FIFO Push-Out LIFO

’ No CeNDzscarding

Video Display N Real Video Seaurnre N PBS Priuritizrd Trrme Triggered Threshold

Figure 15. ATM network simulation environment.

From the results shown in the figures 16 and 17, there are a number of obvious conclusions. First, when a number of VBR MPEG-2 video sources are fed into an ATM switch, the cell loss rates are very sensitive to the alignment between the GOP structures of the sources. Another important observation is the improvement in the cell loss performance when video streams are multiplexed with low source alignment and the deterioration in the cell loss performance when video streams are multiplexed with high source alignment. For the low source alignment scenario no Hp cell losses were reported for all CDMs (figure 16). This is due to the fact that under the low alignment scenario a minimum overlapping of the I-frames (Le. peak rates of different sources) is achieved. Moreover, since every video frame is split int3 a Hp and LP partition, during the transmission of every video frame HP cells are interleaved with LP cells and the CDMs can secure a better service to the HP cells by discarding the LP cells. Under the high source alignment scenario, figure 17 it is clear that all control schemes can effectively redu:e the loss rate for the high-priority cells as compared to the case when no cell discarding scheme is applied (no CDMs). Figures 16 and 17 also show that all schemes exhibit higher low-priority cell losses rates with respect to the no-CDM case. 11. is clear that the schemes manage to control the loss rates of high priority cells at the expense of throwing out low priority cell:;. Therefore, the perceptual impact of cell losses over the video quality using CDMs is lower than when no using any CDMs. Therefore, the CDMs offers better performance for the high priority cells demonstrating the effectiveness of the proposed protocol architecture.

1,OE-Ql

1,UE-UZ W

m U m m

.-

9 - - a, 0

I.OE.03

I J I E - O ~

Cel l L o s s Rate In HP Cells

Low Source Alignment. 5 Sources, Bulfsr S ize=500 cells

/ I

0.5 0.55 u,6 0,65 0,7 n,75 1i.n u.us u,9 0.95 1 Load

(a) Cel l L o s s Rate in L P Cells

Low Source Alignment. 5 Sources, Buffer S ize=500 cells 1 ,IIE+On

1,UE-I)l W

m a: VI VI

9 - - W 0

1.0E-02

Figure 18 shows the cell loss distributions in terms of loss burst length and distance between bursts for all system configurations under the high source alignment scenario and a system load of 0.85. Since error-loss concealment techniques regenerate most of the information by means of interpolation, an important measure when evaluating cell loss rates for applications particularly sensitive to the cell loss, like MPEG-2 video compressed applications, is the length of a loss burst and the distance between losses in high priority cells. Losses in burst and short distances between burst reduce the effectiveness of the proposed error concealment mechanisms. These results clearly demonstrate that all CDMs are able to reduce the number of high-priority cells back-to-back lost during periods of heavy load. As we will see in the following section, the effectiveness of the error concealment schemes can be severely diminished if cell losses are highly correlated. This is particular true when the receiving end reconstructs the missing information by using the

surrounding information. Most burst losses of high priority cells were shorter than five cells. Under all CDMs shorter burst length of high-priority cells were obtained as compared to the results obtained for the system configuration with no CDM in place. Similar conclusions can be obtained for the distance between bursts. It is clear that all CDMs present longer distance between bursts of high-priority cells compared to the results obtained for the system configuration with no CDM in place. Therefore, under the CDMs, the perceptual impact of cell losses over the video quality using CDMs is lower than when no using any CDMs demonstrating the effectiveness of the proposed protocol architecture.

C e l l Loss R a t e in HP Cells High Source Alignment 5 Sources Buffer Slze=500 cells

I .llb.01

I,UE-U2 0

m a: VI VI 0 _J

a,

I

- - 0

l,Ub~ll3

1 ,UE4l4

11s 115s 11.6 11.65 u.7 0.75 u.n 0.85 0.9 11.95 1 Load

Call L o s s R a t e in LP Cells High Source Alignment, 5 Sources, Buffer Slze=500 cells

(b)

Figure 17. Cell loss rate in ATM switches implementing CDMs in high source alignment. (a) HP cells. (b) LP cells.

137

CDF, High Source Ahgnmcnt 5 Sources, Buffcr Sizc=JOO ceb. L

a -

0.6 E

+ P B S

u s - 2 3

X=Lars Burst Lcngth in HP Cells (in cells)

(a) CDF. High Source Alignment

5 Sourccs Buffer Sac=500 celk, Lo.+d=O 8 5 I

[ : : I f , , , , , , *PO I WII

4 1 r i g i n r e d

01

02 U 5 I5 20 25 10 10

X=Dntancc Bctwccn Bursts m HP CeUs (m CcUsl

(b) Figure 18 Cell loss distnbuttons in ATM switches implementing

CDMs (a) Loss burst length (b) Distance Between Bursts.

VII. DYNAMIC ERROR CONCEALMENT ALGORITHM

Several error-loss concealment techniques have been proposed in the literature to restore the degradations produced in the transmission of the video signal [1][7][9][16]. The main purpose of error concealment schemes is to limit the impact of cell losses over the image quality. These schemes make use of the redundancy present in the image in the frequency, spatial and temporal domains. Under these schemes, part of the lost information can be recovered by regenerating most of it by means of interpolation in each of these domains. The use of error concealment techniques alone may provide acceptable results for cell loss probability not exceeding In the case of higher cell loss probabilities, the use of error concealment techniques together with hierarchical coding schemes are needed to obtain satisfactory results [ 5 ] [ 8 ] . In this section we propose an algorithm able to select the best error concealment

138

strategy according to the characteristics of the video sequence and particular picces of information being lost.

Once the effect of cell loss is limited by the schemes previously implemented, we propose a dynamic algorithm for error concealment in Hp cells. Different approaches have been proposed for in;brmation loss concealment in the domains of frequency, space and time [I]. We propose an algorithm to combine these methods using the most adequate technique for each situation. Spatial concealment presents the advantage of smooth and con:#istent edges, which is more favorable from the point of view of the human eye, but concealment in frequency and time can be more simply implemented than spatial concealment.

The most difficult scheme for a real time implementation is spatial Concealment. If the number of macroblocks to be recovered using a scheme in the space domain exceeds a specific threshold, it will be much unlikely to be able to perform all the operations within a frame period. However, in most cases not all lost mairoblocks will require to be recovered by a scheme operatiig in the space domain. For instance, a macroblock with little motion can be better recovered in the time domain. Moreover if the macroblock texture is not very fine or detailed, concealment in the frequency domain can be more than enough. The use of an error concealment scheme operating in the space domain will be required in the case that the macroblock to be recovered is placed in a very complex context, or exhibits high motion, or if no motion vectors are available.

DYNAMIC CONCEALMENT ALGORITHM

IF (I-PICTURE.)

IF (NU W-MB-LOST > Tluesholdl) FREQUENCY CONCEALMENT

IF (TEXTURE > Thmhold2)

ELSE FREQUENCY CONCEALMENT

SPATIAL CONCEALMENT " ELSE

1 IF (MOTION > Tlueshold3 OR NUM-MBJNTRA D Threshold4) SPATIAL CONCEALMENT I I ELSE TEMPORAL CONCEALMENT I

Figure 19. Dynamic concealment algorithm.

In the following, we define a dynamic algorithm which selects the most suitable error concealment domain according to the characteristics of the information to be recovered. The algorithm mainly makes use of the time domain as it is the most suitable in most of the cases. In those cases with scene change or with high motion, a detector at the image and macroblock levels is implemented. The detector is able to recognize these situations by using statistical distributions of motion vectors and

the number of intra macroblocks in the image. If the variance of motion vectors of an image does not exceed a specific threshold, this will mean that there is not a high level of motion in the image. If the number of intra macroblocks in the present inter- coded image does not exceed a specific threshold, this will indicate most likely the absence of a scene change. Temporal concealment will be used for these cases, otherwise the scheme will decide whether to use frequency or spatial concealment. If the number of macroblocks to be reconstructed is too high, since spatial concealment may cause delay problems, frequency concealment is used, on the contrary spatial concealment will be used (see figure 19).

The dynamic algorithm we propose provides better results than the use of a single scheme. We present quantitative results by means of MPQM video quality metric [2][3] for different error concealment techniques applied to the base layer of the hook and martin video sequences (see figure 20). The MPQM video quality metric has been proved to behave consistently with human judgements according to the quality scale that is often used for subjective testing in the engineering community [2][3] (see table 1).

Table 1. Quality scale.

These results show the efficiency of dynamic algorithm in comparison to others proposed in previous works. Since the hook sequence exhibits high motion, the detector at the macroblock level indicates, in most of the cases, that a spatial concealment scheme as the most suitable. For this reason, in this sequence, spatial concealment performs better than temporal concealment. In other side, martin sequence exhibits low motion, so the detector at the macroblock level indicates, in most of the cases, that a temporal concealment scheme as the most suitable. For this reason, in this sequence, temporal concealment performs better than spatial concealment. In both cases, the dynamic algorithm selects the most suitable error concealment domain according to the characteristics of the video sequence.

We also present subjective results for the flower garden video sequence. Figure 21 shows the effectiveness of the proposed dynamic algorithm. In this case, the area with the flowers has been concealed using a temporal-based scheme, because the motion of surrounding areas is low. However, the macroblocks lost in the surrounding areas of the tree has been concealed using a spatial-based scheme due to the higher level of motion. Figure 21 shows that the dynamic algorithm works very well.

139

4.5

4.25

4

3.15

3.5

2 3.25

E 3 z_ 2,15

& 2.25

A

m .= 2.5

2

I .75

I .5 1.25

1

Video Q u a l i t y u s i n g o n l y the B a s e Layer, M a r t i n Sequence

1 &OS 1.E-04 1,E-03 1.E-02

CLR (in HP Cells)

(4

Video Quality u s i n g o n l y the B a s e L a y e r , H o o k Sequence

3.5

3.25

2.15 li Temporal

1.5 1.25 original

1.E~O5 1.E-04 13-03 1.E-02

CLR (in HP Cells)

(b) Figure 20. Quantitative video quality comparison of different error concealment techniques. (a) Hook sequence. (b) Martin sequence.

Finally, figures 22 and 23 present the overall results in terms of the quality of the video (MPQM metric) when using the different control schemes introduced in this paper. From these results, it is clear that without adequate measures to control the propagation of the impairments the quality of the video sequence fall below acceptable levels, even at very low loads of the network. From the overall evaluation of the proposed protocol architecture, it is clear that the control mechanisms introduced herein effectively limit the degradation of the quality of MPEG-2 video streams when transmitted through an ATM network. Our global results also show that when using the Adaptive Hierarchical Coding Scheme and the Hierarchical Packing Scheme, the Dynamic Concealment may considerable improve the performance of the system, Le., the video quality.

In summary, our results show that the use of control mechanisms at various levels of the protocol stack is essential for the deployment of error-resilient video applications.

(b) ImaXe with 1% H P cell losses without (e) Temporal concealment.

(c) ImnRe with 1% HP cell losses with packing scheme proposed.

(1) Dynamic concealmmt.

Figure 21. Subjective video quality comparison for different error concealment techniques in Flower Garden sequence.

VIII. CONCLUSIONS

In this work, we have introduced an integrated set of protocol and error concealment mechanisms particularly suited to support MPEG-2 video communications. These mechanisms are used to limit the degradation of the quality of MPEG-2 video streams when transmitted through an ATM network in the presence of losses. Our results show that the use of control mechanisms at various levels of the protocol stack is essential for the deployment of error resilient video applications. The fact of avoiding the simultaneous transmission of I-frames by two or more video sources has a big impact on the system performance.

Acknowledgments We thank Mr. C. J. Van den Branden and Mr. 0. Verscheure for providing us with the quality of service metric (MPQM) to evaluate the quality of the video sequences. This work was supported by the Ministry of Education of Spain under CICYT project TIC97-0897-C04- 02 and the Natural Sciences and Engineering Council of Canada under grant number OGPIN 334.

I40

Total Video Quality, HSA, F l o w e r Garden Sequence

4,7: T I

Hwarchral Prclmg Schsmc ----nynam,c cancralmrnt

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Load

I.)

Total Video Quality, LSA, F l o w e r Garden Sequence

4;: 4 1-

2 2 s 0 2.25

1.75

I .25

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0,9S 1

Load (h)

Figure 22. Global video quality results in Flower Garden sequence. (a) High Source Alignment. (b) Low Source Alignment.

REFERENCES

[ I ] S.Aign and K.Fazel “Temporal & Spatial Error Concealment Techniques for Hierarchical MPEG-2 Video Coder” h o c . of IEEE lCC‘95. pp. 1778-1783.

[2] C. J. Van den Branden “A Working Spatio-Temporal Model of the Human Visual System for Image Restoration and Quality Assessment Applications”. Proc. of IEEE ICASSP’96. Atlanta, May 1996. , pp 2293-2296.

[3]C. J. Van den Branden and 0. Verscheure “Perceptual Quality Measure Using a Spatio-Temporal Model of Human Visual System”. Roc. of SPE, ~01.2668. San Jose, January 1996. pp 450-461.

[4] P. Cuenca, A. Garrido, F. Quiles and L. Orozco-Barbosa “Performance Evaluation of Cell Discarding Mechanisms for the Distribution of VBR MPEG-2 Video over ATM Networks”. IEEE Transactions on Broadcasting. Vol. 44.Num. 2. June 1998. pp-206-215.

[5] P. Cuenca, T.Olivares, A. Garrido, F. Quiles and L.Orozco-Barbosa, “Techniques to Increase MPEG-2 Error Resilience in the VBR Video Transmission over ATM Networks”, R o c of IEEE ICC’98. pp 869-873.

[6] P.Cuenca, L.Orozco-Barbosa, L.Wang, A.Garrido and F.Quiles. “Packing Scheme for Layered Coding MPEG-2 Video Transmission over ATM Based Networks”. Proc. IEEE ATM’97 Workshop. pp. 168-177.

[7] P.Cuenca, A.C;arrido, F.Quiles, L.Orozco-Barbosa, T.Olivares and M.Lozano. “Dynamic Error Concealment Technique for Hierarchical Coding MPEG-2 Video Transmission over ATM Based Networks”. Proc. of IEEE PACRIM’97. p p 912-915.

[SIP. Cuenca, A. Gmido, F. Quiles and L.Orozco-Barbosa, “Some Proposals to Improve Error :?.esilience in the MPEG-2 Video Transmission over ATM Networks”, Proc:. IEEE lNFOCOM’98 .pp 668-675.

Total V ideo Quality, H S A , M o b i l e Calendar Sequence

4.5 4.25

4 3.75

f 3.5 g 3 3 E 3

2 2.5 0 2.25

2 2,7S

2 1.75

1 ,5 1.25

1 1 -Cadmg Qual i ly

0.5 0.55 0,6 0.65 0.7 0.75 0,8 0,85 0.9 0.95 1 Load

I .)

Total Video Quality, LSA, M o b i l e Calendar Sequence

4,7: -I 4.25 4s m

4 - 1

z 3.5

e 3 3,25

Adaptive Hkrsrehicnl Coding Schcmc , . . . . . . - - - -Hirnlrchical P s e t i g Schrmr

-Codmg Qualily

Dynamic Cancrshrnl

1.75 1.5

1.25

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 I Load

(h)

Figure 23. Global video quality results in Mobile Calendar sequence. (a) High Source Alignment. (b) Low Source Alignment.

[9] M. Ghanbari and VSeferidis, “Cell-Loss Concealment in ATM Video Codecs”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 3, No. 3, June 1993, pp 238-247.

[lO]M. Ghanbari and C.J.Hughes “Packing Coded Video Signals into A T M Cells”, IEEE/ACM Transactions on Networking, Vol. 1, No. 5, October 1993, pp. 505-509.

[ 11]M. Ghanbari “Two-Layer Coding of Video Signals for VBR Networks” IEEE Journal on Selected Areas in Communications, Vol. 7, No. 5, June 1989, pp. 771-781.

[12]T-Y Huang and Y.H. Shu, “Performance Analysis of a New Cell Discarding Scheme in ATM Networks”, Proc. E E E 1CC97. pp. 205-209.

[ 13lM.R. Ismail, LE. Lambadaris, M.Devetsikiotis and A.R. Kaye “Modelling Prioritized MPEG Video Using TES and a Frame Spreading Strategy for Transmission in ATM Networks”. Proc. IEEE INFOCOM’95. pp. 762-769.

[I4]1SO/IEC 13818-2, ‘‘Generic Coding of Moving Picture and Associated Audio”, MPEG-2 Draft Intemational Standard, 1994.

[15]A.Y.M. Lin and J.A. Silvester, “Priority Queueing Strategies and Buffer Allocation Protocols for Traffic Control at an ATM Integrated Broadband Switching System”, IEEE Journal on Selected Areas in Communications, Vol. 9, No. 9, December 1991, pp. 1524-1536.

II61W. Luo and M. El Zarki “Analysis of Error Concealment Schemes for MPEG-2 Video Transmission over ATM based Networks”. Proc. of.SPIE’95 Vo1.2501. pp. 1358-1368.

[17]P. Pancha and M. El Zarki “ MPEG Coding for Variable Bit Rate Video Transmission”, IEEE Communications Magazine, Vol. 32. No. 5, May 1994, pp. 54-66.

[ISID. W.Petr and V.S.Frost, “Nested threshold cell discarding for ATM overload control : optimization under cell loss constrains”. Proc of E E E INFOCOM ’9. pp.1403-1412.

An efficient protocol architecture for error-resilient MPEG2 video communications over ATM networks

Documents

Transcript of An efficient protocol architecture for error-resilient MPEG2 video communications over ATM networks