Scalable Video Model 3.0 - ITU

27
Scalable Video Model 3.0 Thomas Wiegand [email protected]

Transcript of Scalable Video Model 3.0 - ITU

Scalable Video Model 3.0

Thomas Wiegand

[email protected]

T. Wiegand: Scalable Video Model 3.0

slid

e 2

Outline

§ Introduction

§ Motion-compensated temporal filtering (MCTF)

§ MCTF extension of H.264/MPEG-4 AVC

§ Spatial Scalability

§ SNR Scalability

§ Combined scalability

§ Summary

T. Wiegand: Scalable Video Model 3.0

slid

e 3

Introduction

§ Recent advances in MCTF-based video coding• Incorporation of motion compensation into the lifting steps of a

temporal filter bank• Lifting scheme is invertible• Motion compensation with any motion model possible to

incorporate– Variable block-size motion compensation– Sub-sample accurate motion vectors

§ Realization as an extension of H.264/MPEG-4 AVC• Highly efficient motion model of H.264/MPEG-4 AVC• Lifting steps are similar to motion compensation in B slices• Block-based residual coding• Open-loop structure of the analysis filter bank offers the

possibility to efficiently incorporate scalability

T. Wiegand: Scalable Video Model 3.0

slid

e 4

MCTF using the Lifting Representation

Obtaining the high-pass (prediction residual) picturesHk = S2k+1 - P(S2k)

Lk = S2k + U(S2k+1 - P(S2k)) = ½ S2k + U(S2k+1)Obtaining the low-pass pictures

U(P(s)) = s/2

1.When the update step is removed, the presented structure is the same as an open-loop version of a hybrid video codec

2.When the MC in P( ) and U( ) are not invertible against each other, the low-pass band may contain unwanted signal components

T. Wiegand: Scalable Video Model 3.0

slid

e 5

Motion model of H.264/MPEG-4 AVC

§ Variable block-size motion model

§ Quarter-sample accurate motion vectors

§ Reference picture selection for each macroblock partition

§ Selection of predictive and bi-predictive coding for each macroblockpartition

• Predictive: One list of reference pictures - mP0 and rP0

• Bi-predictive: Two lists of reference pictures - mP0, rP0, mP1, and rP1

16x16 16x88x16 8x8

8x8 8x44x8 4x4

Macroblock modes

Sub-macroblock modes

Direct

Direct

Intra

T. Wiegand: Scalable Video Model 3.0

slid

e 6

Prediction and Update operators

§ Adaptive switching between the lifting representations of the Haarand the 5/3 spline filters on a block-basis

§ Incorporation of multiple reference picture capabilities

§ Incorporation of intra macroblock modes

Identity transform

Haar wavelet

5/3 spline wavelet

Transform

20

Intra mode / no update00

Predictive / update 02

Bi-predictive / update11

Prediction / Updatew1w0

( )( ) )]1,[],[(],[

)]222,[]22,[(]2,[

11100041

11100021

−−+⋅+++⋅=

+++⋅+−+⋅=

UUUU

PPPP

rkHwrkHwkH

rkSwrkSwkSmxmxxU

mxmxxP

T. Wiegand: Scalable Video Model 3.0

slid

e 7

Derivation of Update Operators

§ Design goal: derive motion vectors and reference picture indicesthat can be input to H.264/MPEG-4 AVC motion compensation without having to change it at highest coding efficiency

§ For each 4x4 luma block B4x4 in the picture U(Hk), derive mU0, rU0, mU1, and rU1 as follows1. Evaluate all mP0 and mP1 that point into B4x4

2. Select those mP0 and mP1 that use maximum number of samples for reference out of B4x4

3. Set mU0 = –mP0 and mU1 = –mP1

4. Set rU0 and rU1 to point to those pictures into which MC is conducted using mP0 and mP1, respectively

5. Harmonize derived mU0, rU0, mU1, and rU1with H.264/MPEG-4 AVC syntax

B4x4

Picture Y

Picture X

Picture U(Hk)

T. Wiegand: Scalable Video Model 3.0

slid

e 8

Temporal Coding Structure

§Group of N0 input pictures partitioned into two sets: Set 1: NA (0 < NA < N0), Set 2: NB = N0 – NA§Set 1: pictures Ak, Set 2: pictures Bk§Pictures Hk are spatially shift-aligned with pictures Bk§Pictures Lk are spatially shift-aligned with pictures Ak

NA = NB NA = 2NB 2NA = NB

T. Wiegand: Scalable Video Model 3.0

slid

e 9

Temporal Decomposition Structure

GOP boundary

T. Wiegand: Scalable Video Model 3.0

slid

e 10

Impacts on H.264/MPEG-4 AVC

§ Syntax is not changed• High-pass pictures and motion fields are coded as B pictures• Low-pass pictures are coded as I or P pictures

§ Decoding process• Motion-compensated update with extended bit-depth• Derivation process for update motion fields

– Inversion of prediction motion fields– Block-wise motion compatible with the H.264/MPEG-4 AVC

• Without update: H.264/MPEG-4 AVC compatible coder with open-loop control

§ Encoding process• Lagrangian methods as in the H.264/MPEG-4 AVC test model• Cascading of quantization parameters according to the scaling

factors of the analysis filter bank

T. Wiegand: Scalable Video Model 3.0

slid

e 11

Simulation Results for the MCTF Extension

Mobile CIF 30Hz

26

27

28

29

30

31

32

33

34

35

36

37

0 192 384 576 768 960 1152 1344 1536 1728bit rate [kbit/s]

Y-P

SN

R [

dB

]

H.264/MPEG-4 AVC IBBP...

MCTF extension

MCTF extension w/o update

T. Wiegand: Scalable Video Model 3.0

slid

e 12

Simulation Results for the MCTF Extension

Football CIF 30Hz

27

28

29

30

31

32

33

34

35

36

37

38

39

40

0 256 512 768 1024 1280 1536 1792 2048 2304bit rate [kbit/s]

Y-P

SN

R [

dB

]

H.264/MPEG-4 AVC IBBP...

MCTF extension

MCTF extension w/o update

T. Wiegand: Scalable Video Model 3.0

slid

e 13

SNR-Scalable Extension

§ Layered representation of subband pictures• Residual coding of the quantization error between the original subband

pictures and their base layer reconstruction• Efficient for coarse grains of scalable SNR layers

Transform,Scal. / Quant.

EntropyCoding

Inv. Scaling ,Inv. Transform

MC / IntraPrediction

+

++

-

-

Transform,Scal. / Quant.

EntropyCoding

Inv. Scaling,Inv. Transform+-

Transform,Scal. / Quant.

EntropyCoding

SNR Base Layer (Layer 0)

SNR Enhancement Layer (Layer 1)

SNR Enhancement Layer (Layer N-1)

General

Decomposition

of a Group of

Pictures using

Motion-Comp.

Temporal

Filtering

(MCTF) ...

Group of Pictures

.

.

.

Tem

pora

l sub

band

Pic

ture

s

.

.

.

Transform,Scal. / Quant.

EntropyCoding

Inv. Scaling ,Inv. Transform

MC / IntraPrediction

+

++

-

-

Transform,Scal. / Quant.

EntropyCoding

Inv. Scaling,Inv. Transform+-

Transform,Scal. / Quant.

EntropyCoding

SNR Base Layer (Layer 0)

SNR Enhancement Layer (Layer 1)

SNR Enhancement Layer (Layer N-1)

General

Decomposition

of a Group of

Pictures using

Motion-Comp.

Temporal

Filtering

(MCTF) ...

Group of Pictures

.

.

.

Tem

pora

l sub

band

Pic

ture

s

.

.

.

T. Wiegand: Scalable Video Model 3.0

slid

e 14

Adjustment of the Quantization Parameter

§ Enhancement Layer§ Difference between original and reconstructed subband pictures

after decoding the base layer (or a previous enhancement layer)§ Difference pictures are coded using the residual picture syntax

§ Small performance losses if the quantization parameter is decreased by a value of 6 from one layer to the next§ Doubling of the bit-rate (approximately)

L3H3 H2 H1

Base LayerCoding

QPL3 QPH3 QPH2 QPH1

EnhancementLayer Coding

QPL3-∆ QPH3-∆ QPH2-∆ QPH1-∆

T. Wiegand: Scalable Video Model 3.0

slid

e 15

Impacts on H.264/MPEG-4 AVC

§ Syntax• Additional slice type for coding of residual pictures• Re-use of parts of the macroblock syntax: CBP, residual coding

§ Decoding process• Importing data from subordinate SNR layers• Motion-compensation after adding up the residual signals

§ Encoding process• Consider entire range of rate points• Carefully trade-off bit-rates for motion and residual data• Experience shows closed to single layer efficiency for all

supported bit-rates

T. Wiegand: Scalable Video Model 3.0

slid

e 16

Results for Layered SNR-ScalabilityMobile CIF 30Hz

26

27

28

29

30

31

32

33

34

35

36

37

0 192 384 576 768 960 1152 1344 1536 1728bit rate [kbit/s]

Y-P

SN

R [

dB

]

H.264/MPEG-4 AVC IBBP...

MCTF extension

SNR-scalable MCTF extension

T. Wiegand: Scalable Video Model 3.0

slid

e 17

Results for Layered SNR-ScalabilityFootball CIF 30Hz

2627282930313233

34353637383940

0 256 512 768 1024 1280 1536 1792 2048 2304bit rate [kbit/s]

Y-P

SN

R [

dB

]

H.264/MPEG-4 AVC IBBP...

MCTF extension

SNR-scalable MCTF extension

T. Wiegand: Scalable Video Model 3.0

slid

e 18

Spatial Scalability

§ Lowest spatial resolution is H.264/MPEG-4 AVC conforming: MCTF without update step

§ Oversampled pyramid with independent MCTF for each resolution: QCIF, CIF, 4CIF, 16CIF, ...

§ Switchable prediction mechanisms for conveying information from lower spatial layer to the next• Motion information prediction by defining two additional

macroblock modes and a switchable motion vector prediction• Prediction of residual information (high-pass signals) using the

upsampled residual signal of the lower resolution layer• Intra prediction using the reconstructed signal of the lower

resolution layer

T. Wiegand: Scalable Video Model 3.0

slid

e 19

Spatial Prediction of Data§ Upsample partitioning and motion vectors and use them for prediction

(keep list 0, list 1, bi-predictive and reference indices information)

§ Block-wise upsample residual picture using bi-linear filter

§ Upsample reconstructed intra macroblock using half-pel filter of H.264/MPEG-4 AVC

16x1616x16

16x1616x16

Intra-BLIntra-BL

Intra-BLIntra-BL16x16 16x8

8x16 8x8

8x8Direct,16x16,16x8,8x16

8x4

4x8 4x4Intra

T. Wiegand: Scalable Video Model 3.0

slid

e 20

Results for Layered Combined ScalabilityCity

30

31

32

33

34

35

36

37

38

39

QCIF 15Hz64 kbit/s

QCIF 15Hz128 kbit/s

CIF 30Hz256 kbit/s

CIF 30Hz512 kbit/s

4CIF 30Hz1024 kbit/s

4CIF 60Hz2048 kbit/s

Y-P

SN

R [

dB

]

H.264/MPEG-4 AVC IBBP...

SVM 3.0

T. Wiegand: Scalable Video Model 3.0

slid

e 21

Results for Layered Combined ScalabilityCrew

30

31

32

33

34

35

36

37

38

QCIF 15Hz96 kbit/s

QCIF 15Hz192 kbit/s

CIF 30Hz384 kbit/s

CIF 30Hz750 kbit/s

4CIF 30Hz1500 kbit/s

4CIF 60Hz3000 kbit/s

Y-P

SN

R [

dB

]

H.264/MPEG-4 AVC IBBP...

SVM 3.0

T. Wiegand: Scalable Video Model 3.0

slid

e 22

Fine Granular Scalability

§ Layered representation of subband pictures• Non-scalable base layer representation

– Motion information– Base layer representation of intra and residual data– Minimal acceptable reconstruction quality

• Quality scalable enhancement layer representations– Residue between original and base layer representation– FGS packets can be truncated at arbitrarily points– FGS packet: refinement signals corresponds to a bisection of

the quantization step size– Refinement signals are directly coded in the transform

domain– Single inverse transform at the decoder side

T. Wiegand: Scalable Video Model 3.0

slid

e 23

Fine Granular Scalability

§ Coding of enhancement layer packets• Re-quantization of transform coefficients (similar to Redmond

HHI proposal)• Usage of CABAC contexts as specified in H.264/MPEG-4 AVC

with minor modifications• 6 CABAC contexts added• Modification of the transmission order of transform coefficient

levels• Differentiation between significant and non-significant transform

coefficient level• Non-significant transform coefficient levels are basically coded

using CABAC• For significant transform coefficients, only a refinement symbol

(-1, 0, 1) is transmitted

T. Wiegand: Scalable Video Model 3.0

slid

e 24

Fine Granular Scalability

§ Three scan for coding of transform coefficients• 1. non-significant transform coefficients of significant blocks• 2. refinement symbols for already significant coefficients• 3. coefficients of non-significant blocks

§ Scanning pattern for each of the three scans• Scanning from low to high frequency bands (zig-zag)

– Mapping of 8x8 blocks onto four 4x4 blocks• Inside each band: first luma then chroma in raster scan

§ Coding of transform coefficient levels• Non-significant coefficients

– Coded Block Pattern, Coded Block Bit, (Delta QP, Transform Size)– CABAC symbols (SIG, LAST, SIGN, ABS)

• Significant coefficients Refinement symbol (-1, 0, +1)

T. Wiegand: Scalable Video Model 3.0

slid

e 25

Simulation Results: CityCity

30

31

32

33

34

35

36

37

38

39

0 256 512 768 1024 1280 1536 1792 2048bit-rate [kbit/s]

Y-P

SN

R [

dB

]

QCIF15

CIF7.5

CIF15CIF30

4CIF15

4CIF30

4CIF60Redmond test points

T. Wiegand: Scalable Video Model 3.0

slid

e 26

Simulation Results: CrewCrew

32

33

34

35

36

37

0 256 512 768 1024 1280 1536 1792 2048 2304 2560 2816 3072bit-rate [kbit/s]

Y-P

SN

R [

dB

]

QCIF 15HzCIF 7.5HzCIF 15HzCIF 30Hz4CIF 15Hz4CIF 30Hz4CIF 60HzRedmond test points

T. Wiegand: Scalable Video Model 3.0

slid

e 27

Summary

§ Straightforward MCTF extension of H.264/MPEG-4 AVC• Addition of motion-compensated update steps• Usage of H.264/MPEG-4 AVC tools as specified in the standard

§ Scalability• Layered representation of subband pictures• Adding of a slice type for coding residual signals• Layered SNR scalability or FGS• Usage of the residual coding tools of H.264/MPEG-4 AVC• One MC loop only when restricting prediction from base-layer

reconstructed signals to intra• Temporal scalability is inherently provided by the MCTF scheme• Spatial scalability using a oversampled pyramid structure and

independent temporal decomposition in each spatial layer with inter-layer prediction mechanisms