Scalable Video Model 3.0 - ITU
-
Upload
khangminh22 -
Category
Documents
-
view
0 -
download
0
Transcript of Scalable Video Model 3.0 - ITU
T. Wiegand: Scalable Video Model 3.0
slid
e 2
Outline
§ Introduction
§ Motion-compensated temporal filtering (MCTF)
§ MCTF extension of H.264/MPEG-4 AVC
§ Spatial Scalability
§ SNR Scalability
§ Combined scalability
§ Summary
T. Wiegand: Scalable Video Model 3.0
slid
e 3
Introduction
§ Recent advances in MCTF-based video coding• Incorporation of motion compensation into the lifting steps of a
temporal filter bank• Lifting scheme is invertible• Motion compensation with any motion model possible to
incorporate– Variable block-size motion compensation– Sub-sample accurate motion vectors
§ Realization as an extension of H.264/MPEG-4 AVC• Highly efficient motion model of H.264/MPEG-4 AVC• Lifting steps are similar to motion compensation in B slices• Block-based residual coding• Open-loop structure of the analysis filter bank offers the
possibility to efficiently incorporate scalability
T. Wiegand: Scalable Video Model 3.0
slid
e 4
MCTF using the Lifting Representation
Obtaining the high-pass (prediction residual) picturesHk = S2k+1 - P(S2k)
Lk = S2k + U(S2k+1 - P(S2k)) = ½ S2k + U(S2k+1)Obtaining the low-pass pictures
U(P(s)) = s/2
1.When the update step is removed, the presented structure is the same as an open-loop version of a hybrid video codec
2.When the MC in P( ) and U( ) are not invertible against each other, the low-pass band may contain unwanted signal components
T. Wiegand: Scalable Video Model 3.0
slid
e 5
Motion model of H.264/MPEG-4 AVC
§ Variable block-size motion model
§ Quarter-sample accurate motion vectors
§ Reference picture selection for each macroblock partition
§ Selection of predictive and bi-predictive coding for each macroblockpartition
• Predictive: One list of reference pictures - mP0 and rP0
• Bi-predictive: Two lists of reference pictures - mP0, rP0, mP1, and rP1
16x16 16x88x16 8x8
8x8 8x44x8 4x4
Macroblock modes
Sub-macroblock modes
Direct
Direct
Intra
T. Wiegand: Scalable Video Model 3.0
slid
e 6
Prediction and Update operators
§ Adaptive switching between the lifting representations of the Haarand the 5/3 spline filters on a block-basis
§ Incorporation of multiple reference picture capabilities
§ Incorporation of intra macroblock modes
Identity transform
Haar wavelet
5/3 spline wavelet
Transform
20
Intra mode / no update00
Predictive / update 02
Bi-predictive / update11
Prediction / Updatew1w0
( )( ) )]1,[],[(],[
)]222,[]22,[(]2,[
11100041
11100021
−−+⋅+++⋅=
+++⋅+−+⋅=
UUUU
PPPP
rkHwrkHwkH
rkSwrkSwkSmxmxxU
mxmxxP
T. Wiegand: Scalable Video Model 3.0
slid
e 7
Derivation of Update Operators
§ Design goal: derive motion vectors and reference picture indicesthat can be input to H.264/MPEG-4 AVC motion compensation without having to change it at highest coding efficiency
§ For each 4x4 luma block B4x4 in the picture U(Hk), derive mU0, rU0, mU1, and rU1 as follows1. Evaluate all mP0 and mP1 that point into B4x4
2. Select those mP0 and mP1 that use maximum number of samples for reference out of B4x4
3. Set mU0 = –mP0 and mU1 = –mP1
4. Set rU0 and rU1 to point to those pictures into which MC is conducted using mP0 and mP1, respectively
5. Harmonize derived mU0, rU0, mU1, and rU1with H.264/MPEG-4 AVC syntax
B4x4
Picture Y
Picture X
Picture U(Hk)
T. Wiegand: Scalable Video Model 3.0
slid
e 8
Temporal Coding Structure
§Group of N0 input pictures partitioned into two sets: Set 1: NA (0 < NA < N0), Set 2: NB = N0 – NA§Set 1: pictures Ak, Set 2: pictures Bk§Pictures Hk are spatially shift-aligned with pictures Bk§Pictures Lk are spatially shift-aligned with pictures Ak
NA = NB NA = 2NB 2NA = NB
T. Wiegand: Scalable Video Model 3.0
slid
e 10
Impacts on H.264/MPEG-4 AVC
§ Syntax is not changed• High-pass pictures and motion fields are coded as B pictures• Low-pass pictures are coded as I or P pictures
§ Decoding process• Motion-compensated update with extended bit-depth• Derivation process for update motion fields
– Inversion of prediction motion fields– Block-wise motion compatible with the H.264/MPEG-4 AVC
• Without update: H.264/MPEG-4 AVC compatible coder with open-loop control
§ Encoding process• Lagrangian methods as in the H.264/MPEG-4 AVC test model• Cascading of quantization parameters according to the scaling
factors of the analysis filter bank
T. Wiegand: Scalable Video Model 3.0
slid
e 11
Simulation Results for the MCTF Extension
Mobile CIF 30Hz
26
27
28
29
30
31
32
33
34
35
36
37
0 192 384 576 768 960 1152 1344 1536 1728bit rate [kbit/s]
Y-P
SN
R [
dB
]
H.264/MPEG-4 AVC IBBP...
MCTF extension
MCTF extension w/o update
T. Wiegand: Scalable Video Model 3.0
slid
e 12
Simulation Results for the MCTF Extension
Football CIF 30Hz
27
28
29
30
31
32
33
34
35
36
37
38
39
40
0 256 512 768 1024 1280 1536 1792 2048 2304bit rate [kbit/s]
Y-P
SN
R [
dB
]
H.264/MPEG-4 AVC IBBP...
MCTF extension
MCTF extension w/o update
T. Wiegand: Scalable Video Model 3.0
slid
e 13
SNR-Scalable Extension
§ Layered representation of subband pictures• Residual coding of the quantization error between the original subband
pictures and their base layer reconstruction• Efficient for coarse grains of scalable SNR layers
Transform,Scal. / Quant.
EntropyCoding
Inv. Scaling ,Inv. Transform
MC / IntraPrediction
+
++
-
-
Transform,Scal. / Quant.
EntropyCoding
Inv. Scaling,Inv. Transform+-
Transform,Scal. / Quant.
EntropyCoding
SNR Base Layer (Layer 0)
SNR Enhancement Layer (Layer 1)
SNR Enhancement Layer (Layer N-1)
General
Decomposition
of a Group of
Pictures using
Motion-Comp.
Temporal
Filtering
(MCTF) ...
Group of Pictures
.
.
.
Tem
pora
l sub
band
Pic
ture
s
.
.
.
Transform,Scal. / Quant.
EntropyCoding
Inv. Scaling ,Inv. Transform
MC / IntraPrediction
+
++
-
-
Transform,Scal. / Quant.
EntropyCoding
Inv. Scaling,Inv. Transform+-
Transform,Scal. / Quant.
EntropyCoding
SNR Base Layer (Layer 0)
SNR Enhancement Layer (Layer 1)
SNR Enhancement Layer (Layer N-1)
General
Decomposition
of a Group of
Pictures using
Motion-Comp.
Temporal
Filtering
(MCTF) ...
Group of Pictures
.
.
.
Tem
pora
l sub
band
Pic
ture
s
.
.
.
T. Wiegand: Scalable Video Model 3.0
slid
e 14
Adjustment of the Quantization Parameter
§ Enhancement Layer§ Difference between original and reconstructed subband pictures
after decoding the base layer (or a previous enhancement layer)§ Difference pictures are coded using the residual picture syntax
§ Small performance losses if the quantization parameter is decreased by a value of 6 from one layer to the next§ Doubling of the bit-rate (approximately)
L3H3 H2 H1
Base LayerCoding
QPL3 QPH3 QPH2 QPH1
EnhancementLayer Coding
QPL3-∆ QPH3-∆ QPH2-∆ QPH1-∆
T. Wiegand: Scalable Video Model 3.0
slid
e 15
Impacts on H.264/MPEG-4 AVC
§ Syntax• Additional slice type for coding of residual pictures• Re-use of parts of the macroblock syntax: CBP, residual coding
§ Decoding process• Importing data from subordinate SNR layers• Motion-compensation after adding up the residual signals
§ Encoding process• Consider entire range of rate points• Carefully trade-off bit-rates for motion and residual data• Experience shows closed to single layer efficiency for all
supported bit-rates
T. Wiegand: Scalable Video Model 3.0
slid
e 16
Results for Layered SNR-ScalabilityMobile CIF 30Hz
26
27
28
29
30
31
32
33
34
35
36
37
0 192 384 576 768 960 1152 1344 1536 1728bit rate [kbit/s]
Y-P
SN
R [
dB
]
H.264/MPEG-4 AVC IBBP...
MCTF extension
SNR-scalable MCTF extension
T. Wiegand: Scalable Video Model 3.0
slid
e 17
Results for Layered SNR-ScalabilityFootball CIF 30Hz
2627282930313233
34353637383940
0 256 512 768 1024 1280 1536 1792 2048 2304bit rate [kbit/s]
Y-P
SN
R [
dB
]
H.264/MPEG-4 AVC IBBP...
MCTF extension
SNR-scalable MCTF extension
T. Wiegand: Scalable Video Model 3.0
slid
e 18
Spatial Scalability
§ Lowest spatial resolution is H.264/MPEG-4 AVC conforming: MCTF without update step
§ Oversampled pyramid with independent MCTF for each resolution: QCIF, CIF, 4CIF, 16CIF, ...
§ Switchable prediction mechanisms for conveying information from lower spatial layer to the next• Motion information prediction by defining two additional
macroblock modes and a switchable motion vector prediction• Prediction of residual information (high-pass signals) using the
upsampled residual signal of the lower resolution layer• Intra prediction using the reconstructed signal of the lower
resolution layer
T. Wiegand: Scalable Video Model 3.0
slid
e 19
Spatial Prediction of Data§ Upsample partitioning and motion vectors and use them for prediction
(keep list 0, list 1, bi-predictive and reference indices information)
§ Block-wise upsample residual picture using bi-linear filter
§ Upsample reconstructed intra macroblock using half-pel filter of H.264/MPEG-4 AVC
16x1616x16
16x1616x16
Intra-BLIntra-BL
Intra-BLIntra-BL16x16 16x8
8x16 8x8
8x8Direct,16x16,16x8,8x16
8x4
4x8 4x4Intra
T. Wiegand: Scalable Video Model 3.0
slid
e 20
Results for Layered Combined ScalabilityCity
30
31
32
33
34
35
36
37
38
39
QCIF 15Hz64 kbit/s
QCIF 15Hz128 kbit/s
CIF 30Hz256 kbit/s
CIF 30Hz512 kbit/s
4CIF 30Hz1024 kbit/s
4CIF 60Hz2048 kbit/s
Y-P
SN
R [
dB
]
H.264/MPEG-4 AVC IBBP...
SVM 3.0
T. Wiegand: Scalable Video Model 3.0
slid
e 21
Results for Layered Combined ScalabilityCrew
30
31
32
33
34
35
36
37
38
QCIF 15Hz96 kbit/s
QCIF 15Hz192 kbit/s
CIF 30Hz384 kbit/s
CIF 30Hz750 kbit/s
4CIF 30Hz1500 kbit/s
4CIF 60Hz3000 kbit/s
Y-P
SN
R [
dB
]
H.264/MPEG-4 AVC IBBP...
SVM 3.0
T. Wiegand: Scalable Video Model 3.0
slid
e 22
Fine Granular Scalability
§ Layered representation of subband pictures• Non-scalable base layer representation
– Motion information– Base layer representation of intra and residual data– Minimal acceptable reconstruction quality
• Quality scalable enhancement layer representations– Residue between original and base layer representation– FGS packets can be truncated at arbitrarily points– FGS packet: refinement signals corresponds to a bisection of
the quantization step size– Refinement signals are directly coded in the transform
domain– Single inverse transform at the decoder side
T. Wiegand: Scalable Video Model 3.0
slid
e 23
Fine Granular Scalability
§ Coding of enhancement layer packets• Re-quantization of transform coefficients (similar to Redmond
HHI proposal)• Usage of CABAC contexts as specified in H.264/MPEG-4 AVC
with minor modifications• 6 CABAC contexts added• Modification of the transmission order of transform coefficient
levels• Differentiation between significant and non-significant transform
coefficient level• Non-significant transform coefficient levels are basically coded
using CABAC• For significant transform coefficients, only a refinement symbol
(-1, 0, 1) is transmitted
T. Wiegand: Scalable Video Model 3.0
slid
e 24
Fine Granular Scalability
§ Three scan for coding of transform coefficients• 1. non-significant transform coefficients of significant blocks• 2. refinement symbols for already significant coefficients• 3. coefficients of non-significant blocks
§ Scanning pattern for each of the three scans• Scanning from low to high frequency bands (zig-zag)
– Mapping of 8x8 blocks onto four 4x4 blocks• Inside each band: first luma then chroma in raster scan
§ Coding of transform coefficient levels• Non-significant coefficients
– Coded Block Pattern, Coded Block Bit, (Delta QP, Transform Size)– CABAC symbols (SIG, LAST, SIGN, ABS)
• Significant coefficients Refinement symbol (-1, 0, +1)
T. Wiegand: Scalable Video Model 3.0
slid
e 25
Simulation Results: CityCity
30
31
32
33
34
35
36
37
38
39
0 256 512 768 1024 1280 1536 1792 2048bit-rate [kbit/s]
Y-P
SN
R [
dB
]
QCIF15
CIF7.5
CIF15CIF30
4CIF15
4CIF30
4CIF60Redmond test points
T. Wiegand: Scalable Video Model 3.0
slid
e 26
Simulation Results: CrewCrew
32
33
34
35
36
37
0 256 512 768 1024 1280 1536 1792 2048 2304 2560 2816 3072bit-rate [kbit/s]
Y-P
SN
R [
dB
]
QCIF 15HzCIF 7.5HzCIF 15HzCIF 30Hz4CIF 15Hz4CIF 30Hz4CIF 60HzRedmond test points
T. Wiegand: Scalable Video Model 3.0
slid
e 27
Summary
§ Straightforward MCTF extension of H.264/MPEG-4 AVC• Addition of motion-compensated update steps• Usage of H.264/MPEG-4 AVC tools as specified in the standard
§ Scalability• Layered representation of subband pictures• Adding of a slice type for coding residual signals• Layered SNR scalability or FGS• Usage of the residual coding tools of H.264/MPEG-4 AVC• One MC loop only when restricting prediction from base-layer
reconstructed signals to intra• Temporal scalability is inherently provided by the MCTF scheme• Spatial scalability using a oversampled pyramid structure and
independent temporal decomposition in each spatial layer with inter-layer prediction mechanisms