Low-Complexity Forward Error Correction and Modulation for ...

133
Low-Complexity Forward Error Correction and Modulation for Optical Communication by Masoud Barakatain A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto © Copyright 2021 by Masoud Barakatain

Transcript of Low-Complexity Forward Error Correction and Modulation for ...

Low-Complexity Forward Error Correction andModulation for Optical Communication

by

Masoud Barakatain

A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

© Copyright 2021 by Masoud Barakatain

Abstract

Low-Complexity Forward Error Correction and Modulation for Optical Communication

Masoud Barakatain

Doctor of Philosophy

Graduate Department of Electrical and Computer Engineering

University of Toronto

2021

A novel low-complexity architecture for forward error correction (FEC) in optical com-

munication is proposed. The architecture consists of an inner soft-decision low-density

parity check (LDPC) code concatenated with an outer hard-decision staircase or zipper

code. The inner code is tasked with reducing the bit error probability below the level

that allows the outer code to deliver on the stringent output bit error rate required in

optical communication. A hardware-friendly quasi-cyclic construction is adopted for the

inner codes.

The concatenated code is optimized by minimizing the estimated data-flow at the

decoder. A method is developed to obtain complexity-optimized inner-code ensembles.

A key feature emerging from this optimization is that it pays to leave some inner codeword

bits completely uncoded, thereby greatly reducing the decoding complexity. The trade-off

between performance and complexity of the designed codes is characterized by a Pareto

frontier. In binary modulation, up to 71% reduction in complexity is achieved compared

to previously existing designs.

Higher-order modulation via multilevel coding (MLC) is compared with bit-interleaved

coded modulation (BICM) from a performance-versus-complexity standpoint. In both

approaches, complexity-optimized error-reducing LDPC inner codes are designed for con-

catenation with an outer hard-decision code, for various modulation orders. Code designs

for MLC are shown to provide significant advantages relative to designs for BICM over the

entire performance-complexity tradeoff space, for a range of modulation orders. Codes

ii

designed for MLC can operate with 78% less complexity, or provide up to 1.2 dB coding

gain compared to designs for BICM.

A multi-rate and channel-adaptive inner-code architecture is also proposed. A tool is

developed to optimize low-complexity rate- and channel-configurable concatenated FEC

schemes via an MLC architecture. Compared to previously existing FEC schemes, up to

63% reduction in decoding complexity, or up to 0.6 dB coding gain is obtained.

Code designs for MLC in combination with four-dimensional signal constellations are

also considered. The design method is generalized to obtain complexity-optimized non-

binary LDPC codes to concatenate with outer zipper codes. Gains of up to 1 dB over

the conventional schemes are reported.

The possibility of using a novel class of nonlinear codes in FEC design is also inves-

tigated.

iii

To my wonderful family, Amir, Ziba, and Maryam,

and to the love of my life, Zhino.

In memory of Arash and all the precious lives lost

in the downing of flight PS752.

iv

Acknowledgements

I would like to express my sincere gratitude to my supervisor, Prof. Frank R. Kschischang

for his support and guidance throughout my studies. Frank is an amazing scientist with

an in-depth knowledge in various fields of research. Without his brilliant insights and

ideas, his encouraging words, and his patience with me, this work would have never been

possible. He is an excellent teacher, a great mentor, and a wonderful human being. He

has been and will always be a source of inspiration in my life and in my scientific work.

It has been an honour to be his student.

I would like to acknowledge the following colleagues for their contribution to this

work. I thank Georg Bocherer and Diego Lentner with whom Frank and I collaborated

in a fruitful project, some results of which are presented in Chapter 4 of this thesis.

I appreciate Alvin Sukmadji’s work on zipper codes and his insights in designing the

concatenation schemes presented in this work. I thank Felix Frey and Sebastian Stern

for the collaborative work that resulted in Chapter 6 of this thesis. I also thank Yury

Polyanskiy and Hajir Roozbehani for the discussions on non-linear codes that helped us

in development of the work presented in Chapter 7 of this thesis.

I would also like to thank Prof. Laurent Schmalen and Dr. Vahid Aref who were

my supervisors when I was an intern at Nokia Bell Labs in Stuttgart, Germany. Their

knowledge and insight helped me a lot in broadening my understanding of this field.

I would like to thank the following friends and colleagues. I am grateful of my seniors,

Christian, Lei, Chunpo, Chris and Siddarth, for welcoming me to this group and helping

me settle and get started with my research. I thank Amir, Reza, Foad, and Kaveh, my

colleagues during most of my studies, for their friendship and the interesting scientific

discussions. I also thank Susanna, Bo, Qun, Saber, and Mohannad, my newer colleagues,

for keeping the office a warm and welcoming place to work. Thank you all for creating a

very positive and collaborative work environment over the years.

I have been blessed in life with having many wonderful friends. I would like to thank

Atena and Saman, two of my oldest friends, who were my support system when I came to

Canada. To them and to all my great friends, including Soheil, Amirreza, Alborz, Rozhin,

Peter, Sajjad, Amin, and Shiva: you should know that your support and encouragement

was worth more than I can express on paper.

Finally, I would like to thank my amazing family. I am grateful of my parents, Amir

and Ziba, for providing me with the best possible care and education every step of the

way. I thank My sister, Maryam, for her kindness and support. And I wholeheartedly

thank the love of my life, my best friend, Zhino, for her patience, encouragement, and

unconditional love; I could not have asked for a better partner on this journey.

v

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Assumptions and Figures of Merit . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.2 Figures of Merit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Background 7

2.1 Staircase Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Zipper Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 LDPC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.1 EXIT Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.2 Quasi-Cyclic LDPC Codes . . . . . . . . . . . . . . . . . . . . . . 12

2.4 Code Concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Coded Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5.1 Multi-Level Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5.2 Bit-Interleaved Coded Modulation . . . . . . . . . . . . . . . . . . 16

3 Low-Complexity Concatenated LDPC-Staircase Codes 17

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2 The Inner-Code Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.1 Code Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.2 Ensemble Parameterization . . . . . . . . . . . . . . . . . . . . . 19

3.2.3 Complexity Measure . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3 Complexity-optimized Design . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3.1 EXIT chart analysis . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3.2 Code Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3.3 Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . 26

3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

vi

3.4.1 Pareto Frontier . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.4.2 Two Design Examples . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4.3 Comparison to Other Works . . . . . . . . . . . . . . . . . . . . . 31

3.4.4 Quasi-Cyclic-Structured Inner Codes . . . . . . . . . . . . . . . . 32

3.4.5 Concatenated LDPC-Zipper Structure . . . . . . . . . . . . . . . 34

3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Low-Complexity Concatenated FEC for Higher-Order Modulation 37

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2 Concatenated Code Description . . . . . . . . . . . . . . . . . . . . . . . 38

4.3 MLC Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3.1 Coded-Modulation Description . . . . . . . . . . . . . . . . . . . 39

4.3.2 Inner-Code Description . . . . . . . . . . . . . . . . . . . . . . . . 41

4.3.3 Ensemble Optimization . . . . . . . . . . . . . . . . . . . . . . . . 41

4.4 BICM Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.4.1 Coded-Modulation Description . . . . . . . . . . . . . . . . . . . 44

4.4.2 Inner-Code Description . . . . . . . . . . . . . . . . . . . . . . . . 45

4.4.3 Ensemble Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.4.4 Ensemble Optimization . . . . . . . . . . . . . . . . . . . . . . . . 48

4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.5.1 Design for 28% OH . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.5.2 Design for 25% OH . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.5.3 Design Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5 Low-Complexity Rate- and Channel-Configurable Concatenated Codes 62

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.2 Concatenated Code Description . . . . . . . . . . . . . . . . . . . . . . . 64

5.3 Inner-Code Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.4 Ensemble Optimization and Code Construction . . . . . . . . . . . . . . 67

5.4.1 Reference Complexities . . . . . . . . . . . . . . . . . . . . . . . . 67

5.4.2 Configurable Inner-Code Optimization . . . . . . . . . . . . . . . 68

5.4.3 Code Optimization Via Differential Evolution . . . . . . . . . . . 70

5.4.4 Code Construction . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

vii

6 Complexity-Optimized Non-Binary Coded Modulation for Four-Dimensional

Constellations 78

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.2 Four-Dimensional Signal Constellations . . . . . . . . . . . . . . . . . . . 79

6.3 Four-Dimensional Set-Partitioning . . . . . . . . . . . . . . . . . . . . . . 81

6.4 Concatenated Non-Binary FEC Architecture . . . . . . . . . . . . . . . . 82

6.5 Ensemble Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.5.1 Empirical Density Evolution . . . . . . . . . . . . . . . . . . . . . 83

6.5.2 BER Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.5.3 Differential Evolution . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7 Low-Density Nonlinear-Check Codes 90

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

7.2 ERC Limit and Nonlinear Codes . . . . . . . . . . . . . . . . . . . . . . 91

7.3 LDNC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7.3.1 Code Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7.3.2 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7.3.3 Message-Passing Decoding . . . . . . . . . . . . . . . . . . . . . . 95

7.3.4 Efficient Message Computation . . . . . . . . . . . . . . . . . . . 97

7.4 Error-Reducing Performance Results . . . . . . . . . . . . . . . . . . . . 98

7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

8 Conclusion and Topics of Future Research 101

Bibliography 104

viii

List of Tables

3.1 Quantifying Finite Interleaving Loss . . . . . . . . . . . . . . . . . . . . . 27

4.1 An example of degree distributions of various types, for m = 3. . . . . . . 46

4.2 Statistics of the simulation results shown in Fig. 4.5.3 (14 inner iterations) 60

4.3 Statistics of the simulation results shown in Fig. 4.5.3 (12 inner iterations) 60

6.1 Bit-level capacities of the 4D constellations at their respective operating

points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.1 List of all distinct check functions for dc = 7. . . . . . . . . . . . . . . . . 95

ix

List of Figures

2.1 Staircase code structure. Information bits fill the white part of the blocks

and the parity bits fill the rest. The block B0 is initialized with all-zeros. 8

2.2 A typical zipper-code framework (left) and its representation of a staircase

code (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 The two common approaches to coded modulation. . . . . . . . . . . . . 15

3.1 Tanner graph of an LDPC inner code, consisting of some degree-zero vari-

able nodes (uncoded components) and a coded component. The rectangle

labeled by Π represents an edge permutation. The VN and CN degree

distributions are to be designed. . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 The elementary EXIT functions used for designing a rate-8/9 code en-

semble in Section 3.4.2, Example 1. The EXIT function of the resulting

optimized ensemble is compared with that of the (3, 27)-regular LDPC

ensemble. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3 The (Es/N0, ηin) Pareto frontiers of the inner code in the proposed design,

compared with the benchmark design of [1], at 15%, 20%, and 25% OHs. 28

3.4 Simulated inner-code BERs on bits passed to the outer code, sampled from

the complexity-optimized ensembles, for designs at 20% OH. The mid-

point on each BER curve (highlighted by an ‘o’) is the code operational

point, i.e, the SNR for which the inner code is designed to achieve Pout ≤ psc. 29

3.5 The BER on information nodes of different degrees in the ensemble of

Example 1 and the BER on bits passed to the outer code, denoted by Pout.

The degree distribution on the information nodes is 0.1665 + 0.0223x +

0.3919x3 + 0.4193x4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

x

3.6 NCG and η comparisons of the proposed concatenated design and other

soft-decision FEC schemes, at 20% OH. Decoders using a flooding (resp.,

layered) decoding schedule are denoted with Fl (resp. La). For the pro-

posed codes (denoted as “prop.”), the inner decoding algorithm (MS or

SP) is specified. Block length 30000 is considered for the designs with QC-

structured inner codes. The following abbreviations are used in describing

the referenced codes. BCH: Bose—Ray-Chaudhuri—Hocquenghem, UEP:

Unequal Error Protection, RS: Reed-Solomon, CC: Convolutional Code,

SpC: Spatially Coupled. . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.7 NCG and η comparisons of the QC constructions of the designed concate-

nated FEC, at 20% OH, under layered (La) and flooding (Fl) schedules. . 33

3.8 The (Es/N0, ηin) Pareto frontiers of the designed concatenated LDCP-

zipper FEC, at 20% OH, compared with the LDCP-staircase design and

the benchmark design of [1]. . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.9 NCG and η comparisons of the proposed concatenated design and other

soft-decision FEC schemes, at 20% OH. The concatenated design with an

outer zipper code, the NCG-η Pareto frontier of which is the top left curve,

outperforms other designs by a wide margin. . . . . . . . . . . . . . . . . 35

4.1 The encoder and the decoder in the MLC scheme. Here, m = log2M

denotes the number of bits per PAM symbol. . . . . . . . . . . . . . . . . 39

4.2 The encoder and the decoder in the BICM scheme. Here, m = log2M

denotes the number of bits per PAM symbol. . . . . . . . . . . . . . . . . 40

4.3 Inner-code ensemble considered for the BICM scheme . . . . . . . . . . . 45

4.4 Performance-complexity comparisons of optimized codes for MLC and

BICM using 64-QAM, compared with the design in [2]. The number of

decoding iterations required by each designed code is indicated. At the

overall 28% OH of these schemes, CSL = 15.0 dB. . . . . . . . . . . . . . 52

4.5 Simulated decoder outputs of inner codes for designs at 28% OH with 64-

QAM. The mid-point on each BER curve (highlighted by an ‘o’) is the

code operational point, i.e, the SNR for which the inner code is designed

to achieve Pout ≤ P tout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.6 Performance-complexity comparisons of the obtained optimized codes for

MLC and the BICM of various orders at 25% overall OH, compared with

the designs in [3]. The number of decoding iterations required by each

designed code is indicated. . . . . . . . . . . . . . . . . . . . . . . . . . . 54

xi

4.7 Achievable information rate for 16- and 64-QAM modulations compared

to the unconstrained Shannon capacity. The operational point of the de-

signed concatenated code is also shown and compared to that of [4]. . . . 56

4.8 The interleaving and placement of bits into the real buffer of the outer

decoder per chunk, for the FEC parameters discussed in Sec 4.5.3. . . . . 58

4.9 BER simulations for the designed concatenate LDPC-zipper FEC scheme. 59

5.1 The encoder and the decoder in the configurable FEC scheme. Here,

m = log2M denotes the number of bits per PAM symbol. . . . . . . . . . 64

5.2 Designed configurable FEC schemes, denoted by the connected marks.

Each mark is an operating point and its complexity score is indicated on

its label. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.3 Performance-versus-complexity comparison between the designed config-

urable FEC schemes and those of [5]. The FEC rate, in bits per symbol,

of each operating point is indicated on its label. . . . . . . . . . . . . . . 76

6.1 Constellation-constrained capacities in bit/symbol versus the SNR. The

inset shows the 2D projection of the corresponding signal constellations. . 80

6.2 Illustration of first (left) and second (right) partitioning steps of the D4-

based constellations in one polarization. . . . . . . . . . . . . . . . . . . 81

6.3 The proposed concatenated FEC architecture for DP transmission over

the AWGN channel. The alphabet field sizes are denoted below their

corresponding stages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.4 The BER on bits passed to the outer code. The constellation capacities

are indicated by the vertical lines. The horizontal line denotes the outer-

code threshold. Here, the solid curves denote the non-binary designs and

the dotted and dashed curves denote their binary counterparts: TS-BICM

indicates the performance two-stage BICM-based scheme of [6] and 1D-

MLC indicates performance of the scheme of Sec. 4.3. . . . . . . . . . . . 88

7.1 The block diagram of a scheme that achieves the ERC Limit. . . . . . . . 91

7.2 Factor graph representation of an LDNC ensemble. Information- and

check-node degrees are denoted by dv and dc, respectively. Here, a degree-

dc CN is connected to dc information nodes. The rectangle labelled Π

represents an edge permutation. . . . . . . . . . . . . . . . . . . . . . . . 93

7.3 Encoding operation an a nonlinear check node. . . . . . . . . . . . . . . . 94

xii

7.4 A typical CN of degree dc. Node y is set to denote the function the CN

performs on the information nodes. . . . . . . . . . . . . . . . . . . . . . 95

7.5 Binary computation tree for obtaining q(x) when dc = 7. The messages

are passed from the bottom up. . . . . . . . . . . . . . . . . . . . . . . . 97

7.6 BER curves in regular ensemble with dv = 4 and dc = 7 with various

check functions, plotted versus number of decoding iterations. The codes

are simulated at 0.5 dB above their (error-free) constrained Shannon limit. 98

7.7 BER curves of regular dv = 4, dc = 7 LDNC ensembles with three check

functions, plotted in a wide range of SNRs. All decoders perform 4 de-

coding iterations. The error-free constrained Shannon limit is at 1.92 dB

SNR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

xiii

List of acronyms

2D two-dimensional

4D four-dimensional

AIR achievable information rate

APP a posteriori probability

AWGN additive white Gaussian noise

BCH Bose-Chaudhuri-Hocquenghem

BER bit error rate

BICM bit-interleaved coded modulation

BRGC binary reflected Gray code

CN check node

CSL constrained Shannon limit

DP-QAM dual-polarization quadrature amplitude modulation

DP dual-polarization

DPS degree partition and sort

ERC error-reducing code

EXIT extrinsic information transfer

FEC forward error correction

HD hard-decision

LDGM low-density generator-matrix

LDMC low-density majority-check

LDNC low-Density Nonlinear-Check Codes

LDPC low-density parity-check

xiv

LLR log-likelihood ratio

LSB least significant bit

MET multi-edge type

MLC multi-level coding

MS min-sum

MSB most significant bit

NCG net coding gain

OH overhead

OTN optical transport network

PAM pulse amplitude modulations

QAM quadrature amplitude modulation

QC quasi-cyclic

RS Reed-Solomon

SD soft-decision

SNR signal-to-noise ratio

SP sum-product

SQP sequential quadratic programming

VN variable node

xv

Chapter 1

Introduction

1.1 Motivation

This thesis is about the design of low-complexity forward error correction (FEC) ar-

chitectures for applications with high throughput, as needed, for example, in optical

communication systems. We develop methods to obtain FEC and modulation schemes

in which the primary focus is on minimizing the complexity of decoding information bits.

We also aim at understanding the performance-complexity trade-offs of the FEC schemes

with various modulation formats.

Optical communication systems mandate a bit error rate (BER) of less than 10−15

on bits delivered to the customer. To meet this stringent requirement, and with the ever

increasing demand for throughput, channel coding has become an essential component

of the optical transport networks (OTNs), and the study of efficient and low-complexity

FEC schemes for optical communication is an active area of research; see [7–11] and

references therein.

Early FEC scheme proposals for OTNs (ITU-T G.975.1 [12] for example) used Reed-

Solomon (RS) codes and Bose-Chaudhuri-Hocquenghem (BCH) codes, both algebraic

codes, as the FEC components. These codes achieve very good performance and their

algebraic structure, and their syndrome based decoding, enable the decoding procedure

to be performed in a single pass with a low complexity. More recently, algebraic-based

product-like codes, such as staircase codes [13] and zipper codes [14], have been adopted

in OTNs. These codes are low-complexity hard-decision (HD) FEC solutions that can

operate at a gap of only ∼0.5 dB from their information-theoretic limits.

As the demand for throughput in OTNs increases, researchers increasingly specify

the use of soft-decision (SD) codes, i.e., codes that can make use of probabilistic symbol

reliabilities. The difference between an SD and an HD code lies in the number of decoder

1

Chapter 1. Introduction 2

inputs per received symbol, required for decoding. For example, for a binary input

channel, an HD decoder requires a single-bit quantization of the channel output, while

an SD decoder requires a softer quantization of the channel output, as a measure of the

reliability of the received symbol. Hence, HD codes are fundamentally weaker than SD

codes. Examples of modern SD codes are turbo codes, low-density parity-check (LDPC)

codes, and spatially-coupled codes [15]. At a similar overhead (OH) and signal-to-noise

ratio (SNR), SD codes can achieve coding gains of ∼1–2 dB, or more, relative to the HD

codes used in earlier OTN proposals [16].

The excellent performance of SD codes comes, however, at the expense of a signifi-

cantly increased decoding complexity. A comparison of the implementations of soft- and

hard-decision decoders shows that SD decoders typically consume an order of magni-

tude more power than HD decoders [17–20] operating at comparable throughputs. As

estimated in [21, 22], with a pure SD FEC approach, the decoder component would

be responsible for about 16%–35% of the total power consumption in coherent optical

transmission systems, higher power consumption than any other component of the net-

work. In short-reach optical networks, where throughputs as high as 1 Tb/s are being

considered [23], the FEC power consumption is an even bigger concern [24]. If the en-

ergy consumption per decoded bit does not decrease for future OTN designs, the FEC

component becomes increasingly energy hungry and difficult to cool.

In this work, we study various aspects of code design for optical communication,

with a particular focus on obtaining architectures that attain low decoding complexity,

the specific measure of which is defined in Sec. 1.2.2. We consider an FEC architecture

consisting of an SD LDPC code concatenated with an HD staircase or zipper code. We

aim to take advantage of the best of both worlds: the superior performance of SD FEC

schemes and the low-complexity decoding of the HD FEC schemes.

In our designs, the inner LDPC code is tasked with matching the channel and the

rest of the FEC components to the outer code, reducing the BER on bits delivered to it

below its threshold. That, in turn, enables the outer code to take the BER further down,

below 10−15, as required by OTNs. For example, a typical pre-FEC BER is ∼10−2. We

then task the inner code with reducing the BER to ∼10−3, after which the outer code

takes over and brings the BER to below 10−15. With this approach, the bulk of the error-

correction is carried by the outer HD code with very low decoding complexity. The overall

FEC complexity is then dominated by that of the SD decoder. While not considered in

this work, coarsely-quantized LDPC decoding algorithms [25–28] or soft-aided decoding

algorithms for staircase or zipper codes [11,29–31] can be adopted to further reduce the

complexity or improve the performance of the proposed scheme, respectively.

Chapter 1. Introduction 3

In Chapter 3 we study the design of low-complexity LDPC code architectures. A key

feature that emerges from our design is that it pays to have some of the channel symbols

bypass the inner code, thereby greatly reducing a significant portion of the decoding

complexity. We then develop an optimization routine and obtain low-complexity LDPC

codes for various system specifications that, under the considered measures, significantly

outperform all previously existing SD FEC schemes and do so with up to 71% reduction in

decoding complexity. In this work, a hardware-friendly quasi-cyclic (QC) construction is

adopted for the inner codes, which can realize an energy-efficient decoder implementation,

and even further complexity reductions via a layered message-passing decoder schedule.

The explosive growth in demand to achieve increased transmission rates in optical

communication (currently at a compound annual rate of 48% [23]), has given impetus

to the study of FEC schemes in combination with higher order modulation to increase

the spectral-efficiency; see [8, 32, 33] and references therein. In Chapter 4 we study the

design of low-complexity concatenated codes both in a bit-interleaved coded modulation

(BICM) scheme and a multi-level coding (MLC) scheme. We obtain code designs that

handily outperform the previously existing schemes, with up to 60% reduction in decoding

complexity. More importantly, by a clever choice of FEC architecture, we obtain code

designs via MLC scheme that provide significant advantages relative to designs via BICM

scheme over the entire performance-complexity tradeoff space. We also provide examples

in which the designed FEC schemes are describe in great detail.

In Chapter 5 we develop tools to design low-complexity rate- and channel-configurable

FEC schemes. We propose a concatenated coded modulation scheme that can operate

at multiple transmission overheads (OH) and channel qualities, and with various mod-

ulation orders via an MLC scheme. In this design, the transmission rate is configurable

by signalling with various modulation formats and by the configurable inner-code rate,

and operation at various channel qualities is realized by the configurable inner-decoding

complexity. Such flexibility is mandated in a variety of applications including designing

multi-vendor interoperable modules and software defined optical networks. The obtained

configurable codes achieve up to 60% reduction in complexity compared to previously

existing designs.

Non-binary LDPC codes have also been considered for FEC in optical communication

[34] because they can outperform their binary counterparts of short to medium length.

However, conventional non-binary LDPC codes have higher decoding complexity too. In

Chapter 6 we adapt our design tools to obtain complexity-optimized concatenated non-

binary LDPC-zipper codes. In particular, we consider dense signal constellations based

on four-dimensional (4D) lattices and show that with the obtained non-binary FEC

Chapter 1. Introduction 4

schemes, based on an MLC architecture, we achieve up to 1 dB gain over conventional

FEC schemes, yet with reasonable complexity.

In Chapter 7 we first derive the theoretical limit for the error-reducing inner codes we

will have designed in our schemes, and then explore the possibility of using a novel class of

nonlinear codes to be used in our schemes instead. While the obtained codes are not well-

suited for optical communication, they are nevertheless very interesting mathematically,

and may be useful in code designs for other applications.

We also cover various concepts in Chapter 2 that are not new to this work, but are

beneficial for the reader to review before we present the novelties of this work. Finally,

in Chapter 8 we provide the concluding remarks and provide pointers to possible future

research based on this work.

1.2 Assumptions and Figures of Merit

1.2.1 Assumptions

Throughout this work we design codes assuming a memoryless, additive white Gaussian

noise (AWGN) channel. Although the optical channel is in fact nonlinear, the vari-

ous signal processing units that perform filtering, chromatic-dispersion and nonlinearity

compensation, and various other tasks prior to decoding, typically leave the FEC with a

residual AWGN channel [7, 35,36].

While the actual performance of the codes we obtain might be different over an optical

channel, the ordering of their performance is very unlikely to change. In other words, if

code A outperforms code B on an AWGN channel, code A is also very likely to outperform

code B on a typical (equalized, nonlinearity-compensated) optical link. By assuming an

AWGN channel, we can see the potential in the proposed FEC schemes, which makes

it worthwhile to implement them for an optical channel. This is a common practice in

evaluating design proposals for optical communication [3, 8, 37].

Throughout this thesis, we consider signalling with a uniform distribution over the

constellation points. We assume unit energy per signal dimension at the receiver. Further,

we assume an AWGN channel with σ2 denoting the the noise variance per dimension. In

this setting, the SNR, in decibels, is denoted by Es/N0 , −10 log10 σ2.

Chapter 1. Introduction 5

1.2.2 Figures of Merit

Performance

A fundamental figure of merit of an FEC scheme is determined by the SNR at which it

operates, relative to a reference SNR. When the reference SNR is that of the theoretical

limit, the performance metric is the gap to the constrained Shannon limit (CSL). When

the reference SNR is that of an uncoded scheme, the performance metric is the net coding

gain (NCG). For example, consider an FEC scheme that operates at rate R (bit/symbol),

uses the 22m-ary signal constellation Ω, and requires SNRc to achieve a target BER of

10−15. Now let CΩ(SNR) denote the mutual information achieved by a uniform input

distribution over the constellation Ω as a function of SNR. The gap to the CSL is given

by

gap to CSL (dB) = 10 log10

(SNRc

C−1Ω (R)

), (1.1)

where C−1Ω is the inverse function of CΩ. The NCG provided by this FEC scheme is given

by

NCG (dB) = 10 log10

(R · SNRu

2m · SNRc

), (1.2)

where SNRu is the SNR required in an uncoded transmission, using Ω, to achieve 10−15

BER.

Complexity

Power consumption and heat dissipation at the decoder have become increasingly limiting

factors in FEC design for optical communication. It has been shown that with a given

code and decoding algorithm, the power consumption at the decoder chip scales roughly

linearly with the system throughput [21, 22]. Therefore, it is customary to consider the

energy consumed per decoded information bit as a measure of FEC energy efficiency.

Realistic efficiency measurements for FEC schemes, however, are very hard to formulate,

since they will be highly implementation-technology-dependent, architecture-dependent,

etc. and the only reliable way of estimating them is to design and simulate an application-

specific integrated circuit for the FEC scheme [38]. Therefore, often a measure of decoding

complexity is considered as a proxy.

In this work, in order to quantify, and eventually minimize, the FEC complexity,

we consider the decoder data-flow and count the number of messages required to pass

among the various nodes for successful decoding. As the measure of FEC complexity,

we normalize the number of messages passed in decoding by the number of decoded

Chapter 1. Introduction 6

information bits and denote it by η. A similar complexity measure has been used in a

number of prior works including [1, 39].

We note that this measure is only a proxy for the actual decoder complexity (which,

as stated above, might be measured in energy efficiency terms, or area, memory require-

ments, I/O requirements, or in combinations of these factors). While it is true that η is

only a proxy, we believe that it gives insight into code design, since the actual complexity

is very likely to scale with it. For example, the vast majority of power dissipated in a

decoder hardware is the dynamic power resulting from the signal toggling corresponding

to messages passed between the nodes. Also, in most codes and decoding algorithms, the

number of arithmetic operations required in decoding is linearly related to the number of

messages passed in decoding. Therefore, if the FEC scheme were to be implemented using

compute-in-memory technology [40] (where data can be processed in memory, reducing

the number of times it is moved), η would still be a relevant complexity measure.

Interestingly, as will be shown in Chapters 3–6, minimizing η has desirable hardware

implications such as the use of degree-one variable nodes, or even leaving some bits

uncoded (degree zero) at the inner code, the presence of which have obvious complexity

benefits by any measure.

Chapter 2

Background

2.1 Staircase Codes

A Staircase code [13] is a binary code that consists of (possibly infinitely many) m×mblocks B0,B1,B2,B3, . . ., as shown in Fig. 2.1. Associated with a staircase code is a

binary constituent code, C(nc, nc − rc), where nc is the code length and rc is the number

of parity bits per constituent codewords. The inner-code rate is Rc = 1− rc/nc.At the staircase encoder, we initially let B0 be an all-zero block. For block Bi,

where i ≥ 1, we first fill the white part of the block (see Fig. 2.1) with information bits.

We then use the constituent encoder to fill the rest of the block such that the rows of[Bi−1 BT

i

]are codewords in C. Note that we must have nc = 2m. Per block, we then

have m(m − rc) information bits and mrc parity bits. The rate of the unterminated

staircase code is obtained as

Rsc =m− rcm

= 2Rc − 1.

In this structure, two information bits belong to no more than one constituent codeword.

Staircase codes can be decoded by iterative window decoding. The window size W is

defined as the number blocks in the decoder at each time. For i ∈ i2, i3, . . . , iW, at each

iteration, we use the constituent decoder to decode the rows of[Bi−1 BT

i

]. Decoding

continues until a maximum number of iterations is reached. The staircase decoder then

outputs block Bi1 , the oldest block of bits in the decoding window, and brings a newly

received block in it.

The parameters we pick to construct the staircase code and its decoder determine

its threshold, psc. The threshold is the maximum cross-over probability of a binary

symmetric channel for which the decoder can achieve a target output BER, set at 10−15

7

Chapter 2. Background 8

m

rc

nc

B0

B2

B4

BT1

BT3

BT5

Figure 2.1: Staircase code structure. Information bits fill the white part of the blocksand the parity bits fill the rest. The block B0 is initialized with all-zeros.

in optical communication. See, e.g., [1, Table I] and [41, Tables I, II] for various staircase

code constructions and their thresholds.

Staircase codes have excellent error-correcting performance and can operate within

a gap of only 0.56 dB from the binary symmetric channel Shannon limit. Their al-

gebraic syndrome-based decoding means these codes also have extremely low decoding

complexity. Hence, staircase codes are very attractive choices in FEC design in optical

communication. In fact, in the recent 400ZR standard [4] a staircase code, concatenated

with a Hamming code, is used in the FEC scheme. At the expense of higher decod-

ing complexity, the performance of staircase codes, and product codes in general, can

be improved by modifying the decoding algorithm [42] and aiding the decoder by soft

information from the channel [11,29–31].

As shown in [43, Figure 1.1], however, the staircase decoder memory size grows ex-

plosively at higher rates. In FEC designs where concatenation with codes of very high

rate is desirable (see, e.g., Chapter 3), therefore, a staircase code may not be a practical

choice.

2.2 Zipper Codes

Zipper codes [14] are a newly proposed framework for describing spatially-coupled product-

like codes such as staircase codes and braided block codes. Similar to staircase codes, a

constituent code C(nc, nc − rc) is associated with a zipper code. The zipper code buffer

is divided into a pair of virtual and real buffers, shown in Fig. 2.2 as the left and right

halves of the codes, respectively. The virtual buffer contains copies of the bits in the real

Chapter 2. Background 9

B0 BT1

B1 BT2

B2 BT3

rc

nc

µ

W × µ

Figure 2.2: A typical zipper-code framework (left) and its representation of a staircasecode (right).

buffer, possibly in a different arrangement.

At the encoder, we first fill the virtual buffer by a permutation of previously encoded

bits. We then fill the white part of the real buffer (see Fig. 2.2) by information bits.

Finally, we use the constituent encoder to fill the rest of the real block such that the rows

of the buffer are codewords in C.

Zipper codes also can be decoded by iterative window decoding. The decoding window

consists of W chunks, where a chunk is defined as a collection of µ rows. At each iteration,

we decode each chunk by using the constituent decoder to decode the rows. If we correct

(or much less often, miscorrect) any bits in the chunk, we update their copies accordingly.

Decoding continues until a maximum number of iterations is reached. The zipper decoder

then outputs the oldest chunk of bits in the decoding window, and brings a newly received

chunk of bits into it.

Similar to staircase codes, the parameters we pick to construct the zipper code and

its decoder determine its threshold, below which the decoder can achieve a target output

BER, set at 10−15 in optical communication.

Note that with zipper codes we have more flexibility in choosing the chunk size and

decoding-window size compared to staircase codes. Hence, we can keep the decoder

memory size in check when engineering practical high-rate codes. In fact, high-rate and

practical zipper codes are reported in [14, Table I] that can operate within a gap of only

0.49 dB from the binary symmetric channel Shannon limit. We believe zipper codes are

the natural improvement over staircase codes to be used in the next standards.

Chapter 2. Background 10

2.3 LDPC Codes

Invented by Gallager [44], an (N,K) LDPC code is defined as the null space of a sparse

(N −K)×N parity-check matrix H . Here, N and K denote the code length and code

dimension, respectively, and sparse means that the number of non-zero elements in H is

much smaller than the number of zero elements.

An LDPC code also has a Tanner graph representation [45] as shown in, e.g., Fig. 3.1.

Here, the N columns of H are represented by variable nodes (VNs) and the M = N −Krows are represented by check nodes (CNs). Where there is a non-zero element in the H

matrix, the corresponding VN and CN are connected by an edge in the Tanner graph.

The Tanner graph of an LDPC code can be described by its VN and CN degree

distributions. Let Dv denote the maximum VN degree. The node perspective variable-

degree distribution is defined as L(x) =∑Dv

i=0 Lixi, where Li is the fraction of degree-

i VNs. The edge-perspective degree distribution is defined as λ(x) , L′(x)/L′(1) =∑Dvi=1 λix

i−1, where L′(x) = dL(x)/dx.

For the ease of decoder implementation, it is often assumed that the CN degree

distribution is concentrated on one or two consecutive degrees, dc and dc + 1, with dc

denoting the average CN degree. The node perspective, check-degree distribution is

defined as R(x) =∑dc+1

d=dcRdx

d, where Rd is the fraction of degree-d CNs. The edge-

perspective degree distribution is defined as ρ(x) , R′(x)/R′(1), and can also be obtained

as

ρ(x) =dc(dc + 1− dc)

dcxdc−1 +

dc − dc(dc + 1− dc)dc

xdc .

The VN node perspective and edge perspective distribution parameters are related

by

Li = dc(1−Rin)λi/i.

This identity can be obtained by first setting Li = Ni/N , where Ni is the number degree-i

VNs, then by using the relation 1−R = M/N where R is the LDPC code rate, afterwards

by using Mdc = E where E is the number of edges in the Tanner graph, and finally by

using Ni = Eλi/i.

An LDPC code can be decoded by iterative message-passing decoding on its factor

graph. A factor graph [46] is a type of graphical model well-suited for describing codes

and iterative decoding algorithms via the sum-product (SP) algorithm. In SP algorithm

the messages passed on the edges of the graph are “beliefs” about the symbols associated

with the VNs. These beliefs are typically represented by extrinsic posterior probability

vectors. Hence, message-passing and decoding via SP algorithm is sometimes referred to

Chapter 2. Background 11

as belief propagation. With the SP decoding algorithm, LDPC codes can approach the

Shannon limit [47]. Other, sub-optimal, message passing schemes also exist, including

max-product [45], min-sum [48], and offset min-sum [49] schemes, that may offer practical

advantages compared to the SP message-passing, athough typically at a performance loss.

The excellent performance of LDPC codes, however, comes with several practical

challenges, some of which are listed below.

• In their general form, LDPC codes have a complex encoding that scales quadrati-

cally with the code length, both in terms of processing and memory requirements.

There are, however, structured LDPC codes that allow for a low-complexity en-

coding. Examples of such constructions include repeat-accumulate codes [50] and

low-density generator matrix (LDGM) codes [51].

• The exchange of soft information for every VN and CN in message-passing decoding

of LDPC codes leads to very high power consumption at the decoder. Obtaining

updates to those messages also is computationally intensive. Several ideas have been

explored to address the problem of high power consumption in LDPC decoders,

including chip-voltage reduction, switching activity reduction, use of simpler and

quantized, but sub-optimal, message-passing algorithms, and early termination [26–

28,52–55]. Also, introducing structure to LDPC codes reduce their implementation

complexities and help realize energy-efficient decoders [56,57], as will be discussed

in Sec. 2.3.2. Nevertheless, LDPC decoders typically still consume about an order

of magnitude more power than hard-decision decoders [17–20].

• Due to their random structure, LDPC codes that are to operate close to the Shan-

non limit typically suffer from an error floor [58]. Therefore, in applications such

as optical communications where a very low BER is required, a clean-up (usually

algebraic) outer code has to be concatenated with an LDPC code to mitigate the

remaining errors.

2.3.1 EXIT Functions

The standard analysis and design tool for LDPC codes is the density evolution algorithm.

Density evolution, proposed in [47], can accurately track the convergence behaviour of

very long LDPC codes under various decoding schemes. For example, density evolution

can be used to design and track the pdf of LLR messages at each decoding iteration or

to test whether an LDPC code can operate at a given channel condition.

Chapter 2. Background 12

An extrinsic information transfer (EXIT) function analysis can also be used to track

the decoding behaviour of an LDPC code [59]. While not as accurate as density evolution,

EXIT function analysis has much lower computational complexity. It can provide a fast

and accurate enough tool for designing LDPC codes.

In [60], an accurate one-dimensional EXIT function analysis was proposed and used

for LDPC code design. The key idea is to assume a symmetric Gaussian distribution for

the messages that come from the VNs, but not from the CNs. In the log-likelihood ratio

(LLR) domain, the SP update rule requires the VNs to send the sum of the extrinsic

messages that they receive from the CNs, plus their channel message. The Gaussian

distribution of these messages then can be explained by the central limit theorem.

The EXIT function then tracks a measure of progression throughout decoding itera-

tions. The measure can be the mean of the Gaussian LLRs, the probability of error in

messages, or the average mutual information between the value of VNs and the extrin-

sic LLR messages. Each of these measures proves more accurate than others in certain

settings. The error-probability EXIT function, for example, denotes what would be the

probability of error in messages coming from the VNs after one decoding iteration as a

function of the message error probability at the current iteration.

Similarly, in [60] the authors define elementary EXIT functions that track a certain

measure of decoding progression for a VN of particular degree. Moreover, they show

that the EXIT function of the decoder can be closely approximated as a linear function

(corresponding to the VN degree distribution) of the elementary EXIT functions. As will

be shown in Chapter 3, this approximation is key to LDPC code design.

In [60, Fig. 3] the authors provide a visualization of how the EXIT function tracks the

progress of decoding in various iterations. Furthermore, in [61] the authors give a formula

that, given the EXIT function, estimates the number of decoding iterations required to

take the message BER down to a target BER. As shown in Chapter 3, this formula can

be used to design LDPC codes with minimized decoding complexity.

2.3.2 Quasi-Cyclic LDPC Codes

In their general form, LDPC codes have random composition and imposing any struc-

ture on them negatively affects their asymptotic performance [62]. However, a random

composition is not suitable for hardware implementation. Meanwhile, a code structure

can be used to reduce the wiring interconnect complexity and routing in the hardware,

thus reducing the power dissipation on the decoder chip. Structured codes also allow for

hardware reusability.

Chapter 2. Background 13

Most practical LDPC codes have a quasi-cyclic (QC) structure [63], characterized as

follows. For some positive integer q, let n = N/q and m = M/q. Also let P (s), for

s ∈ 0, 1, . . . q − 1, denote the circular shift of a q × q identity matrix by s columns to

the right. For example, P (0) is the identity matrix and

P (1) =

0 1 0 . . . 0

0 0 1 . . . 0...

......

...

0 0 0 . . . 1

1 0 0 . . . 0

.

We also remark that P (s) = (P (1))s, for s ∈ 1, 2, . . . q − 1. The M × N parity-check

matrix of a QC-structured LDPC code is of the form

H =

P (s11) P (s12) . . . P (s1n)

P (s21) P (s22) . . . P (s2n)...

......

P (sm1) P (sm2) . . . P (smn)

,

where for i ∈ 1, 2, . . .m and j ∈ 1, 2, . . . n, we have si,j ∈ 0, 1, . . . q − 1,∞, and we

define P (∞) as the all-zero matrix.

The QC structure has several advantages, some of which are listed below.

• The QC structure is well known to be hardware-friendly, leading to energy-efficient

implementations [64].

• With a QC structure, LDPC codes of moderate lengths can be obtained that have a

large girth. The girth of a graph is the length of its shortest cycle. A Tanner graph

that has a large girth more closely resembles a tree and therefore belief propagation

on such a graph is a better approximation for maximum a posteriori decoding.

• QC-structured LDPC codes can be encoded in linear time with shift registers [65].

• The QC structure enables a layered message-passing decoding schedule [66]. In a

layered decoding schedule, the CNs are divided into layers. In each iteration, the

decoder sequentially update messages corresponding to each layer, always using

the latest available extrinsic information, which in turn results in a faster decoding

convergence. As will be shown in Chapter 3, layered decoding of QC-structured

LDPC codes can reduce the decoding complexity by up to 50%.

Chapter 2. Background 14

2.4 Code Concatenation

The concept of code concatenation was introduced by Forney [67] and extensively studied

in [68]. In Forney’s scheme, an inner block code of short length is concatenated to an

outer algebraic code. The inner code is decoded using maximum-likelihood decoding

which, from the outer-code perspective, reduces the channel to a burst channel. The

outer (usually RS) decoder is then tasked with cleaning the burst errors.

Code concatenation is frequently used to improve performance of FEC schemes. In

recent OTN proposals, often an inner, iteratively decoded, SD code concatenated to

an outer algebraic code is used [4]. In such schemes, as is the case in this work, the

inner code reduces the channel seen by the outer code, possibly through an iterleaver,

to a binary symmetric channel. The outer decoder is then tasked with mitigating the

remaining errors made by the inner code and bring the BER below 10−15, as required by

OTNs.

The code concatenation principle has been extended to generalized concatenated

codes, first in [69]. In this generalization, the inner code is taken as a sequence of

nested codes and multiple outer codes are used to provide unequal protection over the

inner code symbols, resulting in great flexibility in code design. Reed-Muller codes with

decoding as described in [70] is an example of a generalized concatenated code.

2.5 Coded Modulation

Digital modulation can be represented by a signal constellation and its labelling, i.e., a

bijective mapping of the (possibly coded) bit patterns to the constellation points. Coded

modulation, introduced by J.L. Massey [71], is the joint design of coding and modulation

in FEC schemes. Coded modulation can optimize for the performance, complexity, ro-

bustness, etc. of the FEC scheme, or for any combination of these metrics. We describe

the two common approaches to coded modulation below.

2.5.1 Multi-Level Coding

A constellation labelling produces various bit-levels, corresponding to the signal-point

address-bits. The idea of MLC is to protect (possibly groups of) bit levels by individual

codes, as shown in Fig. 2.3(a). At the receiver multi-stage decoding is carried out, during

which the decoding starts with the lowest bit level and continues to higher levels while

taking into account the the decisions of the previous levels.

Chapter 2. Background 15

C1

C2

C3

C

(a) MLC (b) BICM

Figure 2.3: The two common approaches to coded modulation.

For example, the coded modulation scheme proposed by Ungerboeck [72] and Imai

and Hirakawa [73], aims to improve performance by increasing the minimum Euclidean

distance, instead of Hamming distance, among the symbols that represent codewords.

Ungerboeck’s labelling (also known as set-partitioning or natural labelling) maximizes

the minimum intra-subset Euclidean distance when assigning address-bits. See Fig. 6.2

for an example of set-partitioning labelling. Note that the minimum squared distance

among the adjacent sub-constellation points doubles at each step. This effectively means

a 3 dB gain in SNR of the corresponding bit-channel. A trellis coded modulation [74,75],

for example, divides the bit-levels into two groups: least significant ones (first assigned

bits) are protected by a convolutional code and the rest (if any) remain uncoded.

A generalization of trellis coded modulation is lattice coding in which algebraically-

structured constellations (usually in higher dimensions) are used for signalling [76]. More-

over, constellation labelling via bit assignment can be generalized to non-binary parti-

tioning of the constellation. An example of coded modulation using such partitioning is

the coset codes [77,78].

The MLC scheme is optimal from an information-theoretic point of view [73, 79], as

it is capable of approaching channel capacity with multi-stage decoding by appropriate

choice of the codes for the different bit levels. Although used in digital subscriber line

applications [80], MLC schemes have often been avoided in practice because of the poten-

tially high complexity induced by using separate bit-level codes and the negative impact

that multi-stage decoding has on latency and error propagation. Nevertheless, it has

been shown in [2,3,81,82] and also in Chapter 4, that with a clever choice of the bit-level

codes these issues can be largely resolved and MLC schemes can be designed that have

decoding complexity or performance advantages over BICM.

Chapter 2. Background 16

2.5.2 Bit-Interleaved Coded Modulation

The BICM scheme [83, 84] uses one channel code. The encoded bits are passed through

an interleaver and mapped to the (usually Gray-labeled) constellation. At the decoder,

parallel independent decoding is used where, independent of their level, bits are decoded

all in parallel.

The BICM scheme relaxes the constraints between constellation size, labelling, and

the choice of code. It is known that BICM with a Gray constellation labelling can operate

within fractions of a dB from the Shannon limit [84, 85]. Because of its simplicity and

flexibility, BICM is usually considered to be a pragmatic approach to coded modulation

[86].

Another perceived advantage of the BICM scheme is that for a fixed frame length it

allows for the use of codes with longer block lengths, compared to the MLC approach,

thereby potentially unlocking higher coding gains. This advantage, however, diminishes

in applications with higher throughput as in optical communication. For example, at a

throughput of 400 Gb/s over a 16-ary constellation [4], for an additional delay of only

1 µs for the FEC, the bit-channel codes of an MLC scheme can have a block length in

the order of 105, which still allows for very powerful coding.

Chapter 3

Low-Complexity Concatenated

LDPC-Staircase Codes

3.1 Introduction

In this chapter, we build on the work of Zhang and Kschischang [1] on designing low-

complexity concatenated FEC schemes for applications with high throughput. Their

design consists of an inner soft-decision LDGM code concatenated with an outer hard-

decision staircase code. The degree distribution of the inner LDGM code ensemble is

obtained by solving an optimization problem, minimizing the estimated data-flow of the

inner-code decoder, while searching a table of staircase codes to find the optimal inner

and outer code pair. At 20% OH, the codes proposed in [1] can achieve up to 46%

reduction in complexity, when compared with other low-complexity designs.

We adopt the concatenated FEC structure of [1], but we consider a different ensemble

of inner codes. The task of the inner code, similar to that of [1], is to reduce the BER

of the bits transferred to the outer staircase code to below its threshold, which enables

the outer code to take the BER further down, below 10−15, as required by OTNs. We re-

design the inner code to further reduce its data-flow, thereby achieving an FEC solution

with even lower complexity than the codes reported in [1].

Throughout this chapter, we consider signalling using a Gray-labeled quadrature

phase-shift keying constellation, with unit energy per dimension. We assume a mem-

oryless AWGN channel.

A key characteristic that emerges from the re-designed inner-code optimization is

This chapter includes and expands on the work in [87].

17

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 18

that some inner codeword bits remain uncoded! These bits bypass the inner code, and

are protected only by the outer code. We propose a method to analyze and optimize

the inner-code ensemble, and show that the resulting codes can reduce the inner-code

data-flow by up to 71%, when compared to [1]. We show that, when the block length is

sufficiently large, codes generated according to the obtained inner-code ensembles perform

as expected, verifying the design approach.

To realize a pragmatic decoder implementation, we construct QC codes of practical

length, generated according to the obtained inner-code ensembles. We show that the

performance of randomly-generated inner codes of large block-length can be achieved

by QC codes of practical length in the order of 6000 to 15000. A QC-structured inner

code allows for decoder hardware implementations that are very energy efficient [64].

The QC structure also enables a layered message-passing decoding schedule. We show

that, compared with the flooding schedule, layered decoding of the QC-structured codes

reduces the complexity by up to 50%.

The rest of this chapter is organized as follows. In Sec. 3.2 we describe the inner-

code structure, code parameters, and complexity measure. In Sec. 3.3 we describe how

EXIT functions can be used to predict the inner-code performance, and we describe the

inner-code optimization procedure. In Sec. 3.4 we present simulation results and give

a characterization of the trade-off between the required SNR and decoding complexity

for the concatenated code designs. Designs with QC-structured codes are also discussed

in Sec. 3.4, and a comparison with existing soft-decision FEC solutions is presented. In

Sec. 3.5 we provide concluding remarks.

3.2 The Inner-Code Structure

3.2.1 Code Description

We use LDPC codes as inner codes. A significant feature of the inner-code ensemble is

that we allow for both degree-zero and degree-one variable nodes. Degree-zero variable

nodes are uncoded, and thus incur zero inner decoding complexity. Also, as will be

discussed in Sec. 3.2.3, degree-one variable nodes do not add to the data-flow throughout

the decoding procedure, thus they also incur no inner decoding complexity.

A Tanner graph for a member of the inner-code ensemble is sketched in Fig. 3.1. We

denote the inner-code rate by Rin.

In this work we only consider designing ensembles of systematic LDPC codes. In the

encoder of a systematic code, an information set is designated for the message symbols

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 19

Π

dc

dv

Figure 3.1: Tanner graph of an LDPC inner code, consisting of some degree-zero vari-able nodes (uncoded components) and a coded component. The rectangle labeled by Πrepresents an edge permutation. The VN and CN degree distributions are to be designed.

such that once the message symbols are realized, the codeword is uniquely obtained.

Formally, an information set of an (N,K) code is a set of K positions, the projection of

the code onto which results in a code of dimension K, i.e., the (K,K) code.

Note that the LDGM inner code of [1] is an instance of the ensemble defined above.

However, in an LDGM code, CNs are associated randomly with variable nodes, inducing

a Poisson distribution on variable-node degrees. In this work, the variable-node degree

distribution is carefully optimized to achieve small decoding complexity.

3.2.2 Ensemble Parameterization

The inner code ensemble is described by its VN and CN degree distributions. We denote

the maximum VN degree by Dv. We consider a CN degree distribution that is concen-

trated on one or two consecutive degrees, dc and dc + 1, with dc denoting the average CN

degree.

Let N denote the number of VNs in a particular Tanner graph drawn from the

ensemble, and let Ni be the number of degree-i VNs. We designate a particular subset

of the VNs to be the information set, while the remaining VNs form the parity set. We

let K denote the number of information nodes and let Ki be the number of information

nodes of degree i. The code rate therefore is Rin = K/N .

We denote the VN perspective, degree distribution by L(x) =∑Dv

i=0 Lixi, where

Li = Ni/N is the fraction of VNs that have degree i. The portion of uncoded bits

therefore is given by L0. We define the edge-perspective VN degree distribution as

λ(x) , L′(x)/L′(1) =∑Dv

i=1 λixi−1, where L′(x) = dL(x)/dx. The inner-code rate is

related to the edge-perspective VN degree distribution by

Dv∑i=1

λii

=1− L0

dc(1−Rin), (3.1)

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 20

and for i ∈ 1, . . . , Dv, the edge-perspective and node-perspective VN degree distribu-

tion parameters are related by

Li = dc(1−Rin)λi/i. (3.2)

Let Ui = Ki/N be the share of degree-i information nodes among all VNs. Since all

degree-zero VNs must be among the information nodes, we have U0 = L0. Also Ui ≤ Li

for i ∈ 1, . . . , Dv, andDv∑i=0

Ui = Rin.

Let Λ = (λ1, λ2, . . . , λDv) and let U = (U0, U1, . . . , UDv). We refer to the pair (Λ,U)

as the design parameters. The design parameters will be used in the inner-code optimiza-

tion program.

For reasons described in Sec. 3.2.3 and Sec. 3.3.1, degree-one VNs receive special

treatment in our design. We define ν to be the average number of degree-one VNs

connected to each check node. In terms of the code parameters, ν can be expressed as

ν = dcλ1. (3.3)

3.2.3 Complexity Measure

We use the complexity measure described in Sec. 1.2.2, to quantify, and eventually mini-

mize, the required data-flow at the decoder. The concatenated code decoder complexity

is defined as

η =ηin

Rsc

+ P, (3.4)

where ηin is the inner code complexity score, Rsc is the outer staircase code rate, and

P is the number of post-processing operations per information bit at the outer-code

decoder. The η score is a normalized measure of the number of messages passed in

iterative decoding of the inner code. In this thesis, we have set P = 0, since the decoding

complexity, per bit, of the staircase code is typically two to three orders of magnitude

smaller than that of the inner code. This can be estimated as follows for the rate 15/16

staircase code with a (1408,1364) constituent code. Typically, each constituent code

is “visited” by the iterative decoder about four times during the decoding (where the

decoding, i.e., processing of a syndrome, is performed using a small table-lookup-based

circuit). Since each information bit is protected by two constituent codes, the average

number of bits recovered per decoding attempt is 170.5, giving a complexity of P ≈

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 21

0.006 decoding attempts per decoded bit, which is negligible compared to the complexity

incurred by the inner code, obtained next.

Let E denote the number of edges in the ensemble that are not connected to a degree-

one VN. The complexity score of the inner-code, ηin, can be computed as

ηin =EI

K=

(1−Rin)(dc − ν)I

Rin

, (3.5)

where I is the maximum number of decoding iterations allowed for the inner-code decoder.

Note that, similar to [1], the complexity score in (3.5) does not account for messages of

degree-one VNs, as they remain constant throughout the decoding procedure.

3.3 Complexity-optimized Design

3.3.1 EXIT chart analysis

We analyze the the inner code using a version of EXIT functions [60, 61]. Under the

assumption that the all-zero codeword is transmitted, we define the error-probability

EXIT function fΛ, that takes pin, the probability of error in messages coming from the

VNs, as input, and outputs pout, the probability of error in messages coming from the

VNs, after one round of SP message-passing, i.e.,

pout = fΛ(pin). (3.6)

Using the law of total probability, we can write pout as

pout =Dv∑i=1

λipouti , (3.7)

where pouti is the probability of error in messages coming from a degree-i VN. From (3.6)

and (3.7) we get

pout = fΛ(pin) =Dv∑i=1

λifi,Λ(pin), (3.8)

where functions fi,Λ are called elementary EXIT functions. Function fi,Λ takes pin as

an argument, and produces pouti , the probability of error in messages coming from the

degree-i VNs, after one round of SP message-passing. As shown in [60], in practice the

elementary EXIT charts’ dependence on Λ can be neglected. Therefore, (3.8) can be

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 22

written as

pout = f(pin) =Dv∑i=1

λifi(pin). (3.9)

In [60] a method is proposed, that, for an LDPC code ensemble without degree-zero

and degree-one VNs, approximates the elementary EXIT charts using Monte-Carlo sim-

ulation. Assuming that the messages coming from the VNs have a symmetric Gaussian

distribution with mean m = (2 erfc−1(pin))2 and variance σ2 = 2m, an empirical distribu-

tion for CN messages is generated by performing the CN computation on samples of VN

messages. A degree−i VN then adds its channel message and i− 1 independent samples

of CN messages, to generate a sample of fi(pin). It is shown that the elementary EXIT

charts generated by interpolating the average of a large number of fi(pin) samples closely

replicate the actual elementary EXIT charts.

In our design, however, we must take into account the presence of degree-one VNs

in obtaining the elementary EXIT charts with the method of [60], as the messages from

such nodes significantly affect the CN operation. To this end, we generate the elementary

EXIT charts for a pre-set value of ν, the average number of degree-one VNs connected to

each CN, as defined in (3.3). In the Monte-Carlo simulation described above, we modify

the CN operation to account for the fact that each CN is connected to, on average, ν

degree-one VNs, and therefore receives only their channel observation.

In particular, given ν, let θ ∈ [0, 1) satisfy θbνc + (1 − θ)dνe = ν. We then assume

that a fraction θ of the CNs are connected to bνc degree-one VNs and the remainder

are connected to dνe degree-one VNs. Therefore, in obtaining the samples of degree-dc

CN messages, a fraction θe of CN message computations are performed assuming bνcmessages from degree-one VNs and when performing the CN computation and and the

remainder are performed assuming dνe messages from degree-one VNs, where

θe = θdc − bνcdc − ν

. (3.10)

Note that the SNR, dc, and ν are the only parameters needed to compute the elemen-

tary EXIT charts. Since they do not depend on inner-code design parameters, elementary

EXIT charts can be pre-computed. Therefore, when SNR, dc, and ν are given, the prob-

lem of inner-code design reduces to the problem of appropriately shaping an EXIT chart

out of its elementary EXIT charts.

In Fig. 3.2 we plot the elementary EXIT functions used in Section 3.4.2, Example 1,

for designing a rate-8/9 code ensemble. Here we have SNR = 5.85 dB, dc = 24, and

ν = 1.18. We also plot the EXIT function of the (3, 27)-regular LDPC ensemble (ν = 0).

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 23

0 0.5 1 1.5 2 2.5

·10−2

0

0.5

1

1.5

2

2.5·10−2

f2

f3

f4

f5

f6

pin

pout=

f(p

in)

pout = pin

(3, 27)-RegularDesigned

Figure 3.2: The elementary EXIT functions used for designing a rate-8/9 code ensemblein Section 3.4.2, Example 1. The EXIT function of the resulting optimized ensemble iscompared with that of the (3, 27)-regular LDPC ensemble.

We can observe the effect of allowing degree-zero and degree-one VNs in the ensemble

by comparing f3 to the EXIT function of the regular ensemble: for a given code rate,

having such VNs in the ensemble allows for having CNs of lower degree which in turn

can provide the VNs with stronger and more reliable messages at large values of pin.

3.3.2 Code Optimization

Similar to [1], we view the problem of designing the concatenated FEC scheme as a multi-

objective optimization with the objectives (Es/N0, ηin). In both parameters, smaller is

better, i.e., we wish to minimize the SNR needed to achieve the target error rate and

we wish to minimize the estimated complexity needed to do so. Given a concatenated

code rate, Rcat, we characterize the trade-off between the objectives by finding their

Pareto frontier. For any SNR, we find a pair (if it exists), consisting of an outer staircase

code and an inner-code ensemble, with minimum complexity, that together, bring the

BER below 10−15. The Pareto frontier then provides the various choices of FEC schemes

available to be used in the OTN.

Our proposed concatenated code optimization procedure is as follows. When the

concatenated FEC rate, Rcat, is specified, we loop over a table of staircase codes such as [1,

Table 1]. Recall that each staircase code specifies Rsc and psc, the rate and threshold of

the outer code, respectively. For each staircase code, we perform the inner-code ensemble

complexity optimization.

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 24

It is shown in [61], that, given the EXIT function, the number of iterations, I, required

by the inner code to take the VN message error probability from p0, the channel BER,

down to pt, a target message error probability, can be closely approximated as

I ≈∫ p0

pt

dp

p log(

pf(p)

) . (3.11)

Based on the EXIT function analysis described in Sec. 3.3.1, for i ≥ 2, the probability

of error at a degree-i VN is equal to the probability of error at a message coming from a

degree-(i+ 1) VN, i.e., fi+1(pt). However, the probability of error at a degree-one VN is

not equal to f2(pt) because to obtain f2(pt) we add a channel message to a check message.

In computing that check message, there may be a contribution from a degree-1 VN.

Such a message significantly affects the CN message as it remains constant throughout

the decoding procedure. Therefore, we must obtain CN messages specifically targeted

at degree-one VNs and use them, along with the channel observations, to obtain the

probability of error at degree-one VNs. We denote the probability of error at a degree-

one VN by f1(pt). Note that f1(pt) can also be pre-computed and stored, given a fixed

SNR, dc, and ν.

We let Pout denote the BER on bits passed to the outer decoder. Since only the

information bits of the inner code are passed to the outer code, Pout can be obtained as

Pout =1

Rin

(U0p0 + U1f1(pt) +

Dv∑i=2

Uifi+1(pt)

). (3.12)

From (3.5) and (3.11), the complexity-optimized inner-code ensemble is obtained by

searching over a discrete set of values for dc, ν, and pt, and, for each choice, solving the

following optimization problem:

minimize(Λ,U)

ηin =(1−Rin)(dc − ν)

Rin

∫ p0

pt

dp

p log(

pf(p)

) , (3.13)

subject toDv∑i=1

λii≥ 1− L0

dc(1−Rin), (3.14)

Dv∑i=1

λi = 1, λ1dc = ν, (3.15)

0 ≤ λi ∀i ∈ 1, . . . , Dv, (3.16)

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 25

Dv∑i=0

Ui = Rin, U0 = L0, (3.17)

0 ≤ Ui ≤ Li ∀i ∈ 1, . . . , Dv, (3.18)

f(p) < p ∀p ∈ [pt, p0], (3.19)

Pout ≤ psc. (3.20)

In this optimization problem formulation, constraint (3.14) ensures that the obtained

complexity-optimized code has the desired rate (see (3.1)). Constraints (3.15)–(3.16)

ensure the validity of the obtained ensemble. Constraints (3.17)–(3.18) ensure the validity

of the designated information set. Constraint (3.19) ensures that the obtained EXIT-

curve remains open throughout the decoding procedure, i.e., for all p ∈ [pt, p0]. Finally,

(3.20) ensures that the BER on bits passed to the outer code is at or slightly below its

threshold. Unsurprisingly, it turns out that the highest degree VNs are always chosen

by the optimization routine as information nodes. We call constraints (3.14)–(3.19) the

validity constraints and refer to them in the next chapters.

Note that, in terms of the optimization parameters, constraints (3.14)–(3.20) are

linear (see (3.2), (3.8), and (3.12) for how (3.18), (3.19), and (3.20) are related to the

design parameters, respectively). Also, as shown in [61], under mild conditions, I, as

approximated in (3.11), is a convex function of Λ. Therefore, given an SNR, we can

compute the elementary EXIT functions and once the values of dc, ν, and pt are picked,

the problem of designing complexity-optimized inner-code becomes convex, and can be

solved by the method described in Sec. 3.3.3.

Once the search over the library of staircase codes and the values of dc, ν, and pt

is complete, the ensemble with lowest complexity, according to (3.5), is chosen as the

inner-code ensemble. The obtained ensemble and the corresponding staircase code that

achieves the minimum overall complexity then give the optimized concatenated code.

The inner-code optimization procedure describe here, in effect, synthesizes an open

EXIT function out of the elementary EXIT functions to obtain a valid ensemble that

achieves the target BER with minimum complexity. In Fig. 3.2 we plot the EXIT function

of the rate-8/9 optimized ensemble we obtain in Section 3.4.2, Example 1, and compare

it with that of the (3, 27)-regular LDPC ensemble. While the regular ensemble has a

fixed point above the target BER, the optimization procedure described here obtains a

valid ensemble with an open EXIT function.

We remark that, while not considered in this work, the inner-code optimization can

be reformulated to obtain, for a given complexity, the inner-code ensemble and the cor-

responding outer code with the maximum overall rate, thereby maximizing the system

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 26

throughput. Similarly, a Pareto frontier between the SNR and the concatenated code

rate can be established.

3.3.3 Practical Considerations

Discretization

In practice, the integral in (3.11) is estimated by a sum over a quantized version of the

[pt, p0] interval. Let Q be the number of quantization points. Define ∆ , (p0 − pt)/Qand let

qi = pt + i∆, i ∈ 0, 1, . . . , Q− 1.

We define a discrete approximation of the integral in (3.11) as

IQ =

Q−1∑i=0

qi ln( qif(qi)

),

which we use in the objective function in (3.13), instead of the integral. The constraint

f(qi) < qi i ∈ 0, 1, . . . , Q then ensures the openness of the EXIT-curve throughout

the decoding procedure.

Similarly, intervals [dminc , dmax

c ] [0, νmax] and [0, pmaxt ], are quantized with Qdc , Qν and

Qpt points when searching over values of dc, ν, and pt, respectively, at the inner-code

ensemble optimization. Here, the pair (dminc , dmax

c ) and νmax and pmaxt determine the

intervals we search for the values of dc, ν, and pt, respectively, in our search for the

optimal inner-code ensemble. The values of Q, Qdc , Qν and Qpt allow the designer to

trade-off between accuracy and computational complexity of the design process.

Optimization Algorithm

Even when dc, ν, and pt are fixed, the objective function is non-linear and is not easily

differentiable. To solve the optimization problem, we use the sequential quadratic pro-

gramming (SQP) method [88]. This method is an iterative procedure, at each iteration

of which a quadratic model of the objective function is optimized (subject to the con-

straints), and the solution is used to construct a new quadratic model of the objective

function.

An issue with using the SQP algorithm is that it needs to be initialized with a fea-

sible point. In our design procedure, we first substitute the objective function of the

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 27

Table 3.1: Quantifying Finite Interleaving LossPacking Ratio φ 2 4 8 ≥ 16

Performance Loss (dB) 0.02 0.01 0.007 ≈ 0

optimization by a quadratic function, such as

Dv∑i=1

λ2i +

Dv∑i=0

U2i .

A feasible set of values to initialize the design parameters is then found by solving the

optimization problem using any standard quadratic programming method.

Interleaving Between Inner and Outer Code

The outer staircase code threshold psc is computed assuming that the outer code sees a

binary symmetric channel, i.e., a channel with independent and identically distributed bit

errors occurring with probability psc. The inner decoder, however, produces correlated

errors. To mitigate the error correlation, we use a diagonal interleaver as in [1]. We

suppose that each staircase block is of size Φ2, and we choose the inner code dimension

K to divide Φ2. We define the packing ratio, φ, as the number of inner codewords

associated with a staircase block, i.e., φ = Φ2/K.

Table 3.1 shows the performance loss, relative to ideal interleaving, obtained for

different packing ratios via simulation, assuming an outer staircase code of rate 15/16

with Φ = 704 and using an inner code sampled from an optimized ensemble. Here, the

loss is measured as the extra SNR needed at the receiver to achieve 10−5 BER, relative

to the ideal interleaving threshold. The ideal interleaving threshold was estimated by

interleaving inner codewords over multiple staircase blocks. At packing ratios exceeding

8, the performance degradation becomes negligible, justifying the use of the simple binary

symmetric channel BER analysis of staircase codes. A more detailed discussion of the

interleaving between inner and outer codes is provided with an example in Sec. 4.5.3.

3.4 Results

3.4.1 Pareto Frontier

We searched staircase codes of [1, Table 1] for the optimal outer code. We refer the

reader to [41] to see how these codes are obtained. The reader should note that there

is a slight difference between two of the entries in the earlier table [41, Table 1] (which

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 28

5 5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.20

20

40

60

80

25% OH20% OH

15% OH

Es/N0 dB

η i

BenchmarkProposed

Figure 3.3: The (Es/N0, ηin) Pareto frontiers of the inner code in the proposed design,compared with the benchmark design of [1], at 15%, 20%, and 25% OHs.

included t = 5-error-correcting constituent codes) and the later table [1, Table 1] (which

includes only results corresponding to the more practical t = 4 constituent codes).

We used the following parameters in designing inner-code ensembles: Dv = 12, νmax =

4, pmaxt = p0/2, and Q = 400. The pair (dmin

c , dmaxc ) are chosen according to the inner-code

rate while ensuring a large enough interval to search for the optimal dc. We used the SP

algorithm in generating the elementary EXIT charts, and 106 samples were produced at

each pass of the Monte-Carlo simulation.

Fig. 3.3 shows the (Es/N0, ηin) Pareto frontier for the designed inner-codes, at 15%,

20%, and 25% OHs. The Pareto frontiers are also compared with those of [1]. Similar

to [1], all our concatenated code designs picked the highest-rate staircase code available,

with Rsc = 15/16 and psc = 5.02 × 10−3. As can be seen from Fig. 3.3, the proposed

design outperforms the design in [1]. The obtained inner codes achieve the performance

of the inner codes of [1], with up to 71%, 50%, and 19% reduction in complexity, at 15%,

20%, and 25% OHs, respectively. Also, compared to [1], the designed concatenated codes

operate at up to 0.23 dB, 0.14 dB, and 0.06 dB closer to the CSL, at 15%, 20%, and 25%

OHs, respectively.

To study the performance of the designed inner codes at an overall OH of 20%,

we sampled parity-check matrices for codes of length up to 100,000 from each of the

complexity-optimized inner-code ensembles. Since the code-lengths we consider here are

very large, with high probability we obtain a full-rank sub-matrix corresponding to the

designated information set of each parity-check matrix. We simulated the transmission

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 29

5.4 5.6 5.8 6 6.2 6.4 6.610−3

10−2

Es/N0 dB

Pout

uncodedpsc

Figure 3.4: Simulated inner-code BERs on bits passed to the outer code, sampled fromthe complexity-optimized ensembles, for designs at 20% OH. The mid-point on each BERcurve (highlighted by an ‘o’) is the code operational point, i.e, the SNR for which theinner code is designed to achieve Pout ≤ psc.

of codewords over an AWGN channel. Codewords were decoded using the SP algorithm

with floating-point message-passing, and the code performance was obtained by averaging

the codeword BERs. Note that we only care about the BER of the information nodes of

an inner codeword.

In Fig. 3.4, obtained BERs are plotted versus SNR. The psc line shows the outer

staircase code threshold. The mid-point SNR on each curve (highlighted by on ‘o’) is the

code operational point, i.e., the SNR for which the code is designed. Note that BERs

of all the sampled codes hit at, or below, the outer-code threshold, at their operational

point, verifying our design approach.

3.4.2 Two Design Examples

Here we present two interesting examples of the complexity-optimized concatenated code

designs at 20% OH. In both of these examples, the outer code picked was the Rsc = 15/16

and psc = 5.02× 10−3 staircase code.

Example 1 : An FEC scheme operating at 1.27 dB from the CSL. The optimization

procedure yields the following ensemble for the inner code:

L(x) = 0.1480 + 0.1309x+ 0.3484x3 + 0.3727x4,

R(x) = x24.

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 30

0 1 2 3 4 5 6 7 8 910−4

10−3

10−2

10−1

degree-4

degree-3

degree-1

degree-0

Pout

Iteration #

BER

Figure 3.5: The BER on information nodes of different degrees in the ensemble of Ex-ample 1 and the BER on bits passed to the outer code, denoted by Pout. The degreedistribution on the information nodes is 0.1665 + 0.0223x+ 0.3919x3 + 0.4193x4.

This code requires a maximum of 9 iterations to bring the BER below the outer-code

threshold, which gives an inner-code complexity score of 25.67.

In Fig. 3.5 we plot the BER on information nodes of this ensemble over the decoding

iterations. We also plot Pout, the BER on bits passed to the outer code, obtained using

(3.12). In this example the information nodes include some of the degree-one VNs as

well. As can be seen from Fig. 3.5, the BER on VNs of higher degree decrease rapidly

with decoding iterations and therefore the BER of the uncoded bits dominates Pout at

the end. Similarly, in Fig. 3.4, for codes designed for lower SNRs and at low BERs, Pout

is dominated by the BER on the uncoded bits.

Example 2 : An FEC scheme operating at 1 dB from the CSL. The optimization

procedure yields the following ensemble for the inner code:

L(x) = 0.1480 + 0.1111x+ 0.4539x3 + 0.0911x4 + 0.0973x6 + 0.0985x7,

R(x) = x28.

This code requires a maximum of 18 iterations to bring the BER below the outer-code

threshold, which gives an inner-code complexity score of 60.24.

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 31

0 20 40 60 80 100 120 14010.8

11

11.2

11.4

11.6

LDPC

LDPC

LDPC

UEP-BCH-LDPC

BCH-RS-LDPC

LDPC

BCH-LDPCQC

η

NCG

(dB)

SP, QC, prop. (La)MS, QC, prop. (La)

SP, prop. (Fl)SP, QC, prop. (Fl)MS, prop. (Fl)

MS, QC, prop. (Fl)[92] CC (La)[95] QC (La)[92] CC (Fl)[91] (Fl)

[90] QC (Fl)[93] QC (Fl)[94] SpC (Fl)

Figure 3.6: NCG and η comparisons of the proposed concatenated design and other soft-decision FEC schemes, at 20% OH. Decoders using a flooding (resp., layered) decodingschedule are denoted with Fl (resp. La). For the proposed codes (denoted as “prop.”),the inner decoding algorithm (MS or SP) is specified. Block length 30000 is consideredfor the designs with QC-structured inner codes. The following abbreviations are usedin describing the referenced codes. BCH: Bose—Ray-Chaudhuri—Hocquenghem, UEP:Unequal Error Protection, RS: Reed-Solomon, CC: Convolutional Code, SpC: SpatiallyCoupled.

3.4.3 Comparison to Other Works

To compare our work with the existing designs, in Fig. 3.6, we have plotted the NCG,

obtained from (1.2), versus complexity, at 20% OH, for our designed codes, and also

for several other existing FEC solutions. Since the referenced code designs are based on

min-sum (MS) or offset-MS decoding, we also simulated the obtained inner codes using

the offset-MS algorithm with unconditional correction [89].

Compared to code designs decoded under a flooding schedule, the obtained MS-based

codes achieve, at similar complexities, a 0.77 dB gain over the code in [90], a 0.57 dB

gain over the code in [91], and a 0.42 dB gain over the code in [92]. The designed codes

achieve the NCGs of codes in [92] and [93] with more than a 56% reduction in complexity,

and the excellent NCG of the code in [94] with 46% reduction in complexity.

Compared to code designs where the inner code is decoded under a layered schedule,

the obtained MS-based codes achieve the NCGs of codes in [95] with more than 57%

reduction in complexity, and achieve the NCGs of codes in [92] with 15% to 41% reduction

in complexity.

While some designs in [92], decoded under a layered schedule, come close to the pro-

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 32

posed MS-based codes, the proposed SP-based codes, decoded under flooding schedule,

strictly dominate the existing designs. The SP-based codes achieve the NCGs of the

existing designs with at least 62% and 24% reduction in complexity compared to code

designs decoded under a flooding schedule and layered schedule, respectively. The SP-

based codes achieve at least 0.45 dB and 0.11 dB greater NCG over the existing designs,

at nearly the same η, compared to code designs decoded under flooding schedule and

layered schedule, respectively.

The latency of the proposed concatenated code can be obtained by adding the laten-

cies of the inner and the outer codes. The latency is dominated by the staircase decoder.

For example, at 200 Gb/s, for a staircase block containing 4.65 × 105 information bits

and a staircase decoding window size W = 6, the decoding latency of the proposed con-

catenated code (including the inner code) is ≈ 2.8 × 106 bit periods, or 14 µs, which is

an acceptable latency in many OTN applications.

3.4.4 Quasi-Cyclic-Structured Inner Codes

The inner codes considered so far have been randomly structured and have large block

lengths. Decoder architectures for such codes are often plagued with routing and message-

permutation complexities. In order to obtain a more pragmatic implementation of the

proposed FEC scheme, we adopt a quasi-cyclic (QC) structure for the inner codes. The

QC structure is well known to be hardware-friendly and leading to energy-efficient im-

plementations; see [64] and references therein.

To construct a QC inner code given an ensemble, we first sample a base matrix in

keeping with the ensemble. Should the sampled base matrix not have a full-rank sub-

matrix in the designated parity positions we discard it and sample another one. Once a

valid base matrix is obtained, we lift it to obtain a QC parity-check matrix of large girth

for the inner code.

We constructed girth-8 inner-codes of length 30000±1%, based on the obtained inner-

code ensembles, for the concatenated code at 20% OH. As can be seen from Fig. 3.6,

the concatenated FEC with QC-structured inner-codes performs as well as with ran-

domly structured inner-codes, with only a small loss in performance when operating at

a high NCG. Note that, however, we do not make any claim of optimality for the code

constructions with QC-structured inner-codes, as the optimization procedure used as-

sumes a random structure for the inner code. See [96] for a scaling law predicting the

finite-length performance loss of LDPC codes.

The structure of the QC codes also allows for layered decoding of the constructed inner

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 33

0 10 20 30 40 50 60 7010.8

11

11.2

11.4

11.6

η

NCG

(dB)

nc = 30000± 1% (La)nc = 15000± 1% (La)nc = 10000± 2% (La)nc = 6000± 3% (La)nc = 30000± 1% (Fl)nc = 15000± 1% (Fl)nc = 10000± 2% (Fl)nc = 6000± 3% (Fl)

Figure 3.7: NCG and η comparisons of the QC constructions of the designed concatenatedFEC, at 20% OH, under layered (La) and flooding (Fl) schedules.

codes. As can be seen from Fig. 3.6, the concatenated scheme with inner-code length

30000±1%, decoded under layering schedule, performs at up to 50% lower complexity

compared to the scheme with the inner code decoded under flooding schedule. Compared

to the existing code designs decoded under a layered schedule, the designed codes, with

QC inner-codes decoded under layering schedule, achieve a similar NCG with at least

40% reduction in complexity.

While a length 30000 LDPC code can be considered practical for OTN applications

[92], we have also constructed QC-structured inner-codes of shorter lengths (6000± 3%,

10000±2%, and 15000±1%) and possibly lower girths, based on the obtained inner-code

ensembles, at 20% OH. Note that, according to (3.4) and (3.5), using a short inner code

does not change the complexity score of the overall code; however, having a short inner

code leads to a more practical implementation, as it greatly reduces wiring and routing

complexities. A comparison between the concatenated FEC schemes with inner codes of

various lengths is provided in Fig. 3.7.

As can be seen from Fig. 3.7, when shorter inner codes are used, the loss in NCG is

not significant, although the loss becomes bigger, as the NCG increases or as the inner-

code length becomes shorter. Nevertheless, schemes with inner code of length 6000±3%,

decoded under a layered schedule, operate at up to 50% less complexity, compared to

schemes with an inner code of length 30000± 1%, decoded under a flooding schedule.

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 34

5.4 5.6 5.8 6 6.2 6.40

20

40

60

80

100

Es/N0 dB

η i

BenchmarkLDPC-Staircase (15/16)LDPC-Zipper (0.98)

Figure 3.8: The (Es/N0, ηin) Pareto frontiers of the designed concatenated LDCP-zipperFEC, at 20% OH, compared with the LDCP-staircase design and the benchmark designof [1].

3.4.5 Concatenated LDPC-Zipper Structure

As mentioned in the beginning of this section, in all code designs the optimization pro-

cedure picked the highest-rate staircase code available to us. This suggests that using

an outer staircase code with higher rate is likely to yield concatenated code designs with

even lower complexity. However, it is not trivial to design and implement staircase codes

with a very high rate, because the staircase block size becomes very large as the code

rate increases.

Here, instead of a staircase code, we consider a zipper code as the outer code in

our design. Zipper codes are a framework proposed in [14] for describing spatially-

coupled product-like codes such as staircase codes and braided block codes. In particular,

from [14, Table 1], we pick the highest rate code that is the rate-0.98 zipper with threshold

1.1× 10−3.

Fig. 3.8 shows the (Es/N0, ηin) Pareto frontier of the inner codes designed for concate-

nation with the outer zipper code described above, and a 20% overall OH. The Pareto

frontier is also compared with that of designs with an outer staircase code and also the

benchmark Pareto frontier of [1]. The obtained inner codes achieve the performance of

the codes in [1] and codes with an outer staircase code with up to 71% and 54% reduction

in complexity, respectively. Also, at a similar complexity, the obtained codes can operate

at up to 0.41 dB and 0.32 dB closer to the CSL compared to the codes in [1] and codes

with an outer staircase code, respectively.

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 35

0 20 40 60 80 100 120 140 160 180 200 220 240

10.8

11

11.2

11.4

11.6

11.8

12

LDPC

LDPC

LDPC

UEP-BCH-LDPC

BCH-RS-LDPC

LDPC

BCH-LDPC

LDPC

η

NCG

(dB)

LDPC-Zipper0.98 (Fl)LDPC-Staircase (Fl)

[92] CC (La)[95] QC (La)[92] CC (Fl)[91] (Fl)

[90] QC (Fl)[93] QC (Fl)[94] SpC (Fl)[10] SpC (Fl)

Figure 3.9: NCG and η comparisons of the proposed concatenated design and other soft-decision FEC schemes, at 20% OH. The concatenated design with an outer zipper code,the NCG-η Pareto frontier of which is the top left curve, outperforms other designs by awide margin.

In Fig. 3.9, we plot the NCG-η Pareto frontier of the obtained concatenated designs

with the rate-0.98 outer zipper code and compare it to the existing designs at 20% OH.

As can be seen from Fig. 3.9, the designed codes outperform the previously existing

designs by a wide margin. In particular, the designed codes can achieve the excellent

performance of [10] with 74% reduction in complexity.

3.5 Conclusion

In this chapter we have proposed a concatenated code design that improves significantly

upon the results of [1]. The complexity-optimized error-reducing inner code, concatenated

with an outer staircase code, forms a low-complexity FEC scheme suitable for high bit-

rate optical communication. An interesting feature that emerges from the inner-code

optimization is that a fraction of symbols are better left uncoded, and only protected by

the outer code. We showed that, compared to [1], with this modified design, the inner-

code complexity can be reduced by up to 71%. We showed that the concatenated code

designs have lower complexity than, to the best of our knowledge, any other existing SD

FEC scheme.

To realize a pragmatic and energy-efficient implementation for the proposed FEC

scheme, we constructed QC inner codes, based on the obtained ensembles. We showed

that QC-structured inner codes with practical lengths can achieve the performance of

Chapter 3. Low-Complexity Concatenated LDPC-Staircase Codes 36

the randomly constructed inner codes. We simulated layered decoding of the QC inner

codes and showed that with layered decoding the complexity score of the FEC scheme

can be reduced by up to 50%.

Chapter 4

Low-Complexity Concatenated FEC

for Higher-Order Modulation

4.1 Introduction

In this chapter, we consider the design of low-complexity FEC particularly in combination

with higher-order modulation. In designing such an FEC scheme, it is often unclear

whether it is better to design a coded modulation scheme via an MLC approach, or a

BICM approach. As described in Section 2.5, the MLC scheme, on the one hand, is

optimal from an information-theoretic point of view but has often been avoided because

of the potentially high implementation complexities, and the BICM scheme, on the other

hand, is usually considered to be a pragmatic approach to coded modulation.

Here we consider coded modulation design instances for various modulation orders

that are of practical relevance to optical communication. We design inner MLC and

BICM concatenated with an outer hard-decision code, for application in optical transport

networks and we compare them from a performance-complexity standpoint. We consider

signalling with rectangular quadrature amplitude modulation with 16, 64, and 256 points

(16-QAM, 64-QAM, and 256-QAM, respectively) and design concatenated codes of 28%

and 25% OHs. We use similar code-design approaches as in Chapter 3 and in [99], to

obtain complexity-optimized MLC schemes and complexity-optimized BICM schemes,

so that we may make—via their respective Pareto frontiers—a fair comparison between

them.

Simulation results of practical code designs, reported in Section 4.5, show that, for

This chapter includes and expands on the work in [97] and [98].

37

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 38

all considered modulation orders, MLC provides significant advantages relative to BICM

over the entire performance-complexity tradeoff space. For example, at 28% overall OH

the 64-QAM MLC design can operate with 60% less complexity, or provide up to 0.4 dB

coding gain, when compared with the BICM design. It also compares favorably with the

MLC scheme reported in [2]. Similar advantages are provided by MLC at 25% OH for a

range of modulation formats: the MLC design provides an NCG of up to 12.8 dB with

16-QAM (1.0 dB from the CSL), an NCG of up to 13.6 dB with 64-QAM (1.2 dB from

the CSL), and an NCG of up to 14 dB with 256-QAM (1.65 dB from the CSL), all with

reasonable decoding complexity.

The rest of this chapter is organized as follows. In Sec. 4.2 we describe the concate-

nated coded modulation structures and the setup for a fair comparison between the MLC

and BICM schemes. In Sec. 4.3 and 4.4 we describe the MLC and BICM schemes and

their inner code parameterization and design, respectively. In Sec. 4.5 we present simu-

lation results for the MLC and BICM schemes that we have designed, characterize their

trade-offs, and compare them to the existing designs. In Sec. 4.6 we provide concluding

remarks.

4.2 Concatenated Code Description

We adopt a similar concatenated FEC structure to that in Chapter 3. We consider an

inner SD LDPC code concatenated with a high-rate outer HD code. The outer code is

concatenated with the inner code through an interleaver, π (see the encoder and decoder

of Fig. 4.1 and Fig. 4.2). The purpose of the interleaver is to reduce correlation among

bit errors passed to the outer code.

The task of the inner code in the concatenated code design is to reduce the BER of

the bits transferred to the outer code to below its threshold, which enables the outer code

to take the BER further down, below 10−15, as required by optical transport networks.

We construct concatenated codes of 28% and 25% OHs. For the 28% OH design, we

use the staircase code of rate Rout = 239/255, proposed in [13], as the outer code. This

outer code has a nominal threshold of 4.8 × 10−3; however, for the inner code we set a

lower BER target of P tout = 3 × 10−3, as this provides a practical margin that will also

enable a reduced interleaver size between inner and outer codes.

For the 25% OH design, we use a zipper code as the outer code [14]. We use the

diagonal zipper code of rate Rout = 0.96, proposed in [14], as the outer code. This outer

code has a nominal threshold of 2.32 × 10−3; however, for similar practical reasons, we

set a lower BER target of P tout = 2× 10−3 for the inner code.

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 39

Outer HDEncoder

πInner SDEncoder

ΩMLC−M

Mapper

m− 1

Rin,MLC 1

(a) Encoder

ΩMLC−M

Demapper Inner SDDecoder

HD-MSBs

π−1 Outer HDDecoder

m− 1 m− 1

11−Rin,MLC

Rin,MLC

(b) Decoder

Figure 4.1: The encoder and the decoder in the MLC scheme. Here, m = log2M denotesthe number of bits per PAM symbol.

We study this concatenated FEC scheme in conjunction with modulation schemes

of various orders. In particular, we consider rectangular M2-QAM with M ∈ 4, 8, 16.Each of these modulation schemes can be thought of as the Cartesian product of separate

pulse amplitude modulations (PAMs) of M points, one in-phase and one in-quadrature.

It follows that the number of QAM symbols per frame is half that of the number of PAM

symbols. Throughout this chapter we let m = log2M denote the number of bits mapped

to each PAM symbol.

We aim at establishing a trade-off, in a Pareto sense, between performance and com-

plexity, for the considered coded modulation schemes. We use the performance and com-

plexity measure described in Sec. 1.2.2 in code design. As shown in Sec. 3.2.3, the overall

decoder complexity is dominated by that of the inner code and therefore is obtained as

η = ηin/Rout, where ηin is the inner-code complexity.

We have devised a coded modulation architecture to ensure a fair comparison between

the MLC and BICM schemes. Unlike most conventional MLC schemes, only the LSB

is inner-coded here; thus both schemes employ just a single binary code. Moreover, to

achieve the same latency, we choose the inner-code block length of the MLC scheme to

be shorter, by a factor of m, than that of the BICM scheme.

For the purposes of code design, we model the optical channel as an AWGN channel.

4.3 MLC Scheme

4.3.1 Coded-Modulation Description

We label the in-phase and the quadrature M -PAMs forming the M2-QAM constellations

as follows. The least significant (right-most) bit (LSB) alternates between adjacent sym-

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 40

Outer HDEncoder

π Inner SDEncoder

ΩBICM−M

MappermRin,BICM m

(a) Encoder

ΩBICM−M

DemapperInner SDDecoder

π−1 Outer HDDecoder

m mRin,BICM

(b) Decoder

Figure 4.2: The encoder and the decoder in the BICM scheme. Here, m = log2M denotesthe number of bits per PAM symbol.

bols. For symbols with LSB = 0 we use a binary reflected Gray code (BRGC) [100] to

label the most significant bits (MSBs). We use the same MSB labelling for symbols with

LSB = 1. For example, we use the following labelling for the in-phase 8-PAM of the 64-

QAM constellation: ΩMLC-8 = (000, 001, 010, 011, 110, 111, 100, 101). With this labelling,

we construct a channel with minimal reliability for the LSB, thereby maximizing the

reliability of the MSBs. Moreover, the BER on the MSBs, demapped given the LSB, are

most similar with the Gray labelling.

We consider MLC schemes in which only the LSB is encoded by the inner code, and

the MSBs are protected only by the outer code (see Fig. 4.1(a)). At the receiver, a

hard-decision on the MSBs, taking into account the hard-decision on the inner-decoder

output bits and the channel information, is passed through the de-interleaver to the outer

decoder (see Fig. 4.1(b)). We assume that the inner decoder passes only a hard-decision

on its bits to the outer decoder and the MSB demapper.

In Fig. 4.1 we also indicate the number of bits passed to (from) the constellation

(de)mapper, per PAM symbol. Note that while all inner-decoded bits are used for demap-

ping the MSBs, inner-decoded information bits are passed to the outer decoder as well.

With the chosen labelling, the LSB channel is output-symmetric (see [15, Defn. 4.8]).

It is known that for a binary-input, memoryless, and output-symmetric channel, the log-

likelihood ratio densities coming from variable nodes of an LDPC code remain symmetric

during decoding [47, Thm. 3]. Therefore, the EXIT function analysis of [60], which we

modified for code design in Chapter 3, remains valid. This symmetry also simplifies inner-

code design and simulation, as the all-zero codeword can be assumed to be transmitted

on the LSB channel without loss of generality.

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 41

4.3.2 Inner-Code Description

We adopt the design procedure of Chapter 3 to obtain complexity-optimized inner codes.

The inner code ensemble, members of which have a Tanner graph as shown in Fig. 3.1,

is described by its VN and CN degree distributions.

We consider a CN degree distribution that is concentrated on two consecutive degrees,

dc and dc + 1, with dc denoting the average CN degree. We let Rj(x) =∑dcj+1

d=dcjRd,jx

d

denote the node perspective check-degree distribution.

We use the same parameters as in Sec. 3.2.2 to describe the VN degree distribution

of the systematic inner-code ensemble where we designate a particular subset of the

VNs to be the information set and the remaining VNs form the parity set. The inner-

code rate is denoted by Rin,MLC. We therefore have∑Dv

i=0 Ui = Rin,MLC, and also Li =

dc(1−Rin,MLC)λi/i, for i ∈ 1, . . . , Dv.We let Λ = (λ1, λ2, . . . , λDv) and let U = (U0, U1, . . . , UDv). We refer to the pair

(Λ,U) as the design parameters. The design parameters will be used in the inner-code

optimization program.

Let E denote the number of edges in the ensemble that are not connected to a degree-

one VN. Also, let I denote the maximum number of decoding iterations allowed in the

inner decoder. The complexity score of the inner code in the MLC scheme, ηin,MLC, is

then computed as

ηin,MLC =EI

N(m− 1 +Rin,MLC), (4.1)

and measures the number of messages that are passed at the inner decoder per informa-

tion bit transferred to the outer code. Note that in (4.1) we have accounted for the fact

that there are m − 1 uncoded bit-levels per in-phase and in-quadrature M-PAMs that

incur zero inner decoding complexity. It is easy to see that E/N = (1−Rin,MLC)(dc− ν),

where ν is the average number of degree-one VNs connected to each CN. Therefore,

ηin,MLC can be obtained as

ηin,MLC =(1−Rin,MLC)(dc − ν)I

m− 1 +Rin,MLC

. (4.2)

4.3.3 Ensemble Optimization

EXIT Chart Analysis

Since in our architecture the messages passed at the decoder are symmetric, the EXIT

function tracking the decoding procedure is uni-parametric. We use a similar uni-

parametric EXIT function, f(p), as in Sec. 3.3.1, with argument p, the error probability

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 42

on messages coming from the VNs, in analyzing the inner-code ensemble.

Similar to Sec. 3.3.2, given the fixed SNR, dc, and ν, we can pre-compute and store

the elementary EXIT functions fi(p), for i ∈ 2, 3, . . . , Dv, and f1(p) and use them in

the BER analysis and the ensemble optimization. Given the elementary EXIT functions,

the number of iterations, I, required by the inner code to take the VN message error

probability from p0, the channel BER, down to pt, a target message error probability,

can be approximated as in (3.11).

BER Analysis

We let Pinfo and Pparity denote the information-set BER, and the parity-set BER, respec-

tively, after the target message error probability is achieved. In terms of the ensemble

parameters, Pinfo and Pparity can be obtained as

Pinfo =1

Rin,MLC

(U0p0 + U1f1(pt) +

Dv∑i=2

Uifi+1(pt)

),

Pparity =1

1−Rin,MLC

((L1 − U1)f1(pt) +

Dv∑i=2

(Li − Ui)fi+1(pt)

).

As shown in Fig. 4.1(b), the demapped MSBs along with the information bits of the

inner code are passed to the outer decoder. Let Pout denote the BER on bits passed to

the outer decoder. From the law of total probability, Pout can be obtained as

Pout =m− 1

m− 1 +Rin,MLC

PMSB +Rin,MLC

m− 1 +Rin,MLC

Pinfo, (4.3)

where PMSB is the average BER in demapping the MSBs. Note that a hard decision on

all LSBs (both information bits and parity bits of the inner code) is used to demap the

MSBs. Therefore, PMSB can be obtained as

PMSB = Rin,MLCPMSBinfo + (1−Rin,MLC)PMSB

parity, (4.4)

where PMSBinfo and PMSB

parity denote the average BER in demapping the MSBs using informa-

tion bits and parity bits of the inner code, respectively. Now let PMSBc be the average

BER in demapping the MSBs when the LSB decision is correct, and let PMSBe be the

average BER in demapping the MSBs when the LSB decision is in error. The values

of PMSBc and PMSB

e only depend on the SNR and the constellation labelling and can be

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 43

obtained empirically by Mote-Carlo simulation. We can then obtain PMSBinfo and PMSB

parity as

PMSBinfo = (1− Pinfo)PMSB

c + PinfoPMSBe

PMSBparity = (1− Pparity)PMSB

c + PparityPMSBe .

(4.5)

From (4.3)–(4.5), Pout can be obtained by the affine relation

Pout = aPinfo + bPparity + c, (4.6)

where

a =(m− 1)Rin,MLC

m− 1 +Rin,MLC

(1

m− 1+ PMSB

e − PMSBc

),

b =(m− 1)(1−Rin,MLC)

m− 1 +Rin,MLC

(PMSB

e − PMSBc

),

c =m− 1

m− 1 +Rin,MLC

PMSBc

are independent of the inner-code design parameters.

Optimization Routine

The complexity-optimized inner-code ensemble is obtained by searching over a discrete

set of values for dc, ν, and pt, and, for each choice, solving the following optimization

problem:

minimize(Λ,U)

(1−Rin,MLC)(dc − ν)

m− 1 +Rin,MLC

∫ p0

pt

dp

p log(

pf(p)

) , (4.7)

subject toDv∑i=1

λii≥ 1− L0

dc(1−Rin,MLC), (4.8)

Dv∑i=1

λi = 1, λ1dc = ν, (4.9)

0 ≤ λi ∀i ∈ 1, . . . , Dv, (4.10)

Dv∑i=0

Ui = Rin,MLC, U0 = L0, (4.11)

0 ≤ Ui ≤ Li ∀i ∈ 1, . . . , Dv, (4.12)

f(p) < p ∀p ∈ [pt, p0], (4.13)

aPinfo + bPparity + c ≤ P tout. (4.14)

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 44

In this optimization problem formulation, constraints (4.8)–(4.13), similar to constraints

(3.14)–(3.19), are the validity constraints. Constraint (4.14) then ensures that the BER

on bits passed to the staircase decoder is at or below the set target. Unsurprisingly, it

turns out that in all the MLC designs reported in this chapter, the highest degree VNs

are always chosen by the optimization routine as information nodes.

As decribed in Sec. 3.3.2, in terms of the optimization parameters, every constraint

in the optimization program is linear (see (4.6) for how (4.14) is related to the design

parameters). Therefore, one can solve this optimization program using the tools and

techniques described in Sec. 3.3.3.

4.4 BICM Scheme

4.4.1 Coded-Modulation Description

We label the M -PAM constellation of an M2-QAM constellation using a BRGC on m

bits. For example, we use the following labelling for the in-phase 8-PAM of the 64-QAM

constellation: ΩBICM-8 = (000, 001, 011, 010, 110, 111, 101, 100).

Unlike the MLC case, all m bit-levels are encoded jointly by the inner code in our

BICM scheme as shown in Fig. 4.2(a). At the receiver (see Fig. 4.2(b)), we use bitwise

demapping and perform SD bit-metric decoding of the inner LDPC code on all m bit-

levels. The inner decoder output is then passed to the outer decoder, which performs

HD decoding.

A bitwise ΩBICM demapper yields m BICM bit-levels of differing reliabilities. Fol-

lowing the approach of [99], we explicitly incorporate the different bit reliabilities into

the code design and consider a multi-edge type (MET) ensemble [101] as displayed in

Fig. 4.3. The m bit-levels are mapped to the m VN types.

The inner code rate is related to that of the MLC via

m− 1 +Rin,MLC = mRin,BICM.

For instance, for Rin,MLC = 1/2 and m = 3, the corresponding BICM rate is Rin,BICM =

5/6 for the same spectral efficiency.

Unlike that of the MLC scheme, the inner-code design procedure for the BICM

scheme, described next, does not rely upon the channel being output-symmetric. In-

deed, this approach can also be adapted to the design of MLC schemes. While we have

not pursued this direction in this chapter, we will consider a similar code design approach

in the next chapter.

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 45

Π Π Π

VN1 VN2 VNm

dv1 dv2 dvm

dc1

Figure 4.3: Inner-code ensemble considered for the BICM scheme

4.4.2 Inner-Code Description

Consider the MET ensemble in Fig. 4.3. Unlike conventional protographs [102], where

each VN-type represents one specific VN degree, we associate with a type-j VN, where

j ∈ 1, . . .m, a VN-perspective degree distribution Lj(x) =∑Dv

i=0 Li,jxi, where Dv is the

maximum VN degree. The average type-j VN degree therefore is dvj =∑Dv

i=0 iLi,j. We

define the edge-perspective degree distribution of the type-j VN as λj(x) =∑Dv

i=1 λi,jxi−1.

At any CN, let the type-j degree denote the number of its edges that come from

type-j VNs. We consider a MET ensemble in which the type-j CN degree distribution is

concentrated on two consecutive degrees, dcj and dcj + 1. We denote by Γj(x) the type-j

CN degree distribution. The type-j average CN degree, dcj , is then obtained as

dcj =dvj

m(1−Rin,BICM).

Furthermore, similar to the MLC scheme, we consider an overall CN distribution that is

concentrated on two consecutive degrees, dc and dc + 1, with dc denoting the average CN

degree. We therefore have dc =∑m

j=1 dcj .

The average number of degree-one VNs connected to each CN in the MET ensemble

is

ν =

∑mj=1 L1,j

m(1−Rin,BICM).

Similar to (4.2), the complexity score of the inner code in the BICM scheme, ηin,BICM, is

obtained by

ηin,BICM =(1−Rin,BICM)(dc − ν)I

Rin,BICM

. (4.15)

By assigning a degree distribution to each of the m VN-types, we obtain an ensemble

where the VN-types see different reliabilities by the assigned bit-levels. Also at each VN,

after decoding, the VNs of different degrees see different resulting reliabilities. Let Nj

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 46

Table 4.1: An example of degree distributions of various types, for m = 3.

j Lj(x) dvj Γj(x) dcj

1 23x+ 1

3x2 4

329x+ 7

9x2 16

9

2 x4 4 23x5 + 1

3x6 16

3

3 x5 5 13x6 + 2

3x7 20

3

denote the number of non-zero coefficients of the VN degree distribution of the type-j VN

Lj(x). Overall, after decoding, we then have N =∑m

j=1Nj VNs of different reliabilities.

4.4.3 Ensemble Sampling

Note that for the MET ensemble we require not only to have, for all j ∈ 1, . . . ,m, a

concentrated type-j CN degree distribution, but to have a concentrated overall CN degree

distribution. We use an algorithm we call degree partition and sort (DPS) to obtain such

a CN configuration. Before we describe DPS we need the following subroutines that

operate on integer matrices. Here, by “row-weight” we mean the sum of the elements in

a given row of the matrix.

• Sort: The function sort≤(P, l) operates on a matrix P of r rows and a column

vector l of r corresponding integers. It returns a matrix P, in which the rows of P

have been sorted, top-down, by row-weight in ascending order, and it produces a

vector l preserving the original correspondence. The function sort≥(P, l) is defined

similarly, but it returns a P in which the rows of P have been sorted, top-down,

by row-weight in descending order.

• Expand: The function P = expand(P, l) operates on an s × t matrix P and an

s × 1 vector l of positive integers summing to sl. It returns an sl × t matrix P in

which the i-th row of P is repeated with multiplicity l(k), where l(k) denotes the

k-th element of l.

• Collapse: The function coll(P) operates on an sl × t matrix P. It returns an

s × t matrix P and an s × 1 multiplicity vector l of positive integers, where P =

expand(P, l), and s is as small as possible.

Given the degree distributions of all types, DPS works as follows. Let CΓ be the

smallest positive integer such that, for j ∈ 1, . . . ,m, CΓΓj(x) is a polynomial with

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 47

integer coefficients. Let Pj be a column vector containing the CN degrees of type-j,

and let lj be the multiplicity vector containing the corresponding coefficients of CΓΓj(x).

Initially, we let P = P1 and l = l1. For j = 2, . . . ,m, we then update P and l iteratively

as

(P, l) = coll(expand(sort≥(P, l))

∣∣expand(sort≤(Pj, lj))),

where ‘|’ denotes the concatenation operator. Note that a Γj(x) with irrational coefficients

can be approximated arbitrarily closely by a polynomial with rational coefficients. In

practice, therefore, DPS works for any set of degree distributions.

It is possible to show that the resulting matrix P and vector l then describe the CN-

side of the MET ensemble. The rows of P represent the CNs in the ensemble and the

columns represent VN types they connect to. Matrix P will have at most m + 1 rows

which denote various CN kinds in the ensemble. The (i, j)-th element of P corresponds to

the multiplicity of type-j edges at i-th CN kind. Vector l corresponds to the multiplicity

of the CN kinds in the ensemble.

As an example, we apply DPS to an ensemble for which the VN-type degree distri-

butions are give in Table 4.1. Here, we have CΓ = 9. We initialize the DPS algorithm

with

P =

[1

2

], l =

[2

7

],

corresponding to the first VN type. After one round of DPS, we get the following

P =

2 5

2 6

1 6

, l =

6

1

2

,and after the last round of DPS, we get

P =

2 6 6

2 5 6

2 5 7

1 6 7

, l =

1

2

4

2

.

In the resulting matrix P, 1) the elements of the j-th column, representing the type-

j CN degrees, are concentrated on integers dcj and dcj + 1, and 2) the row-weights,

representing the overall CN degrees, are concentrated on integers dc and dc + 1. These

properties hold true in general.

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 48

Let P and l be the outputs of DPS and let sl =∑m+1

k=1 l(k). Let nv and nc be positive

integers such that

1. nv is divisible by m,

2. for j ∈ 1, . . . ,m, nvmLj(x) is a polynomial with integer coefficients,

3. nc is divisible by sl, and

4. ncnv

= 1−Rin,BICM.

We sample from the MET ensemble by creating a bipartite graph with nvmLi,j degree-i

VNs of j-th type, for i ∈ 1, . . . , Dv and j ∈ 1, . . . ,m, and ncsll(k) CNs of k-th kind,

for k ∈ 1, . . . ,m+ 1. We then randomly place the edges in the graph according to the

multiplicity of edge-types at each CN. We do not allow parallel edges in the graph.

Note that similar to the MLC scheme inner-code where the CN degrees concentrate

on two consecutive degrees, the BICM inner-code CN degrees also concentrate locally

on two consecutive degrees for each VN-type. Thus, the BICM inner-code ensemble can

be interpreted as the BICM counterpart of the MLC inner-code ensemble with multiple

VN-types.

4.4.4 Ensemble Optimization

EXIT Function Analysis

The iterative decoding threshold of conventional protograph ensembles can be efficiently

computed using the protograph-based EXIT analysis [103]. Here, we carefully consider

the BICM inner code MET ensemble and the irregular VN and CN degree distributions

of each type and provide an analysis for it based on EXIT functions.

Let the function Υ(σ) be defined as

Υ(σ) = 1−∫ ∞−∞

1√2πσ2

e−(z−σ2/2)2

2σ2 log2

(1 + e−z

)dz. (4.16)

Similar to [99, Sec. IV-B2], we let SNRj denote the equivalent binary-input AWGN

surrogate channel-SNR for the j-th bit channel. The corresponding channel log-likelihood

ratio has a distribution with variance σ2j = 4SNRj. The message from a type-j VN in

the `-th iteration, for j ∈ 1, . . . ,m, is

I`vj→c =Dv∑i=1

λi,jI`vj→c(i), (4.17)

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 49

where I`vj→c(i) is given by

I`vj→c(i) = Υ

(√(i− 1)Υ−1

(I`−1c→vj

)2

+ σ2j

). (4.18)

Initially, we let I0vj→c = Υ(σj). We use the approximation of [104, Eqs. (9),(10)] in

computing Υ(σ).

Note that (4.17) and (4.18) are identical to the equations for irregular LDPC code

ensembles given in [105, Chap. 3, Eq. (17)] and [105, Chap. 3, Eq. (19)], respectively.

The message from a CN to a type-j VN in `-th iteration is

I`c→vj =m+1∑i=1

ρi,jI`c→vj(i), (4.19)

where I`c→vj(i) is given by

I`c→vj(i) = 1−Υ

(( ∑j′=1:mj′ 6=j

[Pij′Υ

−1(

1− I`vj′→c)2

+ (Pij − 1)Υ−1(

1− I`vj→c)2 ]) 1

2

).

(4.20)

Here, Pij is the (i, j)-th element of the matrix P, and where ρi,j denotes the portion of

type-j edges connected to the i-th CN kind.

Note that (4.19) is identical to the equation for irregular LDPC ensemble analysis

given in [105, Chap. 3, Eq. (16)] and (4.20) is identical to the equations for PEXIT

analysis given in [99, Eq. (17)], [103, Sec. III.C].

BER Analysis

The a posteriori probability (APP) mutual information at a type-j VN after ` decoding

iterations, IAPP,`j , can be computed as

IAPP,`j =

Dv∑i=1

Li,jIAPP,`j (i),

where

IAPP,`j (i) = Υ

(√iΥ−1

(I`−1c→vj

)2

+ σ2j

). (4.21)

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 50

Note that the right hand sides of (4.21) is almost identical to the right-hand side of

(4.18), except that we have a factor of i instead of (i − 1), as (4.18) computes extrinsic

information.

Using (4.21), the BER at degree-i, type-j VNs after ` decoding iterations, ε`i,j, there-

fore is

ε`i,j =1

2erfc

(σi,j

2√

2

),

where σi,j = Υ−1(IAPP,`j (i)

)and erfc(x) is the standard complimentary error function.

The contribution of degree-i, type-j VNs to the overall BER therefore is 1mLi,jε

`i,j.

Note, however, that unlike the MLC scheme, the error on inner parity bits has no

impact on the BER on bits passed to the outer decoder. We therefore assign the VNs of

highest reliability as the information bits. In particular, we let ε = (ε1, . . . , εN) be a vector

whose elements are ε`i,j values, sorted in ascending order, and we let α = (α1, . . . , αN)

be their corresponding contribution factors (the 1mLi,j values). Here, N denotes the

number of different reliabilities we get after decoding, as stated in Sec. 4.4.2. Let κ be

the maximum index such that∑κ

i=1 αi < Rin,BICM. The BER on bits passed to the outer

decoder, Pout, is therefore obtained as

Pout =1

Rin,BICM

(κ∑i=1

αiεi +

(Rin,BICM −

κ∑i=1

αi

)εκ+1

). (4.22)

Differential Evolution

We jointly optimize the VN-type degree distributions using a differential evolution algo-

rithm [106]. In particular, we follow the differential evolution procedure of [105, Ch. 3,

Sec. 3.3] for the inner-code ensemble optimization.

Given the inner code rate, Rin,BICM, and a fixed target complexity constraint, ηtin,BICM,

the differential evolution algorithm searches for a set ofm degree distributions, Lj(x)mj=1,

that minimizes a score function defined in the next paragraph.

For the given set of VN degree distributions Lj(x)mj=1, we first obtain the maximum

number of decoding iterations allowed, I, that satisfies the data-flow constraint ηin,BICM ≤ηtin,BICM from (4.15). A non-integral number of iterations is obtained by time-sharing

between decoding withbIc and with dIe number of iterations. The score of Lj(x)mj=1 is

then defined as the minimum channel SNR at which Pout, as defined in (4.22), is below

the target BER, P tout.

The differential evolution search is performed on an N -dimensional vector x contain-

ing the stacked coefficients of Lj(x)mj=1 . Let g denote the number of generations the

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 51

differential evolution is carried for and let S denote the population size at each genera-

tion. Also, let β > 0 be an amplification factor and let 0 ≤ ξ ≤ 1 denote a cross-over

probability. On the population of each generation, the differential evolution carries the

following three steps, where for s ∈ 1, . . . , S:

1. Generate a mutation vs = xi1 + β(xi2 − xi3), where i1, i2, i3 are chosen uniformly

at random, without replacement, from the set 1, . . . , S\s.

2. Generate a competitor vector us whose i-th component, i ∈ 1, . . . , N, is found

as

us,i =

vs,n with probability ξ,

xs,n otherwise.

3. Vector us then replaces vector xs in the next generation if and only if it has a

better (i.e., lower) score.

The algorithm is initialized with random vectors x1, . . . ,xS that satisfy the code-rate

constraint. After carrying differential evolution on g generations, the algorithm then

outputs the vector, x∗, with best score at the last generation. The stacked vector x∗

then determines the optimal inner-code ensemble, for which we can sample a base-matrix

as described in Sec. 4.4.3.

4.5 Results

We design concatenated coded modulation schemes at an overall OH of 28% and 25%,

for various choices of modulation orders using MLC and BICM. We characterize the

performance-complexity trade-off in MLC and BICM by obtaining the Pareto frontier

between the SNR at which the coded modulation operates, and the decoding complexity.

In particular, at any given SNR, we obtain complexity-optimized concatenated MLC and

BICM inner-code ensembles, according to measures (4.2) and (4.15), using the methods

described in Sec. 4.3 and Sec. 4.4, respectively.

We consider a QC structure for the inner codes. We sample base codes of small

length from the obtained ensembles and then lift them to obtain QC inner codes of girth

at least 8. Note that should a sampled base matrix not have a full-rank sub-matrix

in the designated parity positions we discard it and sample another base matrix. As

mentioned in Sec. 3.4.4, the QC structure is well-known to be hardware-friendly, giving

rise to energy-efficient implementations [64].

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 52

16.4 16.6 16.8 17 17.2 17.40

10

20

30

40

60%

0.4 dB

11

9

7

6

5

17

15

109

86

5

Es/N0 dB

η

BICMMLC

MLC [2]

Figure 4.4: Performance-complexity comparisons of optimized codes for MLC and BICMusing 64-QAM, compared with the design in [2]. The number of decoding iterationsrequired by each designed code is indicated. At the overall 28% OH of these schemes,CSL = 15.0 dB.

As suggested in [99, Sec. IV-E], for optimization of the inner code in the BICM scheme

we chose the initial population size S = 100 and g = 1000 for the differential evolution

algorithm. We also used β = 0.6 and ξ = 0.6.

Our codes induce a uniform distribution over the transmitted QAM symbols. In all

the results presented, we use the SP algorithm for inner-decoding, with floating-point

message-passing. For each scheme we measure the gap (in dB) to the CSL and the NCG

using (1.1) and (1.2).

4.5.1 Design for 28% OH

For coded modulation designs with 28% OH, we consider a 64-QAM and we use the

staircase code of rate-239/255 as the outer code. We consider a PAM frame length of

8000 (which amounts to 4000 QAM symbols). This requires the use of a rate-(1/2),

length-8000, inner code on the LSB for the MLC scheme, and the use of a rate-(5/6),

length-24000, inner code for the BICM scheme.

As can be seen in Fig. 4.4, compared to the BICM scheme, the MLC scheme provides

a superior performance-complexity trade-off. Compared to the BICM scheme at a similar

decoding complexity, the MLC scheme provides SNR gains of up to 0.4 dB. Also, at a

similar operating point, up to 60% reduction in decoding complexity can be achieved by

the MLC scheme. We believe that this advantage is obtained because the MLC scheme,

despite its shorter block length, leaves all but one bit levels uncoded, unlike the BICM

decoder which processes all bit levels.

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 53

16.1 16.3 16.5 16.7 16.9 17.1 17.3 17.510−3

10−2

Es/N0 dB

Pout

P tout

(a) MLC inner codes

16.2 16.4 16.6 16.8 17 17.2 17.4 17.610−3

10−2

Es/N0 dB

Pout

P tout

(b) BICM inner codes

Figure 4.5: Simulated decoder outputs of inner codes for designs at 28% OH with 64-QAM. The mid-point on each BER curve (highlighted by an ‘o’) is the code operationalpoint, i.e, the SNR for which the inner code is designed to achieve Pout ≤ P t

out.

The proposed MLC scheme can operate within a 1.4 dB gap to the CSL, achieving an

NCG of up to 13.6 dB with a complexity score below 24. We also see that the obtained

MLC-based codes attain 0.24 dB better coding gain than the code of [2] at a similar

decoding complexity, and a 30% reduction in decoding complexity at a similar NCG.

In Fig. 4.5, we plot the average BER passed to the outer code versus SNR, for the

obtained codes in the MLC and BICM schemes. The P tout line shows the target we set

for the inner codes. The mid-point SNR on each curve (highlighted by an ‘o’) is the code

operational point, i.e., the SNR for which the inner code is designed. Note that all BERs

of the sampled codes hit very close to the target at their operational point, verifying our

design approach.

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 54

11 11.2 11.4 11.6 11.8 12 12.2 12.4 12.6 12.8100

101

102

3

4

67

20

128

65

4

Es/N0 dB

η

[3] BICMBICM[3] MLCMLC

(a) 16-QAM, CSL=10.15 dB.

16.4 16.8 17.2 17.6 18 18.4 18.8100

101

102

2

3

6716

1297 7 6 5

Es/N0 dB

η

[3] BICMBICM[3] MLCMLC

(b) 64-QAM, CSL=15.4 dB.

22 22.2 22.4 22.6 22.8 23 23.2 23.4 23.6 23.8100

101

102

3

5510

86

5 5 5 4 4 4 4 3 3

Es/N0 dB

η

[3] BICMBICM[3] MLCMLC

(c) 256-QAM, CSL=20.45 dB.

Figure 4.6: Performance-complexity comparisons of the obtained optimized codes forMLC and the BICM of various orders at 25% overall OH, compared with the designsin [3]. The number of decoding iterations required by each designed code is indicated.

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 55

4.5.2 Design for 25% OH

For coded modulation designs at 25% OH, we consider three modulation orders, namely

16-QAM, 64-QAM and 256-QAM with PAM frame lengths of 12000, 8000, 6000 (which

amounts to 6000, 4000, and 3000 QAM symbols), respectively. These choices for the

PAM frame lengths would result in a constant bit throughput in all modulation schemes.

For the MLC scheme, we use a rate-(2/3) length-12000 inner code for 16-QAM, a

rate-(1/2) length-8000 inner code for 64-QAM, and a rate-(1/3) length-6000 inner code

for 256-QAM, on the LSB channel. For the BICM scheme, we use a rate-(5/6) and length

24000 for 16-QAM, 64-QAM and 256-QAM schemes.

In Fig. 4.6, we plot the Pareto frontier obtained by MLC and BICM scheme at various

modulation orders. Compared to the BICM scheme at a similar decoding complexity,

the MLC scheme provides SNR gains of up to 0.4 dB, 0.8 dB, and 1.2 dB for 16-QAM,

64-QAM and 256-QAM, respectively. At a similar operating point, up to 43%, 64%, and

78% reduction in decoding complexity can be achieved by the MLC scheme for 16-QAM,

64-QAM and 256-QAM, respectively. Also in that order, the MLC schemes can operate

within 1 dB, 1.2 dB, and 1.65 dB gap to the CSL, achieving NCGs of up to 12.8 dB,

13.6 dB dB, and 14 dB with complexity score of just under 40, 22, and 12.

In Fig. 4.6, we also see that the obtained MLC and BICM codes provide a supe-

rior performance-complexity trade-off compared to the MLC and BICM codes of [3],

respectively. At a similar NCG, for the MLC-based schemes of 16-QAM, 64-QAM and

256-QAM, a 23%, 54%, and 55% reduction in decoding complexity was achieved by our

codes, compared to the codes of [3], respectively. Also at a similar NCG, for the BICM-

based schemes of 16-QAM, 64-QAM and 256-QAM, our codes achieve a 73%, 80%, and

81% reduction in decoding complexity, respectively, compared to the codes of [3].

4.5.3 Design Example1

Here in detail, we describe a design example for the MLC scheme. We explain the

parameters we pick for the FEC scheme and carefully design the interleaver between the

inner and outer code. To validate the system operation, we then implement the encoder

and decoder of both the inner and the outer code and provide BER measurements down

to 10−7.

1The FEC design and simulation presented in this section is a joint work with Alvin Sukmadji.

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 56

4 6 8 10 12 14 160

1

2

3

4

5

6

1.5dB

2dB1dB

0.8dB

Es/N0 dB

AIR

(bit/sym

bol)

Shannon Cap.

64-QAM

16-QAM

Designed Code

400ZR [4]

Figure 4.7: Achievable information rate for 16- and 64-QAM modulations compared tothe unconstrained Shannon capacity. The operational point of the designed concatenatedcode is also shown and compared to that of [4].

FEC Parameters

In Fig. 4.7 we plot the achievable information rate (AIR) of signalling with 16-QAM

alongside that of the 64-QAM and the Shannon capacity. We pick signalling with 16-

QAM and target an overall OH of 25% (3.2 bit/symbol– a high spectral efficiency). At

this OH, the loss due to signalling with 16-QAM compared to the unconstrained Shannon

limit is a mere 1 dB, and the loss compared to signalling 64-QAM is insignificant. By

contrast, at the rate-3.485 bit/symbol of the recent 400ZR implementation agreement [4],

the loss due to signalling with 16-QAM compared to the Shannon limit is around 1.5 dB.

For the outer zipper code, we picked a length-3960 (Galois field extension degree 12),

3 error-correcting constituent code, which gives an outer rate of

Rout = 1− 3 · 12

3960/2.

We define a chunk to have 1210 constituent codes at the zipper decoder with a total

of 6 chunks (around 14.4 Mbits) per decoding window. This outer code has threshold

1.07 × 10−3. Setting the inner-code rate to Rin,MLC = 17/27 then results in 25% overall

OH. We target operation at 0.8 dB gap to the CSL. The inner-code optimization routine

of Sec. 4.3.3 then gives the degree distributions

L(x) = (10x+ 8x5 + 9x6)/27,

R(x) = (6x10 + 4x11)/10,

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 57

with 14 decoding iterations. See Fig. 4.7 for the operational point of the obtained code.

It also turns out that the optimization routine gives the same degree distributions, but

with 12 decoding iteration, for operation at 0.85 dB gap to the CSL. We sampled a base

code of length 27 from this ensemble and lifted it by a factor of 45× 55 = 2475 to obtain

a girth-10 code.

Interleaver Design

Note that the inner code only guarantees that the average BER on bits passed to the outer

code is at or below a target. However, VNs of different degrees have different reliabilities

and similarly, the bits carried on the uncoded level that are demapped conditioned on

those VNs also have different reliabilities. The task of the interleaver therefore is not

only to reduce, to the extent possible, the correlation among bits at each constituent

code, but also to ensure that each constituent code observes, on average, the same BER.

To design such interleaver, we first classify the bits passed to outer code. Note that

the base inner frame produces 44 bits; 17 information bits from the coded level and 27

bits from the uncoded level. We group their corresponding lifted bits in 44 different

classes. Further, we divide bits of each class into cards of 45 bits. Therefore, with each

inner frame, a total of (27 + 17) · 2475 = 108, 900 bits are passed to the outer code–

17 ·2475 bits from the coded level and 27 ·2475 from the uncoded level– among which we

have 44 classes of 2475 bits each. The bits of each class are then divided into 55 cards of

45 bits each, for a total of 2420 = 1210 · 2 cards per inner frame.

We have 22 inner frames per chunk. In Fig. 4.5.3 we depict interleaving and placement

of the inner frames into the real buffer of the zipper decoder chunk. Here, we first stack

the cards of each inner frame vertically on top of each other in two decks (note that

we have 1210 constituent codes per chunk at the outer decoder and therefore each inner

frame will have 2 cards per constituent code). Then, we vertically shift the two decks of

each inner frame as follows: The first inner-frame decks are shifted by 0 cards (no shift),

the second inner-frame decks are shifted by 55 cards, the third inner-frame decks are

shifted by 2 · 55 cards, and so on, and finally, the 22nd inner-frame decks are shifted by

21 · 55 cards. The shifted decks then form the zipper decoder chunk. With this labelling,

we ensure that, 1) each constituent code has an equal number of bits from the 22 inner

frames, providing maximum possible mitigation of correlation among its bits, and 2) each

constituent code has an equal number of bits from the 44 classes of inner bits, providing

it with the expected BER.

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 58

C(0)0 C

(0)1

· · · C(0)1209 C

(0)1210 C

(0)1211

· · · C(0)2419

C(1)0 C

(1)1

· · · C(1)1209 C

(1)1210 C

(1)1211

· · · C(1)2419

C(21)0 C

(21)1

· · · C(21)1209 C

(21)1210 C

(21)1211

· · · C(21)2419

......

......

......

22

108900 24750

45Inner Frames in a chunk.

C(0)0

C(0)1

...

C(0)54

C(0)55

...

C(0)1209

C(0)1210

C(0)1211

...

C(0)1264

C(0)1265

...

C(0)2419

C(1)1155

C(1)1156

...

C(1)1209

...

C(1)0

C(1)1154

C(1)2365

C(1)2366

...

C(1)2419

C(1)1210

...

C(1)2364

C(21)55

C(21)56

...

C(21)109

C(21)110

...

C(21)54

C(21)1265

...

C(21)1266

C(21)1319

C(21)1320

...

C(21)1264

. . .

. . .

. . .

. . .

. . .

45

1980

1210

Real buffer of the chunk.

Figure 4.8: The interleaving and placement of bits into the real buffer of the outer decoderper chunk, for the FEC parameters discussed in Sec 4.5.3.

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 59

10.940 10.960 10.980 11.000 11.020

10−6

10−5

10−4

10−3

SNR (dB)

BER

14 inner iterations12 inner iterations

Figure 4.9: BER simulations for the designed concatenate LDPC-zipper FEC scheme.

BER measurements

In Fig. 4.5.3 we plot the BER simulation results for the obtained concatenated LDPC-

zipper FEC scheme. Here we have used a diagonal interleaver between the real and

virtual buffer at the outer code. We plot the results for both 14 and 12 inner decoding

iterations.

For each SNR, we ran a few trials of the simulation. A trial is considered to be

complete when the decoder records a total of 50 bursts, where a burst is loosely defined

as a sequence of received erroneous chunks of length between 3 and 40, inclusive. The

tips of each error bar denote the maximum and minimum BER values we obtained in

these trials. See Tables 4.2 and 4.3 for more details on the statistics of the simulation

results.

4.6 Conclusion

In this chapter, we have compared performance-complexity tradeoffs achievable by MLC

and BICM in a concatenated coded modulation system at 28% and 25% OHs, using

various QAM modulation schemes. For both systems we have used state-of-the-art op-

timization strategies to obtain a complexity-optimized error-reducing LDPC inner code,

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 60

Table 4.2: Statistics of the simulation results shown in Fig. 4.5.3 (14 inner iterations)SNR (dB) # of trials average stdev max min

10.948 5 4.630× 10−4 2.543× 10−5 4.898× 10−4 4.304× 10−4

10.949 5 3.041× 10−4 2.988× 10−5 3.258× 10−4 2.539× 10−4

10.950 5 2.081× 10−4 4.365× 10−5 2.798× 10−4 1.708× 10−4

10.951 5 1.410× 10−4 1.574× 10−5 1.652× 10−4 1.232× 10−4

10.952 5 6.952× 10−5 7.969× 10−6 7.890× 10−5 6.262× 10−5

10.953 5 3.614× 10−5 7.306× 10−6 4.527× 10−5 2.482× 10−5

10.954 5 2.152× 10−5 5.490× 10−6 2.630× 10−5 1.483× 10−5

10.955 5 9.584× 10−6 2.096× 10−6 1.189× 10−5 6.857× 10−6

10.956 5 5.061× 10−6 2.988× 10−7 5.294× 10−6 4.567× 10−6

10.957 5 3.192× 10−6 5.699× 10−7 4.068× 10−6 2.510× 10−6

10.958 5 1.393× 10−6 3.863× 10−7 1.885× 10−6 9.007× 10−7

10.959 5 7.239× 10−7 2.923× 10−7 1.077× 10−6 4.722× 10−7

Table 4.3: Statistics of the simulation results shown in Fig. 4.5.3 (12 inner iterations)SNR (dB) # of trials average stdev max min

10.996 4 8.116× 10−4 4.010× 10−6 8.172× 10−4 8.086× 10−4

10.997 4 7.836× 10−4 5.076× 10−6 7.898× 10−4 7.774× 10−4

10.998 4 7.511× 10−4 3.834× 10−6 7.568× 10−4 7.487× 10−4

10.999 4 7.048× 10−4 1.154× 10−5 7.172× 10−4 6.915× 10−4

11.000 8 6.747× 10−4 1.234× 10−5 6.906× 10−4 6.518× 10−4

11.001 8 6.419× 10−4 1.991× 10−5 6.846× 10−4 6.243× 10−4

11.002 8 5.539× 10−4 1.875× 10−5 5.799× 10−4 5.248× 10−4

11.003 8 4.793× 10−4 2.075× 10−5 5.062× 10−4 4.464× 10−4

11.004 8 3.624× 10−4 3.549× 10−5 4.189× 10−4 3.168× 10−4

11.005 8 2.420× 10−4 2.040× 10−5 2.671× 10−4 2.133× 10−4

11.006 8 1.212× 10−4 1.043× 10−5 1.406× 10−4 1.074× 10−4

11.007 8 7.028× 10−5 1.627× 10−5 9.050× 10−5 4.617× 10−5

11.008 8 2.595× 10−5 4.584× 10−6 3.340× 10−5 2.073× 10−5

11.009 8 1.232× 10−5 2.030× 10−6 1.536× 10−5 1.068× 10−5

11.010 8 5.672× 10−6 9.819× 10−7 7.430× 10−6 4.155× 10−6

11.011 8 2.412× 10−6 3.972× 10−7 2.876× 10−6 1.691× 10−6

11.012 8 8.857× 10−7 1.396× 10−7 1.107× 10−6 7.002× 10−7

11.013 8 4.257× 10−7 8.815× 10−8 5.359× 10−7 2.991× 10−7

Chapter 4. Low-Comp. Concat. FEC for Higher-Order Modulation 61

to concatenate with an outer hard-decision code. We characterize the trade-off between

performance and decoding complexity by a Pareto frontier. Our results show that the

MLC schemes, despite operating with a shorter block length than BICM, dominate the

BICM schemes from a performance-complexity standpoint. Our complexity-optimized

MLC and BICM schemes also provide a superior performance-complexity trade-off rel-

ative to existing proposals [2, 3], achieving net coding gains of up to 14 dB, yet with

manageable complexity.

We emphasize that our choice to inner-encode only one bit-level in the MLC scheme

is driven by complexity considerations. The fact that the remaining bit-levels can be left

uncoded is permitted because of the presence of an outer code, and then only in certain

settings. Nevertheless, for modulations with up to 256 points, we have shown that the

MLC scheme provides excellent performance-complexity tradeoffs with this architecture.

Furthermore, in Chapter 6 we obtain MLC schemes in which we protect more than one

bit-level by a complexity-optimized non-binary LDPC code, achieving even better FEC

performance.

Chapter 5

Low-Complexity Rate- and

Channel-Configurable Concatenated

Codes

5.1 Introduction

Conventionally, efficient and low-complexity FEC schemes have been designed for a spe-

cific system throughput and channel quality. The rapid adoption of OTNs that can

operate with various modulation formats at a variety of transmission rates and channel

qualities requires, however, that researchers rethink this convention and design config-

urable FEC schemes that can be deployed in multiple modes of operation. In this chapter,

we propose a design approach for low-complexity FEC schemes that can be configured

to operate at multiple transmission rates and channel qualities.

Code designs configurable to channel variations have been studied previously [108,

109]. In FEC design for optical communication, researchers have considered scalable

designs that trade coding gain for low-complexity operation. In the widely used FEC so-

lutions for OTNs—e.g., product-like codes, LDPC codes, and turbo codes—this trade-off

can often be realized by scaling the number of decoding iterations [110, 111]. An exper-

imental implementation of such a scalable FEC scheme with 20.5% OH was presented

in [112].

Rate-adaptive FEC schemes for optical communication have also been studied previ-

ously [113, 114]. Variable-rate FEC design for optical communication has been realized

This chapter includes and expands on the work in [107].

62

Chapter 5. Low-Comp. Rate- and Channel-Config. Concat. Codes 63

by various approaches, including shortening [115–117], puncturing [118], and selective

use of code concatenation [91]. More recently, a concatenated polar-staircase code struc-

ture was proposed in [5], providing rate adaptability with near-continuous granularity by

varying the size of the polar-code frozen set. Rate-adaptive coded modulation schemes

have also been considered for the optical channel [119–121]. In combination with shaping,

rate-adaptability has been realized by an adjustable distribution matcher that performs

probabilistic constellation shaping [122, 123]. Experimental validation of rate-adaptive

FEC schemes for optical communication has been widely reported, e.g., in [123,124].

In this chapter, we propose a design approach for attaining low-complexity, multi-

rate, and channel-adaptive FEC schemes that can provide excellent coded-modulation

performance and are of practical relevance to optical communication. In the designed

concatenated FEC schemes, the transmission rate is configurable by signalling with vari-

ous modulation formats and by shortening or lengthening the inner code, and functioning

at various channel qualities is realized by scaling the inner-decoding operation. We re-

formulate the design tools reported in Chapter 4 with the aim of obtaining a configurable

FEC scheme with near-optimal decoding complexity at its various operating points.

We design a number of configurable FEC schemes flexible to operate at various trans-

mission rates and with various modulation formats compatible with the recent propos-

als [4,5]. Compared to the configurable FEC scheme of [5], the designs reported here can

provide up to 63% reduction in complexity while delivering a similar performance and

provide up to 0.6 dB coding gain when operating at a similar decoding complexity.

The FEC design approach advocated in this chapter can also be used to address

the need for multi-vendor interoperable modules in current and future standards [4].

Moreover, FEC flexibility towards modulation format, data rate, and the delivered coding

gain is a key feature that is necessary in future coherent optical networks [22, 125]. The

approach presented in this chapter can also be applied to FEC design for time-domain

hybrid modulation formats.

The rest of this chapter is organized as follows. In Sec. 5.2 we describe the concate-

nated code structure, the modulation formats we work with, and the MLC architecture

we incorporate in our design. In Sec. 5.3 we describe the inner-code, its parameterization,

and its configurable design. In Sec. 5.4 the inner-code optimization and construction are

explained. In Sec. 5.5 we present simulation results for the various FEC schemes that we

have designed using these optimization tools, characterize their trade-offs, and compare

them to the existing state of the art. In Sec. 5.6 we provide concluding remarks.

Chapter 5. Low-Comp. Rate- and Channel-Config. Concat. Codes 64

Outer HDEncoder

π LDPCEncoder

Ω2m

Mapper

m− 1

Rin 1

Rinm

(a) Encoder

Ω2m

Demapper LDPCDecoder

HD-MSBs

π−1 Outer HDDecoder

m− 1

1 1−Rin

Rin

m Iter. No.

m− 1

(b) Decoder

Figure 5.1: The encoder and the decoder in the configurable FEC scheme. Here, m =log2M denotes the number of bits per PAM symbol.

5.2 Concatenated Code Description

We adopt a similar concatenated FEC structure to that in Chapter 4, Sec. 4.3. We

consider an inner, SD, rate- and channel-configurable LDPC code concatenated with a

high-rate outer HD zipper code. We use the zipper codes of [14, Table 1] as outer codes.

The inner code is concatenated with the outer code through an interleaver, π (see the

encoder and decoder of Fig. 5.1).

Similar to the designs in previous chapters, the task of the inner LDPC code is to

reduce the BER of the bits transferred to the outer code to below its threshold. When

setting a target BER on bits passed to the outer code, P tout, we leave a margin (of 7

to 10%) compared to the outer-code thresholds reported in [14], to enable a reduced

interleaver size between inner and outer codes and a practical realization of our designs

(see Section 5.5).

The concatenated FEC scheme works in conjunction with QAM schemes of various

orders. For concreteness, we consider uniform rectangular M2-QAM, with M ∈ 2, 4, 8,in our FEC schemes. It follows that the number of QAM symbols per frame is half that

of the number of PAM symbols. Throughout this chapter we let m = log2M denote

the number of bits mapped to each PAM symbol. We note, however, that the proposed

configurable FEC scheme can be designed to incorporate essentially any modulation

format.

When a binary modulation (m = 1) is considered, a concatenated FEC scheme similar

to that of [87] is assumed. When the concatenated FEC scheme is to work with a higher-

order modulation (m > 1), we assume that a multi-level coding and multi-stage decoding

structure similar to Sec. 4.3 is deployed. In this architecture, as shown in Fig. 5.1, only

Chapter 5. Low-Comp. Rate- and Channel-Config. Concat. Codes 65

the LSB is encoded by the inner code, and the MSBs are protected only by the outer

code. A similar constellation labelling to that of Sec. 4.3.1 is considered. At the receiver,

a hard-decision on the MSBs, taking into account the hard-decision on the inner-decoder

output bits and the channel information, is passed through the de-interleaver to the outer

decoder (see Fig. 5.1(b)). We assume that the inner decoder passes only hard-decided

bits to the outer decoder and the MSB demapper.

Throughout this chapter, we consider only unshaped (i.e., uniformly-distributed) sig-

nalling schemes with square QAM constellations. Probabilistic- and geometric constella-

tion shaping can provide a power advantage over uniform signalling, and can also provide

rate-configurability by adjusting the entropy of the shaping distribution [123]. It is well

known that the optimal shaping parameters, in both the probabilistic- and geometric-

shaping variants, depend on launch power, target transmission rate, and constellation

size [126]. On the other hand, implementation of shaping schemes adds to the encod-

ing and decoding complexity. We have not attempted in this work to characterize the

tradeoffs between redundancy, reliability and complexity when shaping schemes are in-

corporated; instead we prefer to think of shaping as an independent operation aimed at

narrowing the performance gap to the unconstrained Shannon limit, with its own at-

tendant tradeoffs between performance and complexity. If desired though, as we briefly

sketch in Section 5.4.2, shaping can be incorporated by certain adjustments to our design

procedure.

For the purposes of code design, we model the channel as an additive white Gaussian

noise channel. We optimize the configurable FEC scheme to operate with minimal devi-

ation relative to its reference complexities, i.e., the complexity of the codes individually

designed for its operating points. For the complexity scores, we use the measure described

in Sec. 3.2.3, that is obtained as η = ηin/Rout, where ηin is the inner-code complexity

score, and Rout is the outer-code rate.

5.3 Inner-Code Description

We consider a FEC scheme with J operating points. Throughout this chapter, we use

subscript j to denote parameters specific to the j-th operating point, with j ∈ 0, . . . J−1. The j-th operating point specifies the pair (Rin,j, SNRj), the inner-code rate and

the SNR at the operating point, respectively. Without loss of generality we assume

Rin,j ≤ Rin,j+1, for j ∈ 0, . . . J − 2.Similar to code designs in Chapters 3 and 4, we design ensembles of systematic LDPC

codes where we designate a particular subset of the VNs to be the information set,

Chapter 5. Low-Comp. Rate- and Channel-Config. Concat. Codes 66

while the remaining VNs form the parity set. We let Nj denote the number of VNs

in a particular Tanner graph of the j-th operating point drawn from the ensemble and

let Ni,j be the number of degree-i VNs in the graph. We divide the VNs into two

groups: information nodes and parity nodes, representing information bits and parity

bits, respectively. We let Kj denote the number of information nodes and let Ki,j be

the number of degree-i information nodes in the graph. The code rate, therefore, is

Rin,j = Kj/Nj.

We denote the VN perspective degree distribution for the j-th operating point by

Lj(x) =∑Dv

i=0 Li,jxi, where Li,j = Ni,j/Nj is the fraction of VNs that have degree i,

and Dv denotes the maximum VN degree allowed in the ensemble. Note that we permit

uncoded bits in the ensemble, i.e., we allow L0,j ≥ 0. We define the edge-perspective VN

degree distribution as λj(x) , L′j(x)/L′j(1) =∑Dv

i=1 λi,jxi−1, where L′j(x) = dLj(x)/dx.

We consider a CN degree distribution that is concentrated on two consecutive degrees,

namely, dcj and dcj + 1, with dcj denoting the average CN degree. It is easy to see that,

for i ∈ 1, . . . , Dv, Li,j = dcj(1− Rin,j)λi,j/i. We let Rj(x) =∑dcj+1

d=dcjRd,jx

d denote the

node perspective check-degree distribution, where Rd,j is the fraction of CNs of degree

d, with ρj(x) , R′j(x)/R′j(1) denoting the edge-perspective CN degree distribution.

Let Ui,j = Ki,j/Nj be the share of degree-i information nodes among all VNs. Since all

degree-zero VNs must be among the information nodes, we have U0,j = L0,j. Furthermore,

Ui,j ≤ Li,j for i ∈ 1, . . . , Dv, and∑Dv

i=0 Ui,j = Rin,j.

For j ∈ 0, . . . J − 2, we design the configurable inner code such that where Rin,j <

Rin,j+1, the inner code associated with the j-th operating point is shortened from that

of the (j + 1)-th operating point. Since the number of CNs remains the same when

shortening an LDPC code, we necessarily have Nj − Kj = Nj+1 − Kj+1, and therefore

Nj(1 − Rin,j) = Nj+1(1 − Rin,j+1). However Ki,j ≤ Ki,j+1 (due to the shortening); thus

dividing Ki,j and Ki,j+1 by Nj(1−Rin,j) and Nj+1(1−Rin,j+1), respectively, we get

Ui,j1−Rin,j

≤ Ui,j+1

1−Rin,j+1

.

Similarly, since shortening preserves the degree-i parity nodes, we must have Ni,j−Ki,j =

Ni,j+1 −Ki,j+1 and therefore

Li,j − Ui,j1−Rin,j

=Li,j+1 − Ui,j+1

1−Rin,j+1

.

We allow the possibility that Rin,j = Rin,j+1, i.e., Li,j = Li,j+1 and Ui,j = Ui,j+1, meaning

the code structure stays the same, but with possibility of having a different number of

Chapter 5. Low-Comp. Rate- and Channel-Config. Concat. Codes 67

decoding iterations.

We let Λj = (λ1,j, λ2,j, . . . , λDv ,j) and Uj = (U0,j, U1,j, . . . , UDv ,j). Further, we let

Λ = (Λ1,Λ2, . . . ,ΛJ) and let U = (U1, U2, . . . , UJ). We refer to the pair (Λ,U) as

the design parameters. The design parameters are used in the complexity-optimization

program.

Let Ej denote the number of edges in a particular inner-code Tanner graph of the

j-th operating point that are not connected to a degree-1 VN. Also, let Ij denote the

maximum number of inner decoding iterations performed at the j-th operating point. The

complexity score of the inner code at the j-th operating point, ηin,j, is then computed as

ηin,j =EjIj

Nj(mj − 1 +Rin,j), (5.1)

and measures the number of messages that are passed at the inner decoder per informa-

tion bit transferred to the outer code at the j-th operating point. Here, mj − 1 denotes

the number of bits per in-phase and in-quadrature PAM symbols that bypass the inner

code, per PAM symbol. Note that those bits incur zero inner decoding complexity as

accounted for in (5.1). It is easy to see that Ej/Nj = (1 − Rin,j)(dcj − νj), where νj

is the average number of degree-one VNs connected to each CN in the Tanner graph.

Therefore, ηin,j can be obtained as

ηin,j =(1−Rin,j)(dcj − νj)Ij

mj − 1 +Rin,j

. (5.2)

5.4 Ensemble Optimization and Code Construction

5.4.1 Reference Complexities

The coded modulation scheme at any operating point is structured such that the inner

code observes an output-symmetric channel (see [15, Def. 4.8]). Therefore, the BER and

the EXIT function analysis used for code design in Sec 4.3.3 remains applicable.

Let the uni-parametric EXIT function corresponding to the j-th operating point be

denoted by fj(p). Similar to Sec. 3.3.1, fj(p) can be expressed in terms of elementary

EXIT functions, fi,j(p), as fj(p) =∑Dv

i=0 λifi,j(p), where i ∈ 1, . . . , Dv. The function

fi,j(p) outputs the probability of error in messages emitted from the degree-i VNs after

one round of message-passing. Note that computation of fi,j(p) depends only on the SNR

and modulation format of the j-th operating point. Given the the j-th operating-point

SNR, dcj , and νj, we can pre-compute and store the fi,j(p) values and use them in the

Chapter 5. Low-Comp. Rate- and Channel-Config. Concat. Codes 68

ensemble optimization.

Given the elementary EXIT functions, the number of iterations, Ij, required by the

inner code to take the VN message error probability from p0,j, the channel BER, down

to pt,j, a target message error probability, can be approximated as [61]

Ij ∼=∫ p0,j

pt,j

dp

p log(

pfj(p)

) .Therefore, from (5.2), the inner-code complexity at the j-th operating point, ηin,j, can

be approximated as

ηin,j∼=

(1−Rin,j)(dcj − νj)mj − 1 +Rin,j

∫ p0,j

pt,j

dp

p log(

pfj(p)

) .Let Pinfo,j and Pparity,j denote the information-set BER, and the parity-set BER,

respectively, after the target message error probability, pt,j, is achieved. As explained

in Sec. 4.3.3, Pinfo and Pparity, can be computed from pt,j and the elementary EXIT

functions. Let Pout denote the BER on bits passed to the outer decoder. In terms of the

ensemble parameters, Pout,j can be obtained as

Pout,j = ajPinfo,j + bjPparity,j + cj, (5.3)

where aj, bj, cj are independent of the inner-code design parameters and can be pre-

computed as described in equation (4.3.3).

The complexity-optimized inner-code specifically designed for the j-th operating point

is obtained as described in Sec. 4.3.3 and by searching over a discrete set of values for

dc,j, νj, and pt,j. We refer to the minimum achievable inner-code complexity score as the

j-th reference complexity and denote it by η∗in,j.

5.4.2 Configurable Inner-Code Optimization

We aim at designing a configurable FEC scheme that maintains a close-to-optimal com-

plexity score at its operating points. While there are many possible ways to give a

precise meaning to “close-to-optimal,” in this chapter we use the relative deviation with

respect to the reference complexity as the cost associated with a configurable scheme

at an operating point. Once all the η∗in,j’s are obtained, the optimized configurable

inner-code ensemble is obtained as follows. We search over a discrete set of values for

dc,0 × dc,1 × . . . dc,J−1, ν0 × ν1 × . . . × νJ−1, and pt,0 × pt,1 × . . . × pt,J−1, and, for each

Chapter 5. Low-Comp. Rate- and Channel-Config. Concat. Codes 69

choice we solve the following optimization problem:

minimize(Λ,U)

γ = max

(ηin,0

η∗in,0,ηin,1

η∗in,1, . . . ,

ηin,J−1

η∗in,J−1

), (5.4)

subject toUi,j

1−Rin,j

≤ Ui,j+1

1−Rin,j+1

, (5.5)

Li,j − Ui,j1−Rin,j

=Li,j+1 − Ui,j+1

1−Rin,j+1

, (5.6)

Dv∑i=1

λi,ji≥ 1− L0,j

dc,j(1−Rin,j), (5.7)

Dv∑i=1

λi,j = 1, λ1,j dc,j = νj, (5.8)

0 ≤ λi,j ∀i ∈ 1, . . . , Dv, (5.9)

Dv∑i=0

Ui,j = Rin,j, U0,j = L0,j, (5.10)

0 ≤ Ui,j ≤ Li,j ∀i ∈ 1, . . . , Dv, (5.11)

fj(p) < p ∀p ∈ [pt,j, p0,j], (5.12)

ajPinfo,j + bjPparity,j + cj ≤ P tout, (5.13)

where constraints (5.5)–(5.6) should hold true for all j ∈ 0, . . . , J − 2 and constraints

(5.7)–(5.13) should hold true for all j ∈ 0, . . . , J−1. We call γ the maximum complexity

deviation ratio of a given ensemble set. The objective is to find the configurable inner-

code ensemble set which minimizes γ; the resulting γ is called γ∗.

Note that constraints (5.5)–(5.6) ensure that the obtained codes have a compatible

structure. Constraints (5.7)–(5.12), similar to constraints (3.14)–(3.19), are the validity

constraints. Constraint (5.13) then ensures that the BER on bits passed to the outer

decoder is at or below the set target at each operating point.

The code optimization problem can be solved using the methods described in Sec. 4.3.3;

however, as the number of operating points we optimize for increases, it becomes infeasi-

ble to search over a sufficiently dense subset of the dc,j’s, νj’s, and pt,j discrete parameter

spaces. A practical differential-evolution-based method for obtaining the optimal config-

urable inner-code ensemble is described in the next section.

While not considered in this work, the optimization problem can be modified to incor-

porate probabilistic amplitude shaping, for m > 1, in a reverse concatenated architecture

as in [127]. A first modification would be to use a constellation labelling in which the

Chapter 5. Low-Comp. Rate- and Channel-Config. Concat. Codes 70

LSB alternates between adjacent symbols, and the MSBs, given the LSB, are Gray-

labelled while signal magnitude is indicated by the MSBs. For example, the labelling

Ω8 = (110, 011, 100, 001, 000, 101, 010, 111) could be used for the in-phase 8-PAM of a

64-QAM constellation. A distribution matcher can then perform probabilistic amplitude

shaping by adjusting the distribution of the MSBs. Of course the additional redundancy

and complexity introduced by the distribution matcher must be accounted for in the

overall OH and complexity score. Note that with this labelling, the LSB channel remains

output-symmetric and the EXIT function analysis used for code design remains valid.

5.4.3 Code Optimization Via Differential Evolution

We use a similar method as in Sec. 4.4.4 to characterize and eventually optimize the

configurable FEC scheme. Compared to the method described in Sec. 5.4, despite be-

ing less rigorous, this method obtains very similar optimized ensembles with a lower

computational complexity.

For notational simplicity, here we only limit the method to codes in which νj = 1 for

j ∈ 1, . . . , J. In fact, in most of the optimal codes we found, it turns out that νj = 1.

Note, however, that the method presented here can easily be extended to the general

case.

EXIT Function Analysis

Let the function Υ(σ) be defined as in equation (4.16). For the j-th operating point,

we let SNRj denote the equivalent binary-input additive white Gaussian noise surrogate

channel-SNR [99, Sec. IV-B2]. The corresponding channel log-likelihood ratio has a

symmetric Gaussian distribution with variance σ2j = 4SNRj.

As suggested in [128], in the `-th iteration, the message from a degree-i VN , I`vj→cj(i)

and the message from a degree-d CN, I`cj→vj(i), for i ∈ 2, . . . , Dv and d ∈ dcj , dcj + 1are obtained as

I`vj→cj(i) = Υ

(√(i− 1)Υ−1

(I`−1cj→vj

)2

+ σ2j

),

I`cj→vj(d) = 1−Υ

(((d− 2)Υ−1

(1− I`vj→cj

)2

+ Υ−1 (1−Υ (σj))2

) 12

),

Chapter 5. Low-Comp. Rate- and Channel-Config. Concat. Codes 71

where

I`vj→cj =Dv∑i=2

λi,j1− λ1,j

I`vj→cj(i),

I`cj→vj =

dcj+1∑d=dcj

ρd,j −Rd,jλ1,j

1− λ1,j

I`cj→vj(d).

Here, we have excluded the edges that connect to degree-one VNs. Initially, we let

I0vj→cj = Υ(σj) and I0

cj→vj = 0. Furthermore, we use the approximations in [104, Eqs.

(9),(10)] in computing Υ(σ) and Υ−1(σ).

After ` iterations, the APP mutual information at the degree-i VNs, IAPP,`j , for i ∈

2, . . . , Dv, is obtained as

IAPP,`j (i) = Υ

(√iΥ−1

(I`cj→vj

)2

+ σ2j

). (5.14)

For i = 1, first we obtain the message from a degree-d CN to the degree-1 VNs, for

d ∈ dcj , dcj + 1, as

I`,1cj→vj(d) = 1−Υ

(√(d− 1)Υ−1

(1− I`vj→cj

)2).

Then, the APP mutual information at the degree-1 VNs is obtained from (5.14), but

with substituting I`cj→vj with I`cj→vj , obtained as

I`cj→vj =

dcj+1∑d=dcj

ρd,jI`,1c→vj(d).

The BER on the degree-i VNs, ε`i,j, is then obtained as

ε`i,j =1

2erfc

(σi,j

2√

2

),

where σi,j = Υ−1(IAPP,`j (i)

)is obtained using (5.14), and erfc(x) is the standard comple-

mentary error function. The BER on the inner-code information bits, Pinfo,j, and parity

Chapter 5. Low-Comp. Rate- and Channel-Config. Concat. Codes 72

bits Pparity,j, are therefore obtained as

Pinfo,j =Dv∑i=0

Ui,jεi,j, (5.15)

Pparity,j = ε1,j, (5.16)

where ε0,j is the bit-level channel BER. Note that for νj = 1 we must have U1,j = 0.

Differential Evolution

We obtain the optimized configurable inner-code degree distributions following a similar

differential evolution algorithm as in [105, Ch. 3, Sec. 3.3]. Given the J operating points

and their reference inner-code complexities, i.e., the values for (Rin,j, SNRj) and η∗in,j,

the differential evolution algorithm searches for the J degree distributions that minimize

the maximum complexity deviation ratio from the reference complexities as denoted by

(5.4).

Before we describe the differential evolution operation, we define the helper function

config(C) that generates UC , a collection of valid distributions for the information bits of

a configurable code, based on its input, matrix C. Let C(:, j) denote the j-th column of

matrix C. The j-th column of UC , denoted by UC(:, j), is obtained iteratively, starting

from j = 0, as the solution to the following quadratic program:

minimizeUC(:,j)

‖UC(:, j)−C(:, j)‖2,

subject toDv∑i=0

UC(i, j) = Rin,j,

UC(1, j) = 0,

UC(i, j)

1−Rin,j

≥ UC(i, j − 1)

1−Rin,j−1

,

where we initialize UC(:,−1) to be the all-zero vector and Rin,−1 = 0.

For a given collection of J information-node degree distributions, UjJ−1j=0 , and a de-

viation γ, we first obtain the number of decoding iterations allowed for each operating

point, Ij, according to (5.2) and solving for ηin,j = γη∗in,j. A non-integral number of iter-

ations is obtained by time-sharing between decoding with bIjc and with dIje iterations.

The score of UjJ−1j=0 is then defined as the minimum γ at which Pout, as defined in (5.3),

and calculated using (5.15)–(5.16), is below the target BER, P tout, at all operating points.

The differential evolution search is performed on a size J×(Dv+1) matrix A contain-

Chapter 5. Low-Comp. Rate- and Channel-Config. Concat. Codes 73

ing the UjJ−1j=0 column vectors. Let g denote the number of generations the differential

evolution is carried for and let S denote the population size at each generation. Also,

let β > 0 be an amplification factor and let 0 ≤ ξ ≤ 1 denote a cross-over probability.

On the population of each generation, the differential evolution carries out the following

three steps, for s ∈ 1, . . . , S:

1. Generate a mutation Bs = Ai1 +β(Ai2−Ai3), where i1, i2, i3 are chosen uniformly

at random, without replacement, from the set 1, . . . , S\s.

2. Generate a competitor matrix Cs where the element in row-i and column-j, i ∈0, . . . , Dv and j ∈ 0, . . . , J − 1, is found as

Cs(i, j) =

Bs(i, j) with probability ξ,

As(i, j) otherwise.

We then let UCs = config(Cs).

3. Matrix UCs then replaces matrix As in the next generation if and only if it has a

better (i.e., lower) score.

The differential evolution search is initialized with matrices config(X1), . . . , config(XS),

where matrices X1, . . . ,XS are generated at random. After carrying differential evolu-

tion on g generations, the algorithm then outputs the matrix with the best score at the

last generation, which determines the ensembles of the optimal configurable inner-code.

5.4.4 Code Construction

We consider a QC structure for the inner code. It is well known that the QC structure

enables a hardware-friendly and energy-efficient decoder implementation [64] as required

in OTNs. Note that in solving (5.4), we obtain J optimized inner-code ensembles, ordered

in ascending rates, that describe a configurable FEC scheme.

To construct the QC inner code, we first construct a base graph in keeping with the

obtained ensembles. We let Nbj and Eb

j denote the number of VNs and edges of the

j-th base graph, respectively, and let Mb denote the number of CNs of the base graph.

We start by sampling a Tanner graph with Nb1 VNs and Mb CNs, corresponding to the

ensemble of shortest (lowest) length (rate). Then, when Rin,j < Rin,j+1, the Tanner graph

corresponding to the (j+1)-th operating point is obtained by: (a) adding Nbj+1−Nb

j VNs

to the existing graph; (b) adding Ebj+1−Eb

j edges to the new VNs in accordance with the

VN degree distribution of the (j + 1)-th inner-code ensemble; and (c) adding Ebj+1 −Eb

j

Chapter 5. Low-Comp. Rate- and Channel-Config. Concat. Codes 74

sockets to the existing CNs of the graph and connecting them randomly to the new edges

in accordance with the CN degree distribution of the (j + 1)-th inner-code ensemble.

Should the sampled base matrix not have a full-rank sub-matrix in the designated parity

positions we discard it and start over.

Once the base graph is obtained, we lift its corresponding matrix to obtain a QC

parity-check matrix of large girth for the inner code. Note that the obtained code can be

configured to work at multiple operating points: its rate can be configured by activating or

deactivating certain parts of the graph, and its operational complexity can be configured

by varying the number of decoding iterations (see the encoder and decoder of Fig. 5.1).

5.5 Results

In this section we apply the tools described above to design configurable codes that

can operate at various operational rates and complexities and with various modulation

formats. We characterize the performance of the designed concatenated FEC schemes

by their SNR gap, in dB, to the CSL, obtained from (1.1), at each operating point. We

obtain the complexity score of the operating points according to (5.2). For the designed

configurable FEC schemes, we also report γ∗, the optimal complexity deviation ratio

from the reference complexities.

We use the rate-0.97 and the rate-0.98 zipper codes from [14, Table 1] as the outer

code and we set the conservative BER target of 1.7 × 10−3 and 1 × 10−3 for them, re-

spectively. We obtain the configurable inner-code ensembles using the method described

in Appendix A. In keeping with the suggestion of [99, Sec. IV-E], we chose the following

values for the parameters of the differential evolution: S = 150, g = 100, β = 0.6, and

ξ = 0.6 for the differential evolution algorithm. We then sample base codes of small

length, according to the obtained ensembles, and lift them to obtain QC inner codes

of girth at least 8. In all the results presented, we assume floating-point sum-product

message passing at the inner decoder.

In Fig. 5.2 we plot the performance of the configurable codes versus the FEC rates.

Here, each mark denotes an operating point with a complexity score indicated by its

label; connected marks denote the operating points of a single configurable FEC scheme.

We have also reported γ∗ for each configurable FEC scheme. Two of these configurable

FEC schemes are described in detail in the examples below.

Example 1 : an FEC scheme that uses a rate-0.97 outer zipper code and is configurable

to operate with 25% OH and 64-QAM at 1.18 dB gap to the CSL, with 25% OH and

16-QAM at 1 dB gap to the CSL, and with 20% OH and 4-QAM at 0.9 dB gap to the

Chapter 5. Low-Comp. Rate- and Channel-Config. Concat. Codes 75

0.9 1 1.1 1.2 1.3

2

3

4

5

γ∗ = 1.14

γ∗ = 1.13

γ∗ = 1.05

γ∗ = 1.1214.3 10.4

15.19.739.7410.17

23.017.7

14.221.0 14.724.4

51.5

Gap to the CSL (dB)

FEC

Rate(bit/symbol)

64-QAM16-QAM4-QAM

Figure 5.2: Designed configurable FEC schemes, denoted by the connected marks. Eachmark is an operating point and its complexity score is indicated on its label.

CSL. We sampled base inner-codes of lengths 38, 57, and 142, respectively, from the

following optimized ensembles:

L0(x) = (20x+ 5x6 + 9x7 + 4x8)/38,

R0(x) = (15x8 + 5x9)/20,

L1(x) = (20x+ 18x4 + x5 + 5x6 + 9x7 + 4x8)/57,

R1(x) = (18x12 + 2x13)/20,

L2(x) = (20x+ 67x3 + 21x4 + 5x5 + 11x6

+ 12x7 + 6x8)/142,

R2(x) = (12x27 + 8x28)/20.

Note that the code with VN degree distribution L0(x) is shortened from the code with

VN degree distribution L1(x), that itself is shortened from the code with VN degree

distribution L2(x). We then lifted the obtained base code by a factor of 493 to get a

girth-8 QC code. At the designated operating points, the inner codes require 10, 11,

and 12 iterations, respectively, to bring the BER on the bits passed to the outer code to

below 1.7×10−3, which gives complexity scores of 15.1, 24.4, and 51.1, respectively. This

configurable FEC scheme has γ∗ = 1.14. It is worth noting from this example that the

optimized ensembles do not always shorten the lowest-degree variable nodes, as is often

the practice followed in conventional designs.

Example 2 : an FEC scheme with 15% OH that uses a rate-0.98 outer zipper code

Chapter 5. Low-Comp. Rate- and Channel-Config. Concat. Codes 76

0.8 1 1.2 1.4 1.6 1.8100

101

102

γ∗ = 1.05

γ∗ = 1.09

3.49

3.49 3.493.4

3.4

5.21 5.35

5.21

5.35

5.215.35

5.17

Gap to the CSL (dB)

η16-QAM64-QAMDesigned

[5]

Figure 5.3: Performance-versus-complexity comparison between the designed config-urable FEC schemes and those of [5]. The FEC rate, in bits per symbol, of each operatingpoint is indicated on its label.

and 16-QAM signalling, compatible with the 400ZR implementation agreement [4]. We

obtained an FEC scheme with configurable operational complexity to operate at various

gaps to the CSL. We sampled an inner code of length 95 from the following optimized

ensemble:

L(x) = (21x+ 5x3 + 67x4 + 2x5)/95,

R(x) = (x15 + 20x16)/21.

Note that since the modulation format and the transmission rate are the same for all

operating points, the inner code structure also remains the same across the operating

points. We then lifted the obtained base code by a factor of 210 to get a girth-8 QC

code. With 8, 10, and 13 inner-decoding iterations, the FEC scheme operates at 1.1 dB,

1 dB, and 0.9 dB gap to the CSL, which gives complexity scores of 14.2, 17.7, and 23,

respectively. The inner code brings the BER on the bits passed to the outer code to

below 1× 10−3. This configurable FEC scheme has γ∗ = 1.05.

In Fig. 5.2 we also present a low-complexity configurable FEC scheme that uses 64-

QAM signalling and can operate with 20%, 15%, and 12.5% OH, all at 1.25 dB gap to

the CSL, with γ∗ = 1.12. Another scheme presented in Fig. 5.2 can use 64- and 16-QAM

constellations and can operate at 1 dB or 1.2 dB gap to the CSL, all with 20% OH, and

with γ∗ = 1.13.

Chapter 5. Low-Comp. Rate- and Channel-Config. Concat. Codes 77

When the concatenated FEC scheme is applied to a modulation format with m bit-

levels, m−1 levels are protected only by the outer code, thus incurring zero inner-decoding

complexity. Therefore, as reported in Fig. 5.2, FEC schemes that work with a bigger m

operate at a much lower complexity. Since at the same symbol rate the information

throughput is larger with a bigger m, keeping the complexity score, per information bit,

in check is especially beneficial in realizing very large system throughputs.

In Fig. 5.3 we compare our design approach to the most recent results in the literature,

i.e., those of [5]. Note that in this figure, for better visualization, we plot FEC complexity-

score versus performance, and indicate the FEC rates on the labels. As can be seen in

Fig. 5.3, a performance similar to that of the configurable codes of [5] can be obtained

by our designed code at 63% less complexity. At a similar complexity, our designed code

can provide up to 0.6 dB coding gain over the codes of [5].

5.6 Conclusion

In this chapter, we have proposed a design approach for configurable and complexity-

optimized concatenated FEC scheme capable of operating at various transmission rates

and channel conditions. We use an inner error-reducing LDPC code and an outer zipper

code. We use a low-complexity MLC scheme in which we only inner-encode one bit-level.

We minimize the estimated inner-decoding data-flow while realizing (a) rate-adaptivity

by varying the modulation order and also by varying the inner-code rate by shortening or

lengthening it, and (b) channel-adaptivity by varying the number of decoding iterations

performed by the inner code. We design a number of configurable FEC schemes according

to the most recent industry specifications and show that the designed codes have a

superior performance-complexity trade-off relative to existing proposals.

We took the complexity of the codes particularly designed for an operating point as

the reference and, in designing the configurable codes, we minimized the relative deviation

from the reference complexities. Alternatively, it might benefit the system throughput

and its implementation in hardware to minimize the complexity score of the scheme with

largest spectral efficiency, while also equalizing the relative complexity deviation, however

defined, at other operating points. While we acknowledge that there may be other viable

objectives and formulations to consider in the configurable code design, we leave their

investigation and their implementation implications as a topic of future study.

Chapter 6

Complexity-Optimized Non-Binary

Coded Modulation for

Four-Dimensional Constellations1

6.1 Introduction

In coherent optical communication, we may use both the polarization and the quadratures

of the electromagnetic fields for data transmission. Therefore, it is sensible to consider

all 4 degrees of freedom for signal constellation. In fact, for dual-polarization (DP)

optical communication with 4D modulation, an improved power efficiency was reported

in [32,129,130].

Conventionally, two independent QAM constellations are used for DP optical commu-

nication. However, denser arrangement of points can be realized in a 4D constellation.

In particular, signalling with constellations drawn from the checkerboard lattice, D4,

has shown to provide a packing gain over the conventional constellations [6, 131, 132].

Moreover, spherically-bounded D4-based constellations [133], where the constellation

comprises points within a 4D hypersphere, instead of a hypercube, readily unlock an

additional shaping gain.

Recently, a coded modulation scheme was proposed in [6, 132] that makes effective

use of the D4-based constellations, delivering a coding gain over the conventional DP

quadrature amplitude modulation (DP-QAM) schemes. There, a concatenated MLC

architecture is considered with two inner SD codes, each protecting a number of bit

1 This chapter is a joint work with Sebastian Stern and Felix Frey. It includes and expands on thework in [134].

78

Chapter 6. Comp.-Optimized Non-Binary Coded Modulation for 4D Constel.79

levels, and an outer staircase code. However, the two distinct SD codes used in the

FEC scheme of [6] (a) are generic LDPC codes that incur high decoding complexity, (b)

have to be successively decoded, incurring additional structural complexity, (c) are not

effective in delivering the shaping gain of the spherically-bounded 4D constellation.

In this chapter, we adopt the 4D signalling constellations of [6] and propose a concate-

nated coded modulation architecture carefully designed to deliver the inherent packing

and shaping gains of the D4-based constellations. We use a set partitioning approach to

construct a few bit-levels with minimal reliabilities, thereby maximizing the reliabilities

of the other bit-levels. We then use a low-complexity MLC scheme [79] in which we

inner-encode the less reliable bit-levels and protect the more reliable ones only by the

outer code. We modify the inner-code design approach of Sec. 4.3 to obtain non-binary

SD LDPC codes with minimized decoding data-flow. Similar to Sec. 4.3, the inner code

is tasked to reduce the BER of bits passed to the outer HD zipper code [14] to below

its threshold, which enables it to take the BER further down, below 10−15 as required in

optical communication.

We target a system designed for transmission rate of 6.97 bits/symbol, compatible

with the 400ZR implementation agreement [4], and assess the proposed scheme over the

additive AWGN channel. Compared to the conventional BICM scheme and to the scheme

of Sec. 4.3, a gain of 0.75 dB and 0.62 dB, respectively, is reported for the proposed FEC

architecture. Moreover, the proposed scheme can realize an additional shaping gain of

0.25 dB using the spherically-bounded 4D constellations.

The rest of this chapter is organized as follows. In Sec. 6.2 we describe the 4D

constellations considered in this chapter and show their capacity curves next to that the

conventional QAM constellations. In Sec. 6.3 we describe the set-partitioning procedure

that we use in labelling the 4D constellation. In Sec. 6.4 we describe the coded modulation

structure. In Sec. 6.5 we describe the inner-code ensemble optimization procedure. In

Sec. 6.6 we present simulation results for the obtained scheme and compare them to the

existing designs. In Sec. 6.7 we provide concluding remarks.

6.2 Four-Dimensional Signal Constellations

The DP-QAM constellation induces a subset of the Lipschitz integers [131]. We denote

this constellation by LM , whereM is the constellation cardinality. As described in [6,132],

with 4D signalling, the density of the constellation points packed in a hypersphere can be

doubled— compared to the conventional DP-QAM constellations— without a decrease

in minimum distance. The induced algebraic structure is isomorphic to the D4 lattice

Chapter 6. Comp.-Optimized Non-Binary Coded Modulation for 4D Constel.80

8 9 10 11 12 13 14 15 165

6

7

8

9

ShannonLimit

C = 6.971

Es/N0 (dB)

Cbit/symbol

L256 (16QAM)

H512 (D4, cube)

W512 (D4, sphere)

L1024 (32QAM)

L4096 (64QAM)

8 9 10 11 12 13 14 15 165

6

7

8

9

ShannonLimit

C = 6.971

Es/N0 (dB)

Cbit/symbol

L256 (16QAM)

H512 (D4, cube)

W512 (D4, sphere)

L1024 (32QAM)

L4096 (64QAM)

0.570.24

0.58

Figure 6.1: Constellation-constrained capacities in bit/symbol versus the SNR. The insetshows the 2D projection of the corresponding signal constellations.

and is known as Hurwitz integers [131]. We denote the corresponding constellation by

HM .

We remark that the D4 lattice is isomorphic to the lattice obtained by applying a

single-parity-check code to the Z4 integer lattice. In this sense, the FEC scheme proposed

in this chapter can be considered as a triple concatenated code, deploying the Z4 signal

constellation with a single parity-check code closest to the channel.

In Fig. 6.1, we plot the constellation-constrained capacities, in bits per symbol, versus

the SNR for the 4D 512-ary Hurwitz constellation (H512) and its DP-16QAM (L256)

counterpart. The achievable packing gain for the Hurwitz constellation over the DP-

QAM is evident in this figure. Moreover, as proposed by G. Welti [133], the constellation

capacity can be further increased by selecting a subset Hurwitz integers that are within

a 4D hypersphere. The capacity of the resulting Welti constellation, denoted by W512,

is also shown in Fig. 6.1. Note that the additional achievable shaping gain of the Welti

constellation incurs no extra architectural complexity.

Also shown in Fig. 6.1 are the DP-32QAM and DP-64QAM constellation capacities.

For the target transmission rate, there is virtually no difference between the constel-

lation constrained limit of DP-64QAM and that of the H512. The H512 constellation

however entails a simpler system implementation since it has fewer signalling points.

Similarly for the shaped ones, while there is virtually no difference between the constel-

lation constrained limit of DP-32QAM and that of the W512 at the target transmission

rate, signalling with W512 is preferable because W512 has smaller cardinality.

Chapter 6. Comp.-Optimized Non-Binary Coded Modulation for 4D Constel.81

−2 −1 0 1 2

−2

−1

0

1

2

d2min=1

Re

Im

−2 −1 0 1 2

−2

−1

0

1

2

d2min=2

Re

Figure 6.2: Illustration of first (left) and second (right) partitioning steps of the D4-basedconstellations in one polarization.

We show a two-dimensional (2D) projection of the 4D signal constellations in Fig. 6.1

[6]. We assume the constellations are normalized to have d2min = 1, where dmin is the

squared minimum distance among the constellation points. We have shown the 2D

projections of the L256, H512, and W512 constellation points by blue crosses, red dots,

and green circles, respectively, in the Fig. 6.1 inset.

6.3 Four-Dimensional Set-Partitioning

The D4 lattice can be partitioned according to the following chain [77]:

D4 → Z4 →√

2D4 →√

2Z4 → 2D4 → . . . .

This chain describes the lattice isomorphisms obtained by the set-partitioning of the D4

lattice. Note that the Hurwitz and Welti constellations are subsets of the D4 lattice.

Starting from a normalized D4-based constellation, we illustrate the induced cor-

responding partitioning chain in Fig. 6.2. The first partitioning step decomposes the

constellation into two 256-ary subsets, each a subset of the Z4 lattice. This is shown in

Fig. 6.2 to the left, with triangles and squares as constellation points. Note that we still

have d2min = 1 after the first step. The second step decomposes the sub-constellations

into two 128-ary subsets, each a subset of the√

2D4 lattice. This is shown in Fig. 6.2

to the right, with hollow and solid triangles as sub-constellation points. Note, however,

that we get d2min = 2 after the second step.

It is easy to see that with every other step, the squared minimum distance among the

resulting constellation points doubles.

Chapter 6. Comp.-Optimized Non-Binary Coded Modulation for 4D Constel.82

6.4 Concatenated Non-Binary FEC Architecture

The proposed concatenated FEC architecture is shown in Fig. 6.3. Here, the outer

encoder (ENCout) bits are parallelized into two streams: one going straight to the 4D

constellation mapper M, and the other going to the inner encoder (ENCin) first.

In one channel use, the constellation mapper M, takes in bc inner encoded bits and

bu bypass bits to generate a symbol of the 4D constellation. The inner encoded bits are

assigned to the less reliable bit channels and the bypass bits are assigned to the more

reliable bit channels. We model the linear regime of the optical fiber by the transmission

over a DP AWGN channel.

At the receiver, soft demapping is performed on the first bc levels, the result of which

is passed to the inner decoder (DECin). An HD on the inner-decoded symbols then is

used, along with an HD on the channel observations, to demap the other bu bit levels.

The inner-code information bits and the demapped bypass bits are then passed to the

outer HD decoder (DECout) via a parallel-to-serial convertor.

We use a non-binary LDPC code as the inner code. Theoretically, with a non-binary

MLC approach the 4D constellation capacity is achievable. This, however, typically

comes at the expense of high decoding complexity of the non-binary codes [135, 136].

In this chapter, we modify the approach in Sec. 4.3 to obtain non-binary LDPC codes,

designed to minimize the decoding data flow. Furthermore, as in Sec. 4.3, we only inner-

encode a few bit levels, hence keeping the inner-code alphabet size small. While a number

of sub-optimal decoding algorithms aimed at lowering the decoding complexities of the

non-binary LDPC codes have been proposed [136–138], in this work we only consider the

conventional message-passing algorithm at the inner LDPC decoder [135,136].

We target an FEC scheme compatible with the 400ZR implementation agreement [4].

There, a DP-16QAM is deployed with an inner rate-119/128 code, encoding all bit-

levels, concatenated to an outer rate-239/255 code, for a combined transmission rate of

6.97 bit/symbol.

We consider signalling with H512 and W512 constellations. Wth these 512-ary D4-

based constellations, we use a non-binary inner code of rate Rin = 2.187/4, encoding

bc = 4 bit-levels, concatenated to an outer zipper code of rate Rout = 0.97, providing the

same overall target rate of 6.97 bit/symbol.

For comparison, we also consider a similar architecture to be used with the L256

constellation (DP-16QAM). Here, an inner code of rate Rin = 2.187/3 encodes bc = 3 bit

levels and the other bu = 5 bit levels are protected only by the outer code. We use the

same outer code rate with this scheme as before.

Chapter 6. Comp.-Optimized Non-Binary Coded Modulation for 4D Constel.83

Fbc+bu2

ENCout M

ENCin

F2

HD

DECin

Para

llel/Seria

l

sub set DECout

R4

Channel

AWGN

F2

n

Seria

l/Para

llel

Figure 6.3: The proposed concatenated FEC architecture for DP transmission over theAWGN channel. The alphabet field sizes are denoted below their corresponding stages.

As shown in Fig. 6.1, at the target rate of 6.97 bit/symbol, a packing gain and an

additional shaping gain of 0.74 dB and 0.24 dB is achievable by the H512 and W512

constellations, respectively, over L256 constellation.

6.5 Ensemble Optimization

6.5.1 Empirical Density Evolution

We use a similar parameterization for the inner code as in Sec. 3.2.2. Here, we consider

ensembles with ν = 1 that have no uncoded symbols. In such ensemble, all parity

nodes are of degree-1 and all CNs are connected exactly to one degree-1 VN. The edge-

perspective VN degree distribution that includes only edges connected to the information

nodes is then obtained as

λinfo(x) = λ(x)/λ1 − x.

Similarly, the edge-perspective CN degree distribution that includes only edges connected

to the information nodes is obtained as

ρinfo(x) = R′info(x)/R′info(1),

where Rinfo(x) = R(x)/x.

Instead of an EXIT function analysis, here we use an empirical density evolution

approach in analyzing the ensemble. A similar approach has been used in [139,140]. With

this approach, we aim at understanding the belief propagation decoding of the non-binary

LDPC codes under the flooding schedule. Note that the non-binary LDPC inner code

operates over a Galois field of size 2bc . We use both F2bc and the set α0, α1, . . . , α2bc−1

Chapter 6. Comp.-Optimized Non-Binary Coded Modulation for 4D Constel.84

to denote the code alphabet.

In describing the density evolution, we represent LLR messages as 2bc-dimensional

vectors. We consider passing of messages on n edges at each iteration of the empirical

density evolution. Here, n allows for trade-off between accuracy and computational

complexity of the density evolution. Note that we only consider messages to or from the

information nodes. We assign a value α ∈ α0, α1, . . . , α2bc−1 to each edge and obtain

its channel sample accordingly. The message on such edge, Lα(m), is obtained as

Lα(m) =

(lnP (m = α0)

P (m = α), ln

P (m = α1)

P (m = α), . . . , ln

P (m = α2bc−1)

P (m = α)

). (6.1)

We generate the n CN messages at each iteration according to ρinfo(x). A degree-dc

CN first samples dc − 2 messages at random from those coming from the VNs. Each

of those messages are then permuted by a random non-zero element of the field F2bc .

This permutation imitates the function of a non-zero entry in the parity-check matrix.

See [141, Sec. 2.2] for how elements of a message are permuted by an element of the field.

We also modify the assigned value of the message according to the multiplication in the

field arithmetic. The CN also samples, and permutes by a random non-zero element of

the field, a channel observation as the message coming from its (degree-1) parity node.

The CN then transforms the sampled messages into the Fourier domain and take

the inverse Fourier transform of the dot-product of the resulting vectors to obtain the

outgoing message vector. See [141, Sec. 2.2] for message computation details. The assignd

value of the outgoing message is also computed according to the summation rule in the

field arithmetic. Note that without loss of generality, we may assume that the outgoing

message is not permuted by a field element. Initially, we set the messages from CNs to

all-zero vectors, with assigned values chosen uniformly at random from the alphabet.

We generate n VN messages at each iteration according to λinfo(x). A degree-dv VN

first picks the assigned value to the outgoing message at random and samples a channel

observation accordingly. Then, it samples dv − 1 messages at random among those CN

messages with the same assigned value. It adds up the samples messages and the channel

sample to generate the outgoing message vector.

We remark that with keeping track of the assigned values of the messages we also

prevent any numerical issues in the empirical density evolution. In particular, in (6.1) we

compute the LLR values by normalizing the probabilities by that of the assigned value.

This probability is expected to be sufficiently large so as to avert any numerical issues.

In fact, in the process of obtaining the ensembles presented in the next section, we did

not observe any numerical instability. This process is explained next.

Chapter 6. Comp.-Optimized Non-Binary Coded Modulation for 4D Constel.85

6.5.2 BER Analysis

After the last iteration, we generate the a posteriori LLR vectors at the information

nodes according to degree-distribution on the information nodes. These vectors are

generated similar to VN message generation, but here, a degree-dv VN picks dv messages

at random, among those CN messages with the same true value, to obtain its a posteriori

vector. The maximum element of the vector then denotes the VN decoded value. For a

posteriori vector of parity nodes, the CN messages passed to the degree-1 VNs have to be

considered. To this end, a degree-dc CN has to pick dc−1 messages from the information

nodes to perform the CN operation. A parity node then uses a sample of these “special”

CN messages along with a channel observation sample to obtain its a posteriori vector.

Let Pinfo(αi, αj) be the probability that an information node true value is αi, while

its decoded value is αj, i, j ∈ 0, 1, . . . , 2bc − 1 . Similarly, let Pparity(αi, αj) be the

probability that a parity node true value is αi , while its decoded value is αj. Further,

we let PMSBinfo (αi, αj) and PMSB

parity(αi, αj) be the BER in demapping the MSB using an

inner encoded information node and parity node that has true value αi and decoded

value αj, respectively. We empirically estimate the values of Pinfo(αi, αj), Pparity(αi, αj),

PMSBinfo (αi, αj) and PMSB

parity(αi, αj) using the true values and maximum a posteriori decoding

of the VNs as described above.

The average BER in demapping the MSBs, denoted by PMSB is obtained as

PMSB = Rin

∑i,j

Pinfo(αi, αj)PMSBinfo (αi, αj) + (1−Rin)

∑i,j

Pparity(αi, αj)PMSBparity(αi, αj).

The average BER on inner decoded bits that are passed to the outer code is obtained as

Pinfo =1

bc

∑i,j

Pinfo(αi, αj)B(αi, αj),

where B(αi, αj) is the Hamming distance between the bc LSBs of the labels assigned to

αi and αj by the mappingM. Finally, Pout, the BER on bits passed to the outer decoder

is obtained as

Pout =bu

bu +RinbcPMSB +

Rinbcbu +Rinbc

Pinfo. (6.2)

6.5.3 Differential Evolution

We use a method based on differential evolution, similar to those in Sec. 4.4.4 and

Sec. 5.4.3, to optimize the inner-code ensemble. Here, we fix a maximum number of

iterations and aim at finding an inner-code ensemble that can operate at lowest chan-

Chapter 6. Comp.-Optimized Non-Binary Coded Modulation for 4D Constel.86

nel SNR. Note that we only search among ensembles with ν = 1 that have no uncoded

symbols.

Similar to Sec. 5.4.3, we define the vector Ux to be the valid information node degree

distribution obtained based on vector x. Vector Ux is obtained by the solution of the

following quadratic program:

minimizeUx

‖Ux − x‖2,

subject toDv∑i=0

UC(i) = Rin,j,

Ux(1) = 0,

Ux(i) ≥ 0 ∀i ∈ 1, . . . , Dv.

given an information-node degree distribution vector U , the score of U is defined as

the minimum SNR at which Pout, as defined in (6.2), is below the target BER, P tout.

The differential evolution search is performed on a size-Dv vector x. Let the differential-

evolution parameters g, S, β and ξ be defined as in Sec. 5.4.3. On the population

of each generation, the differential evolution carries out the following three steps, for

s ∈ 1, . . . , S:

1. Generate a mutation ys = xi1 + β(xi2 − xi3), where i1, i2, i3 are chosen uniformly

at random, without replacement, from the set 1, . . . , S\s.

2. Generate a competitor vector ys whose i-th element, i ∈ 1, . . . , Dv is found as

ys(i) =

ys(i) with probability ξ,

xs(i) otherwise.

3. Vector Uys then replaces xs in the next generation if and only if it has a better

(i.e., lower) score.

The differential evolution search is initialized with vectors Ux1 , . . . ,UxS , where vectors

x1, . . . ,xS are generated at random. After carrying differential evolution on g genera-

tions, the algorithm then outputs the matrix with the best score at the last generation,

which determines the ensembles of the optimal configurable inner-code.

Chapter 6. Comp.-Optimized Non-Binary Coded Modulation for 4D Constel.87

Table 6.1: Bit-level capacities of the 4D constellations at their respective operating points.Level Labelling d2min L256 H512 W512

8

PseudoGray 4

0.999 0.998 0.9987 0.999 0.997 0.9976 0.999 0.995 0.9965 0.998 0.994 0.9944 0.998 0.992 0.9923

Set Par-titioning

20.904 0.817 0.830

2 0.857 0.746 0.7621

10.433 0.279 0.359

0 − 0.369 0.258

Rin (levels 4–8 uncoded) 2.187/bc

6.6 Results

As mentioned in Sec. 6.4, we consider a rate-0.97 zipper code as the outer code. We set

the conservative BER threshold of 1.7 × 10−3 for the outer code to enable a practical

realization of our designs. We consider a frame length of 6000 4D symbols in the various

code design described below. This ensures the same number of information bits and noise

samples per frame in the FEC schemes.

We adopt a QC structure for the inner codes. We sample a binary base matrix

according to the obtained ensembles. We then lift the base matrix to obtain binary

parity-check matrix of girth at least 8. Then, we substitute the non-zero values by

random non-zero elements of the field and obtain the parity-check matrix for the inner

code. We assume floating-point sum-product message passing. All non-binary codes

reported here are optimized for and simulated with 10 inner-decoding iterations.

We obtain the FEC schemes with complexity-optimized non-binary inner codes for the

H512, W512, and L256 constellations. The FEC architecture we use with these constella-

tions is detailed in Sec. 6.4. For these constellations, we report the corresponding bit-level

capacities at their eventual operating points in Table 6.1. We use the set-partitioning

chain described in Sec. 6.3 to label the first few bit channels. We see, for example, that

after 4 steps, the remaining bit-levels of the H512 constellation are highly reliable. We

use a pseudo-Gray labelling [142] on these remaining bit-levels. We use a similar method

in labelling the W512 and L256 constellations.

We compare our designs to the other binary concatenated FEC schemes. We obtain

an MLC-based scheme with L256 constellation, with the labelling and the optimized inner

LDPC code as described in Sec. 4.3. We also consider the BICM-based scheme with L256

constellation and generic LDPC inner code as described in [6], and the two-stage BICM-

based scheme with H512 constellation and two generic inner LDPC codes also described

Chapter 6. Comp.-Optimized Non-Binary Coded Modulation for 4D Constel.88

10.5 11 11.5 12 12.5 13 13.510−4

10−3

10−2

10−1

100

0.24 0.58

0.25 0.62Target BER

Es/N0 (dB)

BER

H512, Designed

W512, Designed

L256, Designed

H512, TS-BICM [6]

L256, 1D-MLC

L256, BICM [6]

Figure 6.4: The BER on bits passed to the outer code. The constellation capacitiesare indicated by the vertical lines. The horizontal line denotes the outer-code threshold.Here, the solid curves denote the non-binary designs and the dotted and dashed curvesdenote their binary counterparts: TS-BICM indicates the performance two-stage BICM-based scheme of [6] and 1D-MLC indicates performance of the scheme of Sec. 4.3.

in [6].

In Fig. 6.4 we plot the BER on bits passed to the outer code versus the SNR for the

obtained FEC schemes. With the L256 constellation, the designed non-binary scheme

performs as well as the binary scheme designed as described in Sec. 4.3, both having

a gain of about 0.15 dB over the BICM scheme of [6]. With the H512 constellation,

the designed non-binary scheme performs better than the two-stage BICM-based scheme

scheme of [6] and provides a 0.62 dB gain over the designed non-binary scheme for the L256

constellation. With the W512 constellation, the designed scheme achieves an additional

gain of about 0.25 dB over the scheme with H512, and provides A total gain of around

1 dB over conventional BICM scheme of [6].

6.7 Conclusion

In this chapter, we have proposed a concatenated FEC scheme, consisting of an inner

complexity-optimized non-binary LDPC code and an outer zipper code, specifically de-

signed to take advantage of the 4D signalling in optical communication. We consider

signalling with D4-based constellations, i.e., Hurwitz and Welti constellations, that can

pack a higher density of points per unit of volume of space. We consider an MLC ar-

chitecture in the FEC design in which the non-binary inner code protects only a few bit

Chapter 6. Comp.-Optimized Non-Binary Coded Modulation for 4D Constel.89

levels while the a outer code cleans the residual errors. We obtain concatenated codes

that can deliver on the packing and shaping gains of the 4D constellation, achieving a

total gain of 1 dB compared to the conventional designs.

The decoding complexity of non-binary LDPC codes grows with their alphabet size.

While in the codes we designed we have kept the inner-code alphabet size small, the

obtained non-binary code incur a higher decoding complexity than their binary coun-

terparts. A complete performance-complexity tradeoff assessment of using non-binary

inner LDPC code in an MLC architecture, including the possible adoption of suboptimal

decoding methods, is a work in progress.

In this chapter, we utilized the four degrees of freedom provided by modulating the

signal on both the polarization and the quadratures of the electromagnetic field. We

have shown that with the effective use of a non-binary codes and a dense 4D signal

constellation we can deliver a substantial gain over the conventional designs. While the D4

lattice (which we derived the constellations from) is the densest lattice in four dimensions,

there are denser lattices in higher dimensions [131] we may exploit by modulating the

signal in time as well. For example, by considering two 4D channel uses we may deploy

constellations based on the E8 lattice, the densest lattice in eight dimensions. Exploring

possible advantages of using constellations based on lattices in higher dimensions is a

topic of future work.

Chapter 7

Low-Density Nonlinear-Check Codes

7.1 Introduction

In previous chapters, one constant in our FEC designs has been the use on an error-

reducing code (ERC). An ERC is formally defined as a code that converts a word received

at the output of a noisy channel to a different word that is the output of a less noisy

channel. While an ERC does not guarantee the correction of any complete error pattern

in a received word, it provides a probabilistic guarantee of the correction of a fraction of

the bits in error contained in the received word [143, Def. 1]. The fractional reduction

of BER naturally leads to code constructions consisting of the concatenation of an ERC

with an outer clean-up code, such as those we have designed in the previous chapters.

Recently, Roozbehani and Polyanskiy have introduced low-density majority-check

(LDMC) codes [144, 145]. These codes are nonlinear sparse-graph codes that are struc-

turally similar to LDGM codes but in them a CN, instead of checking the parity on the

VNs, indicates the value attained by the majority of the VNs connected to it. Over an

erasure channel, the authors report a graceful degradation of BER on decoded informa-

tion bits for LDMC codes, as the channel quality degrades. The authors also propose the

use of some majority-check nodes in the ensemble to improve the performance of LDGM

codes.

In this chapter, we introduce low-density nonlinear check (LDNC) codes, a class

of binary sparse-graph codes that are structurally similar to LDGM codes but with a

generalized nonlinear operation at the CNs. We consider all possible CN functions that

are 1) symmetric with respect to their input, and 2) produce an entropy-1 check bit

when the input to the CN function (i.e., the information bits) have entropy 1. Note

that LDMC codes are a member of the class of LDNC codes. We derive a universal

sum-product message passing update rule for the CN functions and obtain an efficient

90

Chapter 7. Low-Density Nonlinear-Check Codes 91

SourceEncoder

D(p)

SourceChannelEncoder

C(σ)

+ChannelDecoder

SourceDecoder

AWGNNoise

Sink

Figure 7.1: The block diagram of a scheme that achieves the ERC Limit.

algorithm to compute those messages. We study LDNC codes over the AWGN channel

and show that LDNC codes can be very effective ERCs.

The rest of this chapter is organized as follows. In Sec. 7.2 we derive the equivalent

of a Shannon limit for an ERC. In Sec. 7.3 we set out the nonlinear CN operations that

we consider for LDNC codes, derive a universal update rule for them, and provide a

computationally efficient method for obtaining the CN outputs. In Sec. 7.4 we provide

examples of the error-reducing capabilities of LDNC codes. In Sec. 7.5 we provide further

discussions and concluding remarks.

7.2 ERC Limit and Nonlinear Codes

The information-theoretic limit associated with ERCs have been obtained in a number

of prior works including [146–148]. Here, we derive this well-known theoretical limit in

order to motivate, through its achievability scheme, the use of LDNC codes. We consider

data transmission over an AWGN channel and derive the maximum possible rate of

transmission when we target a certain BER at the receiver.

Theorem. Consider data transmission over an AWGN channel with σ denoting the noise

standard deviation and where the signal average power is normalized to 1. Let p denote

the maximum BER that is tolerated in the reconstructed bits at the receiver. Then, the

maximum rate of transmission, R∗, is given by

R∗ =C(σ)

D(p), (7.1)

where C(σ) denotes the channel capacity for error-free transmission, and D(p) denotes

the minimum rate required in compressing a binary source with a maximum expected

Hamming distortion p.

Proof.

(a) Achievability: Consider transmission of k information bits, where k is a large number,

encoded by two codes, as shown in Fig. 7.1. Let the source code be a rate-D(p) code

Chapter 7. Low-Density Nonlinear-Check Codes 92

that achieves the rate-distortion limit in compression of a binary source with a maximum

expected Hamming distortion p. The source encoder then outputs kD(p) bits. Let the

channel code be a rate-C(σ) code that achieves the Shannon limit [149]. The channel

encoder then outputs kD(p)/C(σ) bits. At the receiver, the channel decoder produces

kD(p) error-free bits. The source decoder then produces k bits with maximum expected

Hamming distortion p. With this scheme, we have a rate-C(σ)/D(p) transmission scheme

that achieves the desired maximum distortion. Note that with Hamming distortion

measure we have D(p) = 1−Hb(p), where Hb(p) is the binary entropy function.

(b) Converse: Consider a rate-R binary ERC that provides a BER ≤ p at its decoder

output. Let Cp be a capacity-achieving code over the binary symmetric channel with

cross-over probability p. The code Cp then has rate 1−Hb(p) = D(p). Now consider pre-

coding the source with Cp. At the receiver, the Cp decoder then recovers the source bits

error free. The concatenation of Cp and the ERC then gives a code of rate D(p)R that

provides error-free transmission. By Shannon’s theorem we must have D(p)R ≤ C(σ),

and therefore R ≤ C(σ)/D(p).

Corollary. The maximum noise standard deviation, σ, that a rate-R binary ERC can

operate in, when targeting a BER ≤ p at its decoder output, is given by

σ∗ = C−1((1−Hb(p))R).

Proof. From (7.1) we must have D(p)R ≤ C(σ). With a Hamming distortion metric

we have D(p) = 1 −Hb(p). Since C(σ) is a monotonically decreasing function, we have

σ ≤ C−1((1−Hb(p))R).

Note that while the binary ERC limit is derived for the AWGN channel, a similar

limit can be derived for any channel for which the capacity is known.

The achievability scheme showed in Fig. 7.1 uses a lossy source code that operates at

the rate-distortion limit, concatenated with a channel code that operates at the Shannon

limit. It is argued in [150] that while linear channel encoders can achieve the Shannon

limit in discrete channels with additive noise, linear lossy source encoders cannot ap-

proach the rate-distortion limit. In fact, nonlinear encoders with sparse graph structures

have been proposed, e.g, in [151–154], that can approach the rate-distortion limit. Such

codes can have linear or nonlinear codebooks. Next, we study codes that are structurally

similar to LDGM codes but with generalized nonlinear encoding operation at their CNs.

Chapter 7. Low-Density Nonlinear-Check Codes 93

Π

dc

dv

k

m

Figure 7.2: Factor graph representation of an LDNC ensemble. Information- and check-node degrees are denoted by dv and dc, respectively. Here, a degree-dc CN is connectedto dc information nodes. The rectangle labelled Π represents an edge permutation.

7.3 LDNC Codes

7.3.1 Code Description

An LDNC code can be represented by a factor graph consisting of VNs, CNs, and edges.

Fig. 7.2 shows the factor graph of an LDNC code, in which circles and squares represent

VNs and (possibly nonlinear) CNs, respectively. The VNs are partitioned into k infor-

mation nodes (bottom) and m check bits (top). In an LDNC Tanner graph, a CN is

always connected to a of degree 1 check bit, as shown in Fig. 1, so the number of CNs is

equal to m. Throughout this chapter, we do not count the check bits when denoting the

CN degrees.

Given k, m, and the degree of every node, an LDNC ensemble is uniquely determined

by the edge connections between CNs and information nodes. Let dv and dc denote the

average information-node and CN degree, respectively. Note that in a valid ensemble we

have m/k = dv/dc. The rate of the LDNC ensemble is given by

R =k

k +m=

dcdc + dv

. (7.2)

7.3.2 Encoding

Consider a CN of degree-dc. As shown in Fig. 7.3, since the check bit of the CN is of

degree-1, given the information bits, the check bit can be computed by the CN function

easily. The check function of a degree-dc CN can be any function with domain 0, 1dcand codomain 0, 1. We call a check function g suitable if it satisfies the following two

Chapter 7. Low-Density Nonlinear-Check Codes 94

g

. . .

Figure 7.3: Encoding operation an a nonlinear check node.

conditions:

1. Function g should be symmetric with respect to its input, meaning

g(x1, x2, . . . , xdc) = g(π(x1, x2, . . . , xdc)),

for any random permutation π. It is easy to see that any symmetric check function

factors through the input weight, meaning

g(x1, x2, . . . , xdc) = g(wt(x)),

where wt(x) denotes the Hamming weight of vector x = (x1, x2, . . . , xdc).

2. The check bit has unit entropy, meaning that g(wt(x)) is equally likely to be 0 or

1, when the input bits are picked uniformly at random in 0, 1. Note that we send

all information bits and check bits over the channel and this condition ensures that

we make the best use of the channel use when we send the check bit.

Note that, for any dc > 1, the parity-check function is a linear suitable function. When

dc is odd, and also for some even values of dc,1 there are nonlinear suitable functions that

we may consider as the CN function. The problem of finding suitable check functions is

similar to that of bisecting binomial coefficients studied in [155, 156]. In [155, Table 1],

the authors list the number of possible bisections for 1 ≤ dc ≤ 51.

In Table 7.1, we list all the distinct check functions that exist for dc = 7. Note

that in a check function, by flipping all the output bits we get operationally the same

check function as the channel is assumed to be symmetric. In Table 7.1, without loss of

generality, we have assumed that the first input bit is always 0. Here, g1(wt(x)) is in fact

the parity-check function, and g8(wt(x)) is the majority-check function.

1Some small and even dc values for which there exists nonlinear check functions are 8, 14, 20, 24, 26.

Chapter 7. Low-Density Nonlinear-Check Codes 95

wt(x) 0 1 2 3 4 5 6 7

g1 0 1 0 1 0 1 0 1g2 0 0 1 0 1 0 1 1g3 0 1 1 0 1 0 0 1g4 0 0 0 1 0 1 1 1g5 0 1 0 0 1 1 0 1g6 0 0 1 1 0 0 1 1g7 0 1 1 1 0 0 0 1g8 0 0 0 0 1 1 1 1

Table 7.1: List of all distinct check functions for dc = 7.

7.3.3 Message-Passing Decoding

We use a factor graph model to describe decoding of an LDNC code. Fig. 7.4 shows a

factor graph fragment with a check node of degree dc. In a valid configuration, the top

node, denoted by y, is required to satisfy y = g(wt(x1, . . . , xdc)), where y, x1, . . . , xdc ∈0, 1.

We derive here the sum-product update rule for the message to be sent to xdc , given the

messages µ1, . . . , µdc−1 and µy. A message is a function from 0, 1 to R, or, equivalently,

a pair µi = (µi(0), µi(1)) of real numbers. The messages, µi(0) and µi(1), are assumed

to be equal to the probability of xi being 0 and 1, respectively. Accordingly µi(0) ≥ 0,

µi(1) ≥ 0, and µi(0) + µi(1) = 1. The same is true for y and µy. We aim to compute the

message νdc sent from the CN to node xdc ; messages to be sent to the other nodes are

defined in similar fashion.

The general sum-product update rule [46] for the message sent from a factor node

associated with a function f(x, x1, . . . , xm) to a variable node x, given received messages

µi(xi), i = 1, . . . ,m, is proportional to∑

x1,...,xm

∏mi=1 µi(xi)f(x, x1, . . . , xm), meaning

g

x1

x2

xdc−1

...

y

µ1

µ2

µdc−1

µy

xdcνdc

Figure 7.4: A typical CN of degree dc. Node y is set to denote the function the CNperforms on the information nodes.

Chapter 7. Low-Density Nonlinear-Check Codes 96

that for some constant ζ,

ν(x) = ζ∑

x1,...,xm

m∏i=1

µi(xi)f(x, x1, . . . , xm).

Associated with a CN of degree dc + 1 is the indicator function

f(x1, . . . , xdc , y) = Ig(wt(x1,...,xdc ))=y =

1 if y = g(wt(x1, . . . , xdc)),

0 otherwise.

The message νdc can be determined from νdc(0)− νdc(1). For νdc(0) we have

νdc(0) = ζdc∑

y,x1,...,xdc−1

f(x1, . . . , xdc−1, 0, y)µy(y)dc−1∏j=1

µj(xj)

= ζdcµy(0)∑

x1,...,xdc−1

f(x1, . . . , xdc−1, 0, 0)dc−1∏j=1

µj(xj)

+ ζdcµy(1)∑

x1,...,xdc−1

f(x1, . . . , xdc−1, 0, 1)dc−1∏j=1

µj(xj)

= ζdcµy(0)∑

x1,...,xdc−1:g(wt(x1,...,xdc−1,0))=0

dc−1∏j=1

µj(xj) + ζdcµy(1)∑

x1,...,xdc−1:g(wt(x1,...,xdc−1,0))=1

dc−1∏j=1

µj(xj).

(7.3)

Similarly,

νdc(1) = ζdcµy(0)∑

x1,...,xdc−1:g(wt(x1,...,xdc−1,1))=0

dc−1∏j=1

µj(xj) + ζdcµy(1)∑

x1,...,xdc−1:g(wt(x1,...,xdc−1,1))=1

dc−1∏j=1

µj(xj).

(7.4)

Note that ζdc can be computed from

1 = νdc(0) + νdc(1).

Note that since the update rule is symmetric among the xi’s, it can be derived similarly

for i ∈ 1, 2, . . . dc − 1.

Chapter 7. Low-Density Nonlinear-Check Codes 97

7.3.4 Efficient Message Computation

Let pi(x) = µi(0) + µi(1)x, for i ∈ 1, 2, . . . dc, and let

q(x) =dc∏i=1

pi(x).

Now let

qi(x) =q(x)

pi(x)= qi,0 + qi,1x+ . . .+ qi,dc−1x

dc−1.

Then, νi(0) and νi(0) can be written from (7.3) and (7.4) as

νi(0) = ζiµy(0)∑g(j)=0

qi,j + ζiµy(1)∑g(j)=1

qi,j,

νi(1) = ζiµy(0)∑

g(j+1)=0

qi,j + ζiµy(1)∑

g(j+1)=1

qi,j.(7.5)

The key takeaway here is that by obtaining the coefficients of q(x) we are able to compute

the qi(x)’s and consequently the νi(x) messages.

p 1→

←p2

p 1p 2→

p 3→

←p4

←p3 p

4

p1p2p3p

4→

p 5→

←p6

p 5p 6→ ←

p7

←p5p

6p7

p1p2p3p4p5p6p7 →

Figure 7.5: Binary computation tree for obtaining q(x) when dc = 7. The messages arepassed from the bottom up.

Chapter 7. Low-Density Nonlinear-Check Codes 98

0 1 2 3 4 5 6 7 8 9 10 11 1210−2

10−1

g1

g2

g3

g4

g5, g6g7

g8

Iteration #

BER

Figure 7.6: BER curves in regular ensemble with dv = 4 and dc = 7 with various checkfunctions, plotted versus number of decoding iterations. The codes are simulated at0.5 dB above their (error-free) constrained Shannon limit.

We obtain q(x) using a binary computation tree, as illustrated for dc = 7 in Fig. 7.5.

The algorithm runs as a message-passing algorithm and the messages are polynomials.

The messages are passed upwards starting from the leaf nodes, the ith of which sends

pi(x). Interior nodes in the tree perform the polynomial multiplication of the incoming

messages, passing the result further up. Let m = dlog2 dce be the depth of the tree.

At depth L in the tree, each node must multiply two polynomials of degree at most

2m−L−1. Using fast Fourier transform techniques, this computation can be accomplished

with O((m − L − 1)2m−L−1) operations. Since there are at most 2L nodes at depth L,

the total number of operations needed at depth L is at most O(2L(m−L− 1)2m−L−1) =

O((m − L − 1)2m−1). Summing over all levels L (from 0 to m − 1) where computation

is performed results in O(2mm2) = O(dc log2 dc) operations.

Each qi(x), in the worst case, can be obtained from q(x) by a polynomial division

with O(dc) operations. The update messages can therefore be obtained in parallel, each

with O(dc) operations, using (7.5).

7.4 Error-Reducing Performance Results

We study LDNC codes by studying the behaviour of their information BER curve. For

illustration purposes, we consider a regular code, i.e., codes where every information

node is of the same degree and every CN also is of the same degree. In particular, we

consider a regular LDNC code with information degree dv = 4 and CN degree dc = 7. We

consider signalling using the binary alphabet +1,−1 over an AWGN channel. Note

Chapter 7. Low-Density Nonlinear-Check Codes 99

that we may not assume transmission of an all-zero codeword.

0 0.5 1 1.5 2 2.5 310−2

10−1

Es/N0 dB

BER

g1g5g8

Figure 7.7: BER curves of regular dv = 4, dc = 7 LDNC ensembles with three checkfunctions, plotted in a wide range of SNRs. All decoders perform 4 decoding iterations.The error-free constrained Shannon limit is at 1.92 dB SNR.

In Fig. 7.6 we consider all suitable check functions for dc = 7 described in Table 7.1,

and plot the BER in various decoding iterations. Here, we have obtained QC codes of

girth-10 and length ∼100, 000 and simulated the codes at 0.5 dB gap to the (error-free)

constrained Shannon limit. We have observed the following behaviours in the LDNC

ensembles:

• The nonlinear check functions yield a relatively high fixed-point. A fixed-point is an

information-bit error-rate at which the iterative decoder stops and cannot reduce

the error-rate anymore.

• The nonlinear check functions exhibits most of their gains in the first few iterations,

and the following iterations provide little improvement.

• Two of the nonlinear check functions yield the same BER curve (g5 and g6). This

means there are more symmetries that can be factored out of the number distinct

suitable functions available for a particular CN degree.

• There are points, specially with fewer decoding iterations, where the nonlinear

check functions perform better than the linear (parity check) functions.

To understand the performance of the LDNC codes, we picked three check functions,

and simulated a regular dv = 4, dc = 7 ensemble in a wide SNR range. We fixed

decoding iterations to 4 (low complexity). In Fig. 7.7 we plot the BER curves of these

LDNC ensembles versus the SNR. Note that the constrained Shannon limit is at 1.92 dB

Chapter 7. Low-Density Nonlinear-Check Codes 100

SNR. As can be seen from Fig. 7.7, in low SNRs, the ensembles with nonlinear check

functions (g5 and g8) perform better than that with the parity check function, i.e., g1.

Another notable observation is that the ensemble with check function g5 performs better

than that with g8, the majority function proposed in [144,145], in a wide range of SNRs.

Due to the high fixed-point of ensembles with nonlinear check functions, they would

not be good candidates to consider in code design for optical communication. For other

applications, however, it is possible to optimize these codes using a differential evolution

method as described in previous chapters. We remark that in studying ensembles with

nonlinear check functions, we observe the EXIT functions are not uni-parametric and

therefore a method based on a Monte Carlo simulation, similar to Sec. 6.5, should be

deployed to optimize these codes.

7.5 Conclusion

In this chapter, we studied the performance of LDNC codes, a class of error-reducing

codes, for the binary-input AWGN channel. We derived the rules for the sum-product

message-passing decoder of LDNC codes and obtained an efficient algorithm for message

computation. We analyzed the error-reduction behaviour of LDNC decoders for regular

LDNC codes for various nonlinear check functions. We observed that in certain regimes,

codes with nonlinear check functions can perform better that codes with parity check

functions.

While we observed an interesting error-reduction behaviour in LDNC ensembles, it

remains to be seen how a forward error-correction scheme consisting of an LDNC inner

code and an outer clean-up code compares with existing designs. It would be of interest

to compare the performance-complexity trade-off curve of such scheme with schemes of

Chapter 3.

Chapter 8

Conclusion and Topics of Future

Research

In this work we developed tools and methods to obtain low-complexity concatenated

FEC schemes for applications with high throughput, as needed, for example, in optical

communication. We characterized and compared the performance-complexity tradeoffs

in various FEC schemes and modulation formats.

We proposed a decoder architecture consisting of an inner, error-reducing, LDPC

code, concatenated to an outer staircase or zipper code. We showed that with this

scheme, we may rely on the outer, algebraic code for the bulk of the error-correction, and

task the inner SD code only with reducing the BER to below the outer code threshold.

The outer code then can bring the BER to below 10−15, as required in OTNs, with very

low complexity.

Accordingly, we developed methods to optimize the FEC scheme by minimizing the

estimated data-flow at the inner code, for various choices of the outer code. An interesting

feature that emerges from the inner-code optimization is that a fraction of symbols are

better left uncoded, and only protected by the outer code. We considered a QC structure

for the inner codes in our design to realize a pragmatic and energy-efficient hardware

implementation.

We extended the code design method to FEC schemes with higher order modula-

tion. We obtained complexity-optimized MLC schemes and complexity-optimized BICM

schemes and made a fair comparison between them via their respective Pareto frontiers.

We showed that by a clever design, the MLC scheme can provide significant advantages

relative to the BICM scheme over the entire performance-complexity tradeoff space.

For binary modulation, the obtained FEC schemes provided up to 71% reduction

in complexity or up to 0.4 dB gain compared to the existing designs. For higher order

101

Chapter 8. Conclusion and Topics of Future Research 102

modulation, via the designed MLC scheme, the obtained codes provided up to 60%

reduction in complexity or up to 0.7 dB gain compared to existing designs.

We also designed a multi-rate and channel-adaptive LDPC code architecture. We

then developed a tool to optimize a low-complexity rate- and channel-configurable FEC

scheme via an MLC approach. We reported up to 63% reduction in decoding complexity,

or up to 0.6 dB gain compared to existing flexible FEC schemes.

To achieve even further performance improvements, we adapted our tools to design

complexity-optimized non-binary LDPC codes to concatenate with outer zipper codes in

an FEC scheme via the MLC approach. We considered 4D signal constellations that are

denser than their 2D counterparts and obtained clever labellings for them. We obtained

gains of up to 1 dB over the conventional schemes.

Based on this work, there are various worthwhile topics that are left for future study.

We discuss some of these below.

• In Chapter 3 we showed that with a layered schedule in decoding the LDPC in-

ner codes, we can significantly reduces the decoding complexity. Note that those

codes were not designed to be decoded with a layered schedule. Therefore, we can-

not make any claim of optimality for those codes when decoded under a layered

schedule. We may, however, be able to obtain inner-code ensembles, designed to be

decoded with a layered schedule, using, for example, a differential evolution method

similar to that in Sec. 6.5.3.

• In the MLC structure we considered in Chapters 4–6, the demodulator demaps the

MSBs given the inner decoded LSBs. Only hard-decision feedback is provided, re-

sulting in a low-complexity implementation. The BICM schemes we considered, on

the other hand, are deprived of such feedback. If greater complexity were permis-

sible, it is known that soft-decision feedback from the decoder to the demodulator

can significantly improve the performance of a BICM scheme [157]. Code design in

presence of such feedback is a topic for future study.

• We also remark that in the MLC structure we considered in Chapters 4–6 we

have not considered the effects of a possible mismatch between the actual channel

parameters and those for which the codes are designed. It is known that MLC

schemes are generally more susceptible than BICM schemes to such mismatches.

We leave the investigation of channel parameter mismatch on MLC and BICM code

performance as a topic for future study.

• We pointed out in Chapter 6 that the D4 lattice we obtained the signal constellation

Chapter 8. Conclusion and Topics of Future Research 103

from, is in fact isomorphic to the lattice obtained by applying a single-parity-check

code to the Z4 integer lattice. The next logical lattice we can draw the signal

constellation from, the E8 lattice, is isomorphic to the lattice obtained by applying

the length-8 extended Hamming code to the Z8 integer lattice. Exploring possible

advantages of signal constellations obtained in this way is another topic of future

work.

• We observed that with a clever constellation labelling, we can obtain some highly

reliable bit-levels (see, e.g., Table 6.1). Such bit-levels may then bypass the inner

SD code and be protected only by the outer HD code with much lower decoding

complexity, without sacrificing too much performance. On the flip side, in multi-

dimensional signal constellations and with clever labelling, we may also be able to

obtain some bit-levels with very low reliability. Such bit-levels may then be left

completely uncoded (frozen) without sacrificing too much performance. Designing

such signal constellations and labellings is also another topic of future work.

• In Chapter 7 we only considered nonlinear check operations that generate an

entropy-1 check bit to be sent over the symmetric channel. This constraints can

be relaxed in order to explore a larger space of check functions and their error-

reducing performance. As suggested by the external examiner, Prof. Alexan-

dre Graell i Amat, by controlling the check-bit entropy, LDNC code could also

be considered for probabilistic shaping of the signal constellation. Investigating

these ideas and other possible applications of LDNC codes is a topic of future

work.

Bibliography

[1] L. M. Zhang and F. R. Kschischang, “Low-complexity soft-decision concatenated

LDGM-staircase FEC for high bit-rate fiber-optic communication,” J. Lightw.

Technol., vol. 35, no. 18, pp. 3991–3999, Sep. 2017.

[2] A. Bisplinghoff, S. Langenbach, and T. Kupfer, “Low-power, phase-slip tolerant,

multilevel coding for M-QAM,” J. Lightw. Technol., vol. 35, no. 4, pp. 1006–1014,

Feb. 2017.

[3] Y. Koganei, T. Oyama, K. Sugitani, H. Nakashima, and T. Hoshida, “Multilevel

coding with spatially coupled repeat-accumulate codes for high-order QAM optical

transmission,” J. Lightw. Technol., vol. 37, no. 2, pp. 486–492, Jan. 2019.

[4] Optical Internetworking Forum, “Implementation agreement 400ZR,” OIF-400ZR-

01.0, 2020.

[5] T. Mehmood, M. P. Yankov, S. Iqbal, and S. Forchhammer, “Flexible

multilevel coding with concatenated polar-staircase codes for M-QAM,”

IEEE Trans. Commun., 2020, early access version. [Online]. Available:

https://doi.org/10.1109/TCOMM.2020.3038185

[6] F. Frey, S. Stern, J. K. Fischer, and R. F. H. Fischer, “Two-stage coded modulation

for Hurwitz constellations in fiber-optical communications,” J. Lightw. Technol.,

vol. 38, no. 12, pp. 3135–3146, Jun. 2020.

[7] I. B. Djordjevic, “On advanced FEC and coded modulation for ultra-high-speed

optical transmission,” IEEE Commun. Surveys Tuts., vol. 18, no. 3, pp. 1920–1951,

Jul. 2016.

[8] A. Alvarado, G. Liga, T. Fehenberger, and L. Schmalen, “On the design of coded

modulation for fiber optical communications,” in Proc. Signal Proc. Photonic Com-

mun. (SPPCom), New Orleans, USA, Jul. 2017, pp. SpM4F–2.

104

Bibliography 105

[9] P. Larsson-Edefors, C. Fougstedt, and K. Cushon, “Implementation challenges for

energy-efficient error correction in optical communication systems,” in Proc. Adv.

Photon., Zurich, Switzerland, Jul. 2018, pp. SpTh4F–2.

[10] A. Graell i Amat, G. Liva, and F. Steiner, “Coding for optical communications–

Can we approach the Shannon limit with low complexity?” in Proc. Europ. Conf.

Optic. Commun., Dublin, Ireland, Sep. 2019, pp. (Tu.1.B.5)1–4.

[11] A. Sheikh, A. G. i Amat, and A. Alvarado, “Novel high-throughput decoding

algorithms for product and staircase codes based on error-and-erasure decoding,”

Aug. 2020. [Online]. Available: http://arxiv.org/abs/2008.02181v1

[12] International Telecommunication Union, Telecommunication Standardization Sec-

tor, “Forward error correction for high bit-rate DWDM submarine systems,” ITU-T

Rec. G.975.1, Feb. 2004.

[13] B. P. Smith, A. Farhood, A. Hunt, F. R. Kschischang, and J. Lodge, “Staircase

codes: FEC for 100 Gb/s OTN,” J. Lightw. Technol., vol. 30, no. 1, pp. 110–117,

Jan. 2012.

[14] A. Y. Sukmadji, U. Martınez-Penas, and F. R. Kschischang, “Zipper codes:

Spatially-coupled product-like codes with iterative algebraic decoding,” in Proc.

Canadian Workshop Info. Theory, Hamilton, Canada, Jun. 2019, pp. 1–6.

[15] T. J. Richardson and R. L. Urbanke, Modern Coding Theory. Cambridge, U.K.:

Cambridge U. Press, 2008.

[16] G. Tzimpragos, C. Kachris, I. Djordjevic, M. Cvijetic, D. Soudris, and I. Tomkos,

“A survey on FEC codes for 100G and beyond optical networks,” IEEE Commun.

Surveys Tuts., vol. 18, no. 1, pp. 209–221, First Quarter 2016.

[17] M. Weiner, M. Blagojevic, S. Skotnikov, A. Burg, P. Flatresse, and B. Nikolic, “A

scalable 1.5-to-6Gb/s 6.2-to-38.1mW LDPC decoder for 60 GHz wireless networks

in 28nm UTBB FDSOI,” in IEEE Int. Solid-State Circuits Conf., Feb. 2014, pp.

464–465.

[18] T.-C. Ou, Z. Zhang, and M. Papaefthymiou, “An 821MHz 7.9Gb/s

7.3pJ/b/iteration charge-recovery LDPC decoder,” in IEEE Int. Solid-State Cir-

cuits Conf., Feb. 2014, pp. 462–463.

Bibliography 106

[19] Y. Lee, H. Yoo, J. Jung, J. Jo, and I.-C. Park, “A 2.74-pJ/bit, 17.7-Gb/s itera-

tive concatenated-BCH decoder in 65-nm CMOS for NAND flash memory,” IEEE

Trans. Syst. Sci. Cybern., vol. 48, no. 10, pp. 2531–2540, Oct. 2013.

[20] H. Yoo, Y. Lee, and I.-C. Park, “7.3 Gb/s universal BCH encoder and decoder for

SSD controllers,” in Proc. Asia South Pacific Design Autom. Conf., Jan. 2014, pp.

37–38.

[21] B. S. G. Pillai, B. Sedighi, K. Guan, N. P. Anthapadmanabhan, W. Shieh, K. J.

Hinton, and R. S. Tucker, “End-to-end energy modeling and analysis of long-haul

coherent transmission systems,” J. Lightw. Technol., vol. 32, no. 18, pp. 3093–3111,

Jun. 2014.

[22] D. A. Morero, M. A. Castrillon, A. Aguirre, M. R. Hueda, and O. E. Agazzi, “Design

tradeoffs and challenges in practical coherent optical transceiver implementations,”

J. Lightw. Technol., vol. 34, no. 1, pp. 121–136, Jan. 2016.

[23] 800G Pluggable Multi-source Agreement, “Enabling the next generation of cloud

& AI using 800Gb/s optical modules,” White Paper, Mar. 2020.

[24] E. Maniloff, S. Gareau, and M. Moyer, “400G and beyond: Coherent evolution

to high-capacity inter data center links,” in Proc. Optical Fiber Commun. Conf.

(OFC), San Diego, USA, Mar. 2019, p. M3H.4.

[25] G. Lechner, T. Pedersen, and G. Kramer, “Analysis and design of binary message

passing decoders,” IEEE Trans. Commun., vol. 60, no. 3, pp. 601–607, Dec. 2011.

[26] F. Angarita, J. Valls, V. Almenar, and V. Torres, “Reduced-complexity min-sum

algorithm for decoding LDPC codes with low error-floor,” IEEE Trans. Circuits

Syst. I, vol. 61, no. 7, pp. 2150–2158, Feb. 2014.

[27] K. Cushon, P. Larsson-Edefors, and P. Andrekson, “Low-power 400-Gbps soft-

decision LDPC FEC for optical transport networks,” J. Lightw. Technol., vol. 34,

no. 18, pp. 4304–4311, Aug. 2016.

[28] F. Steiner, E. B. Yacoub, B. Matuz, G. Liva, and A. G. i Amat, “One and two

bit message passing for SC-LDPC codes with higher-order modulation,” J. Lightw.

Technol., vol. 37, no. 23, pp. 5914–5925, Sep. 2019.

[29] Y. Lei, A. Alvarado, B. Chen, X. Deng, Z. Cao, J. Li, and K. Xu, “Decoding

staircase codes with marked bits,” in Proc. 10th Int. Symp. Turbo Codes Iterative

Inf. Process. (ISTC), Hong Kong, Dec. 2018, pp. 1–5.

Bibliography 107

[30] A. Sheikh, A. G. i Amat, and G. Liva, “Binary message passing decoding of

product-like codes,” IEEE Trans. Commun., vol. 67, no. 12, pp. 8167–8178, Sep.

2019.

[31] Y. Lei, B. Chen, G. Liga, X. Deng, Z. Cao, J. Li, K. Xu, and A. Alvarado,

“Improved decoding of staircase codes: The soft-aided bit-marking (SABM) al-

gorithm,” IEEE Trans. Commun., vol. 67, no. 12, pp. 8220–8232, Oct. 2019.

[32] D. S. Millar, T. Koike-Akino, S. O. Arik, K. Kojima, K. Parsons, T. Yoshida, and

T. Sugihara, “High-dimensional modulation for coherent optical communications

systems,” Opt. Express, vol. 22, no. 7, pp. 8798–8812, Apr. 2014.

[33] D. S. Millar, T. Fehenberger, T. Koike-Akino, K. Kojima, and K. Parsons, “Coded

modulation for next-generation optical communications,” in Proc. Optical Fiber

Commun. Conf. (OFC), San Diego, USA, Mar. 2018, p. Tu3C.3.

[34] I. B. Djordjevic and B. Vasic, “Nonbinary LDPC codes for optical communication

systems,” IEEE Photon. Technol. Lett., vol. 17, no. 10, pp. 2224–2226, Sep. 2005.

[35] A. Leven and L. Schmalen, “Status and recent advances on forward error correction

technologies for lightwave systems,” J. Lightw. Technol., vol. 32, no. 16, pp. 2735–

2750, 2014.

[36] R.-J. Essiambre, G. Kramer, P. J. Winzer, G. J. Foschini, and B. Goebel, “Capacity

limits of optical fiber networks,” J. Lightw. Technol., vol. 28, no. 4, pp. 662–701,

2010.

[37] Y. Cai, W. Wang, W. Qian, J. Xing, K. Tao, J. Yin, S. Zhang, M. Lei, E. Sun,

H.-C. Chien, Q. Liao, K. Yang, and H. Chen, “FPGA investigation on error-flare

performance of a concatenated staircase and Hamming FEC code for 400G inter-

data center interconnect,” J. Lightw. Technol., vol. 37, no. 1, pp. 188–195, Jan.

2019.

[38] L. Lundberg, “Power consumption and joint signal processing in fiber-optical

communication,” Ph.D. dissertation, Dept. of Microtechnology & Nanoscience,

Chalmers University of Technology, 2019.

[39] T. Koike-Akino, D. S. Millar, K. Kojima, K. Parsons, Y. Miyata, K. Sugihara, and

W. Matsumoto, “Iteration-aware LDPC code design for low-power optical commu-

nications,” J. Lightw. Technol., vol. 34, no. 2, pp. 573–581, 2015.

Bibliography 108

[40] N. Verma, H. Jia, H. Valavi, Y. Tang, M. Ozatay, L.-Y. Chen, B. Zhang, and

P. Deaville, “In-memory computing: Advances and prospects,” IEEE Solid-State

Circuits Mag., vol. 11, no. 3, pp. 43–55, Aug. 2019.

[41] L. M. Zhang and F. R. Kschischang, “Staircase codes with 6% to 33% overhead,”

J. Lightw. Technol., vol. 32, no. 10, pp. 1999–2002, May 2014.

[42] C. Hager and H. D. Pfister, “Approaching miscorrection-free performance of prod-

uct codes with anchor decoding,” IEEE Trans. Commun., vol. 66, no. 7, pp. 2797–

2808, Mar. 2018.

[43] A. Y. Sukmadji, “Zipper codes: High-rate spatially-coupled codes with algebraic

component codes,” Master’s thesis, Dept. of Electrical & Computer Engineering,

University of Toronto, 2020.

[44] R. G. Gallager, “Low-density parity-check codes,” IEEE Trans. Inf. Theory, vol. 8,

no. 1, pp. 21–28, Jan. 1962.

[45] R. Tanner, “A recursive approach to low complexity codes,” IEEE Trans. Inf.

Theory, vol. 27, no. 5, pp. 533–547, Sep. 1981.

[46] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the sum-

product algorithm,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 498–519, Feb.

2001.

[47] T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, “Design of capacity-

approaching irregular low-density parity-check codes,” IEEE Trans. Inf. Theory,

vol. 47, no. 2, pp. 619–637, Feb. 2001.

[48] N. Wiberg, “Codes and decoding on general graphs,” Ph.D. dissertation, Dept. of

Electrical Engineering, Linkoping University, 1996.

[49] J. Chen and M. P. Fossorier, “Density evolution for two improved BP-based decod-

ing algorithms of LDPC codes,” IEEE Commun. Lett., vol. 6, no. 5, pp. 208–210,

Aug. 2002.

[50] D. Divsalar, H. Jin, and R. J. McEliece, “Coding theorems for ”Turbo-Like” codes,”

in Proc. 36th Allerton Conf. on Commun., Control, and Comput., vol. 36, Allerton,

USA, Sep. 1998, pp. 201–210.

Bibliography 109

[51] J. Garcia-Frias and W. Zhong, “Approaching shannon performance by iterative

decoding of linear codes with low-density generator matrix,” IEEE Commun. Lett.,

vol. 7, no. 6, pp. 266–268, Jun. 2003.

[52] A. Darabiha, A. C. Carusone, and F. R. Kschischang, “Power reduction techniques

for LDPC decoders,” IEEE J. Solid-State Circuits, vol. 43, no. 8, pp. 1835–1845,

Jul. 2008.

[53] E. Amador, R. Knopp, V. Rezard, and R. Pacalet, “Dynamic power management

on LDPC decoders,” in Proc. IEEE Comput. Society Annu. Symp. VLSI, Lixouri,

Greece, Jul. 2010, pp. 416–421.

[54] T. Mohsenin, D. N. Truong, and B. M. Baas, “A low-complexity message-passing

algorithm for reduced routing congestion in LDPC decoders,” IEEE Trans. Circuits

Syst. I, vol. 57, no. 5, pp. 1048–1061, May 2010.

[55] X. Zhang and P. H. Siegel, “Quantized iterative message passing decoders with low

error floor for LDPC codes,” IEEE Trans. Commun., vol. 62, no. 1, pp. 1–14, Dec.

2013.

[56] Z. Wang, Z. Cui, and J. Sha, “VLSI design for low-density parity-check code de-

coding,” IEEE Circuits Syst. Mag., vol. 11, no. 1, pp. 52–69, Feb. 2011.

[57] M. Milicevic, “Low-density parity-check decoder architectures for integrated cir-

cuits and quantum cryptography,” Ph.D. dissertation, Dept. of Electrical & Com-

puter Engineering, University of Toronto, 2017.

[58] T. Richardson, “Error floors of LDPC codes,” in Proc. 41st Allerton Conf. on

Commun., Control, and Comput., vol. 41, no. 3, Allerton, USA, Oct. 2003, pp.

1426–1435.

[59] S. ten Brink, “Convergence behavior of iteratively decoded parallel concatenated

codes,” IEEE Trans. Commun., vol. 49, no. 10, pp. 1727–1737, Oct. 2001.

[60] M. Ardakani and F. R. Kschischang, “A more accurate one-dimensional analysis

and design of irregular LDPC codes,” IEEE Trans. Commun., vol. 52, no. 12, pp.

2106–2114, Dec. 2004.

[61] B. P. Smith, M. Ardakani, W. Yu, and F. R. Kschischang, “Design of irregu-

lar LDPC codes with optimized performance-complexity tradeoff,” IEEE Trans.

Commun., vol. 58, no. 2, pp. 489–499, Feb. 2010.

Bibliography 110

[62] L. Schmalen, S. ten Brink, G. Lechner, and A. Leven, “On threshold prediction

of low-density parity-check codes with structure,” in Proc. 46th Annu. Conf. on

Inform. Sci. and Syst. (CISS), Princeton , USA, Mar. 2012, pp. 1–5.

[63] M. P. Fossorier, “Quasicyclic low-density parity-check codes from circulant per-

mutation matrices,” IEEE Trans. Inf. Theory, vol. 50, no. 8, pp. 1788–1793, Jul.

2004.

[64] M. Milicevic and P. G. Gulak, “A multi-Gb/s frame-interleaved LDPC decoder

with path-unrolled message passing in 28-nm CMOS,” IEEE Trans. Very Large

Scale Integr. (VLSI) Syst., vol. 26, no. 10, pp. 1908–1921, Jun. 2018.

[65] Z. Li, L. Chen, L. Zeng, S. Lin, and W. H. Fong, “Efficient encoding of quasi-cyclic

low-density parity-check codes,” IEEE Trans. Commun., vol. 54, no. 1, pp. 71–81,

Jan. 2006.

[66] D. E. Hocevar, “A reduced complexity decoder architecture via layered decoding of

LDPC codes,” in Proc. IEEE Workshop Signal Processing and Systems (SIPS.04),

Austin , USA, Oct. 2004, pp. 107–112.

[67] G. D. Forney, “Concatenated codes,” Ph.D. dissertation, Research Laboratory of

Electronics, Massachusetts Institute of Technology, 1965.

[68] I. Dumer, “Low-density parity-check code constructions,” in Handbook of Coding

Theory, V. S. Pless and W. C. Huffman, Ed. Amsterdam, The Netherlands:

Elsevier Science, 1998, ch. 23, pp. 1911–1988.

[69] E. L. Blokh and V. V. Zyablov, “Coding of generalized concatenated codes,” Probl.

Pered. Inform., vol. 10, no. 3, pp. 45–50, 1974.

[70] M. Bossert, Channel coding for telecommunications. Hoboken, NJ: John Wiley &

Sons, 1999.

[71] J. L. Massey, “Coding and modulation in digital communications,” in International

Zurich Seminar on Digital Communications, Zurich, Switzerland, Mar. 1974.

[72] G. Ungerboeck, “Channel coding with multilevel/phase signals,” IEEE Trans. Inf.

Theory, vol. 28, no. 1, pp. 55–67, Jan. 1982.

[73] H. Imai and S. Hirakawa, “A new multilevel coding method using error-correcting

codes,” IEEE Trans. Inf. Theory, vol. 23, no. 3, pp. 371–377, May 1977.

Bibliography 111

[74] G. Ungerboeck, “Trellis-coded modulation with redundant signal sets Part I: In-

troduction,” IEEE Commun. Mag., vol. 25, no. 2, pp. 5–11, Feb. 1987.

[75] ——, “Trellis-coded modulation with redundant signal sets Part II: State of the

art,” IEEE Commun. Mag., vol. 25, no. 2, pp. 12–21, Feb. 1987.

[76] R. Zamir, Lattice Coding for Signals and Networks: A Structured Coding Approach

to Quantization, Modulation, and Multiuser Information Theory. Cambridge,

England: Cambridge University Press, 2014.

[77] G. D. Forney Jr, “Coset codes. I. introduction and geometrical classification,” IEEE

Trans. Inf. Theory, vol. 34, no. 5, pp. 1123–1151, Sep. 1988.

[78] ——, “Coset codes. II. binary lattices and related codes,” IEEE Trans. Inf. Theory,

vol. 34, no. 5, pp. 1152–1187, Sep. 1988.

[79] U. Wachsmann, R. F. Fischer, and J. B. Huber, “Multilevel codes: Theoretical

concepts and practical design rules,” IEEE Trans. Inf. Theory, vol. 45, no. 5, pp.

1361–1391, Jul. 1999.

[80] Asymmetric Digital Subscriber Line Transceivers 2 (ADSL2), Int. Telecommun.

Union (ITU) Std. Recommendation G.992.3, 2009.

[81] L. Beygi, E. Agrell, and M. Karlsson, “On the dimensionality of multilevel coded

modulation in the high SNR regime,” IEEE Commun. Lett., vol. 14, no. 11, pp.

1056–1058, 2010.

[82] F. Frey, S. Stern, R. Emmerich, C. Schubert, J. K. Fischer, and R. F. H. Fischer,

“Coded modulation using a 512-ary Hurwitz-integer constellation,” in Proc. 45th

Europ. Conf. Opt. Commun., Dublin, Ireland, Sep. 2019, pp. (W2D2)1–4.

[83] E. Zehavi, “8-PSK trellis codes for a Rayleigh channel,” IEEE Trans. Commun.,

vol. 40, no. 5, pp. 873–884, 1992.

[84] G. Caire, G. Taricco, and E. Biglieri, “Bit-interleaved coded modulation,” IEEE

Trans. Inf. Theory, vol. 44, no. 3, pp. 927–946, May 1998.

[85] L. Szczecinski and A. Alvarado, Bit-interleaved Coded Modulation: Fundamentals,

Analysis and Design. Hoboken, NJ: John Wiley & Sons, 2015.

Bibliography 112

[86] B. P. Smith and F. R. Kschischang, “A pragmatic coded modulation scheme for

high-spectral-efficiency fiber-optic communications,” J. Lightw. Technol., vol. 30,

no. 13, pp. 2047–2053, Jul. 2012.

[87] M. Barakatain and F. R. Kschischang, “Low-complexity concatenated LDPC-

staircase codes,” J. Lightw. Technol., vol. 36, no. 12, pp. 2443–2449, Jun. 2018,

(correction: vol. 37, no. 3, p. 1070, 2019).

[88] M. J. D. Powell, “A fast algorithm for nonlinearly constrained optimization calcu-

lations,” in Numerical Analysis. Springer, 1978, pp. 144–157.

[89] J. Zhao, F. Zarkeshvari, and A. H. Banihashemi, “On implementation of min-

sum algorithm and its modifications for decoding low-density parity-check (LDPC)

codes,” IEEE Trans. Commun., vol. 53, no. 4, pp. 549–554, Apr. 2005.

[90] K. Onohara, T. Sugihara, Y. Konishi, Y. Miyata, T. Inoue, S. Kametani, K. Sug-

ihara, K. Kubo, H. Yoshida, and T. Mizuochi, “Soft-decision-based forward error

correction for 100 Gb/s transport systems,” IEEE J. Sel. Topics Quantum Elec-

tron., vol. 16, no. 5, pp. 1258–1267, Sep. 2010.

[91] Y. Miyata, K. Kubo, K. Sugihara, T. Ichikawa, W. Matsumoto, H. Yoshida, and

T. Mizuochi, “Performance improvement of a triple-concatenated FEC by a UEP-

BCH product code for 100 Gb/s optical transport networks,” in Proc. OptoElec-

tronics Commun. Conf., Wuhan, China, May 2013, pp. (ThR2–2)1–3.

[92] D. Chang, F. Yu, Z. Xiao, N. Stojanovic, F. N. Hauske, Y. Cai, C. Xie, L. Li,

X. Xu, and Q. Xiong, “LDPC convolutional codes using layered decoding algorithm

for high speed coherent optical transmission,” in Proc. IEEE/OSA Optical Fiber

Commun. Conf., 2012, pp. (OW1H.4)1–3.

[93] D. Morero, M. Castrillon, F. Ramos, T. Goette, O. Agazzi, and M. Hueda, “Non-

concatenated FEC codes for ultra-high speed optical transport networks,” in Proc.

IEEE Global Telecommun. Conf., Dec. 2011, pp. 1–5.

[94] K. Sugihara, K. Ishii, K. Dohi, K. Kubo, T. Sugihara, and W. Matsumoto, “Scal-

able SD-FEC for efficient next-generation optical networks,” in Proc. Eur. Conf.

Exhibit. Opt. Commun., 2016, pp. 568–570.

[95] D. Chang, F. Yu, Z. Xiao, Y. Li, N. Stojanovic, C. Xie, X. Shi, X. Xu, and Q. Xiong,

“FPGA verification of a single QC-LDPC code for 100 Gb/s optical systems without

Bibliography 113

error floor down to BER of 10−15,” in Proc. IEEE/OSA Optical Fiber Commun.

Conf., 2011, pp. (OTuN2)1–3.

[96] A. Amraoui, A. Montanari, T. Richardson, and R. Urbanke, “Finite-length scaling

for iteratively decoded LDPC ensembles,” IEEE Trans. Inf. Theory, vol. 55, no. 2,

pp. 473–498, Feb. 2009.

[97] M. Barakatain, D. Lentner, G. Bocherer, and F. R. Kschischang, “Performance-

complexity tradeoffs of concatenated FEC for 64-QAM MLC and BICM,” in Proc.

45th Europ. Conf. Optic. Commun. (ECOC), Dublin, Ireland, Sep. 2019, pp.

(Tu.1.B.4)1–4.

[98] ——, “Performance-complexity tradeoffs of concatenated FEC for higher-order

modulation,” J. Lightw. Technol., vol. 38, no. 11, pp. 2944–2953, Jun. 2020.

[99] F. Steiner, G. Bocherer, and G. Liva, “Protograph-based LDPC code design for

shaped bit-metric decoding,” J. Sel. Areas Commun., vol. 34, no. 2, pp. 397–407,

Feb. 2016.

[100] F. Gray, “Pulse code communication,” U.S. Patent 2 632 058, Mar. 17, 1953.

[101] T. Richardson and R. Urbanke, “Multi-edge type LDPC codes,” in Information,

Coding and Mathematics: Proceedings of Workshop Honoring Prof. Bob McEliece

on his 60th Birthday, M. Blaum, P. G. Farrell, and H. C. A. van Tilbog, Eds. New

York, NY: Springer, Apr. 2002, pp. 24–25.

[102] J. Thorpe, “Low-density parity-check (LDPC) codes constructed from pro-

tographs,” IPN Progr. Rep., vol. 42, no. 154, pp. 42–154, Aug. 2003.

[103] G. Liva and M. Chiani, “Protograph LDPC codes design based on EXIT analysis,”

in Proc. IEEE Global Telecommun. Conf. (GLOBECOM), Washington, DC, Nov.

2007, pp. 3250–3254.

[104] F. Brannstrom, L. K. Rasmussen, and A. J. Grant, “Convergence analysis and

optimal scheduling for multiple concatenated codes,” IEEE Trans. Inf. Theory,

vol. 51, no. 9, pp. 3354–3364, Sep. 2005.

[105] E. Paolini and M. Flanagan, “Low-density parity-check code constructions,” in

Channel Coding: Theory, Algorithms, and Applications, D. Declercq, M. Fossorier,

and E. Biglieri, Eds. Cambridge, MA: Academic Press, 2014, pp. 141–209.

Bibliography 114

[106] R. Storn and K. Price, “Differential evolution–a simple and efficient heuristic for

global optimization over continuous spaces,” J. Global Optim., vol. 11, no. 4, pp.

341–359, Dec. 1997.

[107] M. Barakatain and F. R. Kschischang, “Low-complexity rate- and channel-

configurable concatenated codes,” J. Lightw. Technol., 2020, early access version.

[Online]. Available: https://doi.org/10.1109/JLT.2020.3046473

[108] J. Shi and R. D. Wesel, “A study on universal codes with finite block lengths,”

IEEE Trans. Inf. Theory, vol. 53, no. 9, pp. 3066–3074, Sep. 2007.

[109] H. Esfahanizadeh, A. Hareedy, R. Wu, R. Galbraith, and L. Dolecek, “Spatially-

coupled codes for channels with SNR variation,” IEEE Trans. Magn., vol. 54,

no. 11, pp. 1–5, Nov. 2018.

[110] D. Chang, F. Yu, Z. Xiao, N. Stojanovic, F. N. Hauske, Y. Cai, C. Xie, L. Li, X. Xu,

and Q. Xiong, “LDPC convolutional codes using layered decoding algorithm for

high speed coherent optical transmission,” in Proc. Optical Fiber Commun. Conf.

(OFC), Los Angeles, USA, Mar. 2012, p. OW1H.4.

[111] J. D. Andersen, K. J. Larsen, C. Bering, S. Forchhammer, F. Da Ros, K. Dalgaard,

and S. Iqbal, “A configurable FPGA FEC unit for Tb/s optical communication,”

in Proc. IEEE Int. Conf. Commun. (ICC), Paris, France, May 2017, pp. 1–6.

[112] K. Ishii, K. Dohi, K. Kubo, K. Sugihara, Y. Miyata, and T. Sugihara, “A study on

power-scaling of triple-concatenated FEC for optical transport networks,” in Proc.

Europ. Conf. Optical Commun., Valencia, Spain, Sep. 2015, pp. (Tu.3.4.2)1–3.

[113] D. A. A. Mello, A. N. Barreto, T. C. de Lima, T. F. Portela, L. Beygi, and J. M.

Kahn, “Optical networking with variable-code-rate transceivers,” J. Lightw. Tech-

nol., vol. 32, no. 2, pp. 257–266, Jan. 2013.

[114] G. Bosco, “Advanced modulation techniques for flexible optical transceivers: The

rate/reach tradeoff,” J. Lightw. Technol., vol. 37, no. 1, pp. 36–49, Jan. 2018.

[115] L. Schmalen, L. M. Zhang, and U. Gebhard, “Distributed rate-adaptive staircase

codes for connectionless optical metro networks,” in Proc. Optical Fiber Commun.

Conf. (OFC), Los Angeles, USA, Mar. 2017, pp. W1J–2.

[116] T. Koike-Akino, D. S. Millar, K. Parsons, and K. Kojima, “Rate-adaptive LDPC

convolutional coding with joint layered scheduling and shortening design,” in Proc.

Optical Fiber Commun. Conf. (OFC), San Francisco, USA, Mar. 2018, p. Tu3C.1.

Bibliography 115

[117] V. Jain, C. Fougstedt, and P. Larsson-Edefors, “Variable-rate FEC decoder VLSI

architecture for 400G rate-adaptive optical communication,” in Proc. IEEE Int.

Conf. Electron., Circuits, Systems (ICECS), Genova, Italy, Nov. 2019, pp. 45–48.

[118] G.-H. Gho and J. M. Kahn, “Rate-adaptive modulation and coding for optical

fiber transmission systems,” J. Lightw. Technol., vol. 30, no. 12, pp. 1818–1828,

Jun. 2012.

[119] M. Arabaci, I. B. Djordjevic, L. Xu, and T. Wang, “Nonbinary LDPC-coded mod-

ulation for rate-adaptive optical fiber communication without bandwidth expan-

sion,” IEEE Photon. Technol. Lett., vol. 24, no. 16, pp. 1402–1404, Jun. 2012.

[120] L. Beygi, E. Agrell, J. M. Kahn, and M. Karlsson, “Rate-adaptive coded mod-

ulation for fiber-optic communications,” J. Lightw. Technol., vol. 32, no. 2, pp.

333–343, Jan. 2013.

[121] B. Chen, Y. Lei, D. Lavery, C. Okonkwo, and A. Alvarado, “Rate-adaptive coded

modulation with geometrically-shaped constellations,” in Proc. Asia Commun.

Photon. Conf. (ACP), Hangzhou, China, Oct. 2018, pp. 1–3.

[122] D. S. Millar, T. Fehenberger, T. Koike-Akino, K. Kojima, and K. Parsons, “Distri-

bution matching for high spectral efficiency optical communication with multiset

partitions,” J. Lightw. Technol., vol. 37, no. 2, pp. 517–523, Jan. 2019.

[123] F. Buchali, F. Steiner, G. Bocherer, L. Schmalen, P. Schulte, and W. Idler, “Rate

adaptation and reach increase by probabilistically shaped 64-QAM: An experimen-

tal demonstration,” J. Lightw. Technol., vol. 34, no. 7, pp. 1599–1609, Apr. 2016.

[124] D. A. Morero, M. A. Castrillon, T. A. Goette, M. S. Schnidrig, F. A. Ramos,

M. C. Asinari, D. E. Crivelli, and M. R. Hueda, “Experimental demonstration of

a variable-rate LDPC code with adaptive low-power decoding for next-generation

optical networks,” in Proc. IEEE Photon. Conf., Wakoloa, USA, Oct. 2016, pp.

307–308.

[125] A. S. Thyagaturu, A. Mercian, M. P. McGarry, M. Reisslein, and W. Kellerer, “Soft-

ware defined optical networks (SDONs): A comprehensive survey,” IEEE Commun.

Surveys Tuts., vol. 18, no. 4, pp. 2738–2786, Oct. 2016.

[126] C. Pan and F. R. Kschischang, “Probabilistic 16-QAM shaping in WDM systems,”

J. Lightw. Technol., vol. 34, no. 18, pp. 4285–4292, Sep. 2016.

Bibliography 116

[127] G. Bocherer, P. Schulte, and F. Steiner, “Probabilistic shaping and forward error

correction for fiber-optic communication systems,” J. Lightw. Technol., vol. 37,

no. 2, pp. 230–244, Jan. 2019.

[128] S. ten Brink, G. Kramer, and A. Ashikhmin, “Design of low-density parity-check

codes for modulation and detection,” IEEE Trans. Commun., vol. 52, no. 4, pp.

670–678, Apr. 2004.

[129] E. Agrell and M. Karlsson, “Power-efficient modulation formats in coherent trans-

mission systems,” J. Lightw. Technol., vol. 27, no. 22, pp. 5115–5126, Nov. 2009.

[130] M. Karlsson and E. Agrell, Multidimensional Optimized Optical Modulation For-

mats. Hoboken, USA: John Wiley & Sons, 2016, ch. 2, pp. 13–64.

[131] J. H. Conway and N. J. A. Sloane, Sphere Packings, Lattices and Groups, 3rd ed.

New York, USA: Springer-Verlag, 1999.

[132] S. Stern, F. Frey, J. K. Fischer, and R. F. H. Fischer, “Two-stage dimension-wise

coded modulation for four-dimensional Hurwitz-integer constellations,” in Proc.

12th Int. ITG Conf. on Systems, Commun. and Coding (SCC), Rostock, Germany,

Feb. 2019, pp. 197–202.

[133] G. Welti and Jhong Lee, “Digital transmission with coherent four-dimensional mod-

ulation,” IEEE Trans. Inf. Theory, vol. 20, no. 4, pp. 497–502, Jul. 1974.

[134] S. Stern, M. Barakatain, F. Frey, J. Pfeiffer, J. K. Fischer, and R. F. Fischer,

“Coded modulation for four-dimensional signal constellations with concatenated

non-binary forward error correction,” in Proc. 46th Europ. Conf. Optic. Commun.

(ECOC), Brussels, Belgium, Dec. 2020, pp. (We1F.4)1–4.

[135] H. Song and J. R. Cruz, “Reduced-complexity decoding of Q-ary LDPC codes for

magnetic recording,” IEEE Trans. Magn., vol. 39, no. 2, pp. 1081–1087, Mar. 2003.

[136] D. Declercq and M. Fossorier, “Decoding algorithms for nonbinary LDPC codes

over GF(q),” IEEE Trans. Commun., vol. 55, no. 4, pp. 633–643, Apr. 2007.

[137] A. Voicila, D. Declercq, F. Verdier, M. Fossorier, and P. Urard, “Low-complexity

decoding for non-binary LDPC codes in high order fields,” IEEE Trans. Commun.,

vol. 58, no. 5, pp. 1365–1375, May 2010.

Bibliography 117

[138] V. B. Wijekoon, E. Viterbo, and Y. Hong, “A low complexity decoding algorithm

for NB-LDPC codes over quadratic extension fields,” in Proc. IEEE Int. Symp. Inf.

Theory (ISIT), Los Angeles, USA, Jun. 2020, p. C.4.1.

[139] M. C. Davey and D. J. MacKay, “Monte Carlo simulations of infinite low den-

sity parity check codes over GF(q),” in Int. Workshop Optim. Codes Rel. Topics,

Bulgaria, Balkans, Jun. 1998, pp. 9–15.

[140] M. Gorgoglione, V. Savin, and D. Declercq, “Optimized puncturing distributions

for irregular non-binary LDPC codes,” in Int. Symp. Inf. Theory Appl. (ISITA),

Taichung, Taiwan, Oct. 2010, pp. 400–405.

[141] M. Beermann, E. Monzo, L. Schmalen, and P. Vary, “GPU accelerated belief prop-

agation decoding of non-binary LDPC codes with parallel and sequential schedul-

ing,” J. Signal Process. Syst., vol. 78, no. 1, pp. 21–34, Jan. 2015.

[142] K. Zeger and A. Gersho, “Pseudo-Gray coding,” IEEE Trans. Commun., vol. 38,

no. 12, pp. 2147–2158, Dec. 1990.

[143] D. A. Spielman, “Linear-time encodable and decodable error-correcting codes,”

IEEE Trans. Inf. Theory, vol. 42, no. 6, pp. 1723–1731, Jun. 1996.

[144] H. Roozbehani and Y. Polyanskiy, “Triangulation codes: a family of non-linear

codes with graceful degradation,” in 2018 Conf. Inform. Sciences and Syst. (CISS),

Princeton, USA, Mar. 2018, pp. 1–6.

[145] ——, “Low density majority codes and the problem of graceful degradation,”

Nov. 2019. [Online]. Available: http://arxiv.org/abs/1911.12263v1

[146] R. G. Gallager, Information theory and reliable communication. Hoboken, NJ:

John Wiley & Sons, 1968.

[147] T. M. Cover, Elements of information theory. Hoboken, NJ: John Wiley & Sons,

1999.

[148] R. McEliece and R. J. Mac Eliece, The theory of information and coding. Cam-

bridge, U.K.: Cambridge university press, 2002.

[149] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J.,

vol. 27, no. 3, pp. 379–423, Jul. 1948.

Bibliography 118

[150] J. L. Massey, “Joint source and channel coding,” in Commun. Syst. Random Process

Theory, J. K. Skwirzynski, Ed. Alphenaan den Rijn, The Netherlands: Sijthoff

and Noordhoff, 1978, pp. 279–293.

[151] A. Gupta and S. Verdu, “Nonlinear sparse-graph codes for lossy compression,”

IEEE Trans. Inf. Theory, vol. 55, no. 5, pp. 1961–1975, Apr. 2009.

[152] Z. Sun, M. Shao, J. Chen, K. M. Wong, and X. Wu, “Achieving the rate-distortion

bound with low-density generator matrix codes,” IEEE Trans. Commun., vol. 58,

no. 6, pp. 1643–1653, Jun. 2010.

[153] R. Venkataramanan, T. Sarkar, and S. Tatikonda, “Lossy compression via sparse

linear regression: Computationally efficient encoding and decoding,” IEEE Trans.

Inf. Theory, vol. 60, no. 6, pp. 3265–3278, Apr. 2014.

[154] V. Aref, N. Macris, and M. Vuffray, “Approaching the rate-distortion limit with

spatial coupling, belief propagation, and decimation,” IEEE Trans. Inf. Theory,

vol. 61, no. 7, pp. 3954–3979, May 2015.

[155] E. J. Ionascu, T. Martinsen, and P. Stanica, “Bisecting binomial coefficients,”

Discrete Applied Mathematics, vol. 227, pp. 70–83, Aug. 2017.

[156] E. J. Ionascu, “A variation on bisecting the binomial coefficients,” Mar. 2018.

[Online]. Available: http://arxiv.org/abs/1712.01243

[157] H. Ma, X. Leung, W. K.and Yan, K. Law, and M. Fossorier, “Delayed bit in-

terleaved coded modulation,” in Proc. 9th Int. Symp. Turbo Codes Iterative Inf.

Process. (ISTC), Brest, France, Aug. 2016, pp. 86–90.