Efficient Error Correcting Codes for Emerging and High ...
-
Upload
khangminh22 -
Category
Documents
-
view
0 -
download
0
Transcript of Efficient Error Correcting Codes for Emerging and High ...
The Dissertation Committee for Abhishek Das Certifies that this is the approved
version of the following dissertation:
Efficient Error Correcting Codes for Emerging and High-Density
Memory Systems
Committee:
____________________________________
Nur A. Touba, Supervisor
____________________________________
Zhigang Pan
____________________________________
Jacob A. Abraham
____________________________________
Michael Orshansky
____________________________________
Mudit Bhargava
Efficient Error Correcting Codes for Emerging and High-Density
Memory Systems
by
Abhishek Das
Dissertation
Presented to the Faculty of the Graduate School of
The University of Texas at Austin
in Partial Fulfillment
of the Requirements
for the Degree of
Doctor of Philosophy
The University of Texas at Austin
December 2019
v
Acknowledgements
First and foremost, I would like to thank my advisor Dr. Nur Touba for his
invaluable guidance and support over the course of my PhD. Our discussions of various
professional and personal topics were highly stimulating and a joyful experience.
Instances wherein he would easily spot loopholes in new ideas which I missed due to my
excitement of having discovered something or point out the fact that my newly
discovered idea had been discovered some 50 odd years back always brings a smile to my
face. I am immensely grateful to him for instilling the qualities of striving for higher
quality and periodically taking a step back and analyzing things in me. I can truly say that
the pace at which I moved forward in my PhD program was defined by me. His innate
ability to motivate without ever putting any kind of pressure at all is something I will
value for the rest of my life.
I would also like to take this opportunity to thank my committee members Dr.
Jacob Abraham, Dr. Michael Orshansky, Dr. David Z. Pan and Dr. Mudit Bhargava for
their valuable insights, stimulating discussions and vital suggestions without which this
dissertation wouldn’t have been complete.
Swetalina Panigrahi has been my pillar of support throughout all these years. I am
extremely luck to have her in my life. Her keen insights, not in my research, but rather in
our personal life, helped me research better. Her belief of giving our best at everything
we do and her relaxed attitude towards our life in general has pulled me through some
really difficult times.
I also take this opportunity to thank my sister Aritra Das for her life lessons.
Although, younger, her attitude towards life is something I always aspire to have.
vi
Last but not the least, this dissertation would not have been complete without my
parents, Bithika and Swarup Kumar Das. Their emphasis to strive towards self-growth
with honesty and integrity has made me what I am today. Their unwavering support,
unconditional love and an inherent belief in my abilities has been the greatest source of
inspiration for me. I would like to thank them from the bottom of my heart for being there
whenever I needed them. No amount of words can ever justify their immense
contribution towards my success.
Finally, I would like to thank the National Science Foundation for their generous
grants which made this dissertation possible.
vii
Abstract
Efficient Error Correcting Codes for Emerging and High-Density
Memory Systems
Abhishek Das, Ph.D.
The University of Texas at Austin, 2019
Supervisor: Nur A. Touba
As memory technology scales, the demand for higher performance and reliable
operation is increasing as well. Field studies show increased error rates at dynamic
random-access memories. The high density comes at a cost of more marginal cells and
higher power consumption. Multiple bit upsets caused by high energy radiation are now
the most common source of soft errors in static random-access memories affecting
multiple cells. Phase change memories have been in focus as an attractive alternative to
DRAMs due to their low power consumption, lower bit cost and high density. But these
memories suffer from various reliability issues. The errors caused by such mechanisms
can cause large overheads for conventional error correcting codes.
This research addresses the issue of memory reliability under these new
constraints due to technology scaling. The goal of the research is to address the different
error mechanisms as well as increased error rates while keeping the error correction time
low so as to enable high throughput. Various schemes have been proposed such as
addressing multiple bit upsets in SRAMs through a burst error correcting code which has
a linear increase in complexity as compared to exponential increase for existing methods
viii
[Das 18b], as well as a double error correcting code with lower complexity and lower
correction time for the increased error rates in DRAMs [Das 19].
This research also addresses limited magnitude errors in emerging multilevel cell
memories, e.g. phase change memories. A scheme which extends binary Orthogonal
Latin Square codes in presented [Das 17] which utilizes a few bits from each cell to
provide protection based on the error magnitude. The issue of write disturbance error in
multilevel cells is also addressed [Das 18a] using a modified Reed-Solomon code. The
proposed scheme achieves a very low decoding time compared to existing methods
through the use of a new construction methodology and a simplified decoding procedure.
A new scheme is presented using non-binary Hamming codes which protect more
memory cells for the same amount of redundancy [Das 18c] through the use of unused
columns in the code space of the design.
ix
Table of Contents
List of Tables .................................................................................................................... xii
List of Figures .................................................................................................................. xiii
Chapter 1: Introduction ........................................................................................................1
1.1 Phase Change Memories .......................................................................................2
1.1.1 Write Disturbance Errors .......................................................................3
1.1.2 Resistance Drift Errors...........................................................................4
1.2. Spin Transfer Torque Magnetic RAM (STT-MRAM) .............................5
1.2.1 Read Disturbance Errors ........................................................................6
1.2.2 Magnetic Field Coupling Errors ............................................................6
1.3 High Density Memory Systems ............................................................................7
1.4 Error Correcting Codes .........................................................................................7
1.5 Contributions of Dissertation ................................................................................9
Chapter 2: Low Complexity Burst Error Correcting Codes to Correct MBUs in
SRAMs .........................................................................................................................12
2.1 Introduction .........................................................................................................12
2.2 Burst Error Correcting Hamming Codes ............................................................14
2.2.1 Syndrome Analysis ..............................................................................15
2.2.2 Decoding Procedure ..........................................................................16
2.3 Proposed Scheme ................................................................................................16
2.3.1 Decoding Procedure ..........................................................................17
2.3.2 Area and Delay Optimization ...........................................................19
2.4 Evaluation ...........................................................................................................20
x
2.4.1 Redundancy ......................................................................................20
2.4.1 Hardware Complexity .......................................................................21
2.5 Conclusion ..........................................................................................................23
Chapter 3: Layered-ECC: A Class of Double Error Correcting Codes for High
Density Memory Systems ............................................................................................24
3.1 Introduction .........................................................................................................24
3.2 Related Work ......................................................................................................25
3.3 Proposed Scheme ................................................................................................28
3.3.1 Low Latency Decoding ........................................................................33
3.3.2 Low Complexity Decoding ..................................................................34
3.4 Evaluation ...........................................................................................................35
3.5 Conclusion ..........................................................................................................37
Chapter 4: Limited Magnitude Error Correction using OLS Codes for Memories with
Multilevel Cells ............................................................................................................38
4.1 Introduction .........................................................................................................38
4.2 Orthogonal Latin Square Codes ..........................................................................39
4.3 Proposed Scheme ................................................................................................41
4.3.1 Redundancy Optimization ...................................................................44
4.4 Evaluation ...........................................................................................................46
4.5 Conclusion ..........................................................................................................48
Chapter 5: Systematic b-Adjacent Symbol Error Correcting Reed-Solomon Codes
with Parallel Decoding .................................................................................................50
5.1 Introduction .........................................................................................................50
5.2 Reed-Solomon Codes .........................................................................................51
5.3 Proposed Scheme ................................................................................................53
xi
5.3.1 Decoding Procedure .............................................................................55
5.4 Evaluation ...........................................................................................................57
5.4.1 Redundancy .........................................................................................59
5.4.2 Hardware Complexity ..........................................................................59
5.5 Conclusion ..........................................................................................................61
Chapter 6: Efficient Non-binary Hamming Codes for Limited Magnitude Errors in
MLC PCMs ..................................................................................................................62
6.1 Introduction .........................................................................................................62
6.2 General Hamming Codes ....................................................................................64
6.3 Proposed Scheme ................................................................................................66
6.3.1 Syndrome Analysis ..............................................................................68
6.3.2 Companion Matrix ...............................................................................70
6.3.3 Encoder ................................................................................................71
6.3.4 Decoder ................................................................................................71
6.4 Evaluation ...........................................................................................................72
6.5 Conclusion ..........................................................................................................75
Chapter 7: Summary and Future Work ..............................................................................76
7.1 Summary .............................................................................................................76
7.2 Future Work ........................................................................................................78
Bibliography ......................................................................................................................79
Vita .....................................................................................................................................84
xii
List of Tables
Table 2.1: Possible Errors for a Data Bit Within a 4-Bit Burst Window...........................15
Table 2.2: Comparison of redundancy, decoder latency, decoder area and decoder cell
usage between burst error correcting Hamming codes and Proposed
Codes.............................................................................................................21
Table 3.1: Example of syndrome values and error candidates for different error types ....32
Table 3.2: Comparison of proposed low latency decoder with existing schemes ............36
Table 3.3: Comparison of proposed low complexity serial decoder with existing
schemes .........................................................................................................36
Table 4.1: Comparison of OSMLD and proposed codes for asymmetric magnitude-3
error ...............................................................................................................47
Table 4.2: Comparison of OSMLD, proposed OLS and Hybrid codes for symmetric
magnitude-1 error in 3-bits/cell memory ......................................................47
Table 4.3. Comparison of OSMLD, proposed OLS and Hybrid codes for symmetric
magnitude-3 error in 4-bits/cell memory ......................................................48
Table 5.1: Redundancy, Decoder Area and Latency Comparison for DAsEC codes ........59
Table 5.2: Redundancy, Decoder Area and Latency Comparison for TAsEC codes ........59
Table 6.1: Comparison of Encoding circuit between the different schemes .....................74
Table 6.2: Comparison of Decoding circuit between the different schemes .....................74
xiii
List of Figures
Fig. 1.1: PCM Cell Structure ...............................................................................................2
Fig. 1.2: MLC PCM resistance levels ..................................................................................3
Fig. 1.3: Write Disturbance in PCM ....................................................................................4
Fig. 1.4: Resistance Drift in MLC PCM ..............................................................................5
Fig. 1.5: 1T1MTJ STT-MRAM Cell ...................................................................................6
Fig. 2.1: General form of the parity check matrix of proposed codes. ..............................17
Fig. 2.2: Symbols data bit di is part of for a 4-bit burst error correcting proposed code. ..18
Fig. 2.3: Comparison of decoder area for different information lengths k and different
burst sizes b. ..................................................................................................22
Fig. 3.1: p-parallel Chien search architecture with short critical path ...............................26
Fig. 3.2: Parity check matrix of proposed scheme for k=16 ..............................................29
Fig. 3.3: Block diagram of proposed decoding logic .........................................................30
Fig. 3.4: Comparison of number of syndromes for different data bit sizes between
proposed scheme and a DEC BCH code .......................................................34
Fig. 3.5: Block diagram of error pattern generation using the serial low complexity
decoding procedure .......................................................................................35
Fig. 4.1: Parity check matrix and decoder logic for a SEC (8,4) OLS Code .....................40
Fig. 4.2: Bidirectional (magnitude-1 upwards and magnitude-2 downwards) error
transitions on lower order 2 bits....................................................................42
Fig. 4.3. Example of encoder and decoder logic for a proposed scheme ..........................43
Fig. 4.4: Symmetric magnitude-1 errors for d0 and d1 .......................................................45
Fig. 5.1: Partial schematics of error pattern generator for the proposed scheme. ..............56
Fig. 6.1: Parity Check Matrix of (5, 3) 4-ary Hamming code ...........................................64
Fig. 6.2: Classification of columns and elements for a 2-bits/cell memory ......................65
xiv
Fig. 6.3: Multiplying major columns with 7 for a 3-bits/cell memory ..............................65
Fig. 6.4: All possible error patterns for a 3-bits/cell memory ............................................67
Fig. 6.5: (a) All possible columns for a 3-bits/cell memory and 2 parity check
symbols (b) Resultant columns after removal of major elements from e1
(c) Resultant columns after removal of major elements from e2 ..................67
Fig. 6.6: Parity check matrix of a limited magnitude-1 error correcting code for a 3-
bits per cell memory......................................................................................68
Fig.6.7: Algorithm for construction of the parity check matrix of the proposed scheme ..69
Fig. 6.8: Comparison of #syndromes between limited magnitude error correcting
Hamming codes and non-binary general Hamming codes ...........................70
Fig. 6.9: Binary form of partial parity check matrix of Fig. 6.6 ........................................72
Fig. 6.10: Error Magnitude Computation for first 2 symbols of parity check matrix
from Fig. 6.9 .................................................................................................72
1
Chapter 1: Introduction
Conventional memory systems are based on DRAMs and SRAMs which have
been quite efficient for many years now. But with technology scaling leading to more
stringent requirements like less power consumption and higher capacity, DRAM and
SRAM scaling has not been able to keep up. Emerging forms of memories like Phase
Change Memory (PCM) and Spin Transfer Torque Magnetic RAM (STT-MRAM) have
been the focus of research as possible alternatives. Properties like low power
consumption, low cost per bit and further technology scaling make them a viable
replacement solution. But such emerging memories suffer from various reliability issues
like write disturbance errors and resistance drifts which can cause read reliability
degradation in PCMs. Meanwhile, the conventional memories have continued to shrink in
size due to technology scaling thus providing adequately high densities.
Conventional error correcting techniques can protect the newer memory
alternatives. But the completely different reliability issues and error models make them
un-suitable for this purpose. The high density and increased error rates in memories like
SRAMs and DRAMs also render conventional codes ineffective. Conventional codes can
very well lead to high data redundancy and high decoding complexity for these modern
memory systems. This creates a demand for newer and efficient error correcting
techniques which are better suited in addressing the newer reliability issues and fault
models. Emerging memories also support multilevel operations i.e. they can store
multiple bits per cell. This renders conventional binary error correcting codes in-efficient
to address multi-bit errors. Such codes would require a lot of redundancy to handle multi-
bit symbol-based errors.
2
In further sections, the basic operation of each type of emerging and high density
memory systems is described along with current reliability issues that plague these
memory technologies.
1.1 PHASE CHANGE MEMORIES
Although, phase change memories were first developed in [Ovshinsky 68], this
memory technology was revived in recent years due to a fast crystallization material
Ge2Sb2Te5 (GST) [Yamada 91]. This memory technology was shown to have great
promise as a main memory system in terms of scalability [Raoux 08]. A more recent
proposal of multilevel cell (MLC) operation with the ability to store multiple bits per cell
thus lowering costs and providing high density [Papandreou 10] has increased the
research focus on this area further.
Top Electrode
Phase Change Material
Programmable Region
Bottom Electrode
HeaterInsulator
Fig. 1.1: PCM Cell Structure
A basic PCM cell has been shown in Fig. 1.1. It consists of a phase change
material, a programmable region, a heater, insulators, a top and a bottom electrode. The
programmable region can be programmed to either be in a crystalline state or in an
amorphous state [Wong 10]. The PCM cell is reset into an amorphous state by melting
the programmable region and then quenching it rapidly through a short duration large
electrical current. This creates a highly resistive amorphous material. This amorphous
region is in series with the crystalline phase change material. The total resistance of the
3
cell thus is the series combination of the crystalline part and the created amorphous
region. In order to set the PCM cell to crystalline state, a lower electrical pulse is applied
for a long duration so that the programmable region is annealed at a temperature between
the crystallization temperature and the melting temperature. A read operation is
performed through measuring the resistance of the cell by passing a small electrical
current through the cell. The electrical current needs to be small enough so as not to
disturb the contents of the cell.
PCM cells can also be configured so that the memory cell can be programmed
into several intermediate resistance levels between that of SET and RESET state. This
allows for the storage of multiple bits per cell. This leads to further lower cost per bit
since it basically enables increasing the capacity without changing the number of cells.
But such an operation requires an iterative algorithm-based programming so as to account
for process and material variations. An example for a 2-bits/cell memory’s resistance
distribution is shown in Fig. 1.2.
00 01 10 11
% o
f ce
lls
Resistance
Fig. 1.2: MLC PCM resistance levels
1.1.1 Write Disturbance Errors
Write disturbance errors are caused by the heat dissipated from a cell during a
PROGRAM operation. Cells that are being programmed to RESET state are more prone
to cause write disturbance errors since it involves a large electrical current pulse. If the
4
heat dissipated from such a PROGRAM operation is more than the crystallization
temperature of the neighboring cells, it can potentially cause partial crystallization of
such cells thus changing their stored data values. This problem worsens with technology
scaling as cells get more closer to each other [Jiang 14]. Smaller technology nodes with
super dense memories thus have exacerbated write disturbance errors affecting numerous
contiguous cells. An example of write disturbance has been shown in Fig. 1.3. It should
be noted that for write disturbance errors to occur, a considerably high write current is
required. This is so that enough heat needs to be dissipated to cause a disturbance in
neighboring cells. Thus, write disturbance errors are more probable if the write operation
involves programming the cell to a more amorphous state.
2 3 0 1 0 3 1 0
2 0 x x x 3 1 0
Old Data
New Data
Disturbed Cells
Write
Fig. 1.3: Write Disturbance in PCM
1.1.2 Resistance Drift Errors
Resistance drift errors are caused due to structural relaxation of the PCM cell
which causes the cell’s resistance to increase over time [Li 12]. This causes a read
reliability degradation over time since the increased resistance of the cell can lead to an
erroneous readout of the cell. For MLC PCM cells, this problem is exacerbated since
there are more resistance distributions compared to a single level cell (SLC) storing just
one bit. Also, as the number of bits per cell b increases, the number of resistance
distributions, given by L = 2b, increases exponentially. Thus, the read reliability
5
degradation worsens as more bits are stored per cell. An example of resistance drift for a
2-bits per cell memory is shown in Fig. 1.4.
00 01 10 11
% o
f ce
lls
Resistance
Fig. 1.4: Resistance Drift in MLC PCM
1.2. SPIN TRANSFER TORQUE MAGNETIC RAM (STT-MRAM)
STT-MRAMs was theoretically predicted in [Slonczewski 96]. A STT-MRAM
cell consists of a transistor and a magnetic tunneling junction connected in series as
shown in Fig. 1.5. The MTJ consists of ferromagnetic electrodes separated by a thin
barrier of tunnel oxide (e.g. MgO). Any information is thus stored in an MTJ based on
the relative magnetization of the free layer and the pinned layer. Also, since storage is
dependent on the spin directions of electrons as opposed to charge, they are relatively
immune to radiation induced soft errors. The MTJ can be configured into two resistance
states: a low resistance state when the free layer’s magnetization is parallel to that of the
pinned layer and a high resistance state when the free layer’s magnetization is anti-
parallel to that of the pinned layer. A spin polarized current is used to switch the
magnetization of the MTJ [Diao 05]. The spin polarized current exerts a torque on the
free layer causing it to change its direction of magnetic moment. The read operation
involved passing a small current through the cell and sensing the resistive state of the
STT-MRAM cell.
6
Free Layer
Pinned Layer
Tunnel Oxide
WL
SL
BL
Fig. 1.5: 1T1MTJ STT-MRAM Cell
1.2.1 Read Disturbance Errors
With the scaling of STT-MRAM, the write current reduces as well. But the read
current does not scale well with feature size [Jiang 16]. For small technology nodes, it is
challenging to reduce read current since it becomes difficult for conventional STT-
MRAM sense amplifiers to sense data correctly. Thus, for lower technology reads it is
possible that some read operations might actually cause the cell’s data to flip, thus
causing a read disturbance error.
1.2.2 Magnetic Field Coupling Errors
Scaling MTJ in a densely packed array causes program errors due to large stray
field coupling [Chappert 07]. As MTJs scale down, the distance of ferromagnets, free
layer and the pinned layer decrease. This causes strong magnetic coupling. As the
distance reduces further, the magnetic coupling gets stronger thus worsening the problem.
This magnetic field coupling from one MTJ basically affects the write and read
operations of its neighboring bits [Yoon 18]. Magnetic coupling causes significant
degradation to average retention times. A non-trivial degradation to critical current
densities is also caused which affects the write operation of a STT-MRAM cell.
7
1.3 HIGH DENSITY MEMORY SYSTEMS
Conventional memory systems like SRAM and DRAM have seen continued
shrinking of their size due to technology scaling. The new technology nodes enable high
density, but also come at a cost of increased error rate. For high density SRAM systems,
the decreased cell size and the decreased distance between cells means that a particle
strike no longer affects a single SRAM cell. The effects of a particle strike are now seen
across multiple SRAM cells causing a multiple bit upset (MBU) [Radaelli 05]. DRAM
systems have also seen a continuous scaling trend with current industry standards
manufacturing their 10nm class of DRAMs. But field studies of DRAM systems have
shown increased error rates wherein the traditional SEC-DED codes are no longer strong
enough to provide continued error protection.
Flash based memories, e.g. NAND flash, have seen an increase in capacity as
well. And as technology nodes scales, the demand for high throughput becomes a major
concern. This is because, these memories use BCH or its derivative based codes which
use a significant number of decoding cycles to check and correct for errors. This becomes
a bottleneck in enabling high throughput.
1.4 ERROR CORRECTING CODES
Error correcting codes have been conventionally used to protect SRAMs and
DRAMs from radiation induced soft errors. The most prevalent code is the single error
correcting double error detecting (SEC-DED) Hamming code presented in [Hamming
50]. These codes can correct any single bit error in a word and detect any double errors.
These codes have very low decoding latency and have an intermediate amount of data
redundancy. Another prevalent code specific to SRAMs is the Orthogonal Latin Square
(OLS) codes first presented in [Hsiao 70]. These codes have a considerable high amount
8
of data redundancy. But they utilize majority logic decoding procedure which enables
very low latency decoding with very low decoding complexity. Also, they are modular in
design and can correct more than one errors. These codes are not suitable for modern
memories like PCMs and STT-MRAMs because of the relatively larger number of bits
that need protection. An SEC-DED code cannot guarantee protection against the newer
forms of errors which affect multiple bits. OLS codes incur a considerable data
redundancy overhead to address symbol-based error in MLC PCMs or multi-bit clustered
errors in STT-MRAMs. For high density SRAMs and DRAMs, due to the shrinking
technology nodes, these memories are seen to have an increased error rate. The
conventional error correcting codes incur high overheads and are not strong enough to
protect the memories from the high error rates.
For flash-based memories and hard disk storage, conventionally low redundancy
codes like Bose-Chaudhuri-Hocquenghem (BCH) codes have been used. They offer very
low redundancy but have a trade-off in terms of multi-cycle decoding procedure. They
incur significant decoding logic overhead and can take considerable number of cycles,
sometimes proportional to the number of bits in the word, for the decoding procedure
[Chien 64]. This calls for the need of alternative solutions which incur a less overhead in
terms of number of decoding cycles. Moreover, non-binary error correcting BCH codes
are not suitable in the context of PCM and STT-MRAMs. This is because, these
memories are random-access memories and require a faster and relatively simpler
decoding procedure. These factors call for the need of design of more efficient codes
which are better suitable for the newer forms of memory with their relatively different
reliability issues.
9
1.5 CONTRIBUTIONS OF DISSERTATION
Over the course of the next few chapters, novel ideas related to different
reliability concerns have been presented. The primary focus of all the methods proposed
in this dissertation is to design more efficient error correcting codes better suited to
address the new reliability issues.
Chapter 2 addresses the issue of burst errors or MBUs in SRAM based systems. A
new burst error correcting code based on Hamming codes is proposed which allows much
better scaling of decoder complexity as the burst size is increased [Das 18b]. For larger
burst sizes, it can provide significantly smaller and faster decoders than existing methods
thus providing higher reliability at an affordable cost. Moreover, there is no significant
increase in the number of check bits in comparison with existing methods. A general
construction and decoding methodology for the new codes is proposed. Experimental
results are presented comparing the decoder complexity for the proposed codes with
conventional burst error correcting Hamming codes demonstrating the significant
improvements that can be achieved.
In chapter 3, a layered double error correcting (DEC) code is proposed with a
simple decoding procedure [Das 19]. The codes are shown to strike a good balance
between redundancy and decoder complexity. A general construction methodology is
presented along with two different decoding schemes. One is a low latency decoding
scheme that is useful for main memories which need high speed decoding for optimal
performance. This scheme is shown to achieve better redundancy compared to existing
low-latency codes as well as faster decoder latency compared to existing low-redundancy
codes. The second is a low complexity decoding scheme which is useful for flash-based
memories. This scheme is shown to have considerably less area compared to existing
schemes. Also, it is shown that the proposed serial low complexity decoding scheme can
10
take significantly fewer cycles to complete the whole decoding procedure; thus, enabling
better performance compared to existing serial decoding schemes.
Chapter 4 proposes a non-binary OLS code to address limited magnitude errors in
the form of resistance drifts in MLC PCMs [Das 17]. The codes presented extend the
binary OLS codes and modified to correct limited magnitude errors. The codes presented
are able to lower the redundancy by considering only a few bits from each symbol to
compute the parity bits. A new decoding methodology is presented which looks at the
received bits per symbols as well as the decoder bits per symbol to compute the error
magnitude, which is then added back to the received symbol. Chapter 3 also presents
hybrid codes are which combines the limited magnitude OLS codes and another low
redundancy error correcting codes to further reduce the number of check symbols. But
this low data redundancy is achieved at the expense of some additional decoding
complexity.
In chapter 5, a new systematic b-adjacent codes based on Reed-Solomon codes
are proposed [Das 18a]. These codes are targeted towards write disturbances in PCM
which affects multiple clustered cells as well as finds usage in magnetic field coupling in
STT-MRAMs affecting multiple neighboring bits. Reed Solomon (RS) codes offer good
error protection since they can correct multi-bit symbols at a time. But beyond single
symbol error correction, the decoding complexity as well as the decoding latency is very
high. The codes presented in this work also have a low latency and low complexity
parallel one step decoding scheme. This makes them suitable for PCM and STT-MRAM
since they need a high-speed decoding procedure to enable high performance. The codes
presented can correct any errors within b-adjacent symbols. The codes presented are
shown to have very low data redundancy and significantly better decoding complexity
and decoding latency compared to existing Reed-Solomon schemes.
11
Chapter 6 proposes a new systematic single error correcting (SEC) limited
magnitude error correcting non-binary Hamming code specifically to address limited
magnitude errors in multilevel cell memories storing multiple bits per cell [Das 18c]. A
general construction methodology is presented to correct errors of limited magnitude and
is compared to existing schemes addressing limited magnitude errors in phase change
memories. A syndrome analysis is done to show the reduction in total number of
syndromes for limited magnitude error models. It is shown that the proposed codes
provide better latency and complexity compared to existing limited magnitude error
correcting non-binary Hamming codes. It is also shown that the proposed codes achieve
better redundancy compared to the symbol extended version of binary Hamming codes.
Finally, Chapter 7 summarizes the contributions of this dissertation and provides
some future directions for the continuation of this research work.
12
Chapter 2: Low Complexity Burst Error Correcting Codes to Correct
MBUs in SRAMs
2.1 INTRODUCTION
Soft errors caused by radiation poses a significant reliability concern for SRAMs
[Baumann 05]. With technology scaling, the susceptibility of SRAMs to soft errors has
significantly increased as well [Ibe 10]. In current nanoscale technology nodes, device
geometries are small, and with technology scaling, devices are getting smaller. Thus, a
particle strike might affect more than one cell causing a multiple bit upset [Radaelli 05].
The smaller the device geometries, more the number of cells that are affected by a single
particle strike. A b-bit burst error caused by such a particle strike can cause multiple bits
to be flipped within the b-bit burst window. Thus, codes aimed to correct MBUs in
SRAMs should be able to correct all possible error combinations within a b-bit burst
window including all b-bits getting flipped.
Burton Code [Burton 71] was one of the early codes that dealt with MBUs by
correcting a single phased burst error or a single symbol error. But they can only correct
b-bit burst error if it falls within a single symbol which is not always guaranteed. The
most common method to address MBUs in SRAMs was to use a SEC-DED code in
tandem with word-interleaving in such a way that each MBU would affect only a single
bit for each word instead of multiple bits of a single word. This scheme made it possible
to correct MBUs and reduce the soft error rate (SER) using a SEC-DED error correcting
code [Baeg 09]. The degree of interleaving defined the amount of adjacent error
This chapter is based on the publication [Das 18b]: A. Das and N.A. Touba, "Low Complexity Burst Error
Correcting Codes to Correct MBUs in SRAMs", in Proc. of ACM Great Lakes Symposium on VLSI
(GLSVLSI), pp. 219-224, 2018. The author of this dissertation contributed to the conception of the research
problem, theoretical developments and experimental verification of the research work.
13
protection for a memory system. However, such a scheme is no longer beneficial due to
constraints such as memory aspect ratio, performance and power consumption which
limits the degree of interleaving.
Error correcting codes (ECCs) with one-step majority logic decoding have been
studied and extended to address the problem of MBUs for SRAMs. [Datta 11] proposed a
general theory on adjacent error correcting orthogonal Latin square (OLS) codes for
various burst sizes. [Reviriego 13] proposed a method to correct triple adjacent errors
with the same amount of redundancy as a double error correcting OLS code. [Reviriego
12] proposed a method to correct MBUs in SRAMs using majority logic-decodable
difference set codes. [Reviriego 15] proposed a new class for double adjacent error
correction (DAEC) SEC-DED-DAEC codes from OLS codes. Although these classes of
majority-logic decodable (MLD) codes have very low access latencies which is good for
the overall performance of the memory system, the MLD codes suffer from very high
redundancy. Other low redundancy codes have also been extensively studied for
correcting MBUs in SRAMs. [Namba 14] proposed a single-bit and double adjacent error
correcting parallel decoder for BCH codes. [Kim 07] proposed a two-dimensional coding
method to correct multiple bit errors in caches. [Argyrides 11] proposed a matrix-based
code which combined Hamming codes and Parity codes to detect and correct multiple
errors. But these codes have high area, power consumption and sometimes high decoding
latency. Recently, Hamming codes have also been studied and extended to correct MBUs
in SRAMs. These codes provide balanced trade-off between the amount of redundancy
required and the decoding complexity and decoding latency. Thus, they are an attractive
solution specific to SRAMs. [Dutta 07] first extended the traditional SEC-DED Hamming
code to correct two adjacent errors for tolerating multiple bit upsets in small memories.
[Shamshiri 10] proposed a general solution for burst (local) error correcting codes
14
working in conjunction with random (global) error correcting codes to correct MBUs in
SRAMs. [Neale 13] presented a code with the basic SEC-DED coverage as well as both
DAEC and scalable adjacent error detection (xAED) while reducing the
adjacent/nonadjacent double-bit error mis-correction probability. [Adalid 15] proposed a
SEC-DAEC-TAEC and 3-bit burst error correction codes which reduced the total number
of ones in the parity check matrix to optimize the decoder complexity and decoder
latency. For all such Hamming extended codes, the decoding procedure still relies on a
syndrome based matching method. As the burst size increases, the number of syndromes
also increases exponentially, thus increasing the decoder complexity exponentially as
well.
In this work, a general b-bit burst error correcting code is proposed which aims to
reduce the decoder complexity for a similar amount of redundancy as general burst error
correcting Hamming codes. The proposed codes correct all possible combination of
errors within a b-bit burst and achieve much better decoder complexity and decoder
latency for higher burst sizes. The sections of this chapter are organized as follows.
Section 2.2 briefly describes the concepts for burst error correcting Hamming codes
along with an analysis on syndromes. Section 2.3 describes the proposed codes, its
construction and its decoding procedures. Section 2.4 evaluates the proposed codes and
makes a comparison to the general burst error correcting Hamming codes. Finally, sec.
2.5 presents the conclusion of this work.
2.2 BURST ERROR CORRECTING HAMMING CODES
Extensive research has been done on burst error correcting Hamming codes as
discussed previously. The key concepts related to burst error correcting codes remains the
15
same. For a Hamming code to be b-bit burst error correcting, it needs to satisfy the
following conditions [Shamshiri 10]:
1. Each column of the parity check matrix is unique.
2. The bitwise XOR of any combination of columns within b-adjacent columns is unique.
Condition-1 ensures that all single errors are recognized and corrected. Condition-
2 ensures that any combination of errors within the burst size b are recognized and
corrected. The parity check matrix can be constructed in an algorithmic manner by
extending [Dutta 07] or using a Boolean SAT solver [Shamshiri 10].
2.2.1 Syndrome Analysis
The total number of syndromes for a b-bit burst error correcting code for each b-
adjacent column code is given by equation (2.1). Only syndromes affecting information
bits are considered as there is no necessary requirement to correct the parity check bits.
Since for each information bit a b-bit burst might exist, the total number of useful
syndromes for a b-bit burst error correcting code is given by equation (2.2), where k is
the total number of information bits.
Number of bits in error Location of errors
1 i
2 (i,i+1); (i,i+2); (i,i+3);(i-3,i); (i-2,i); (i-1,i)
3 (i-3,i-2,i); (i-3,i-1,i); (i-2,i-1,i); (i-2,i,i+1);
4 (i-3,i-2,i-1,i); (i-2,i-1,i,i+1); (i-1,i,i+1,i+2); (i,i+1,i+2,i+3)
Table 2.1: Possible Errors for a Data Bit Within a 4-Bit Burst Window
Equation (2.2) shows that the total number of syndromes is a linear function of
the length of information bits, and an exponential function of the burst size b. Thus, as
# 𝑆𝑦𝑛𝑑𝑟𝑜𝑚𝑒𝑠 𝐵𝑢𝑟𝑠𝑡 = 𝐶𝑏1 + 𝐶𝑏
2 + ⋯ + 𝐶𝑏𝑏 = 2𝑏 − 1 ()
# 𝑆𝑦𝑛𝑑𝑟𝑜𝑚𝑒𝑠 = 𝑘 2𝑏 − 1 ()
16
the burst size increases the number of syndromes increase exponentially thereby
increasing the decoder hardware exponentially. For a b-bit burst error, any combination
of errors within b-adjacent bits is possible. Table 2.1 shows all the possible errors that
can occur involving data bit di within a 4-bit burst window. For each type of error within
b-adjacent bits, a unique syndrome defines the pattern and location of the error.
2.2.2 Decoding Procedure
The decoding procedure for most methods is based on syndrome matching i.e.
each syndrome is mapped to particular data bit(s) being in error. Each syndrome is
mapped to their corresponding data bits and the relevant syndrome is the OR of all
corresponding syndromes. This indicates whether a bit is in error or not, and if it is in
error, the data bit is flipped. Thus, as the burst size or the block length increases, the
number of syndromes increases and so does the complexity of the decoding circuit.
2.3 PROPOSED SCHEME
For a burst size of b, the key idea of the proposed codes is to partition the parity
check matrix in a manner such that the upper b-rows of the parity check matrix is used to
compute the error pattern within the b-bit burst, and the lower (r-b) rows of the parity
check matrix are used to compute the location of the burst. To compute the error pattern,
the upper b-rows of the parity check matrix are organized in such a manner that for each
consecutive b-columns there is exactly one non-zero entry per row. This is easily done by
interleaving the 1s every b-columns. Next, the lower (r-b) rows of the parity check matrix
are constructed in such a manner that they satisfy the conditions for a b-bit burst error
correcting Hamming Code:
1. All columns of the parity check matrix are unique.
17
2. The bitwise XOR of any combination of columns within b-adjacent columns are also
unique.
Similar to burst error correcting Hamming codes, Condition-1 ensures that all
single errors are recognized and corrected. Condition-2 ensures that any combination of
errors within the burst size b are recognized and corrected. This condition is necessary
because MBUs do not necessarily flip consecutive bits, and there might be cases were the
MBU flips a few non-consecutive bits within a burst size. The general form of the parity
check matrix for a b-bit burst error correcting code has been shown in Fig. 2.1. In this
case H* refers to the lower (r-b) rows constructed as described previously.
Fig. 2.1: General form of the parity check matrix of proposed codes.
The proposed codes are also systematic by design. Thus, the encoding procedure
is the same as general Hamming codes. The parity bits are computed by XORing data-
bits based on the parity check matrix and are appended to the data-bits to form the final
codeword. Thus, no complexity is added to the encoding procedure.
2.3.1 Decoding Procedure
The decoding procedure involves two parts: the error pattern computation and the
error location computation. The error pattern is directly computed from the upper b-rows
of the parity check matrix. Since the upper b-rows of the parity check matrix are arranged
in an interleaved fashion, any error within a b-bit burst produces a syndrome equal to that
of the error pattern. The location of the burst error is computed through the lower (r-b)
rows. If an error occurs within a b-bit burst starting from location i, the syndrome is given
𝐻 = 𝐼𝑏 𝐼𝑏 ⋯ 𝐼𝑏
𝐻∗ 𝐼𝑟
18
by equation (2.3). Thus, the decoding works by considering each group of b-adjacent
columns as a single symbol and decoding on a per symbol basis. Any data bit di then is
part of b symbols of b-bit each. An example has been shown for a 4-bit burst error
correcting code in Fig. 2.2, the data but di is part of 4-bit symbols Bi-3, Bi-2, Bi-1 and Bi. A
b-bit burst error simply means an error in one of the b-bit symbols, and thus can be
computed through equation (2.4). The error pattern of the data bit is simply the syndrome
value Sα where α is the row amongst the upper b-rows for which the corresponding
column is a 1. Thus, a data bit will not be in error only if it does not satisfy equation (2.4)
for all b-bit symbols of which it is a part.
1 0 0 0 1 0 0
0 1 0 0 0 1 0
0 0 1 0 0 0 1
0 0 0 1 0 0 0
0 0 1 1 1 0 0
0 0 1 0 0 1 0
1 0 1 1 0 0 1
0 0 1 1 1 1 1
0 1 1 0 1 1 1
di-3 di-2 di-1 di di+1 di+2 di+3
......
Bi-3
Bi-2
Bi-2
Bi
Fig. 2.2: Symbols data bit di is part of for a 4-bit burst error correcting proposed code.
The above method has a clear advantage over syndrome matching based decoding
methods specifically for larger burst sizes. This is because as the burst size increases, the
𝑆1
𝑆2
⋮𝑆𝑏
𝑆𝑏+1
𝑆𝑏+2
⋮𝑆𝑟
=
𝑒𝑖
𝑒𝑖+1
⋮𝑒𝑖+𝑏−1
ℎ𝑏+1,𝑖𝑒𝑖 ⊕ ℎ𝑏+1,𝑖+1𝑒𝑖+1 ⋯⊕ ℎ𝑏+1,𝑖+𝑏−1𝑒𝑖+𝑏−1
ℎ𝑏+2,𝑖𝑒𝑖 ⊕ ℎ𝑏+2,𝑖+1𝑒𝑖+1 ⋯⊕ ℎ𝑏+2,𝑖+𝑏−1𝑒𝑖+𝑏−1
⋮ℎ𝑟 ,𝑖𝑒𝑖 ⊕ ℎ𝑟 ,𝑖+1𝑒𝑖+1 ⋯ ⊕ ℎ𝑟 ,𝑖+𝑏−1𝑒𝑖+𝑏−1
()
ℎ𝑏+𝛽 ,𝑖𝑆1 ⊕ ℎ𝑏+𝛽 ,𝑖+1𝑆2 ⋯⊕ ℎ𝑏+𝛽 ,𝑖+𝑏−1𝑆𝑏 ⊕ 𝑆𝑏+𝛽 = 0 ∀ 𝛽 ∈ {1, 2, …𝑟 − 𝑏} ()
19
number of syndromes increases exponentially, as shown in equation (2.2). Thus, the
amount of hardware needed to decode also increases exponentially in syndrome matching
based decoding. But in the proposed decoding scheme, an increase in burst size results in
addition of another term in equation (2.4). This essentially means a linear increase in the
amount of XOR, AND and OR gates in the decoder design.
2.3.2 Area and Delay Optimization
In a general Hamming Code, there are two optimization criteria that can be used
on its parity check matrix to reduce the decoder complexity and decoder latency:
1. Minimize the total number of 1’s in each row of the parity check matrix.
2. Minimize the total number of 1’s in the parity check matrix.
The first criterion minimizes the decoder latency while the second criterion
reduces the decoder complexity. A similar optimization can be done for the proposed
codes as well but with a simple modification. Considering the general parity check matrix
of the proposed codes shown in Figure 2, it can be seen that the upper b-rows of the
matrix is already sparse and has the minimum number of 1’s both on a row basis and in
total. Thus, the two optimization criteria of the Hamming code can be applied to the
lower (r-b) rows of the matrix to achieve the same optimizations. This is because,
minimizing the number of 1’s in each row simply reduces the number of bits that need to
be XORed together for the syndrome as well as for equation (2.4). Thus, the optimization
results in lesser XOR operations thereby reducing the decoder delay. And minimizing the
total number of 1’s in the lower (r-b) rows of the parity check matrix reduces the total
number of XOR gates thus reducing the complexity. Thus, optimization techniques
proposed in [Adalid 15] are orthogonal to the proposed codes and can be used in place of
burst error correcting Hamming codes.
20
2.4 EVALUATION
The burst error correcting codes for different burst sizes were constructed by
extending the codes proposed in [Shamshiri 10]. The proposed codes and the general
burst error correcting Hamming codes were synthesized on Synopsys Design Compiler
using NCSU FreePDK45 45nm library for information length k = 16, 32 and 64 bits, and
burst sizes b = 3, 4, 5, 6 and 7 bits. For the general burst error correcting Hamming codes,
the codes in [Shamshiri 10] were used to construct the parity check matrix for difference
burst sizes. But [Shamshiri 10] doesn’t explicitly define a decoding procedure and simply
mentions the decoding procedure to be the same as a Hamming Code. Thus, the
syndrome matching based decoding from [Dutta 07] was extended to the appropriate
burst sizes for the constructed parity check matrix derived from [Shamshiri 10]. The
proposed codes and the general burst error correcting Hamming codes were implemented
using Dataflow model in Verilog and errors were injected to ensure that all errors within
the burst size was corrected. Exhaustive testing was done for all different error patterns
within the burst size and for various locations of the burst errors. Table 2.2 shows the
comparison of redundancy, decoder area, number of cells in decoder and decoder latency,
between the proposed codes and the general burst error correcting Hamming codes, for
different information lengths and different burst error sizes.
2.4.1 Redundancy
Table 2 shows the comparison of redundancy for k = 16, 32 and 64 bits for
various burst error sizes b = 3, 4, 5, 6 and 7 bits. The redundancy column shows the
number of check bits required to correct the b-bit burst error for different information
lengths k. It can be seen that the redundancy in all the cases is either the same or very
close to the redundancy of the general burst error correcting Hamming codes. Thus, the
21
proposed codes add negligible amount of redundancy compared to the general burst error
correcting Hamming codes.
#Data bits
(k)
Burst
error size
(b-bits)
Burst error correcting Hamming Codes Proposed Codes
#check
bits (r)
Latency
(ns) #cells
Area
(μm2)
#check
bits (r)
Latency
(ns) #cells
Area
(μm2)
16
3 8 0.89 302 807 8 0.79 225 892
4 9 1 531 1327 9 0.93 252 977
5 11 1.31 1015 2396 12 0.9 269 963
6 13 1.33 1559 3570 13 1.23 339 1162
7 15 1.5 2795 6360 15 1.16 276 921
32
3 8 1.11 529 1480 9 1.08 450 1711
4 10 1.29 1108 2759 10 1.15 506 2020
5 12 1.59 2115 4983 12 1.29 674 2708
6 14 1.9 4354 9915 14 1.4 727 2965
7 15 2.39 7753 17482 16 1.27 650 2493
64
3 9 1.47 1056 2967 10 1.18 858 3150
4 11 1.89 2032 5171 11 1.22 1016 3657
5 13 1.8 4102 9696 13 1.4 1270 4774
6 14 2.51 8952 20296 14 1.58 1724 7022
7 16 2.52 15958 36124 16 1.77 1748 7495
Table 2.2: Comparison of redundancy, decoder latency, decoder area and decoder cell
usage between burst error correcting Hamming codes and Proposed Codes.
2.4.1 Hardware Complexity
Table 2.2 shows the comparison of decoder latency, number of cells as well as
decoder area for k = 16, 32 and 64 bits for various burst error sizes b = 3, 4, 5, 6 and 7
bits. It can be seen that the decoder area for general burst error correcting Hamming
codes increases almost exponentially with the increase in burst size. This in turn affects
the decoder latency as well since the gate depth also increases with the increase in burst
size. The decoder area for the proposed codes increases almost linearly with an increase
in burst size as shown in Figure 2.3, which plots the decoder area for both the general
burst error correcting Hamming codes and the proposed codes for different information
22
lengths k and different burst error sizes b. The decoder area is much less for the proposed
codes compared to the general burst error correcting Hamming codes for higher burst
sizes. This also leads to a slower increase in decoder latency compared to the general
burst error correcting Hamming codes. Thus, the decoder latency is also lower for higher
burst sizes compared to the general burst error correcting Hamming codes. It is also seen
that in the cases of k = 16 and 32, the decoder complexity and the decoder latency for
burst size b = 7 is lower compared to burst size b = 5, 6. This is because the H-matrix for
the proposed codes is constructed using a greedy algorithm, that selects the lowest
Hamming weight column that satisfies the conditions described in Sec. 2.3. This
inadvertently creates a very sparse H-matrix which reduces the decoder complexity.
Fig. 2.3: Comparison of decoder area for different information lengths k and different
burst sizes b.
0
5000
10000
15000
20000
25000
30000
35000
40000
3 4 5 6 7
Are
a (μ
m2 )
Burst Size (bits)
k=16, Burst error correcting Hamming codes k=16, Proposed codes
k=32, Burst error correcting Hamming codes k=32, Proposed codes
k=64, Burst error correcting Hamming codes k=64, Proposed codes
23
2.5 CONCLUSION
This research work proposes a new burst error correcting code by modifying the
parity check matrix of a general burst error correcting Hamming code along with its
decoding procedure. The proposed codes achieve significant reduction in decoder area as
well as a non-negligible reduction in the decoder latency for similar amount of
redundancy compared to a general burst error correcting Hamming Code. Also, the
proposed codes can be used in tandem with other burst error correction optimization
methods to achieve better decoder latency as well as better decoder complexity. Thus, the
proposed codes are highly suitable for correcting MBUs in SRAMs specifically for lower
technology nodes where the burst size from a single particle strike is expected to
increase.
24
Chapter 3: Layered-ECC: A Class of Double Error Correcting Codes
for High Density Memory Systems
3.1 INTRODUCTION
Soft errors are a major reliability concern for high density memories. Soft errors
can arise due to numerous mechanisms, e.g., particle strikes, marginal cells, etc. ECCs
can be used to tolerate such soft errors and overcome data corruption by adding parity or
check bits to each word. These bits are then evaluated after the read process to detect
and/or correct errors. DRAM scaling has been a challenge in recent years. But industry
has continued the scaling of DRAM to nanoscale technology nodes. Some of the current
generation of DRAMs are being manufactured as a 10 nm class of memories which pose
new challenges for manufacturing due to the smaller technology node. For DRAMs,
SEC-DED codes have traditionally been sufficient to protect against soft errors. But field
studies of more recent DRAM systems [Meza 15] have observed an increasing failure
rate with increasing DRAM chip density which necessitates the use of stronger ECCs.
In terms of flash memories, NAND flash has been very successful due to its
scalability and low cost per bit. Research has led to the development and manufacturing
of multilevel cell (MLC) NAND flash which stores multiple bits per cell [Lee 11]. Soft
errors in NAND flash memories are addressed using DEC Bose-Chaudhuri-
Hocquenghem (BCH) codes. These DEC BCH codes have complex decoding logic which
takes a high number of clock cycles to decode [Chien 64]. It is possible to parallelize the
This chapter is based on the publication [Das 19]: A. Das and N. A. Touba, "Layered-ECC: A Class of
Double Error Correcting Codes for High Density Memory Systems" in Proc. of IEEE VLSI Test Symposium
(VTS), paper 7A.2, 2019. The author of this dissertation contributed to the conception of the research
problem, theoretical developments and experimental verification of the research work.
25
decoding logic, but that incurs a significant area overhead. The high complexity is mainly
due to Galois Field (GF) operations specifically for larger bits per symbol.
In this work, a layered double error correcting scheme is proposed. These codes
are constructed using two layers of parity bits. The first layer of parity bits is used to
prune down possible error locations through analysis of the computed syndrome of the
received word. The second layer of parity bits is used to compute syndrome bits that are
to be matched with the pruned down error locations and get the final bits that are in error.
The proposed schemes are shown to have a good tradeoff between data redundancy and
decoder complexity or latency. Two different decoding procedures are proposed which
either reduce the decoder complexity or the decoder latency. Thus, these codes can be
used for various class of memories. The rest of the chapter is organized as follows.
Section 3.2 describes the existing schemes. Section 3.3 describes the proposed scheme
and the two types of decoding schemes. Section 3.4 evaluates the two decoding schemes
against the existing schemes. Section 3.5 provides a conclusion of this work.
3.2 RELATED WORK
An SEC-DED Hamming code [Hamming 50] has been traditionally used and is
still in use for protecting memories. These codes can correct a single error and rely on
syndrome matching based decoding i.e. the computed syndrome is directly matched to a
particular column of the parity check matrix and the corresponding bit is flipped. These
codes also detect all double errors but cannot correct them.
For double error correction, a binary BCH code is more prevalent. These codes
have low data redundancy but have high decoding complexity. Generally, a BCH code
has a serial decoder involving a Chien search algorithm. This algorithm iterates n times
where n is the total number of bits in the codeword to detect and correct a certain number
26
of errors. Over the years, numerous different approaches have been proposed to reduce
the time taken for decoding. Parallel Chien search has been proposed to perform p GF
multiplications in parallel [Chang 02]. This reduces the decoding time to n/p cycles
where p is the degree of parallelization. A p-parallel Chien search algorithm is shown in
Fig. 3.1. Low power architectures for a parallel Chien search has also been proposed in
[Yoo 16] using a two-step approach, which reduces power by reducing access to the
second step.
MUX
D
αp
MUX
D
α2p
MUX
D
αtp
...
Ʌ(αip)
α(p-1) α2(p-1) αt(p-1)
Ʌ(αi(p-1))
α α2 αt
Ʌ(αi)
... ...
...
Ʌ1 Ʌ2 Ʌt
Fig. 3.1: p-parallel Chien search architecture with short critical path
For DEC BCH codes, approaches which store possible double error syndromes in
a read-only memory (ROM) and evaluate the computed syndrome against the syndromes
in the ROM were proposed in [Lu 96]. [Naseer 08] proposed a direct decoding method
through syndrome matching for smaller data bit sizes. [Yoo 14] proposed a search-less
DEC BCH decoder which utilized look-up table (LUT) based computations to replace the
Chien search. But for more than a single error, all these decoding schemes either take
multiple cycles to decode using a serial decoding architecture or involve a significant
decoding latency and decoder area for parallel architectures.
27
Another class of codes that are suitable for random-access memories is the
majority logic decodable codes. Orthogonal Latin Square (OLS) codes are one of the best
examples for these types of codes [Hsiao 70]. These codes are modular in design and
have the basic parity check matrix structure as shown in equation (3.1). The submatrices
{M1, M2, … M2t} can be constructed from mutually orthogonal Latin squares. The basic
idea of the majority logic decoding scheme is that in the presence of t errors, each data bit
can be reconstructed from 2t independent sources excluding the data bit itself. Thus, there
are (2t + 1) independent sources for each data bit and in the presence of t errors, (t + 1) of
them will be uncorrupted. Thus, a majority vote will always be able to correct any t
errors. The disadvantage of these codes lies in its data redundancy required to construct
the 2t independent sources. But the decoding scheme itself is very simple and has very
low decoding latency. The DEC OLS codes have recently been used in SRAM based
FPGAs [Reviriego 16]. A second class of majority logic decoding using difference-set
codes was proposed in [Reviriego 12]. But for double error correction these codes only
support a single block size (data block size of 11, codeword size 21) and cannot be
extended to include different sizes. [Liu 18] recently used difference set codes to correct
data block sizes of 32 with some additional decoding complexity. But with only two data
block sizes, the application of such codes is limited.
Multiple cell upsets (MCUs) can also occur in DRAMs wherein adjacent bits are
affected by a single particle strike. MCUs are generally addressed by using word
interleaving such that any MCU will at most cause a single error in any word. [Das 18]
𝐻 =
𝑀1
𝑀2
𝑀3
⋮𝑀2𝑡
𝐼2𝑡𝑚
(3.1)
28
also addresses this issue by modifying Hamming codes to correct MCUs in SRAMs,
which can be extended to DRAMs as well. The focus of this work is on double random
errors only and not on MCUs.
A new class of DEC codes are proposed which are shown to have better data
redundancy at the expense of higher decoding complexity and higher decoding latency
compared to OLS codes. The proposed codes are also shown to have better decoding
latency and better decoding complexity, depending on the type of decoding logic,
compared to DEC BCH codes. But these benefits come at the cost of additional data
redundancy compared to DEC BCH codes.
3.3 PROPOSED SCHEME
The proposed scheme is made up of two layers of ECC. The first layer is based on
a single error correcting OLS code which prunes down possible error location candidates.
The second layer is constructed by adding columns to the parity check matrix such that
the below mentioned conditions are satisfied.
1. All columns added are unique and are not repeated.
2. The sum of any two columns of the complete parity check matrix should not be equal
to the sum of any other two columns of the parity check matrix.
The first condition ensures that double errors produce a syndrome which is non-
zero. The second condition basically ensures that the pruned list of possible error
locations do not produce the same syndrome thus avoiding the chance of mis-correction.
The first layer of the parity check matrix is created using m groups of m data bits each,
where m = √k. The rest of the parity check matrix is created in an algorithmic manner
satisfying the two conditions mentioned above. An example of the parity check matrix
and the corresponding syndrome bits for k = 16 is shown in Fig. 3.2. The syndrome
29
computed from the parity check matrix in this case is also divided into layers. The upper
syndrome layer has 2m bits (S0 to S7 in Fig. 3.2) which help in selecting possible double
error candidates. The lower syndrome layer (S8 to S12 in Fig. 3.2) matches the possible
error candidates to the received syndrome to locate the erroneous bits.
Fig. 3.2: Parity check matrix of proposed scheme for k=16
Consider the simple example from Fig. 3.2 where bits d0 and d7 are in error. The
corresponding syndrome bits for this case is shown in equation (3.2). Considering the
syndrome bits S0 through S3, S0 = 1 suggests that one of the bits in amongst d0, d1, d2 and
d3 is in error. Similarly, S1 = 1 suggests that one of the bits amongst d4, d5, d6 and d7 is
also in error. Now, considering the syndrome bits S4 through S7, S4 = 1 narrows down the
possibility to either d0 or d4. Similarly, S7 = 1 narrows down the second error’s possibility
to d3 or d7. Thus, from the upper layer of syndrome bits, we narrowed the set of suspect
location pairs to {d0, d7} and {d3, d4}. We can now simply compute the XOR of both
pairs and match it against the second layer of syndromes. It can be easily verified that d0
⊕ d7 = S8:12, which means that bits d0 and d7 have flipped.
𝑑0𝑑1𝑑2𝑑3𝑑4𝑑5𝑑6𝑑7𝑑8𝑑9𝑑10𝑑11𝑑12𝑑13𝑑14𝑑15𝑝0𝑝1𝑝2𝑝3𝑝4𝑝5𝑝6𝑝7𝑝8𝑝9𝑝10𝑝11𝑝12
1 1 1 1 11 1 1 1 1
1 1 1 1 11 1 1 1 1
1 1 1 1 11 1 1 1 1
1 1 1 1 11 1 1 1 1
1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1
1 1 1 1 1 1 1
→
𝑆0
𝑆1
𝑆2
𝑆3
𝑆4
𝑆5
𝑆6
𝑆7
𝑆8
𝑆9
𝑆10
𝑆11
𝑆12
𝑆0 𝑆1 𝑆2 𝑆3 𝑆4 𝑆5 𝑆6 𝑆7 𝑆8 𝑆9 𝑆10𝑆11𝑆12
1 1 0 0 1 0 0 1 1 0 0 0 1 (3.2)
30
The encoding procedure of the proposed codes is a one-step low latency
procedure. Each parity bit can be computed in parallel by XORing all the data bits which
have corresponding 1’s in the row of the parity check matrix. The codeword can be
formed by appending the parity bits to the received data bits and the codeword can then
be stored in memory. For the decoding procedure, there are 3 major cases that need to be
considered. These include the cases of no error, a single error, and a double error in the
word. These three cases can be deciphered using a combination of the upper layer of 2m
syndrome bits as given by the below mentioned equations (3.3-3.6).
GXOR1
GXOR2
Single Error
Double Error Pattern
Generator
Single Error Pattern
Generator
Syn
dro
me
S0
:r-1
Co
lum
ns
C0
:k-1
e0:k-1
e0:k-1
1
0
GOR1
GOR2
No Error
e0:k-1
Double_parity_error_b
Fig. 3.3: Block diagram of proposed decoding logic
For single errors, since the upper 2m rows in the parity check matrix can also be
used as a single error correcting OLS code, a simple majority voting logic can be used to
correct single errors. For double errors, we propose a double error pattern generator
which gets triggered for double error cases. The block diagram of the proposed decoding
𝐺𝑂𝑅1 = 𝑆0 𝑆1 𝑆2 … 𝑆𝑚−2 | 𝑆𝑚−1 (3.3)
𝐺𝑂𝑅2 = 𝑆𝑚 𝑆𝑚+1 𝑆𝑚+2 … 𝑆2𝑚−2 | 𝑆2𝑚−1 (3.4)
𝐺𝑋𝑂𝑅1 = 𝑆0 ⊕ 𝑆1 ⊕ 𝑆2 … 𝑆𝑚−2 ⊕ 𝑆𝑚−1 (3.5)
𝐺𝑋𝑂𝑅2 = 𝑆𝑚 ⊕ 𝑆𝑚+1 ⊕ 𝑆𝑚+2 … 𝑆2𝑚−2 ⊕ 𝑆2𝑚−1 (3.6)
31
logic is shown in Fig. 3.3. Parity bits are not corrected since there is no specific need for
them. The different cases for the decoding logic are as follows:
1. GOR1 = GOR2 = 0: No error.
2. GXOR1 = GXOR2 = 1: This is only possible in case of a single error in one of the data
bits. For this case, a simple majority voting logic can be used.
3. GXOR1 = 1; GXOR2 = 0: This is possible either in case of a single parity error or a
combination of data bit error and a parity bit error. The decoding logic is the same for
either case. In this case, exactly one bit in a m-bit group (e.g., d0 to d3 is one group in Fig.
3.2) is in error and each column from the group is then matched to the lower syndrome to
get the correct error location.
4. GXOR1 = 0; GXOR2 = 1: Similar to case-3, this is also possible for either a single
parity bit error or a combination of single data bit error and a parity bit error. For this
case, a specific column number (whichever of the syndrome bits Sm through S2m-1 is 1) of
each group can create this syndrome. The column from each group is matched to the
lower layer syndrome in this case.
5. GXOR1 = 0; GXOR2 = 0: This is a case of a double error with both the errors in the
data bits. This involves 3 more cases which are distinguished as described below.
5.1 GOR1 = 1; GOR2 = 0: This indicates that two syndrome bits among S0
through Sm-1 are 1 while Sm:2m-1 = 0. Thus, m column pairs from each of the two groups
indicated by S0:m-1 are matched to the lower layer syndrome bits to get the correct pair
that is in error.
5.2 GOR1 = 0; GOR2 = 1: This indicates that two syndrome bits among Sm
through S2m-1 are 1 while S0:m-1 = 0. Thus, column pairs indicated by Sm:2m-1 from each of
the m groups are matched to the lower layer syndrome bits to get the erroneous error pair.
32
5.3 GOR1 = 1; GOR2 = 1: This indicates errors in distinct groups. Based on the
syndrome bits S0:m-1, the groups are narrowed down. Then, based on the syndrome bits
Sm:2m-1, specific column pairs are found. Thus, there are exactly two pairs of columns that
need to be compared to the lower layer syndrome bits.
An example of each of the above cases for k = 16 with the parity check matrix in
Fig. 3.2, the corresponding syndrome bits S0:2m-1 and the possible errors or pairs of errors
has been shown in Table 3.1. The worst-case possibility for all the cases is comparing a
combination of m pairs of syndromes to the lower layer syndrome bits. Based on the
cases above, there are two types of decoding procedures that can be followed. The first is
a low latency decoding scheme, which enumerates all the cases above and is based on
syndrome matching for each case. The second is a low complexity decoding scheme
which instead of operating on individual columns operates on the index of columns
instead. This lowers the complexity of the decoding logic as well as the number of cycles
needed for decoding compared to a serial decoding BCH scheme.
Syndrome Bits S0:7 Possible (pairs of) error candidates
(data bits only, parity bits ignored)
00000000 No error
00010001 Single data error (majority vote)
00000001 Possible single error in {d3, d7, d11, d15}
00100000 Possible single error in {d8, d9, d10, d11}
10000011 Possible single error in {d2, d3}
11000001 Possible single error in {d3, d7}
11000000 {(d0, d4), (d1, d5), (d2, d6), (d3, d7)}
00000110 {(d1, d2), (d5, d6), (d9, d10), (d13, d14)}
11000011 {(d2, d7), (d3, d6)}
Table 3.1: Example of syndrome values and error candidates for different error types
Apart from the above cases, 2 additional check bits are required to distinguish
between a single error and a double error in parity. The two check bits basically compute
the parity of each group of parities to detect errors in the parity bits. In theory, it is also
33
possible to extend these codes to multiple bits per symbol and construct a symbol based
layered ECC. This can be useful for emerging multilevel cell memories or even for
double byte error correction.
3.3.1 Low Latency Decoding
The low latency decoding procedure involves a syndrome matching based
decoding. The upper layer syndromes S0:2m-1 are directly enumerated and depending on
these syndrome values, a certain number of combination of pairs of syndromes are
matched to the lower layer of syndromes. The critical path in this case depends on the
selection of a particular combination of the upper layer of syndromes. The total number
of relevant syndromes in a double error correcting code [Naseer 08] is given by equation
(3.7), where n is the total number of bits in the codeword and k is the number of data bits.
Comparatively, the low latency decoding procedure has fewer syndromes to be
enumerated as shown in equation (3.8). The comparison of the total number of
syndromes for different data bit sizes has been shown in Fig. 3.4.
As the number of data bits increases, the number of syndromes goes up
considerably. This causes a considerable increase in the decoding complexity. But, since
the syndrome matching is done in parallel, the rise in decoding latency is much slower.
This type of decoding method is suitable for mostly random-access memories which need
low decoding latency and high throughput.
#𝑠𝑦𝑛𝑑𝑟𝑜𝑚𝑒𝑠 = 𝑘𝑛 (3.7)
#𝑠𝑦𝑛𝑑𝑟𝑜𝑚𝑒𝑠 = 2𝑚( 𝐶1)𝑚 + 2𝑚( 𝐶2) + 2( 𝐶2𝑚 )2𝑚 (3.8)
34
Fig. 3.4: Comparison of number of syndromes for different data bit sizes between
proposed scheme and a DEC BCH code
3.3.2 Low Complexity Decoding
An alternative decoding procedure is the low complexity decoding procedure.
Compared to the low latency decoding, this procedure operates on the indices of data bits
instead of the columns of parity check matrix. Based on the different cases described
previously, the first index or pair of indices is computed. Also, based on the upper layer
of syndromes an addition factor is computed. This addition factor basically constructs all
other m possible error indices by adding to the previous index or pair of indices. For the
case of a double error in separate groups i.e. case-5.3, the 2 possible pairs of indices are
directly assigned. The m possible error indices can be computed in m clock cycles which
lowers the decoding time considerably. Once all the indices are computed, the
corresponding columns are then matched with the lower layer of syndromes to get the
final error location or pair of locations. Fig. 3.5 shows the partial block diagram of a
double error pattern generator for the low complexity serial decoding procedure.
0
2
4
6
8
10
12
0 100 200 300 400 500 600 700 800 900 1000 1100
#Syn
dro
mes
(x
10k)
Number of data bits k
DEC BCH codes Proposed Codes
35
MU
X
Init
ial I
nd
ex
Co
mp
utat
ion
S0:2m-1
MU
X
D++ D ge
tCo
lum
n
S2m:r-1
e ie m
+i
factor
Index[m+i]
Index[i]
Fig. 3.5: Block diagram of error pattern generation using the serial low complexity
decoding procedure
3.4 EVALUATION
The proposed scheme was implemented using Verilog for both the low latency
and low complexity decoder versions for different data bit sizes. The codes were
synthesized using the NCSU FreePDK45 45nm library and Synopsys Design Compiler.
OLS codes were also implemented and synthesized for different data bit sizes. A double
error correcting BCH code with syndrome matching based decoding [Naseer 08] was
implemented and synthesized as well. The proposed decoding schemes were compared to
the existing schemes in terms of decoder complexity, decoder latency, total dynamic
power consumption and data redundancy. Table 3.2 shows the comparison of the low
latency decoding scheme with OLS codes and the syndrome matching based DEC BCH
codes [Naseer 08]. The redundancy or number of check bits is given by r. As seen in
Table 3.2, OLS codes have the lowest decoder latency but it comes at the expense of very
high data redundancy. DEC BCH codes on the other hand have very low redundancy but
have considerably higher decoder latency instead. The proposed low latency decoding
scheme balances between the two existing schemes. The data redundancy of the proposed
scheme is much less than OLS codes. Similarly, the decoding latency and dynamic power
consumption of the proposed scheme is much lower than the BCH codes. The decoder
36
area of the proposed scheme compared to the existing BCH codes is also comparatively
less since it uses lesser number of syndromes.
The scheme in [Wilkerson 10] was also implemented and synthesized to make a
comparison to the proposed low complexity serial decoder. For the DEC BCH code, the
error location computation was done via direct error location polynomial computation i.e.
the error location polynomial coefficients were computed directly. Table 3.3 shows the
comparison of the low complexity decoding scheme with the existing decoding scheme
of BCH codes in [Wilkerson 10]. The additional check-bit compared to BCH codes
comes from the computation of a parity bit. The decoding scheme is a one-step decoding
procedure for single errors and takes multiple clock cycles for double errors. As seen in
Table 3.3, the number of cycles taken to decode, area and power consumption for the
proposed decoder is significantly lesser. The proposed codes incur a redundancy
overhead in order to provide the benefits of lower number of decoding cycles, lesser
power consumption and lower decoder complexity.
Data
bits
OLS codes BCH codes [Naseer 08] Proposed Low Latency Decoder
#Check
bits
Area
(μm2)
Latency
(ns)
Pdyn
(mW)
#Check
bits
Area
(μm2)
Latency
(ns)
Pdyn
(mW)
#Check
bits
Area
(μm2)
Latency
(ns)
Pdyn
(mW)
16 16 647.63 0.5 0.53 10 3567.15 1.71 2.05 15 1576.38 0.98 0.77
32 28 1307.94 0.63 1.19 12 9249.43 2.17 6.15 20 5117.25 1.09 1.45
64 32 2599.92 0.78 2.98 14 27213.77 2.78 15.18 25 12205.08 1.32 2.48
128 64 5182.95 1.09 6.51 16 80312.72 3.60 23.35 34 51656.79 1.78 4.77
256 64 10280.02 1.2 15.45 18 252717.11 5.09 29.61 45 139603.61 1.60 10.39
Table 3.2: Comparison of proposed low latency decoder with existing schemes
Data bits Serial BCH Codes [Wilkerson 10] Proposed Low Complexity Serial Decoder
#Checkbits Area (μm2) Latency (cycles) Pdyn (mW) #Checkbits Area (μm2) Latency (cycles) Pdyn (mW)
16 11 2341.34 27 2.59 15 1236.61 4 1.19
32 13 3800.39 45 4.50 20 2162.53 6 1.99
64 15 6141.73 79 7.14 25 3471.88 8 2.90
128 17 10800.00 145 12.48 34 6168.48 12 4.99
256 19 18927.34 275 18.42 45 5925.38 16 4.58
512 21 36757.92 533 32.79 61 10389.83 23 7.65
Table 3.3: Comparison of proposed low complexity serial decoder with existing schemes
37
3.5 CONCLUSION
In this research work, a new class of layered double error correcting codes are
presented along with two different decoding procedures for the codes: a low latency
decoding scheme based on syndrome matching and a low complexity decoding scheme
based on error location index evaluation. The schemes are compared with existing
schemes and are shown to provide a good trade-off between data redundancy and
decoding latency or complexity depending on the type of decoder logic. The low latency
scheme can be used for high density random-access memories while flash memories can
benefit from the low complexity scheme. The low complexity serial decoding scheme
achieves orders of magnitude reduction in worst case number of decoding cycles and can
be helpful in enabling high throughput for flash memory-based systems. Thus, these
codes are able to provide a balanced data redundancy and decoder latency/complexity
tradeoff and can be used for high density memory systems.
38
Chapter 4: Limited Magnitude Error Correction using OLS Codes for
Memories with Multilevel Cells
4.1 INTRODUCTION
As discussed in Chapter 1, MLC PCM are very useful for their high density, cost-
effectiveness, non-volatility and portability. But the issue of resistance drifts in MLC
PCM causes a reliability concern. Resistance drifts causes the resistance of a PCM cell to
increase over time due to the phenomenon of structural relaxation. This causes a read
reliability degradation over time i.e. the resistance level of a cell is mis-read as some
other increased resistance level compared to the original resistance level of the cell. A
limited magnitude error simply means that the error magnitude is finite. Resistance drifts
can be thought of as a limited magnitude error model because the increase in resistance
level over time is finite and the maximum increase in resistance level is limited by time.
Limited magnitude errors have been a focus of research through many years. An
asymmetric magnitude-1 error correcting code was first proposed by using even cell
levels in [Ahlswede 02]. An error correction code for limited magnitude errors in general
was introduced in [Cassuto 10] which used a special parity mapping function and non-
binary Hamming code which causes a considerable increase in the decoding latency.
Systematic limited magnitude error correction with very low redundancy using a non-
binary Hamming code was introduced in [Klove 11]. But the codes can correct only a
single error. [Jeon 12] introduced bidirectional limited magnitude error correcting codes
using Reed-Solomon Codes. But these codes are originally intended for MLC flash
This chapter is based on the publication [Das 17]: A. Das and N. A. Touba, "Limited Magnitude Error
Correction using OLS Codes for Memories with Multilevel Cells," in Proc. of IEEE International
Conference on Computer Design (ICCD), pp. 391-394, 2017. The author of this dissertation contributed to
the conception of the research problem, theoretical developments and experimental verification of the
research work.
39
memories and suffer from high decoding complexity beyond a single error correction
spanning multiple clock cycles. Thus, this is not useful for MLC PCM since their
performance in sensitive to decoding latencies. [Namba 15a] introduced symbol error
correcting OLS codes which extended binary OLS codes to multi-bit symbols.
This research work proposes limited magnitude error correcting OLS codes,
which significantly reduce data and hardware redundancy compared to existing schemes
while still detecting limited magnitude errors. It also reduces decoder latency compared
to the other methods mentioned previously. The proposed code ensures that t errors of
limited magnitude L can be corrected with some additional hardware on top of the
decoding logic of the general OLS codes. This work also proposes a new hybrid error
correction code which combines OLS codes with a binary low redundancy code for
symmetric and bidirectional limited magnitude errors. The rest of the work is organized
as follows. Section 4.2 reviews orthogonal Latin square codes. In Sec. 4.3, the proposed
OLS codes for limited magnitude errors are described with its encoding and decoding
procedures. Section 4.4 evaluates the error correction capabilities and hardware
complexity of the proposed code. Finally, Sec. 4.5 presents the conclusion of this work.
4.2 ORTHOGONAL LATIN SQUARE CODES
OLS codes are based on Latin squares [Hsiao 70]. A Latin square of size m is a m
x m square matrix such that the rows and columns of the matrix is a permutation of the
numbers 0 to (m-1) and each number only appears once in each row or column. Two
Latin squares are orthogonal if when they are superimposed on each other produces a
unique ordered pair of elements in the new superimposed matrix. The underlying
principle of a t-error correcting OLS code is that there are (2t+1) independent sources for
re-constructing each data bit. These (2t+1) independent sources involve the data bit itself
40
and 2t parity check equations. The different data bits participating in the parity check
equations are unique in the sense that any data bit occurs at-most in one of the parity
check equations. Thus, for any number of errors ≤ t, at-most t sources are corrupted. The
remaining (t+1) sources remain uncorrupted from errors. A majority logic decoding
simply picks the binary value which occurs in maximum number of its inputs. As a result,
the majority vote of (2t+1) independent sources with t-errors still yields the correct data-
bit. OLS codes have k = m2 data bits, where m is the size of the Orthogonal Latin Square.
The number of check bits is 2tm where t is the maximum number of errors that the code
can correct. OLS codes are modular in design, which means that to correct additional
errors, adding 2m check bits for each error is sufficient.
d0
d1
p0
Maj
orit
y V
ote
r
Cor
rect
ed d
0
d2
p2
1 1 0 0 1 0 0 0
0 0 1 1 0 1 0 0
1 0 1 0 0 0 1 0
0 1 0 1 0 0 0 1
d0 d1 d2 d3 p0 p1 p2 p3
H =
Fig. 4.1: Parity check matrix and decoder logic for a SEC (8,4) OLS Code
The encoding procedure involves the computation of parity bits. This is the XOR
operation of all the data bits which are 1 in the row of the parity check matrix for which
the parity bit is 1. The decoding procedure involves the majority vote between the data bit
itself and the 2t parity check equations constructed from the rows of the parity check
matrix. Thus, the decoder for data bit di will have parity check equations from each row
of the parity check matrix for which the column di is a 1. The main advantage of the OLS
41
codes is the simplicity of the decoder circuit which makes it very useful for memories
with random accesses. The majority logic decoding circuit has very low latency thereby
increasing decoding speed and enabling faster read operations. An example of the parity
check matrix for SEC code for k = 4 (i.e. m = 2) and its decoder logic for data bit d0 has
been shown in Fig. 4.1.
4.3 PROPOSED SCHEME
Limited magnitude errors can be both asymmetric and symmetric. Symmetric
errors are errors of limited magnitude wherein the errors can occur in both directions i.e.
errors can have both positive and negative magnitude. Asymmetric limited magnitude
errors are errors limited to one direction only. A bidirectional error model also assumes
errors in both direction but can be of different magnitudes based on the direction. The
proposed limited magnitude OLS codes aim to deal with all 3 types of limited magnitude
errors and provide a general solution. The key idea is to use parity symbols made up of bp
bits per symbol. The number of bits per parity symbol is given by equation (4.1), where L
= (maximum magnitude of error – minimum magnitude of error).
Thus, if we want to consider asymmetric magnitude-3 errors, L = (0 – (-3)) = 3
and number of bits per parity symbol = 2. The number of bits per parity symbol bp are
sufficient to successfully encode and decode all possible transitions that can occur due to
limited magnitude errors. The encoding procedure is an extended version of the binary
OLS codes. Since we consider parity symbols of bp bits per symbol, we compute the
parity symbol by independently computing each parity bit from the corresponding lower
order bp data bits. These bp bits for each parity symbol are sufficient to detect and correct
𝑏𝑝 = 𝑙𝑜𝑔2(𝐿 + 1) (4.1)
42
any errors of magnitude within the range L. An example of bi-directional transitions has
been shown in Fig. 4.2.
Fig. 4.2: Bidirectional (magnitude-1 upwards and magnitude-2 downwards) error
transitions on lower order 2 bits.
Consider a 4 bits/cell PCM with 16 different resistance levels. If an asymmetric
magnitude-3 error model is considered, the number of bits per parity symbol bp = 2.
Thus, bit-0 of the data symbols are used to compute the lower order parity bit (pLSB),
while bit-1 of the data symbols are used to compute the higher order parity bit (pMSB)
using the same parity check matrix. The parity bits are then concatenated in a 4-
bits/symbol manner for storage in memory cells. This allows us to encode 2 parity
symbols of 2 bits each into a single memory cell.
OLS codes uses a majority logic decoding to compute the correct bit for each
codeword. The proposed decoding procedure is an extension of the original OLS codes.
From the codeword, the lower order bp bits from each symbol are used for the majority
voting decoder circuit. The majority voting circuit recovers the correct lower order bp bits
for each data symbol. But the limited magnitude error may change the higher order bits as
well. We use the decoded lower order bp bits and the received bits to figure out the
direction as well as the magnitude of the error for each data symbol. The decoding
complexity of the proposed decoder for OLS codes increases by a few gates for the error
magnitude computation and a simple adder. As the number of errors corrected t increases,
the number of parity checks for the majority logic decoding increases to 2t parity checks.
43
But the logic after the majority voter remains the same independent of the number of
errors being corrected.
d0,3 d0,2 d0,1 d0,0
d0
d1,3 d1,2 d1,1 d1,0
d1
d2,3 d2,2 d2,1 d2,0
d2
d3,3 d3,2 d3,1 d3,0
d3
p0,LSBp0,MSB p1,LSBp1,MSBParity Symbol PS0
p0,LSB d1,0 d2,0 d3,0 p4,LSB d4,0 d8,0 d12,0
Majority Voter
d0,0
p0,MSB d1,1 d2,1 d3,1 p4,MSB d4,1 d8,1 d12,1
Majority Voter
d0,1
Combinational Logic
d0,0
d0,1
+
d0
Corrected d0
Fig. 4.3. Example of encoder and decoder logic for a proposed scheme
The parity symbol computation for the first parity symbol p0 for k = 16, t = 1
asymmetric magnitude-3 errors and 4-bits/cell memory as well as the decoder circuit for
data symbol d0 has been shown in Fig. 4.3. OLS codes by construction are such that only
a few parity bits are used in the correction circuitry of the data bits. But if a cell
containing encoded parity symbol gets affected, it can potentially cause multiple parity
symbol errors, since multiple parity symbols are stored in each cell. Thus, the only
restriction that needs to be put is that different parity symbols belonging to the same
majority voting decoder circuit should not be adjacent to each other in the same memory
cell containing encoded parity symbol. Since the parity check matrix in an OLS code has
its parity bits in a staggered manner, such restrictions are not violated that often. In cases
where this restriction is violated, reordering of parity bits can be done or dummy bits,
which do not contribute to any of the decoding logic, can be introduced in the symbols
that violate the restriction. Since only a part of the data symbol is used for parity, the
code rate also increases. The modified code rate equation for k = m2 data bits, t errors and
bp parity bits per symbol in a memory with b bits per cell is shown in equation (4.2).
44
4.3.1 Redundancy Optimization
For symmetric and bidirectional limited magnitude errors, the maximum
magnitude in either direction can still be decoded from the lower bu bits as shown in
equation (4.3). Lu is the maximum magnitude of error in either direction. To decode the
direction of error, the higher order bit bh in position (bu +1) is sufficient. Thus, it is not
necessary to correct the higher order bit bh, a simple detection whether it has flipped or
not is sufficient. Thus, for each bh bits of the data symbol, an ECC with reduced
capability can be employed to indicate the direction of error. The task of finding out
whether an error has occurred or not in a position is done by the lower bu bits. The bh bit
gives an indication of the direction of error. If the lower bu bits do not indicate an error,
the bh bit is ignored. This code is called Hybrid code from here-on for easy distinction.
The different cases for reduced error correction capabilities are described below.
Case-1 SEC: Since there is only a single error that can be corrected, the parity can
simply be XOR of all bh bits of the data symbols in a SEC code. Thus, it can be known
that the higher order bit has flipped or not through a single parity bit.
Case-2 DEC or greater: For a code that corrects t errors, the bh bits can be
encoded with any code that corrects (t-1) errors and detects t errors. The (t-1) error
correction capability can indicate whether the bh bit has changed at any given position. If
there are t errors, then it is detected whether the bh bit at the all the error positions have
changed. The code chosen can be one with low data redundancy (e.g. BCH code) to
𝑅𝑐 =𝑘
𝑘+ 2𝑡𝑚 .𝑏𝑝
𝑏
(4.2)
𝑏𝑢 = 𝑙𝑜𝑔2(𝐿𝑢 + 1) (4.3)
45
reduce the overall number of parity symbols. But this lower redundancy comes at the
expense of a higher decoding latency.
For example, consider symmetric limited magnitude-1 errors and a TLC memory
(3-bits/cell). Consider 2 data symbols d0 = 6(110) and d1 = 4(100) amongst many other
symbols. Since limited magnitude is 1, the possible errors for both the data symbols have
been shown in Fig. 4.4.
Fig. 4.4: Symmetric magnitude-1 errors for d0 and d1
The LSB (b0) of the data symbols is used for error location using OLS. The bit b1
is used for indicating direction of error. Since it is a DEC code, SEC-DED Hamming
code is used to encode bit b1 of all the data symbols. Now, if only symbol d1 changes
from 4(100) to 3(011). Then OLS codes of LSB locates the error. The SECDED code can
point out whether bit b1 of symbol d1 has changed or not. Since in this case it has, the
correction circuitry simply adds +1 to received symbol. Now consider the case, where
both data symbols d0 and d1 change to d0 = 5(101) and d1=5(101). In this case, SECDED
Hamming can indicate that bit b1 of d0 has changed while that of d1 hasn’t. Thus, +1 is
added to d0 while -1 is added to d1.
The encoding procedure involves computing 2 sets of parity. One set is for lower
bu bits which is fed to an OLS parity generator. The other set is for the bh bit of each data
symbol which goes to the reduced capability ECC parity generator. Similarly, the
decoding procedure needs to now have two sets of decoding logic, one for OLS codes
46
and another for the reduced capability ECC. For both the proposed OLS codes and
Hybrid codes, the operations are binary. Thus, the decoding speed is faster compared to a
non-binary Hamming code, as used in [Cassuto 10], which uses GF(q) operations and is
dependent on the magnitude of error. [Cassuto 10] also uses a special parity mapping to
prevent multiple parity errors when one symbol is affected. OLS codes dot not have any
such need for a special mapping function. For bidirectional and symmetric errors, a two-
step decoding procedure to recognize miscorrected symbols and correct them in the
second step in [Cassuto 10]. The proposed codes work without any extra decoding step
regardless of the type of error.
4.4 EVALUATION
The conventional symbol error correcting OS-MLD code [Namba 15a], the
proposed OLS code and the redundancy optimized Hybrid code were synthesized on
Synopsys Design Compiler using the NCSU FreePDK45 library for k = 16, 64 and 256
SEC and DEC codes for different error magnitudes. Table 4.1 gives the comparison of
the data redundancy, number of cells used in the decoder logic and the decoder latency
for asymmetric magnitude-3 errors in 3-bits/cell and 4-bits/cell memory. Table 4.2 makes
a similar comparison for symmetric magnitude-1 errors in 3-bits/cell memory. Table 4.3
makes the comparison for symmetric magnitude-3 error in 4-bits/cell memory. Errors
were injected for each implementation to ensure that limited magnitude errors of all types
were corrected. Exhaustive testing was done for different error magnitudes, and for all
the tests the data symbols were successfully decoded to match the original message
symbols.
From Table 4.1, it can be seen that the proposed OLS codes have lower
redundancy as well as lower decoder area compared to standard OS-MLD codes. It is
47
also seen that the proposed codes have a slightly increased decoder latency. This is
because of the additional combinational logic and the adder which increases the critical
path of the decoding logic. But the number of majority voters is reduced since a few bits
from each symbol is used to construct the parity. This leads to a lesser decoder area for
the proposed codes since the area of the combinational logic and adder is less than that of
the additional XOR gates and majority voter in [Namba 15a]. Thus, the proposed codes
are able to achieve lower redundancy and decoder area at the trade-off of slightly
increased decoder latency.
Bits/cell in
memory
Error
Type(t)
Data
symbols
OSMLD [Namba 15a] Proposed OLS
#check
symbols Area (μm2)
Latency
(ns)
#check
symbols Area (μm2)
Latency
(ns)
3
t=1 (SEC)
16 8 1062.96 0.33 6 1111.30 0.58
64 16 5193.74 0.54 12 5122.88 0.81
256 32 22438.17 0.74 22 19586.24 0.89
t=2 (DEC)
16 16 2645.44 0.44 12 1936.80 0.55
64 32 12093.39 0.65 24 8941.57 0.8
256 64 52015.33 0.81 44 38982.87 0.97
4
t=1 (SEC)
16 8 1417.29 0.33 4 1224.40 0.62
64 16 6924.99 0.54 8 5573.88 0.84
256 32 22438.17 0.74 16 21360.19 0.95
t=2 (DEC)
16 16 3527.26 0.44 8 2071.49 0.71
64 32 16004.07 0.65 16 9265.86 0.8
256 64 69345.64 0.82 32 40828.16 1.02
Table 4.1: Comparison of OSMLD and proposed codes for asymmetric magnitude-3 error
Error
Type(t)
Data
Symbols
OSMLD [Namba 15a] Proposed OLS Hybrid Codes
#check
symbols
Area
(μm2)
Latency
(ns)
#check
symbols
Area
(μm2)
Latency
(ns)
#check
symbols
Area
(μm2)
Latency
(ns)
t=1 (SEC)
16 8 1062.96 0.33 6 1250.22 0.54 4 918.89 1.13
64 16 5193.74 0.54 12 5595.93 0.76 6 3959.01 2.29
256 32 22438.17 0.74 22 22748.38 0.93 12 15240.99 3.68
t=2 (DEC)
16 16 2645.44 0.44 12 2150.33 0.63 4 1562.30 1.02
64 32 12093.39 0.65 24 9654.44 0.88 14 6558.47 1.88
256 64 52015.33 0.81 44 41821.67 1.12 26 28247.17 2.07
Table 4.2: Comparison of OSMLD, proposed OLS and Hybrid codes for symmetric
magnitude-1 error in 3-bits/cell memory
From Table 4.2 and Table 4.3, it is seen that the Hybrid codes have the lowest
data redundancy amongst all three codes. This is because the Hybrid codes use a reduced
48
strength ECC for the higher order bit. It is seen that the Hybrid codes also have the
highest decoding latency but slightly lower decoder area. This is because for this
implementation of a DEC code, SEC-DED Hamming code was used to trade-off
redundancy for decoder latency. Hamming codes in general have a higher critical path
but a slightly lower decoder complexity compared to OLS codes. A SEC-DED OLS code
can instead be used to reduce the decoder latency at the expense of higher decoder area
and redundancy. For a SEC code, a simple parity was used for the higher order bits. It can
also be seen that the proposed OLS codes are better in terms of decoder area for a DEC
code and not for a SEC code. This is because the combinational logic and adder logic for
the proposed codes is simpler than a DEC majority voter but complex than an additional
SEC majority voter in the case of symmetric errors. The Hybrid codes are able to achieve
a reduced decoder area for majority of the experiments at the expense of an increased
decoder latency. Thus, the proposed codes and Hybrid codes have a better redundancy for
a trade-off of increased decoder latency.
Error
Type(t)
Data
Symbols
OSMLD [Namba 15a] Proposed OLS Hybrid Codes
#check
symbols
Area
(μm2)
Latency
(ns)
#check
symbols
Area
(μm2)
Latency
(ns)
#check
symbols
Area
(μm2)
Latency
(ns)
t=1 (SEC)
16 8 1417.29 0.33 6 1878.14 0.63 5 1556.20 0.95
64 16 6924.99 0.54 12 7792.26 0.88 9 6344.94 2.12
256 32 22438.17 0.74 24 33027.46 1.05 17 26231.52 3.38
t=2 (DEC)
16 16 3527.26 0.44 12 3255.06 0.71 10 2683.93 1.02
64 32 16004.07 0.65 24 14362.46 0.95 18 11725.93 1.44
256 64 69345.64 0.82 48 62123.12 1.19 35 49404.62 1.99
Table 4.3. Comparison of OSMLD, proposed OLS and Hybrid codes for symmetric
magnitude-3 error in 4-bits/cell memory
4.5 CONCLUSION
In this work, a technique to derive limited magnitude error correction codes from
OLS codes has been proposed. The proposed codes can correct any limited magnitude
49
error with an increase in redundancy as the magnitude of error increases. Also, a
technique to extend the OLS codes further to lower the redundancy for symmetric and
bidirectional errors were also discussed. These codes are useful for MLC PCM which
have very low read latency. The proposed codes provide balanced tradeoffs between the
amount of data redundancy, decoding complexity and decoder latency, and can be used to
increase the reliability of emerging memories like MLC PCM. Also, the low redundancy,
high decoding speed with low decoding complexity of the codes make them a viable
choice for MLC PCM wherein the performance is sensitive to access latencies.
50
Chapter 5: Systematic b-Adjacent Symbol Error Correcting Reed-
Solomon Codes with Parallel Decoding
5.1 INTRODUCTION
Section 1.1.1 presented the basics of write disturbance errors in MLC PCM. With
technology node scaling, the problem of these write disturbance errors gets highly
exacerbated [Jiang 14] specifically for super dense memories. The problem of magnetic
field coupling in STT-MRAMs as discussed in Sec. 1.2.2. [Yoon 18] showed that for
dense memory bits and lower stored energy, a magnetic field- induced coupling between
adjacent bits can cause significant change in the average retention time. As this
technology matures enabling dense multilevel cells, magnetic field-induced coupling is
expected to play a key role. These errors can typically be modeled is the form of a burst
error affecting neighboring cells in the memory. The key point to consider is that for
these memories, the performance is sensitive to memory access latencies.
Binary burst error correcting codes with fast decoding procedures [Klockmann
17] [Datta 11] might not be suitable for such an application because of limited range of
bursts that the codes can correct. Reed-Solomon (RS) codes are highly suitable for these
cases since the correction is on a symbol basis and they have very low redundancy. But
beyond single error correction, Reed-Solomon codes suffer from complex decoding
procedure which results in high decoding latency spanning multiple cycles. [Fujiwara 06]
describes various methods proposed over the years to correct single byte errors with a
parallel decoding methodology. But the main issue with single byte error correction is
This chapter is based on the publication [Das 18a]: A. Das and N. A. Touba, "Systematic b-adjacent
symbol error correcting Reed-Solomon codes with parallel decoding", in Proc. of IEEE VLSI Test
Symposium (VTS), paper 7A.1, 2018. The author of this dissertation contributed to the conception of the
research problem, theoretical developments and experimental verification of the research work.
51
that the burst errors may affect multiple bytes at a time within the burst range and are not
guaranteed to affect only a single byte. An adjacent symbol error correcting Reed-
Solomon code is also proposed in [Namba 15b]. But the codes have a high decoder
latency and decoder area since they use GF(2m) operations where m is the number of bits
per symbol. [Reviriego 13, 15] propose a method to correct double adjacent errors and
triple adjacent errors respectively for binary bits using OLS codes. These methods can be
extended [Namba 15a] to correct adjacent symbol errors as well. But these codes have
very high redundancy.
This research work proposes a methodology to correct b-adjacent symbol errors
using Reed-Solomon codes. The codes are systematic by design and can have a
maximum information symbol length k = 2m-1, where m is the number of bits per
symbol. This contrasts with the traditional Reed-Solomon codes which have a maximum
block length n = 2m-1. The codes have very low redundancy and have a parallel one-step
decoding procedure. The rest of the work is organized as follows. Section 5.2 reviews the
Reed-Solomon codes and its extensions. Section 5.3 describes the proposed b-adjacent
symbol error correcting Reed-Solomon codes in detail along with its construction and
decoding procedures. Section 5.4 evaluates the proposed codes and compares it with
existing codes in terms of hardware complexity and redundancy. Section 5.5 presents the
conclusion of this work.
5.2 REED-SOLOMON CODES
Reed-Solomon codes are a special case of non-binary BCH codes of length n =
qm-1 over GF(qm). The number of parity check symbols for RS codes is given by n – k =
2t, where t is the number of errors being corrected. For q=2, the RS code comprises of m-
bit symbols which is used to construct the code. An extension for the SEC RS code exists
52
and has the parity check matrix of the form shown in equation (5.1). It is a subclass of
Hamming type codes over GF(2m) that has 2 check symbols [Bossen 70]. The codes
consist of m-bit symbols unlike a binary Hamming code. Thus, a single error can result in
a syndrome which is either equal to a column or is a multiple of a column in the parity
check matrix. Any column in the parity check matrix can have (2m-1) multiples i.e. all
possible cases except the all-0 column. Thus, the number of possible syndromes for each
single error is (2m-1). As a result, for the case of extended RS codes, the number of
possible syndromes equals n(2m-1). This can result in huge complexity, since the number
of syndromes increases exponentially with m. Thus, to make the decoding simpler,
[Bossen 70] describes the use of companion matrix which transforms the H-matrix over
GF(2m) to binary form. The binary form of the new parity check matrix for a primitive
polynomial g(x) = x2 + x + 1, k = 3 and n = 5 is shown in equation (5.2).
A companion matrix is a m x m non-singular binary matrix defined from a
primitive polynomial g(x) of degree m [Fujiwara 06]. The syndromes from the parity
check matrix are constructed for each row in a general fashion by XORing all the bits for
which the corresponding column is a 1. Since there are only 2 rows in the parity check
matrix, the syndrome corresponding to any single symbol error will be of the form shown
in equation (5.3), i is the location of the error, ei is the error pattern, and S1, S2 are m-bit
𝐻 = 1 1 1 1 ⋯ 1 1 01 𝛼 𝛼2 𝛼3 ⋯ 𝛼𝑛−1 0 1
(5.1)
𝐻 =
1 0 1 0 1 0 1 0 0 00 1 0 1 0 1 0 1 0 01 0 0 1 1 1 0 0 1 00 1 1 1 1 0 0 0 0 1
(5.2)
𝑆1
𝑆2 =
𝑒𝑖
𝛼𝑖𝑒𝑖 (5.3)
53
symbols. Thus, for column i, αiS1 ⊕ S2 = 0, while for all other columns, αiS1 ⊕ S2 0, ꓯ
j i. The decoding procedure using companion matrix reduces the complexity of the
circuit compared to syndrome comparison decoding method of Hamming codes by
trading off decoder latency. This decoding method is only useful for single symbol error
correction. For t > 1, Reed-Solomon codes have a two-step decoding procedure which
involves two polynomials, an error locating polynomial and an error magnitude
polynomial. [Carrasco 08] describes various methods to compute both the error location
and the error magnitude for t (> 1) errors.
5.3 PROPOSED SCHEME
This section describes the proposed b-adjacent symbol error correcting Reed-
Solomon Codes. The key idea is to construct a new parity check matrix in such a manner
that the first b-rows of the parity check matrix have at-most one 1 within b-adjacent
columns in any given row. The main purpose of such a construction is to distinguish the
error magnitude for the adjacent columns in error in a single step. This type of
construction enables the error pattern of any erroneous symbol to be one of the syndrome
symbols. The lower (r-b) rows of the parity check matrix, where r is the total number of
check symbols, is constructed by using rows from the original Reed-Solomon parity
check matrix such that the following conditions are met.
1. All b-adjacent syndromes generated by XORing b-adjacent columns should be unique.
2. All syndromes for all possible combinations of columns within b-adjacent columns
should be unique.
3. If multiples of a column are used to XOR instead of the original column in the above
two cases, all syndromes thus generated should be unique.
54
Condition 1 ensures that no b-adjacent errors are miscorrected. Condition 2
ensures that any number of errors within the b-adjacent columns are not miscorrected.
The unique syndromes exactly identify which b-adjacent columns contain the errors.
Since Reed-Solomon codes correct non-binary m-bit symbols, each column can also have
different multiples. These multiples can be identified from the upper b-rows of the parity
check matrix. But to avoid mis-correction for different magnitudes of errors, condition 3
needs to be satisfied. Thus, we keep on adding rows to the bottom part of the parity check
matrix until all the conditions above are satisfied.
The proposed codes are systematic by design and have the following parameters:
maximum block length n = k + r, where the number of information symbols is k = (2m-1)
and r is the number of check symbols needed to construct the parity check matrix. The
systematic design of the proposed codes increases the speed of the encoding procedure,
since now it only involves simple XOR operation of data symbols. Also, the parity check
symbols sometimes need to be re-ordered or placed in such a manner that the conditions
above are met without any additional rows. But this placement or re-ordering of the
parity symbols in the codeword does not affect the encoding or decoding latency. An
example of the parity check matrix for a double-adjacent symbol error correcting code for
k = 7 using the companion matrix notation has been shown in equation (5.4).
𝐻 =
1 0 1 0 1 0 10 1 0 1 0 1 01 𝛼 𝛼2 𝛼3 𝛼4 𝛼5 𝛼6
1 𝛼2 𝛼4 𝛼6 𝛼 𝛼3 𝛼5
0 1 0 00 0 1 01 0 0 00 0 0 1
(5.4)
55
5.3.1 Decoding Procedure
The most prevalent decoding procedure is to simply compare all the syndromes
and based on the computed syndrome, both the error location and magnitude can be
found. But this method involves comparison of a very large number of syndromes. Also,
the number of syndromes increases linearly with the number of symbols in a codeword,
and exponentially with the number of bits per symbol. Thus, in order to reduce the
complexity of the decoder we propose a decoding method based on the companion
matrix. This method has less complexity, but the reduction in complexity comes at the
cost of decoder latency. The syndromes are computed by taking the XOR of all data bits
whose corresponding column is 1 in the binary form of the parity check matrix. The
syndromes themselves are made up of m-bit symbols. If b-adjacent symbols are in error
starting from symbol i, the syndromes then are given by equation (5.5). Moreover, it can
be derived that if b-adjacent columns or any number of columns within b-adjacent
columns are in error, then equation (5.6) is always true for the b-adjacent columns.
The implementation of equation (5.6) is simply parallel XOR gates with the
syndrome symbols as its inputs. If equation (5.6) is satisfied for all cases of β ꓯ 1 ≤ β ≤ x
and x = (r-b), then b-adjacent errors have occurred from symbol i. The error location for
each symbol is the OR of equation (5.6) for all β. A symbol di will have b different
possibilities of adjacent columns that can be in error. The final indication of whether a
𝑆1
⋮𝑆𝑏
𝐿𝑆1
⋮𝐿𝑆𝑥
=
𝑒𝑖
⋮𝑒𝑖+𝑏−1
𝑇𝑖𝑒𝑖 ⊕ 𝑇𝑖+1𝑒𝑖+1 ⊕ ⋯⊕ 𝑇𝑖+𝑏−1𝑒𝑖+𝑏−1
⋮𝑇𝑥𝑖𝑒𝑖 ⊕ 𝑇𝑥(𝑖+1)𝑒𝑖+1 ⊕ ⋯⊕ 𝑇𝑥(𝑖+𝑏−1)𝑒𝑖+𝑏−1
(5.5)
𝑇𝛽𝑖𝑆1 ⊕ 𝑇𝛽(𝑖+1)𝑆2 ⊕ ⋯⊕ 𝑇𝛽(𝑖+𝑏−1)𝑆𝑏 ⊕ 𝐿𝑆𝛽 = 0 (5.6)
56
symbol is in error or not is the AND of error location signals of all b adjacent columns it
is part of. This is because if any one of the b different possibilities is in error, then it
equates to 0 indicating that the symbol is in error. A symbol is error free if and only if all
b possibilities equate to 1. The error pattern is obtained from Sα where α is the row
number in the upper b-rows of the parity check matrix for which column i is a 1. The
error pattern is then XORed with the received message symbols to get the decoded
message symbols. Thus, the upper b rows of the syndromes are used to obtain the error
magnitudes or patterns for adjacent errors. The lower (r-b) rows of the syndromes are
used to obtain the error location. The decoding procedure is thus a parallel one step
procedure. A partial schematic of the error pattern generator in the decoding circuit has
been shown in Fig. 5.1.
+
Synd
rom
e G
ener
ator
S7
S8
S9
S10
S11
S12
S1
S2
S3
S4
+ + +
& & &
& & & & & & & & & & & &
...
e0 e1 e5 e6
+
&
& & &
e2
Fig. 5.1: Partial schematics of error pattern generator for the proposed scheme.
57
If any number of check symbols within b-adjacent symbols are in error, then there
is at-least one case of β for which equation (5.6) is not satisfied. This simply indicates
that none of the data symbols are in error. In such a case, we consider the rows of the
parity check matrix which have 1’s in the parity check columns. These rows become the
magnitude computation rows of the parity check matrix, while all the remaining rows are
used for error location. The proposed codes can provide suitable error protection against
burst symbol errors with low latency decoding. Consider m = 6 bits/symbol RS code as
an example, three 2-bits/cell MLC memory cells can be concatenated to form a symbol.
A double adjacent symbol error correction (DAsEC) scheme can protect at least 4
adjacent MLC memory cells and at most 6 adjacent MLC memory cells. Similarly, a
triple adjacent symbol error correction (TAsEC) scheme can protect at least 7 adjacent
MLC memory cells and at most 9 adjacent MLC memory cells. In general, if a symbol is
made up of c MLC cells and a b-adjacent symbol error correcting proposed Reed-
Solomon code is used then the minimum number of adjacent memory cells protected by
the proposed code is Cmin = c(b-1) + 1. Similarly, the maximum number of adjacent
memory cells being protected is Cmax = cb. This is the best possible scenario wherein all
the cells affected in a b-symbol burst lie within b-adjacent symbols.
5.4 EVALUATION
The proposed codes were synthesized on Synopsys Design Compiler using NCSU
FreePDK45 45nm library for DAsEC and TAsEC for different information symbol
lengths. Both these codes were implemented using Dataflow model in Verilog and errors
were injected to ensure that all double adjacent errors and triple adjacent errors were
corrected. Exhaustive testing was done for different magnitudes and locations of the
58
symbols. For triple adjacent errors, the codes were tested to ensure that all errors within
any 3 adjacent columns were also corrected.
[Revirieo 15] proposed a method to correct double adjacent errors using OLS
codes for binary bits by augmenting the parity check matrix and the decoding procedure
of general OLS codes. This method was extended to non-binary symbols using the
method proposed in [Namba 15a], wherein each bit in the symbol had its own
independent parity check matrix as well as its own independent decoder, to correct
double adjacent symbol errors. These double adjacent symbol error correcting OLS
(DAsEC OLS) and double adjacent symbol error correcting Reed-Solomon (DAsEC RS)
code were implemented using Dataflow model in Verilog and synthesized on Synopsys
Design Compiler using NCSU FreePDK45 45nm library in order to compare it to the
proposed DAsEC codes.
[Reviriego 13] proposed a method for correcting triple adjacent errors for binary
bits using OLS codes with the same number of check bits as a double error correcting
OLS code but with an augmented decoding procedure. This was also extended using the
procedure in [Namba 15a] to correct triple adjacent symbol errors. This triple adjacent
symbol error correcting OLS (TAsEC OLS) code was implemented using Dataflow
model in Verilog and also synthesized on Synopsys Design Compiler using NCSU
FreePDK45 45nm library to compare it to the proposed TAsEC codes. The evaluation of
the implemented codes and the proposed codes was done on the basis of area of the
decoder, decoder latency and redundancy (i.e. number of check symbols required for a
given information symbol length per codeword).
59
5.4.1 Redundancy
Table 5.1 shows the comparison of redundancy between, the DAsEC OLS codes
[Reviriego 15] [Namba 15a], the DAsEC RS codes in [Namba 15b] and the proposed
DAsEC codes for k = 8, 16, 32 and 64. k refers to the number of information symbols in
the codeword and r refers to the number of check symbols required. It is seen that the
redundancy of the proposed codes and DAsEC RS codes are the same. It is also seen that
the proposed codes have much better redundancy compared to the DAsEC OLS codes.
Table 5.2 shows the comparison between the TAsEC OLS codes [Reviriego 13] [Namba
15a] and the proposed TAsEC codes for k = 8, 16, 32 and 64. As seen from the table, the
proposed codes have a much better redundancy compared to the TAsEC OLS codes.
Bits/
Symbol k
DAsEC OLS Codes [Namba
15a] [Reviriego 15] DAsEC RS Codes [Namba 15b] Proposed Codes
#check
symbols
Area
(μm2)
Latency
(ns)
#check
symbols
Area
(μm2)
Latency
(ns)
#check
symbols
Area
(μm2)
Latency
(ns)
4 8 9 659 0.27 4 42711 3.09 4 2328 0.96
5 16 12 1634 0.31 4 165222 4.76 4 8420 1.47
6 32 21 3847 0.42 4 1314387 6.61 4 18379 1.64
7 64 24 9225 0.42 4 16563101 9.89 4 43541 2.38
Table 5.1: Redundancy, Decoder Area and Latency Comparison for DAsEC codes
Bits/Symbol k
TAsEC OLS Codes [Namba 15a]
[Reviriego 13] Proposed Codes
#check
symbols
Area
(μm2)
Latency
(ns)
#check
symbols
Area
(μm2)
Latency
(ns)
4 8 - - - 6 2943 1.09
5 16 16 6174 0.77 6 12572 1.74
6 32 26 15456 1.22 6 41045 2.25
7 64 32 37679 1.64 6 128011 3.21
Table 5.2: Redundancy, Decoder Area and Latency Comparison for TAsEC codes
5.4.2 Hardware Complexity
Table 5.1 also shows the comparisons of area of the decoder and the decoder
latency between the DAsEC OLS codes [Reviriego 15] [Namba 15a], the DAsEC RS
60
codes in [Namba 15b] and the proposed DAsEC codes for k = 8, 16, 32 and 64. The
operations involved in the DAsEC RS codes were of GF(2m), where m = number of bits
per symbol, which increases the complexity of the decoder. The complex operations
result in a much greater circuit depth compared to the proposed codes. Due to this higher
circuit depth, these codes have high decoder latency as well. The use of a companion
matrix makes all operations binary for the proposed DAsEC codes. As a result, there are
no complex operations involved for the proposed codes. The decoder circuit is comprised
mainly of XOR gates, with additional AND and OR gates. As a result, the decoder
latency and the decoder area of the proposed codes are much less compared to the
DAsEC RS codes as seen in Table 5.1. The area and decoder latency of the DAsEC OLS
codes are better than the proposed codes, because the circuit depth is higher for the
proposed codes due to the low redundancy. The DAsEC OLS codes have lesser
complexity and parallel decoding logic. This reduces the decoder latency of the DAsEC
OLS codes compared to the proposed codes. But this reduced decoder latency comes at
an expense of very high redundancy.
Table 5.2 also shows the comparison of area of the decoder and the decoder
latency between the TAsEC OLS codes [Reviriego 13] [Namba 15a] and the proposed
TAsEC codes. Similar to the DAsEC code comparison, the proposed codes have higher
complexity due to the higher circuit depth arising from the low redundancy. The low
decoding latency of the TAsEC OLS codes is due to a low complexity parallel decoding
logic but comes at an expense of very high redundancy. The proposed codes can provide
much better redundancy for a slightly higher cost of decoder latency and decoder area.
61
5.5 CONCLUSION
A b-adjacent symbol error correction for Reed-Solomon codes is proposed which
are systematic by design and have low complexity parallel one step decoding. The
proposed codes have better decoding latency and decoder area compared to existing
adjacent error correction for Reed-Solomon codes. The proposed codes are also
compared to symbol error correcting OLS codes and it is shown that the proposed codes
achieve much better redundancy, but at a cost of slightly higher decoder latency and
decoder area. The proposed codes thus provide a balanced tradeoff between the amount
of redundancy required and the decoder complexity and latency. As a result, the proposed
codes can be used to increase the reliability of latest memory technologies like MLC
PCM and STT-MRAM in lower technology nodes.
62
Chapter 6: Efficient Non-binary Hamming Codes for Limited
Magnitude Errors in MLC PCMs
6.1 INTRODUCTION
Chapter 1 discusses the recent focus of MLC PCMs due to their high density and
lower costs [Papandreou 10]. They store multiple bits per cell by configuring the memory
element so that it can exist in 2b different states, where b is the number of bits per cell.
Their ability to store multiple bits in a single cell as well as being byte-addressable makes
them an attractive alternative to DRAM solutions. But, due to this MLC property, other
problems like resistance drift arises [Li 12]. This read reliability degradation due to
resistance drifts is a major reliability concern for MLC PCMs. Apart from resistance drift
issues, MLC PCMs also suffer from write disturbance issues wherein heat dissemination
from writes to a memory cell causes another nearby memory cell’s state to change [Jiang
14]. This happens only for certain states of the memory cell i.e. for certain value patterns
and thus is data dependent.
Thus, for emerging MLC PCMs instead of bit-flips, the dominating errors are of
limited magnitude by virtue of the resistance drifts. Limited magnitude errors refer to the
shift in the state of the cell due to the drifting resistance or due to the thermal disturbance
caused by a program operation. Thus, the dominant error patterns in these memories is
not random but instead depends on the data stored in the cell, the time for which it has
been stored in the cell as well as nearby program operations. Since PCMs are byte-
addressable, the error correction schemes to be used should have very low decoder
This chapter is based on the publication [Das 18c]: A. Das and N. A. Touba, "Efficient Non-Binary
Hamming Codes for Limited Magnitude Errors in MLC PCMs", in Proc. of IEEE International Symposium
on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), pp. 1-6, 2018. The author of
this dissertation contributed to the conception of the research problem, theoretical developments and
experimental verification of the research work.
63
latency since the performance is sensitive to access latencies of the memory. Also, since
both write disturbance and resistance drifts are considered, the error correcting code
should be able to correct errors of limited magnitude in either direction i.e. symmetric
limited magnitude errors.
Chapter 4 discusses research efforts for limited magnitude errors [Ahlswede 02,
Cassuto 10]. Systematic symmetric limited magnitude error correction with very low
redundancy using non-binary Hamming code was introduced in [Klove 11]. But these
codes used decimal arithmetic operation which can be expensive both in terms of decoder
complexity and decoder latency. In Chapter 4, we discussed a limited magnitude error
correcting Orthogonal Latin Square (OLS) codes, which takes advantage of the limited
magnitude error model to reduce the data redundancy [Das 17]. Also, the use of OLS
codes results in low decoder latencies which is important for the performance of MLC
PCMs. Such codes specific to single error correction have a very high redundancy cost
which leads to low memory utilization. It is also possible to use a Gray code mapping so
that a magnitude-1 error will cause only a single bit error. This can then be corrected
using a binary SEC code. But such a scheme cannot address errors of magnitude more
than 1 since in that case more than 1-bit changes.
In this work, a limited magnitude error correcting non-binary Hamming code is
proposed which can protect more information symbols for the same amount of
redundancy as a general non-binary Hamming code. The proposed codes are shown to
have much better hardware complexity and hardware latencies compared to [Klove 11] as
well as much better overall redundancy compared to symbol based binary Hamming
codes. The rest of the chapter is organized as follows. Section 6.2 reviews the general
Hamming codes. Section 6.3 describes the proposed limited magnitude error correcting
Hamming code in detail along with its construction and decoding procedures. Section 6.4
64
compares the proposed codes with existing codes in terms of hardware complexity and
redundancy. Section 6.5 presents the conclusion of this work.
6.2 GENERAL HAMMING CODES
A (n, k) q-ary Hamming code is a linear block code and a k-dimensional linear
subspace of a n-dimensional vector space over a Galois Field GF(q) [Lin 04]. If q = 2, the
codes are referred to as binary Hamming codes. For MLC memories, q = 2b, where b is
the number of bits stored per memory cell. For our purposes, we refer to this case as
general non-binary Hamming codes from hereon. Addition operation in GF(2b) is a
simple XOR operation while multiplication operation is combinational and depends on
the primitive polynomial as well. k is the number of information symbols while n is the
total number of symbols in the codeword. The amount of redundancy r is given by the
relation r = n – k. The length of the codeword is computed using equation (6.1). A parity
check matrix for a (5, 3) 4-ary Hamming code is shown in Fig. 6.1.
Fig. 6.1: Parity Check Matrix of (5, 3) 4-ary Hamming code
We introduce the notion of major element to simplify the explanation of how to
construct the parity check matrix. Major element is defined as the leading non-zero
element in a column of the parity check matrix. We also introduce two additional terms to
ease the explanations further. A major column can be defined as the set of columns
whose major element is 1. A minor column is defined as the set of columns whose major
𝑛 = 𝑞𝑟−1
𝑞−1 (6.1)
𝐻 = 1 1 1 1 01 2 3 0 1
65
element is not 1. Fig. 6.2 highlights all the major elements and labels the major columns
and minor columns amongst all possible columns for a 2-bits/cell memory with 2 parity
check symbols i.e. r = 2. The advantage of using the notion of major element is that
instead of considering individual columns and its multiples, we can instead consider
groups of columns with the same major element as a single entity. This makes it much
easier to visualize the construction process of the parity check matrix.
1 1 1 1 2 2 2 2 3 3 3 3000
1 2 31 2 30 1 2 30 1 2 30
Major ElementsParity Check
Columns
Major Columns
Minor Columns
Fig. 6.2: Classification of columns and elements for a 2-bits/cell memory
1 1 1
1 2 3
1 1 1
4 5 6
1
7
1
0
7 7 7
7 5 2
7 7 7
1 6 4
7
3
7
0
x7
Fig. 6.3: Multiplying major columns with 7 for a 3-bits/cell memory
As an example, consider the columns with major element as 1 for a 3-bits per cell
memory. When the columns are multiplied with 7 in GF(8) with the primitive polynomial
being p(x) = x3 + x + 1, the corresponding results are shown in Fig. 6.3. Thus, we see that
all columns with major element 1 when multiplied with 7 produce all columns whose
major element is 7 and no column is repeated in the multiples. As shown in Fig. 6.1, for
general non-binary Hamming codes, the major element is always 1 since it assumes that
all error patterns are possible. A single error produces a syndrome which is either equal to
a column or a multiple of column in the parity check matrix. The error magnitude and the
data symbol corresponding to the matched column are then added to recover the correct
data symbol.
66
6.3 PROPOSED SCHEME
The underlying principle of the proposed scheme is that not all error patterns are
possible when limited magnitude errors are considered, thus making it possible to use
minor columns in the parity check matrix. Consider an MLC memory with 3-bits/cell and
limited magnitude-1 errors. Since we consider symmetric limited magnitude errors i.e.
errors can occur in either direction, all possible errors and the corresponding error
patterns have been shown in Fig. 6.4. Error patterns are computed by taking the XOR of
the original symbol and the new erroneous symbol.
We define e as the set of possible error patterns. From Fig. 6.4 it is seen that the
set of error patterns is e = {1, 3, 7}. Thus, we can insert minor columns in the parity
check matrix of the Hamming codes if it satisfies the following conditions:
1. The product of a major column and the elements in e should not result in the minor
column.
2. The product of the minor column and the set of elements in e should be distinct and
should not be equal to the product of elements in e and any other column in the parity
check matrix.
Condition 1 ensures that a limited magnitude error in any column in the parity
check matrix does not produce a syndrome equal to some other column in the parity
check matrix. Condition 2 ensures that no two columns in the parity check matrix with
either same or different limited magnitude errors produce the same syndrome.
Next, we show the construction of the parity check matrix by means of an
example. Consider a 3-bits/cell memory with limited magnitude-1 errors. From Fig. 6.4,
we see that e = {1, 3, 7}. Now the total number of available columns for r = 2 parity
check symbols is shown in Fig. 6.5(a). The major elements have been highlighted. As
67
mentioned before, the product operation in this case is a GF(8) operation with the
primitive polynomial being p(x) = x3 + x + 1.
1 2 3 4 5 6 70
1 2 3 4 5 6 7 -
1 2 3 4 5 6- 0
+1
-1
1 3 1 7 1 3 1 -
- 1 3 1 7 1 3 1
Original Symbols
Erroneous Symbols
Error Patterns
Erroneous Symbols
Error Patterns
Fig. 6.4: All possible error patterns for a 3-bits/cell memory
4
444 44 4 4 40
6
4
54
1 ... 11
0 ... 71
2 ... 22
0 ... 71
3 ... 33
0 ... 71
4 ... 44
0 ... 71
5 ... 55
0 ... 71
6 ... 66
0 ... 71
7 ... 77
0 ... 71
0 ... 00
1 ... 72(a)
2 ... 22
0 ... 71
... 44
0 ... 71
5 ... 55
0 ... 71
6 ... 66
0 ... 71
0
2(b)
0 0 0
321 4 5 6 70(c)
Fig. 6.5: (a) All possible columns for a 3-bits/cell memory and 2 parity check symbols (b)
Resultant columns after removal of major elements from e1 (c) Resultant
columns after removal of major elements from e2
Considering the product of the elements in e and the major columns, the resulting
columns are all columns whose major element belongs to the set e1, where e1 is given by
𝑒1 = 1, 3, 7 × 1 = {1, 3, 7} (6.2)
𝑒2 = 1, 3, 7 × 2 = {2, 6, 5} (6.3)
𝑒4 = 1, 3, 7 × 4 = {4, 7, 1} (6.4)
68
equation (6.2). These columns cannot be used in the parity check matrix as per
condition-1 and are removed from the list of available columns. Of the remaining
columns shown in Fig. 6.5(b), we consider the next major element (which in this case =
2). The elements in e are then multiplied with the major element 2, and the resultant
columns are all columns whose major element belongs to the set e2, where e2 is given by
equation (6.3). Since the elements in e2 are distinct and do not belong to e2, condition-2 is
satisfied as well and all columns with major element 2 can be included in the parity check
matrix. The resultant columns with major elements belonging to e2 are removed from the
list of available columns. The only columns that remain is the list of available columns
are columns with major element 4 as shown in Fig. 6.5(c). If we multiply elements in e
with major element 4, we get the columns whose major element belongs to the set e4
given by equation (6.4). Since the elements in e4 also belong to the set e1, it violates
condition-2 and thus cannot be included and is removed from the list of available
columns. The list of available columns is now empty, and the final parity check matrix is
shown in Fig. 6.6. An algorithm to construct the parity check matrix for a given number
of parity check symbols is shown in Fig. 6.7.
Fig. 6.6: Parity check matrix of a limited magnitude-1 error correcting code for a 3-bits
per cell memory
6.3.1 Syndrome Analysis
Considering an MLC memory with b bits/cell, general non-binary Hamming
Codes consider all possible error patterns. It is not a necessary requirement to correct
𝐻 = 0 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 02 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1
69
errors in the parity symbol and only syndromes of information symbols can be
considered. Thus, the total number of possible syndromes of the parity check matrix is
given by equation (6.5).
Inputs:allCols = sorted set of all possible columns except all-0 columne = set of error patterns
Outputs:HCol = columns for parity check matrixsyn = set of all possible valid syndromes
syn = {};HCol = {};totalCols = total #columns in allCols;
for (i=0; i<totalCols; i++){
col = allCols[i];validCol = 1;vSyn = {};
Error_Pattern_Loop: for each element x in e {
mCol = x * col; // GF multiplication
if (mCol is present in syn) {
validCol = 0;exit Error_Pattern_Loop;
}else
add mCol to vSyn;}
if (validCol = 1) // This means that all syndromes generated are unique{
for each sydrome s in vSynadd s to syn;
add col to HCol; }
}
Fig.6.7: Algorithm for construction of the parity check matrix of the proposed scheme
But for the proposed scheme with limited magnitude errors, since the error model
itself assumes that not all possible error patterns are possible, the total number of
syndromes of the parity check matrix depends on the set of all possible error patterns e
and is given by equation (6.6), where |e| is the cardinality of e. Depending on the
magnitudes of errors being corrected, |e| is generally much smaller than (2b-1). Thus, for
cases of limited magnitude errors, the total number of syndromes is lesser compared to
general non-binary Hamming codes. Since the decoding of Hamming codes involve a
syndrome matching circuit wherein the received syndrome is matched to either a column
70
or a multiple of a column, the proposed scheme for limited magnitude error reduces the
decoder complexity simply by reducing the number of syndromes to be matched per
column of the parity check matrix. Fig. 6.8 shows a comparison between the number of
syndromes between the two cases for different magnitudes of error l and different
information lengths k. LM refers to number of syndromes for proposed codes correcting
limited magnitude errors. NB refers to the general non-binary Hamming codes.
Fig. 6.8: Comparison of #syndromes between limited magnitude error correcting
Hamming codes and non-binary general Hamming codes
6.3.2 Companion Matrix
Finite field i.e. GF(2b) operations are expensive in terms of hardware. They can
cause increase in both the encoder and decoder complexity as the number of bits per cell
b increases. [Bossen 70] introduced the use of companion matrices to keep the
#𝑠𝑦𝑛𝑑𝑟𝑜𝑚𝑒𝑠𝑁𝐵 = 𝑘(2𝑏 − 1) ()
#𝑆𝑦𝑛𝑑𝑟𝑜𝑚𝑒𝑠𝐿𝑀 = 𝑘( 𝑒 ) ()
19
2 38
4 76
8
51
2
10
24
20
48
25
6 51
2
10
24
44
8
89
6
17
92
10
24
20
48
40
96
64 128 256
#Syn
dro
me
s
Information length k
LM(q=8, l=1) NB(q=8) LM(q=16, l=1) LM(q=16, l=2) NB(q=16)
71
complexity low by replacing finite field multiplication with binary matrix multiplications
or simple XOR operations. This trades off decoder complexity for a slight increase in
decoder latency. Companion matrices are b x b non-singular matrices derived from the
root α of the primitive polynomial used. Thus, both the encoder and decoder circuit use
companion matrices to reduce the overall hardware complexity.
6.3.3 Encoder
The encoder circuit is similar to that of a general non-binary Hamming code with
the parity check symbols being computed from the information symbols and then
appending them to the information symbols to form the final codeword. The only
difference is that the proposed scheme uses a companion matrix instead of non-binary
alphabets. Thus, all finite field multiplications are replaced by XOR operations. The
binary form of the parity check matrix shown in Fig. 6.6 is shown partially in Fig. 6.9.
Thus, the proposed scheme has a systematic design which results in a low complexity
encoder.
6.3.4 Decoder
The decoder circuit is also similar to that of a general non-binary Hamming code.
The syndromes are computed based on the binary parity check matrix derived using the
companion matrices. Each syndrome symbol consists of b bits. The syndromes are
matched to both individual columns, and their multiples, of the parity check matrix. The
corresponding error magnitude is then XORed to the received symbol based on the
syndrome computation. Fig. 6.10 shows the partial decoding circuit generating the error
magnitudes for the first 2 symbols of the parity check matrix from Fig. 6.9.
72
Fig. 6.9: Binary form of partial parity check matrix of Fig. 6.6
Fig. 6.10: Error Magnitude Computation for first 2 symbols of parity check matrix from
Fig. 6.9
6.4 EVALUATION
[Klove 11] proposed systematic, single limited magnitude error correcting codes
which used decimal arithmetic operations. These codes were implemented and simulated
for various information symbol lengths k, different bits per symbol b and different
magnitudes of errors l. All error magnitudes considered here are symmetric. A symbol-
based binary (SyB) Hamming code was also implemented similar to the idea proposed in
[Namba 15a]. In this case, each bit of the symbol had its own independent parity check
matrix. For the decoding procedure, the syndromes were also a concatenated version of
73
the syndromes of each individual bit in a symbol. Then for each bit in the symbol, the
decoding was done separately and similar to a binary Hamming code’s syndrome-based
matching. Thus, each bit position in the syndrome symbol was matched to a particular
column and the bit position of the corresponding information symbol was flipped.
The proposed codes as well as the above-mentioned codes used for comparison
were implemented using the Dataflow model in Verilog. Exhaustive functional testing
was done for different magnitudes of error. The codes were then synthesized in Synopsys
Design Compiler using NCSU FreePDK45 45nm library. Evaluation was done on the
basis of hardware complexity like area and number of cells, as well as on latency for both
the encoding circuit and the decoding circuit. Additionally, redundancy was also used as
a measure of evaluation between the schemes. In the comparison, b refers to the number
of bits stored per memory cell, l refers to the maximum magnitude of error, k refers to the
number of message symbols in the codeword and r refers to the number of check symbols
in the codeword. Latency in computed in nanoseconds (ns) while area is measured in
square micron (μm2) for the designs. Also, for the different magnitudes of error,
symmetric limited magnitude error model is assumed i.e. errors of maximum magnitude l
can occur in either direction.
Table 6.1 shows the comparison of the encoder latency, encoder area and number
of check symbols in the encoder. In general, the proposed scheme achieves much better
encoder latency as well as encoder area compared to the codes in [Klove 11]. Also, the
encoder area is very similar for the proposed scheme and the SyB Hamming codes. The
encoder latency is the least for SyB Hamming codes since it has more independent
parallel operations per bit in the symbol which leads to lesser circuit depth compared to
the proposed scheme.
74
b l k
Arithmetic Codes [Klove 11] SyB Hamming [Namba 15a] Proposed Scheme
#check
Symbols
Latency
(ns)
Area
(μm2)
#check
Symbols
Latency
(ns)
Area
(μm2)
#check
Symbols
Latency
(ns)
Area
(μm2)
3 1
64 3 2.45 4474 7 0.72 1664 3 1.21 1642
128 3 3.023 9021 8 1.09 3399 3 1.32 2958
256 4 3.79 19908 9 1.65 6884 4 1.94 6308
4 1
64 2 2.66 5952 7 0.74 2223 3 1.16 2086
128 3 3.11 13031 8 1.00 4527 3 1.71 4425
256 3 3.94 30627 9 1.68 9153 3 2.19 9130
4 2
64 3 2.12 8249 7 0.74 2223 3 1.41 2330
128 3 2.69 18431 8 1.00 4527 3 1.98 4700
256 4 3.44 36853 9 1.68 9153 3 2.14 9381
Table 6.1: Comparison of Encoding circuit between the different schemes
b l k
Arithmetic Codes [Klove 11] SyB Hamming [Namba 15a] Proposed Scheme
#check
Symbols
Latency
(ns)
Area
(μm2)
#check
Symbols
Latency
(ns)
Area
(μm2)
#check
Symbols
Laten
cy
(ns)
Area
(μm2)
3 1
64 3 2.82 8210 7 1.04 3686 3 1.66 3690
128 3 3.51 16270 8 1.46 7104 3 2.06 7030
256 4 4.54 35324 9 2.03 14077 4 2.88 13922
4 1
64 2 3.21 10549 7 1.06 4898 3 1.85 4794
128 3 3.96 22028 8 1.4 9435 3 2.51 9649
256 3 4.73 48428 9 1.95 18607 3 3.6 19057
4 2
64 3 3.61 14209 7 1.06 4898 3 2.52 6412
128 3 4.77 29995 8 1.4 9435 3 3.32 12728
256 4 5.08 60129 9 1.95 18607 3 3.13 24435
Table 6.2: Comparison of Decoding circuit between the different schemes
Table 6.2 shows the comparison of redundancy i.e. number of check symbols,
decoder latency and decoder area in the decoder design. The proposed scheme achieves
better decoder latency and decoder area for similar redundancy as the codes in [Klove
11]. Also, compared to the SyB Hamming codes, the proposed codes achieve much better
redundancy and a slight increase in decoder latency for similar decoder area. Thus, the
proposed codes achieve a balanced trade-off between decoder complexity and
redundancy compared to the other two existing schemes.
75
6.5 CONCLUSION
A new efficient non-binary Hamming code is proposed for limited magnitude
errors with the parity check matrix dependent on the possible error patterns. The
proposed codes can correct a single limited magnitude error in a word. The proposed
codes are able to achieve better decoder latency and hardware complexity compared to
arithmetic codes correcting limited magnitude errors for the same amount of redundancy.
The proposed codes also achieve better redundancy compared to binary Hamming codes
when extended to correct symbols. Thus, these codes are useful for protecting newer
emerging non-volatile main memories like MLC PCMs which suffer from read reliability
degradation due to resistance drifts.
76
Chapter 7: Summary and Future Work
7.1 SUMMARY
In this dissertation, various methods to detect and correct online errors in
emerging memories and high-density memory systems were devised and explored. As
technology scales further, conventional memories like SRAM and DRAM become denser
which creates new reliability challenges and higher soft error rates. Flash based memories
lag behind in terms of throughput in order to support stronger ECC schemes. Also, as
conventional memories fall behind in terms of power consumption and performance,
emerging memory technologies are being researched to provide an alternative solution.
But these newer memories have different reliability issues and different error models for
soft errors. Thus, the conventional error correcting codes are not efficient enough to
provide the required error correction capability without compromising on performance or
hardware overhead. The goal of this research is to develop efficient codes suitable for
these new reliability challenges and error mechanisms whilst reducing hardware overhead
and with minimizing impact to the performance of the memory.
In chapter 2, a new code to correct multiple bit upsets in SRAMs is proposed.
This new scheme leads to only a linear increase in decoder complexity as the number of
adjacent bits being upset increases. This provides a huge benefit against existing schemes
wherein the decoder complexity rises exponentially to the number of adjacent bits in
error. It is also shown that this benefit in decoder complexity comes at a slight or almost
no additional overhead in terms of redundancy. Chapter 3 addresses the issue of higher
error rates in both DRAMs and flash-based memories while providing adequate
performance as well. A scheme to correct double errors is proposed while maintaining
low decoder latency in order to enable high throughput. Two new decoding schemes are
77
proposed: first, a low latency decoding scheme for DRAMs and second, a low
complexity serial decoding scheme for flash-based memories. It is shown that the low
latency decoding scheme is able to reduce both the decoder latency and decoder
complexity at the expense of additional check-bits. It is also shown that the low
complexity serial decoding scheme reduces the number of decoding cycles by orders of
magnitude compared to existing schemes, thus enabling high performance. This scheme
also reduces the decoder complexity, but these benefits are achieved at the expense of
higher redundancy.
In chapter 4, a modified OLS code was proposed to address resistance drifts in
MLC PCMs. It was shown that these codes provide a better code rate and lesser decoder
hardware overhead compared to existing solutions. This benefit came with a slight trade-
off f decoder latency in order to achieve a better code-rate. Chapter 5 proposes a new b-
adjacent error correcting code based on Reed-Solomon codes to mitigate burst errors in
super dense memories. These codes are highly useful for mitigating write disturbance
errors in super dense MLC PCM and magnetic field coupling errors in dense STT-
MRAMs. It was shown that the proposed codes provided a balanced trade-off between
the code-rate, decoder hardware overhead as well as decoder latency compared to
existing schemes. The one-step decoding procedure enables high speed decoding for
these memories thus enabling better performance as well.
Finally, chapter 6 proposes a new Hamming code-based scheme for limited
magnitude errors. Limited magnitude errors are bounded by design. Thus, the code-space
of non-binary Hamming codes is exploited to protect a greater number of data symbols
for the same amount of redundancy. It is shown that the proposed scheme achieves a
good balanced trade-off between both decoding and encoding area, latency as well as
78
redundancy compared to existing schemes. This makes it suitable for emerging memory
designs suffering from limited magnitude errors.
7.2 FUTURE WORK
As memories grow more complex and new memory technologies are researched
and explored, the reliability challenges and error mechanisms specifically with respect to
soft errors change with the technology as well. This creates huge opportunities to develop
error correcting codes which are specific and highly efficient based on the design and
memory technology. For the work in chapter 2, one possible extension would be to
enable double error detection along with adjacent error correction. This would enable the
memory to generate either an interrupt or invalidate a memory line when random double
error occurs, thus preventing erroneous data from being used. The scheme in chapter 3
along with the serial decoding scheme can also possibly be extended to correct triple
errors and beyond such that they are able to enable high throughput for various error
prone applications thus increasing their overall reliability.
As the technology with respect to emerging memories and their various
applications mature, it is also possible to optimize the codes in chapters 4, 5 and 6 further
depending on the reliability challenges faced. New technologies like resistive RAMs
(RRAM) are researched for their use in neural networks. The RRAMs are used to store
the weights and also perform matrix multiplication in a single step in the analog domain.
Similar to PCM, RRAMs also exhibit limited magnitude erroneous behavior. A possible
field of exploration would be to optimize the codes presented in this work for limited
magnitude errors specifically in neural networks wherein RRAMs are not used like a
typical memory and online errors would affect the result of matrix multiplication.
79
Bibliography
[Adalid 15] L. S. Adalid, P. Reviriego, P. Gil, S. Pontarelli and J. A. Maestro, “MCU
Tolerance in SRAMs Through Low-Redundancy Triple Adjacent Error
Correction,” in IEEE Transactions on Very Large-Scale Integration (VLSI)
Systems, vol. 23, no. 10, pp. 2332-2336, Oct. 2015.
[Ahlswede 02] R. Ahlswede, H. Aydinian, and L. Khachatrian, “Unidirectional error
control codes and related combinatorial problems,” in Proc. of the Eighth
International Workshop on Algebraic and Combinatorial Coding Theory (ACCT-
8), pp.6-9, 2002.
[Argyrides 11] C. Argyrides, D. Pradhan and T. Kocak, “Matrix codes for reliable and
costefficient memory chips,” in IEEE Transactions on Very Large-Scale
Integration (VLSI) Systems, vol. 19, no. 3, pp. 420-428, Mar. 2011.
[Baeg 09] S. Baeg, S. Wen and R. Wong, “SRAM Interleaving Distance Selection with a
Soft Error Failure Model,” in IEEE Transactions on Nuclear Science, vol. 56, no.
4, pp. 2111-2118, Aug. 2009.
[Baumann 05] R. Baumann, “Soft errors in advanced computer systems,” in IEEE Design
& Test of Computers, vol. 22, no. 3, pp. 258-266, May-Jun. 2005.
[Bossen 70] D. C. Bossen, “b-Adjacent Error Correction,” in IBM Journal of Research
and Development, vol. 14, no. 4, pp. 402-408, Jul. 1970.
[Burton 71] H. O. Burton, “Some asymptotically optimal burst-correction codes and their
relation to single-error-correcting Reed-Solomon codes,” in IEEE Transactions
on Information Theory, vol. 17, no. 1, pp. 92–95, Jan. 1971.
[Carrasco 08] R. A. Carrasco and M. Johnston, Non-binary Error Control Coding for
Wireless Communication and Data Storage. Chichester, West Sussex, UK: Wiley,
2008.
[Cassuto 10] Y. Cassuto, M. Schwartz, V. Bohossian, and J. Bruck, “Codes for
Asymmetric Limited-Magnitude Errors with Application to Multilevel Flash
Memories”, in IEEE Transactions on Information Theory, vol. 56, no. 4, pp.
1582-1595, Apr. 2010.
[Chang 02] H. C. Chang, C. C. Lin and C. Y. Lee, "A low power Reed-Solomon decoder
for STM-16 optical communications", in Proc. of IEEE Asia-Pacific Conference
on ASIC, pp. 351-354, 2002.
[Chappert 07] C. Chappert, A. Fert, and F. N. Van Dau, “The emergence of spin
electronics in data storage,” in Nature Materials, vol. 6, no. 11, pp. 813–823,
Nov. 2007.
[Chien 64] R. Chien, “Cyclic decoding procedures for Bose-Chaudhuri-Hocquenghem
codes,” in IEEE Transactions on Information Theory, vol. 10, no. 4, pp. 357-363,
Oct. 1964.
80
[Das 17] A. Das and N. A. Touba, "Limited Magnitude Error Correction using OLS
Codes for Memories with Multilevel Cells," in Proc. of IEEE International
Conference on Computer Design (ICCD), pp. 391-394, 2017.
[Das 18a] A. Das and N. A. Touba, "Systematic b-adjacent symbol error correcting Reed-
Solomon codes with parallel decoding", in Proc. of IEEE VLSI Test Symposium
(VTS), paper 7A.1, 2018.
[Das 18b] A. Das and N.A. Touba, "Low Complexity Burst Error Correcting Codes to
Correct MBUs in SRAMs", in Proc. of ACM Great Lakes Symposium on VLSI
(GLSVLSI), pp. 219-224, 2018.
[Das 18c] A. Das and N. A. Touba, "Efficient Non-Binary Hamming Codes for Limited
Magnitude Errors in MLC PCMs", in Proc. of IEEE International Symposium on
Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), pp. 1-6,
2018.
[Das 19] A. Das and N. A. Touba, "Layered-ECC: A Class of Double Error Correcting
Codes for High Density Memory Systems" in Proc. of IEEE VLSI Test
Symposium (VTS), paper 7A.2, 2019.
[Datta 11] R. Datta and N.A. Touba, “Generating Burst-Error Correcting Codes from
Orthogonal Latin Square Codes - A Graph Theoretic Approach,” in Proc. of IEEE
International Symposium on Defect and Fault Tolerance in VLSI and
Nanotechnology Systems (DFT), pp. 367-373, 2011.
[Diao 05] Z. Diao, D. Apalkov, M. Pakala, Y. F. Ding, A. Panchula, and Y. M. Huai,
“Spin transfer switching and spin polarization in magnetic tunnel junctions with
MgO and AlOx barriers,” in Applied Physics Letters, vol. 87, no. 23, pp. 1-3, Dec.
2005.
[Dutta 07] A. Dutta and N.A. Touba “Multiple Bit Upset Tolerant Memory Using a
Selective Cycle Avoidance Based SEC-DED-DAEC Code,” in Proc. of IEEE
VLSI Test Symposium (VTS), pp. 349-354, 2007.
[Fujiwara 06] E. Fujiwara, Code Design for Dependable Systems: Theory and Practical
Applications. Hoboken, NJ, USA: Wiley-Interscience, 2006.
[Hamming 50] R. W. Hamming, “Error detecting and error correcting codes,” in Bell
System Technical Journal, vol. 26, no. 2, pp. 147-160, Apr. 1950.
[Hsiao 70] M. Y. Hsiao, D. C. Bossen, and R. T. Chien, ‘‘Orthogonal Latin Square
codes,’’ in IBM Journal of Research and Development, vol. 14, no. 4, pp. 390–
394, Jul. 1970.
[Ibe 10] E. Ibe, H. Taniguchi, Y. Yahagi, K. Shimbo and T. Toba, “Impact of scaling on
neutron-induced soft error in SRAMs from a 250 mm to a 22 nm design rule,” in
IEEE Transactions on Electron Devices, vol. 57, no. 7, pp. 1527-1538, Jul. 2010
81
[Jeon 12] M. Jeon, and J. Lee, “On Codes Correcting Bidirectional Limited-Magnitude
Errors for Flash Memories,” in Proc. of International Symposium on Information
Theory and its Applications, pp. 96-100, 2012.
[Jiang 14] L. Jiang, Y. Zhang and J. Yang, “Mitigating Write Disturbance in Super-Dense
Phase Change Memories,” in Proc. of Annual IEEE/IFIP International
Conference on Dependable Systems and Networks (DSN), pp. 216-227, 2014.
[Jiang 16] L. Jiang, W. Wen, D. Wang and L. Duan, “Improving read performance of
STT-MRAM based main memories through Smash Read and Flexible Read,” in
Proc. of Asia and South Pacific Design Automation Conference (ASP-DAC), pp.
31-36, 2016.
[Kim 07] J. Kim, N. Hardavellas, K. Mai, B. Falsafi and J. Hoe, “Multi-bit error tolerant
caches using two-dimensional error coding,” in Annual IEEE/ACM International
Symposium on Microarchitecture (MICRO), pp. 197–209, 2007.
[Klockmann 17] A. Klockmann, G. Georgakos and M. Goessel, “A new 3-bit burst-error
correcting code,” in Proc. of IEEE International Symposium on On-Line Testing
and Robust System Design (IOLTS), pp. 3-5, 2017.
[Klove 11] T. Klove, B. Bose, and N. Elarief, "Systematic, single limited magnitude error
correcting codes for flash memories," in IEEE Transactions on Information
Theory, vol. 57, no. 7, pp.4477-4487, Jul. 2011.
[Lee 11] K. Lee and S. Choi, “A Highly Manufacturable Integration Technology of 20nm
Generation 64Gb Multi-Level NAND Flash Memory,” in Proc. of IEEE
Symposium on VLSI Technology, pp. 70-71, 2011.
[Li 12] J. Li, B. Luan and C. Lam, "Resistance drift in phase change memory," in Proc.
of IEEE International Reliability Physics Symposium (IRPS), Paper 6C.1, 2012.
[Lin 04] S. Lin and D. J. Costello, Error Control Coding. Upper Saddle River, NJ, USA:
Pearson Education, 2004.
[Liu 18] S. Liu, J. Li, P. Reviriego, M. Ottavi and L. Xiao, “A Double Error Correction
Code for 32-Bit Data Words With Efficent Decoding,” in IEEE Transactions on
Device and Materials Reliability, vol. 18, no. 1, pp. 125-127, Mar. 2018.
[Lu 96] E. H. Lu and T. Chang, “New decoder for double-error-correcting binary BCH
codes,” in IEE Proceedings - Communications, vol. 143, no. 3, pp. 129 – 132,
Jun. 1996.
[Meza 15] J. Meza, Q. Wu, S. Kumar and O. Mutlu, “Revisiting Memory Errors in
Large-Scale Production Data Centers: Analysis and Modeling of New Trends
from the Field,” in Proc. of IEEE/IFIP International Conference on Dependable
Systems and Networks (DSN), pp. 415-426, 2015.
[Namba 14] K. Namba, S. Pontarelli, M. Ottavi and F. Lombardi, “A Single-Bit and
Double-Adjacent Error Correcting Parallel Decoder for Multiple-Bit Error
82
Correcting BCH Codes,” in IEEE Transactions on Device and Materials
Reliability, vol. 14, no. 2, pp. 664-671, Jun. 2014.
[Namba 15a] K. Namba, and F. Lombardi, “Non-Binary Orthogonal Latin Square Codes
for a Multilevel Phase Charge Memory (PCM),” in IEEE Transactions on
Computers, vol. 64, no. 7, pp. 2092-2097, Jul. 2015.
[Namba 15b] K. Namba and F. Lombardi, “A Single and Adjacent Symbol Error-
Correcting Parallel Decoder for Reed-Solomon Codes,” in IEEE Transactions on
Device and Materials Reliability, vol. 15, no. 1, pp. 75-81, Mar. 2015.
[Naseer 08] R. Naseer and J. Draper, “Parallel double error correcting code design to
mitigate multi-bit upsets in SRAMs,” in Proc. of European Solid-State Circuits
Conference (ESSCIRC), pp. 222-225, 2008.
[Neale 13] A. Neale and M. Sachdev, “A new SEC-DED error correction code subclass
for adjacent MBU tolerance in embedded memory,” in IEEE Transactions on
Device and Materials Reliability, vol. 13, no. 1, pp. 223–230, Mar. 2013.
[Ovshinsky 68] S. R. Ovshinsky, ‘‘Reversible Electrical Switching Phenomena in
Disordered Structures,’’ in Physics Review Letters, vol. 21, no. 20, pp. 1450–
1455, Nov. 1968.
[Papandreou 10] N. Papandreou, A. Pantazi, A. Sebastian, M. Breitwisch, C. Lam, H.
Pozidis and E. Eleftheriou, “Multilevel Phase-Change Memory,” in Proc. of IEEE
International Conference on Electronics, Circuits and Systems (ICECS), pp.
1017-1020, 2010.
[Radaelli 05] D. Radaelli, H. Puchner, S. Wong and S. Daniel, “Investigation of multi-bit
upsets in a 150 nm technology SRAM device,” in IEEE Transactions on Nuclear
Science, vol. 52, no. 6, pp. 2433-2437, Dec. 2005.
[Raoux 08] S. Raoux, G. W. Burr, M. J. Breitwisch, C. T. Rettner, Y.-C. Chen, R. M.
Shelby, M. Salinga, D. Krebs, S. H. Chen, H. L. Lung and C. H. Lam, “Phase-
change random access memory: A scalable Technology,” in IBM Journal of
Research and Development, vol. 52, no. 4/5, pp. 465-479, Sep. 2008.
[Reviriego 12] P. Reviriego, M. Flanagan, S.-F. Liu and J. Maestro, “Multiple cell upset
correction in memories using difference set codes,” in IEEE Transactions on
Circuits and Systems-I: Regular Papers, vol. 59, no. 11, pp. 2592–2599, Nov.
2012.
[Reviriego 13] P. Reviriego, S. Liu, J.A. Maestro, S. Lee, N.A. Touba and R. Datta,
“Implementing Triple Adjacent Error Correction in Double Error Correction
Orthogonal Latin Square Codes,” in Proc. of IEEE International Symposium on
Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), pp. 167-
171, 2013.
83
[Reviriego 15] P. Reviriego, S. Pontarelli, A. Evans and J.A. Maestro, “A Class of SEC-
DED-DAEC Codes Derived from Orthogonal Latin Square Codes,” in IEEE
Transactions on very Large Scale Integration (VLSI) Systems, vol. 23, no. 5, pp.
968-972, May 2015.
[Reviriego 16] M. Demirci, P. Reviriego and J. A. Maestro, “Implementing Double Error
Correction Orthogonal Latin Squares Codes in SRAM-based FPGAs,” in
Microelectronics Reliability, vol. 56, pp. 221-227, Jan. 2016.
[Shamshiri 10] S. Shamshiri and K. T. Cheng, “Error-locality-aware linear coding to
correct multi-bit upsets in SRAMs,” in Proc. of IEEE International Test
Conference (ITC), Paper 7.1, 2010.
[Slonczewski 96] J. C. Slonczewski, “Current-driven excitation of magnetic multilayers,”
in Journal of Magnetism and Magnetic Materials, vol. 159, no. 1–2, pp. L1-L7,
Jun. 1996.
[Wilkerson 10] C. Wilkerson, A. R. Alameldeen, Z. Chishti, W. Wu, D. Somasekhar, and
S. Lu, “Reducing cache power with low-cost, multi-bit error-correcting codes,” in
Proc. of ACM annual international symposium on Computer architecture (ISCA),
pp. 83–93, 2010.
[Wong 10] H. S. P. Wong, S. Raoux, S. Kim, J. Liang, J. P. Reifenberg, B. Rajendran, M.
Asheghi, and K. E. Goodson, “Phase Change Memory,” in Proceedings of the
IEEE, Vol. 98, No. 12, December 2010.
[Yamada 91] N. Yamada, E. Ohno, K. Nishiuchi, and N. Akahira, ‘‘RapidPhase
Transitions of GeTe-Sb2Te3 Pseudobinary Amorphous Thin Films for an Optical
Disk Memory,’’ in Journal of Applied Physics, vol. 69, no. 5, pp. 2849–2856,
Apr. 1991.
[Yoo 14] I. Yoo and I. C. Park, “A search-less DEC BCH decoder for low-complexity
fault-tolerant systems,” in Proc. of IEEE Workshop on Signal Processing Systems,
pp. 1-6, 2014.
[Yoo 16] H. Yoo, Y. Lee and I. C. Park, “Low-Power Parallel Chien Search Architecture
Using a Two-Step Approach,” in IEEE Transactions on Circuits and Systems II:
Express Briefs, vol. 63, no. 3, pp. 269-273, Mar. 2016.
[Yoon 18] I. Yoon and A. Raychowdhury, “Modeling and Analysis of Magnetic Field
Induced Coupling on Embedded STT-MRAM Arrays,” in IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 2, pp.
337-349, Feb. 2018.
84
Vita
Abhishek Das was raised in the small town of Rourkela, India. He received his
Bachelor of Technology degree in Electronics and Communications Engineering from
National Institute of Technology Rourkela in 2012. After graduating, he worked in
Centre for Development of Telematics as a Research Engineer for 2 years. He received a
Master’s degree in Electrical Engineering from the University of Texas at Austin in 2016.
He joined the PhD program at the University of Texas at Austin in 2016 and has been
working in the Computer Aided Testing (CAT) Lab under the supervision of Prof. Nur
Touba. He is currently pursuing his PhD degree. His current research interests include
fault tolerant computing specific to MLC non-volatile memories, VLSI testing and
exploring fault tolerant techniques for memory security.
Permanent address (or email): [email protected]
This dissertation was typed by Abhishek Das.