Efficient Error Correcting Codes for Emerging and High ...

98
Copyright by Abhishek Das 2019

Transcript of Efficient Error Correcting Codes for Emerging and High ...

Copyright

by

Abhishek Das

2019

The Dissertation Committee for Abhishek Das Certifies that this is the approved

version of the following dissertation:

Efficient Error Correcting Codes for Emerging and High-Density

Memory Systems

Committee:

____________________________________

Nur A. Touba, Supervisor

____________________________________

Zhigang Pan

____________________________________

Jacob A. Abraham

____________________________________

Michael Orshansky

____________________________________

Mudit Bhargava

Efficient Error Correcting Codes for Emerging and High-Density

Memory Systems

by

Abhishek Das

Dissertation

Presented to the Faculty of the Graduate School of

The University of Texas at Austin

in Partial Fulfillment

of the Requirements

for the Degree of

Doctor of Philosophy

The University of Texas at Austin

December 2019

Dedicated to my family

v

Acknowledgements

First and foremost, I would like to thank my advisor Dr. Nur Touba for his

invaluable guidance and support over the course of my PhD. Our discussions of various

professional and personal topics were highly stimulating and a joyful experience.

Instances wherein he would easily spot loopholes in new ideas which I missed due to my

excitement of having discovered something or point out the fact that my newly

discovered idea had been discovered some 50 odd years back always brings a smile to my

face. I am immensely grateful to him for instilling the qualities of striving for higher

quality and periodically taking a step back and analyzing things in me. I can truly say that

the pace at which I moved forward in my PhD program was defined by me. His innate

ability to motivate without ever putting any kind of pressure at all is something I will

value for the rest of my life.

I would also like to take this opportunity to thank my committee members Dr.

Jacob Abraham, Dr. Michael Orshansky, Dr. David Z. Pan and Dr. Mudit Bhargava for

their valuable insights, stimulating discussions and vital suggestions without which this

dissertation wouldn’t have been complete.

Swetalina Panigrahi has been my pillar of support throughout all these years. I am

extremely luck to have her in my life. Her keen insights, not in my research, but rather in

our personal life, helped me research better. Her belief of giving our best at everything

we do and her relaxed attitude towards our life in general has pulled me through some

really difficult times.

I also take this opportunity to thank my sister Aritra Das for her life lessons.

Although, younger, her attitude towards life is something I always aspire to have.

vi

Last but not the least, this dissertation would not have been complete without my

parents, Bithika and Swarup Kumar Das. Their emphasis to strive towards self-growth

with honesty and integrity has made me what I am today. Their unwavering support,

unconditional love and an inherent belief in my abilities has been the greatest source of

inspiration for me. I would like to thank them from the bottom of my heart for being there

whenever I needed them. No amount of words can ever justify their immense

contribution towards my success.

Finally, I would like to thank the National Science Foundation for their generous

grants which made this dissertation possible.

vii

Abstract

Efficient Error Correcting Codes for Emerging and High-Density

Memory Systems

Abhishek Das, Ph.D.

The University of Texas at Austin, 2019

Supervisor: Nur A. Touba

As memory technology scales, the demand for higher performance and reliable

operation is increasing as well. Field studies show increased error rates at dynamic

random-access memories. The high density comes at a cost of more marginal cells and

higher power consumption. Multiple bit upsets caused by high energy radiation are now

the most common source of soft errors in static random-access memories affecting

multiple cells. Phase change memories have been in focus as an attractive alternative to

DRAMs due to their low power consumption, lower bit cost and high density. But these

memories suffer from various reliability issues. The errors caused by such mechanisms

can cause large overheads for conventional error correcting codes.

This research addresses the issue of memory reliability under these new

constraints due to technology scaling. The goal of the research is to address the different

error mechanisms as well as increased error rates while keeping the error correction time

low so as to enable high throughput. Various schemes have been proposed such as

addressing multiple bit upsets in SRAMs through a burst error correcting code which has

a linear increase in complexity as compared to exponential increase for existing methods

viii

[Das 18b], as well as a double error correcting code with lower complexity and lower

correction time for the increased error rates in DRAMs [Das 19].

This research also addresses limited magnitude errors in emerging multilevel cell

memories, e.g. phase change memories. A scheme which extends binary Orthogonal

Latin Square codes in presented [Das 17] which utilizes a few bits from each cell to

provide protection based on the error magnitude. The issue of write disturbance error in

multilevel cells is also addressed [Das 18a] using a modified Reed-Solomon code. The

proposed scheme achieves a very low decoding time compared to existing methods

through the use of a new construction methodology and a simplified decoding procedure.

A new scheme is presented using non-binary Hamming codes which protect more

memory cells for the same amount of redundancy [Das 18c] through the use of unused

columns in the code space of the design.

ix

Table of Contents

List of Tables .................................................................................................................... xii

List of Figures .................................................................................................................. xiii

Chapter 1: Introduction ........................................................................................................1

1.1 Phase Change Memories .......................................................................................2

1.1.1 Write Disturbance Errors .......................................................................3

1.1.2 Resistance Drift Errors...........................................................................4

1.2. Spin Transfer Torque Magnetic RAM (STT-MRAM) .............................5

1.2.1 Read Disturbance Errors ........................................................................6

1.2.2 Magnetic Field Coupling Errors ............................................................6

1.3 High Density Memory Systems ............................................................................7

1.4 Error Correcting Codes .........................................................................................7

1.5 Contributions of Dissertation ................................................................................9

Chapter 2: Low Complexity Burst Error Correcting Codes to Correct MBUs in

SRAMs .........................................................................................................................12

2.1 Introduction .........................................................................................................12

2.2 Burst Error Correcting Hamming Codes ............................................................14

2.2.1 Syndrome Analysis ..............................................................................15

2.2.2 Decoding Procedure ..........................................................................16

2.3 Proposed Scheme ................................................................................................16

2.3.1 Decoding Procedure ..........................................................................17

2.3.2 Area and Delay Optimization ...........................................................19

2.4 Evaluation ...........................................................................................................20

x

2.4.1 Redundancy ......................................................................................20

2.4.1 Hardware Complexity .......................................................................21

2.5 Conclusion ..........................................................................................................23

Chapter 3: Layered-ECC: A Class of Double Error Correcting Codes for High

Density Memory Systems ............................................................................................24

3.1 Introduction .........................................................................................................24

3.2 Related Work ......................................................................................................25

3.3 Proposed Scheme ................................................................................................28

3.3.1 Low Latency Decoding ........................................................................33

3.3.2 Low Complexity Decoding ..................................................................34

3.4 Evaluation ...........................................................................................................35

3.5 Conclusion ..........................................................................................................37

Chapter 4: Limited Magnitude Error Correction using OLS Codes for Memories with

Multilevel Cells ............................................................................................................38

4.1 Introduction .........................................................................................................38

4.2 Orthogonal Latin Square Codes ..........................................................................39

4.3 Proposed Scheme ................................................................................................41

4.3.1 Redundancy Optimization ...................................................................44

4.4 Evaluation ...........................................................................................................46

4.5 Conclusion ..........................................................................................................48

Chapter 5: Systematic b-Adjacent Symbol Error Correcting Reed-Solomon Codes

with Parallel Decoding .................................................................................................50

5.1 Introduction .........................................................................................................50

5.2 Reed-Solomon Codes .........................................................................................51

5.3 Proposed Scheme ................................................................................................53

xi

5.3.1 Decoding Procedure .............................................................................55

5.4 Evaluation ...........................................................................................................57

5.4.1 Redundancy .........................................................................................59

5.4.2 Hardware Complexity ..........................................................................59

5.5 Conclusion ..........................................................................................................61

Chapter 6: Efficient Non-binary Hamming Codes for Limited Magnitude Errors in

MLC PCMs ..................................................................................................................62

6.1 Introduction .........................................................................................................62

6.2 General Hamming Codes ....................................................................................64

6.3 Proposed Scheme ................................................................................................66

6.3.1 Syndrome Analysis ..............................................................................68

6.3.2 Companion Matrix ...............................................................................70

6.3.3 Encoder ................................................................................................71

6.3.4 Decoder ................................................................................................71

6.4 Evaluation ...........................................................................................................72

6.5 Conclusion ..........................................................................................................75

Chapter 7: Summary and Future Work ..............................................................................76

7.1 Summary .............................................................................................................76

7.2 Future Work ........................................................................................................78

Bibliography ......................................................................................................................79

Vita .....................................................................................................................................84

xii

List of Tables

Table 2.1: Possible Errors for a Data Bit Within a 4-Bit Burst Window...........................15

Table 2.2: Comparison of redundancy, decoder latency, decoder area and decoder cell

usage between burst error correcting Hamming codes and Proposed

Codes.............................................................................................................21

Table 3.1: Example of syndrome values and error candidates for different error types ....32

Table 3.2: Comparison of proposed low latency decoder with existing schemes ............36

Table 3.3: Comparison of proposed low complexity serial decoder with existing

schemes .........................................................................................................36

Table 4.1: Comparison of OSMLD and proposed codes for asymmetric magnitude-3

error ...............................................................................................................47

Table 4.2: Comparison of OSMLD, proposed OLS and Hybrid codes for symmetric

magnitude-1 error in 3-bits/cell memory ......................................................47

Table 4.3. Comparison of OSMLD, proposed OLS and Hybrid codes for symmetric

magnitude-3 error in 4-bits/cell memory ......................................................48

Table 5.1: Redundancy, Decoder Area and Latency Comparison for DAsEC codes ........59

Table 5.2: Redundancy, Decoder Area and Latency Comparison for TAsEC codes ........59

Table 6.1: Comparison of Encoding circuit between the different schemes .....................74

Table 6.2: Comparison of Decoding circuit between the different schemes .....................74

xiii

List of Figures

Fig. 1.1: PCM Cell Structure ...............................................................................................2

Fig. 1.2: MLC PCM resistance levels ..................................................................................3

Fig. 1.3: Write Disturbance in PCM ....................................................................................4

Fig. 1.4: Resistance Drift in MLC PCM ..............................................................................5

Fig. 1.5: 1T1MTJ STT-MRAM Cell ...................................................................................6

Fig. 2.1: General form of the parity check matrix of proposed codes. ..............................17

Fig. 2.2: Symbols data bit di is part of for a 4-bit burst error correcting proposed code. ..18

Fig. 2.3: Comparison of decoder area for different information lengths k and different

burst sizes b. ..................................................................................................22

Fig. 3.1: p-parallel Chien search architecture with short critical path ...............................26

Fig. 3.2: Parity check matrix of proposed scheme for k=16 ..............................................29

Fig. 3.3: Block diagram of proposed decoding logic .........................................................30

Fig. 3.4: Comparison of number of syndromes for different data bit sizes between

proposed scheme and a DEC BCH code .......................................................34

Fig. 3.5: Block diagram of error pattern generation using the serial low complexity

decoding procedure .......................................................................................35

Fig. 4.1: Parity check matrix and decoder logic for a SEC (8,4) OLS Code .....................40

Fig. 4.2: Bidirectional (magnitude-1 upwards and magnitude-2 downwards) error

transitions on lower order 2 bits....................................................................42

Fig. 4.3. Example of encoder and decoder logic for a proposed scheme ..........................43

Fig. 4.4: Symmetric magnitude-1 errors for d0 and d1 .......................................................45

Fig. 5.1: Partial schematics of error pattern generator for the proposed scheme. ..............56

Fig. 6.1: Parity Check Matrix of (5, 3) 4-ary Hamming code ...........................................64

Fig. 6.2: Classification of columns and elements for a 2-bits/cell memory ......................65

xiv

Fig. 6.3: Multiplying major columns with 7 for a 3-bits/cell memory ..............................65

Fig. 6.4: All possible error patterns for a 3-bits/cell memory ............................................67

Fig. 6.5: (a) All possible columns for a 3-bits/cell memory and 2 parity check

symbols (b) Resultant columns after removal of major elements from e1

(c) Resultant columns after removal of major elements from e2 ..................67

Fig. 6.6: Parity check matrix of a limited magnitude-1 error correcting code for a 3-

bits per cell memory......................................................................................68

Fig.6.7: Algorithm for construction of the parity check matrix of the proposed scheme ..69

Fig. 6.8: Comparison of #syndromes between limited magnitude error correcting

Hamming codes and non-binary general Hamming codes ...........................70

Fig. 6.9: Binary form of partial parity check matrix of Fig. 6.6 ........................................72

Fig. 6.10: Error Magnitude Computation for first 2 symbols of parity check matrix

from Fig. 6.9 .................................................................................................72

1

Chapter 1: Introduction

Conventional memory systems are based on DRAMs and SRAMs which have

been quite efficient for many years now. But with technology scaling leading to more

stringent requirements like less power consumption and higher capacity, DRAM and

SRAM scaling has not been able to keep up. Emerging forms of memories like Phase

Change Memory (PCM) and Spin Transfer Torque Magnetic RAM (STT-MRAM) have

been the focus of research as possible alternatives. Properties like low power

consumption, low cost per bit and further technology scaling make them a viable

replacement solution. But such emerging memories suffer from various reliability issues

like write disturbance errors and resistance drifts which can cause read reliability

degradation in PCMs. Meanwhile, the conventional memories have continued to shrink in

size due to technology scaling thus providing adequately high densities.

Conventional error correcting techniques can protect the newer memory

alternatives. But the completely different reliability issues and error models make them

un-suitable for this purpose. The high density and increased error rates in memories like

SRAMs and DRAMs also render conventional codes ineffective. Conventional codes can

very well lead to high data redundancy and high decoding complexity for these modern

memory systems. This creates a demand for newer and efficient error correcting

techniques which are better suited in addressing the newer reliability issues and fault

models. Emerging memories also support multilevel operations i.e. they can store

multiple bits per cell. This renders conventional binary error correcting codes in-efficient

to address multi-bit errors. Such codes would require a lot of redundancy to handle multi-

bit symbol-based errors.

2

In further sections, the basic operation of each type of emerging and high density

memory systems is described along with current reliability issues that plague these

memory technologies.

1.1 PHASE CHANGE MEMORIES

Although, phase change memories were first developed in [Ovshinsky 68], this

memory technology was revived in recent years due to a fast crystallization material

Ge2Sb2Te5 (GST) [Yamada 91]. This memory technology was shown to have great

promise as a main memory system in terms of scalability [Raoux 08]. A more recent

proposal of multilevel cell (MLC) operation with the ability to store multiple bits per cell

thus lowering costs and providing high density [Papandreou 10] has increased the

research focus on this area further.

Top Electrode

Phase Change Material

Programmable Region

Bottom Electrode

HeaterInsulator

Fig. 1.1: PCM Cell Structure

A basic PCM cell has been shown in Fig. 1.1. It consists of a phase change

material, a programmable region, a heater, insulators, a top and a bottom electrode. The

programmable region can be programmed to either be in a crystalline state or in an

amorphous state [Wong 10]. The PCM cell is reset into an amorphous state by melting

the programmable region and then quenching it rapidly through a short duration large

electrical current. This creates a highly resistive amorphous material. This amorphous

region is in series with the crystalline phase change material. The total resistance of the

3

cell thus is the series combination of the crystalline part and the created amorphous

region. In order to set the PCM cell to crystalline state, a lower electrical pulse is applied

for a long duration so that the programmable region is annealed at a temperature between

the crystallization temperature and the melting temperature. A read operation is

performed through measuring the resistance of the cell by passing a small electrical

current through the cell. The electrical current needs to be small enough so as not to

disturb the contents of the cell.

PCM cells can also be configured so that the memory cell can be programmed

into several intermediate resistance levels between that of SET and RESET state. This

allows for the storage of multiple bits per cell. This leads to further lower cost per bit

since it basically enables increasing the capacity without changing the number of cells.

But such an operation requires an iterative algorithm-based programming so as to account

for process and material variations. An example for a 2-bits/cell memory’s resistance

distribution is shown in Fig. 1.2.

00 01 10 11

% o

f ce

lls

Resistance

Fig. 1.2: MLC PCM resistance levels

1.1.1 Write Disturbance Errors

Write disturbance errors are caused by the heat dissipated from a cell during a

PROGRAM operation. Cells that are being programmed to RESET state are more prone

to cause write disturbance errors since it involves a large electrical current pulse. If the

4

heat dissipated from such a PROGRAM operation is more than the crystallization

temperature of the neighboring cells, it can potentially cause partial crystallization of

such cells thus changing their stored data values. This problem worsens with technology

scaling as cells get more closer to each other [Jiang 14]. Smaller technology nodes with

super dense memories thus have exacerbated write disturbance errors affecting numerous

contiguous cells. An example of write disturbance has been shown in Fig. 1.3. It should

be noted that for write disturbance errors to occur, a considerably high write current is

required. This is so that enough heat needs to be dissipated to cause a disturbance in

neighboring cells. Thus, write disturbance errors are more probable if the write operation

involves programming the cell to a more amorphous state.

2 3 0 1 0 3 1 0

2 0 x x x 3 1 0

Old Data

New Data

Disturbed Cells

Write

Fig. 1.3: Write Disturbance in PCM

1.1.2 Resistance Drift Errors

Resistance drift errors are caused due to structural relaxation of the PCM cell

which causes the cell’s resistance to increase over time [Li 12]. This causes a read

reliability degradation over time since the increased resistance of the cell can lead to an

erroneous readout of the cell. For MLC PCM cells, this problem is exacerbated since

there are more resistance distributions compared to a single level cell (SLC) storing just

one bit. Also, as the number of bits per cell b increases, the number of resistance

distributions, given by L = 2b, increases exponentially. Thus, the read reliability

5

degradation worsens as more bits are stored per cell. An example of resistance drift for a

2-bits per cell memory is shown in Fig. 1.4.

00 01 10 11

% o

f ce

lls

Resistance

Fig. 1.4: Resistance Drift in MLC PCM

1.2. SPIN TRANSFER TORQUE MAGNETIC RAM (STT-MRAM)

STT-MRAMs was theoretically predicted in [Slonczewski 96]. A STT-MRAM

cell consists of a transistor and a magnetic tunneling junction connected in series as

shown in Fig. 1.5. The MTJ consists of ferromagnetic electrodes separated by a thin

barrier of tunnel oxide (e.g. MgO). Any information is thus stored in an MTJ based on

the relative magnetization of the free layer and the pinned layer. Also, since storage is

dependent on the spin directions of electrons as opposed to charge, they are relatively

immune to radiation induced soft errors. The MTJ can be configured into two resistance

states: a low resistance state when the free layer’s magnetization is parallel to that of the

pinned layer and a high resistance state when the free layer’s magnetization is anti-

parallel to that of the pinned layer. A spin polarized current is used to switch the

magnetization of the MTJ [Diao 05]. The spin polarized current exerts a torque on the

free layer causing it to change its direction of magnetic moment. The read operation

involved passing a small current through the cell and sensing the resistive state of the

STT-MRAM cell.

6

Free Layer

Pinned Layer

Tunnel Oxide

WL

SL

BL

Fig. 1.5: 1T1MTJ STT-MRAM Cell

1.2.1 Read Disturbance Errors

With the scaling of STT-MRAM, the write current reduces as well. But the read

current does not scale well with feature size [Jiang 16]. For small technology nodes, it is

challenging to reduce read current since it becomes difficult for conventional STT-

MRAM sense amplifiers to sense data correctly. Thus, for lower technology reads it is

possible that some read operations might actually cause the cell’s data to flip, thus

causing a read disturbance error.

1.2.2 Magnetic Field Coupling Errors

Scaling MTJ in a densely packed array causes program errors due to large stray

field coupling [Chappert 07]. As MTJs scale down, the distance of ferromagnets, free

layer and the pinned layer decrease. This causes strong magnetic coupling. As the

distance reduces further, the magnetic coupling gets stronger thus worsening the problem.

This magnetic field coupling from one MTJ basically affects the write and read

operations of its neighboring bits [Yoon 18]. Magnetic coupling causes significant

degradation to average retention times. A non-trivial degradation to critical current

densities is also caused which affects the write operation of a STT-MRAM cell.

7

1.3 HIGH DENSITY MEMORY SYSTEMS

Conventional memory systems like SRAM and DRAM have seen continued

shrinking of their size due to technology scaling. The new technology nodes enable high

density, but also come at a cost of increased error rate. For high density SRAM systems,

the decreased cell size and the decreased distance between cells means that a particle

strike no longer affects a single SRAM cell. The effects of a particle strike are now seen

across multiple SRAM cells causing a multiple bit upset (MBU) [Radaelli 05]. DRAM

systems have also seen a continuous scaling trend with current industry standards

manufacturing their 10nm class of DRAMs. But field studies of DRAM systems have

shown increased error rates wherein the traditional SEC-DED codes are no longer strong

enough to provide continued error protection.

Flash based memories, e.g. NAND flash, have seen an increase in capacity as

well. And as technology nodes scales, the demand for high throughput becomes a major

concern. This is because, these memories use BCH or its derivative based codes which

use a significant number of decoding cycles to check and correct for errors. This becomes

a bottleneck in enabling high throughput.

1.4 ERROR CORRECTING CODES

Error correcting codes have been conventionally used to protect SRAMs and

DRAMs from radiation induced soft errors. The most prevalent code is the single error

correcting double error detecting (SEC-DED) Hamming code presented in [Hamming

50]. These codes can correct any single bit error in a word and detect any double errors.

These codes have very low decoding latency and have an intermediate amount of data

redundancy. Another prevalent code specific to SRAMs is the Orthogonal Latin Square

(OLS) codes first presented in [Hsiao 70]. These codes have a considerable high amount

8

of data redundancy. But they utilize majority logic decoding procedure which enables

very low latency decoding with very low decoding complexity. Also, they are modular in

design and can correct more than one errors. These codes are not suitable for modern

memories like PCMs and STT-MRAMs because of the relatively larger number of bits

that need protection. An SEC-DED code cannot guarantee protection against the newer

forms of errors which affect multiple bits. OLS codes incur a considerable data

redundancy overhead to address symbol-based error in MLC PCMs or multi-bit clustered

errors in STT-MRAMs. For high density SRAMs and DRAMs, due to the shrinking

technology nodes, these memories are seen to have an increased error rate. The

conventional error correcting codes incur high overheads and are not strong enough to

protect the memories from the high error rates.

For flash-based memories and hard disk storage, conventionally low redundancy

codes like Bose-Chaudhuri-Hocquenghem (BCH) codes have been used. They offer very

low redundancy but have a trade-off in terms of multi-cycle decoding procedure. They

incur significant decoding logic overhead and can take considerable number of cycles,

sometimes proportional to the number of bits in the word, for the decoding procedure

[Chien 64]. This calls for the need of alternative solutions which incur a less overhead in

terms of number of decoding cycles. Moreover, non-binary error correcting BCH codes

are not suitable in the context of PCM and STT-MRAMs. This is because, these

memories are random-access memories and require a faster and relatively simpler

decoding procedure. These factors call for the need of design of more efficient codes

which are better suitable for the newer forms of memory with their relatively different

reliability issues.

9

1.5 CONTRIBUTIONS OF DISSERTATION

Over the course of the next few chapters, novel ideas related to different

reliability concerns have been presented. The primary focus of all the methods proposed

in this dissertation is to design more efficient error correcting codes better suited to

address the new reliability issues.

Chapter 2 addresses the issue of burst errors or MBUs in SRAM based systems. A

new burst error correcting code based on Hamming codes is proposed which allows much

better scaling of decoder complexity as the burst size is increased [Das 18b]. For larger

burst sizes, it can provide significantly smaller and faster decoders than existing methods

thus providing higher reliability at an affordable cost. Moreover, there is no significant

increase in the number of check bits in comparison with existing methods. A general

construction and decoding methodology for the new codes is proposed. Experimental

results are presented comparing the decoder complexity for the proposed codes with

conventional burst error correcting Hamming codes demonstrating the significant

improvements that can be achieved.

In chapter 3, a layered double error correcting (DEC) code is proposed with a

simple decoding procedure [Das 19]. The codes are shown to strike a good balance

between redundancy and decoder complexity. A general construction methodology is

presented along with two different decoding schemes. One is a low latency decoding

scheme that is useful for main memories which need high speed decoding for optimal

performance. This scheme is shown to achieve better redundancy compared to existing

low-latency codes as well as faster decoder latency compared to existing low-redundancy

codes. The second is a low complexity decoding scheme which is useful for flash-based

memories. This scheme is shown to have considerably less area compared to existing

schemes. Also, it is shown that the proposed serial low complexity decoding scheme can

10

take significantly fewer cycles to complete the whole decoding procedure; thus, enabling

better performance compared to existing serial decoding schemes.

Chapter 4 proposes a non-binary OLS code to address limited magnitude errors in

the form of resistance drifts in MLC PCMs [Das 17]. The codes presented extend the

binary OLS codes and modified to correct limited magnitude errors. The codes presented

are able to lower the redundancy by considering only a few bits from each symbol to

compute the parity bits. A new decoding methodology is presented which looks at the

received bits per symbols as well as the decoder bits per symbol to compute the error

magnitude, which is then added back to the received symbol. Chapter 3 also presents

hybrid codes are which combines the limited magnitude OLS codes and another low

redundancy error correcting codes to further reduce the number of check symbols. But

this low data redundancy is achieved at the expense of some additional decoding

complexity.

In chapter 5, a new systematic b-adjacent codes based on Reed-Solomon codes

are proposed [Das 18a]. These codes are targeted towards write disturbances in PCM

which affects multiple clustered cells as well as finds usage in magnetic field coupling in

STT-MRAMs affecting multiple neighboring bits. Reed Solomon (RS) codes offer good

error protection since they can correct multi-bit symbols at a time. But beyond single

symbol error correction, the decoding complexity as well as the decoding latency is very

high. The codes presented in this work also have a low latency and low complexity

parallel one step decoding scheme. This makes them suitable for PCM and STT-MRAM

since they need a high-speed decoding procedure to enable high performance. The codes

presented can correct any errors within b-adjacent symbols. The codes presented are

shown to have very low data redundancy and significantly better decoding complexity

and decoding latency compared to existing Reed-Solomon schemes.

11

Chapter 6 proposes a new systematic single error correcting (SEC) limited

magnitude error correcting non-binary Hamming code specifically to address limited

magnitude errors in multilevel cell memories storing multiple bits per cell [Das 18c]. A

general construction methodology is presented to correct errors of limited magnitude and

is compared to existing schemes addressing limited magnitude errors in phase change

memories. A syndrome analysis is done to show the reduction in total number of

syndromes for limited magnitude error models. It is shown that the proposed codes

provide better latency and complexity compared to existing limited magnitude error

correcting non-binary Hamming codes. It is also shown that the proposed codes achieve

better redundancy compared to the symbol extended version of binary Hamming codes.

Finally, Chapter 7 summarizes the contributions of this dissertation and provides

some future directions for the continuation of this research work.

12

Chapter 2: Low Complexity Burst Error Correcting Codes to Correct

MBUs in SRAMs

2.1 INTRODUCTION

Soft errors caused by radiation poses a significant reliability concern for SRAMs

[Baumann 05]. With technology scaling, the susceptibility of SRAMs to soft errors has

significantly increased as well [Ibe 10]. In current nanoscale technology nodes, device

geometries are small, and with technology scaling, devices are getting smaller. Thus, a

particle strike might affect more than one cell causing a multiple bit upset [Radaelli 05].

The smaller the device geometries, more the number of cells that are affected by a single

particle strike. A b-bit burst error caused by such a particle strike can cause multiple bits

to be flipped within the b-bit burst window. Thus, codes aimed to correct MBUs in

SRAMs should be able to correct all possible error combinations within a b-bit burst

window including all b-bits getting flipped.

Burton Code [Burton 71] was one of the early codes that dealt with MBUs by

correcting a single phased burst error or a single symbol error. But they can only correct

b-bit burst error if it falls within a single symbol which is not always guaranteed. The

most common method to address MBUs in SRAMs was to use a SEC-DED code in

tandem with word-interleaving in such a way that each MBU would affect only a single

bit for each word instead of multiple bits of a single word. This scheme made it possible

to correct MBUs and reduce the soft error rate (SER) using a SEC-DED error correcting

code [Baeg 09]. The degree of interleaving defined the amount of adjacent error

This chapter is based on the publication [Das 18b]: A. Das and N.A. Touba, "Low Complexity Burst Error

Correcting Codes to Correct MBUs in SRAMs", in Proc. of ACM Great Lakes Symposium on VLSI

(GLSVLSI), pp. 219-224, 2018. The author of this dissertation contributed to the conception of the research

problem, theoretical developments and experimental verification of the research work.

13

protection for a memory system. However, such a scheme is no longer beneficial due to

constraints such as memory aspect ratio, performance and power consumption which

limits the degree of interleaving.

Error correcting codes (ECCs) with one-step majority logic decoding have been

studied and extended to address the problem of MBUs for SRAMs. [Datta 11] proposed a

general theory on adjacent error correcting orthogonal Latin square (OLS) codes for

various burst sizes. [Reviriego 13] proposed a method to correct triple adjacent errors

with the same amount of redundancy as a double error correcting OLS code. [Reviriego

12] proposed a method to correct MBUs in SRAMs using majority logic-decodable

difference set codes. [Reviriego 15] proposed a new class for double adjacent error

correction (DAEC) SEC-DED-DAEC codes from OLS codes. Although these classes of

majority-logic decodable (MLD) codes have very low access latencies which is good for

the overall performance of the memory system, the MLD codes suffer from very high

redundancy. Other low redundancy codes have also been extensively studied for

correcting MBUs in SRAMs. [Namba 14] proposed a single-bit and double adjacent error

correcting parallel decoder for BCH codes. [Kim 07] proposed a two-dimensional coding

method to correct multiple bit errors in caches. [Argyrides 11] proposed a matrix-based

code which combined Hamming codes and Parity codes to detect and correct multiple

errors. But these codes have high area, power consumption and sometimes high decoding

latency. Recently, Hamming codes have also been studied and extended to correct MBUs

in SRAMs. These codes provide balanced trade-off between the amount of redundancy

required and the decoding complexity and decoding latency. Thus, they are an attractive

solution specific to SRAMs. [Dutta 07] first extended the traditional SEC-DED Hamming

code to correct two adjacent errors for tolerating multiple bit upsets in small memories.

[Shamshiri 10] proposed a general solution for burst (local) error correcting codes

14

working in conjunction with random (global) error correcting codes to correct MBUs in

SRAMs. [Neale 13] presented a code with the basic SEC-DED coverage as well as both

DAEC and scalable adjacent error detection (xAED) while reducing the

adjacent/nonadjacent double-bit error mis-correction probability. [Adalid 15] proposed a

SEC-DAEC-TAEC and 3-bit burst error correction codes which reduced the total number

of ones in the parity check matrix to optimize the decoder complexity and decoder

latency. For all such Hamming extended codes, the decoding procedure still relies on a

syndrome based matching method. As the burst size increases, the number of syndromes

also increases exponentially, thus increasing the decoder complexity exponentially as

well.

In this work, a general b-bit burst error correcting code is proposed which aims to

reduce the decoder complexity for a similar amount of redundancy as general burst error

correcting Hamming codes. The proposed codes correct all possible combination of

errors within a b-bit burst and achieve much better decoder complexity and decoder

latency for higher burst sizes. The sections of this chapter are organized as follows.

Section 2.2 briefly describes the concepts for burst error correcting Hamming codes

along with an analysis on syndromes. Section 2.3 describes the proposed codes, its

construction and its decoding procedures. Section 2.4 evaluates the proposed codes and

makes a comparison to the general burst error correcting Hamming codes. Finally, sec.

2.5 presents the conclusion of this work.

2.2 BURST ERROR CORRECTING HAMMING CODES

Extensive research has been done on burst error correcting Hamming codes as

discussed previously. The key concepts related to burst error correcting codes remains the

15

same. For a Hamming code to be b-bit burst error correcting, it needs to satisfy the

following conditions [Shamshiri 10]:

1. Each column of the parity check matrix is unique.

2. The bitwise XOR of any combination of columns within b-adjacent columns is unique.

Condition-1 ensures that all single errors are recognized and corrected. Condition-

2 ensures that any combination of errors within the burst size b are recognized and

corrected. The parity check matrix can be constructed in an algorithmic manner by

extending [Dutta 07] or using a Boolean SAT solver [Shamshiri 10].

2.2.1 Syndrome Analysis

The total number of syndromes for a b-bit burst error correcting code for each b-

adjacent column code is given by equation (2.1). Only syndromes affecting information

bits are considered as there is no necessary requirement to correct the parity check bits.

Since for each information bit a b-bit burst might exist, the total number of useful

syndromes for a b-bit burst error correcting code is given by equation (2.2), where k is

the total number of information bits.

Number of bits in error Location of errors

1 i

2 (i,i+1); (i,i+2); (i,i+3);(i-3,i); (i-2,i); (i-1,i)

3 (i-3,i-2,i); (i-3,i-1,i); (i-2,i-1,i); (i-2,i,i+1);

4 (i-3,i-2,i-1,i); (i-2,i-1,i,i+1); (i-1,i,i+1,i+2); (i,i+1,i+2,i+3)

Table 2.1: Possible Errors for a Data Bit Within a 4-Bit Burst Window

Equation (2.2) shows that the total number of syndromes is a linear function of

the length of information bits, and an exponential function of the burst size b. Thus, as

# 𝑆𝑦𝑛𝑑𝑟𝑜𝑚𝑒𝑠 𝐵𝑢𝑟𝑠𝑡 = 𝐶𝑏1 + 𝐶𝑏

2 + ⋯ + 𝐶𝑏𝑏 = 2𝑏 − 1 ()

# 𝑆𝑦𝑛𝑑𝑟𝑜𝑚𝑒𝑠 = 𝑘 2𝑏 − 1 ()

16

the burst size increases the number of syndromes increase exponentially thereby

increasing the decoder hardware exponentially. For a b-bit burst error, any combination

of errors within b-adjacent bits is possible. Table 2.1 shows all the possible errors that

can occur involving data bit di within a 4-bit burst window. For each type of error within

b-adjacent bits, a unique syndrome defines the pattern and location of the error.

2.2.2 Decoding Procedure

The decoding procedure for most methods is based on syndrome matching i.e.

each syndrome is mapped to particular data bit(s) being in error. Each syndrome is

mapped to their corresponding data bits and the relevant syndrome is the OR of all

corresponding syndromes. This indicates whether a bit is in error or not, and if it is in

error, the data bit is flipped. Thus, as the burst size or the block length increases, the

number of syndromes increases and so does the complexity of the decoding circuit.

2.3 PROPOSED SCHEME

For a burst size of b, the key idea of the proposed codes is to partition the parity

check matrix in a manner such that the upper b-rows of the parity check matrix is used to

compute the error pattern within the b-bit burst, and the lower (r-b) rows of the parity

check matrix are used to compute the location of the burst. To compute the error pattern,

the upper b-rows of the parity check matrix are organized in such a manner that for each

consecutive b-columns there is exactly one non-zero entry per row. This is easily done by

interleaving the 1s every b-columns. Next, the lower (r-b) rows of the parity check matrix

are constructed in such a manner that they satisfy the conditions for a b-bit burst error

correcting Hamming Code:

1. All columns of the parity check matrix are unique.

17

2. The bitwise XOR of any combination of columns within b-adjacent columns are also

unique.

Similar to burst error correcting Hamming codes, Condition-1 ensures that all

single errors are recognized and corrected. Condition-2 ensures that any combination of

errors within the burst size b are recognized and corrected. This condition is necessary

because MBUs do not necessarily flip consecutive bits, and there might be cases were the

MBU flips a few non-consecutive bits within a burst size. The general form of the parity

check matrix for a b-bit burst error correcting code has been shown in Fig. 2.1. In this

case H* refers to the lower (r-b) rows constructed as described previously.

Fig. 2.1: General form of the parity check matrix of proposed codes.

The proposed codes are also systematic by design. Thus, the encoding procedure

is the same as general Hamming codes. The parity bits are computed by XORing data-

bits based on the parity check matrix and are appended to the data-bits to form the final

codeword. Thus, no complexity is added to the encoding procedure.

2.3.1 Decoding Procedure

The decoding procedure involves two parts: the error pattern computation and the

error location computation. The error pattern is directly computed from the upper b-rows

of the parity check matrix. Since the upper b-rows of the parity check matrix are arranged

in an interleaved fashion, any error within a b-bit burst produces a syndrome equal to that

of the error pattern. The location of the burst error is computed through the lower (r-b)

rows. If an error occurs within a b-bit burst starting from location i, the syndrome is given

𝐻 = 𝐼𝑏 𝐼𝑏 ⋯ 𝐼𝑏

𝐻∗ 𝐼𝑟

18

by equation (2.3). Thus, the decoding works by considering each group of b-adjacent

columns as a single symbol and decoding on a per symbol basis. Any data bit di then is

part of b symbols of b-bit each. An example has been shown for a 4-bit burst error

correcting code in Fig. 2.2, the data but di is part of 4-bit symbols Bi-3, Bi-2, Bi-1 and Bi. A

b-bit burst error simply means an error in one of the b-bit symbols, and thus can be

computed through equation (2.4). The error pattern of the data bit is simply the syndrome

value Sα where α is the row amongst the upper b-rows for which the corresponding

column is a 1. Thus, a data bit will not be in error only if it does not satisfy equation (2.4)

for all b-bit symbols of which it is a part.

1 0 0 0 1 0 0

0 1 0 0 0 1 0

0 0 1 0 0 0 1

0 0 0 1 0 0 0

0 0 1 1 1 0 0

0 0 1 0 0 1 0

1 0 1 1 0 0 1

0 0 1 1 1 1 1

0 1 1 0 1 1 1

di-3 di-2 di-1 di di+1 di+2 di+3

......

Bi-3

Bi-2

Bi-2

Bi

Fig. 2.2: Symbols data bit di is part of for a 4-bit burst error correcting proposed code.

The above method has a clear advantage over syndrome matching based decoding

methods specifically for larger burst sizes. This is because as the burst size increases, the

𝑆1

𝑆2

⋮𝑆𝑏

𝑆𝑏+1

𝑆𝑏+2

⋮𝑆𝑟

=

𝑒𝑖

𝑒𝑖+1

⋮𝑒𝑖+𝑏−1

ℎ𝑏+1,𝑖𝑒𝑖 ⊕ ℎ𝑏+1,𝑖+1𝑒𝑖+1 ⋯⊕ ℎ𝑏+1,𝑖+𝑏−1𝑒𝑖+𝑏−1

ℎ𝑏+2,𝑖𝑒𝑖 ⊕ ℎ𝑏+2,𝑖+1𝑒𝑖+1 ⋯⊕ ℎ𝑏+2,𝑖+𝑏−1𝑒𝑖+𝑏−1

⋮ℎ𝑟 ,𝑖𝑒𝑖 ⊕ ℎ𝑟 ,𝑖+1𝑒𝑖+1 ⋯ ⊕ ℎ𝑟 ,𝑖+𝑏−1𝑒𝑖+𝑏−1

()

ℎ𝑏+𝛽 ,𝑖𝑆1 ⊕ ℎ𝑏+𝛽 ,𝑖+1𝑆2 ⋯⊕ ℎ𝑏+𝛽 ,𝑖+𝑏−1𝑆𝑏 ⊕ 𝑆𝑏+𝛽 = 0 ∀ 𝛽 ∈ {1, 2, …𝑟 − 𝑏} ()

19

number of syndromes increases exponentially, as shown in equation (2.2). Thus, the

amount of hardware needed to decode also increases exponentially in syndrome matching

based decoding. But in the proposed decoding scheme, an increase in burst size results in

addition of another term in equation (2.4). This essentially means a linear increase in the

amount of XOR, AND and OR gates in the decoder design.

2.3.2 Area and Delay Optimization

In a general Hamming Code, there are two optimization criteria that can be used

on its parity check matrix to reduce the decoder complexity and decoder latency:

1. Minimize the total number of 1’s in each row of the parity check matrix.

2. Minimize the total number of 1’s in the parity check matrix.

The first criterion minimizes the decoder latency while the second criterion

reduces the decoder complexity. A similar optimization can be done for the proposed

codes as well but with a simple modification. Considering the general parity check matrix

of the proposed codes shown in Figure 2, it can be seen that the upper b-rows of the

matrix is already sparse and has the minimum number of 1’s both on a row basis and in

total. Thus, the two optimization criteria of the Hamming code can be applied to the

lower (r-b) rows of the matrix to achieve the same optimizations. This is because,

minimizing the number of 1’s in each row simply reduces the number of bits that need to

be XORed together for the syndrome as well as for equation (2.4). Thus, the optimization

results in lesser XOR operations thereby reducing the decoder delay. And minimizing the

total number of 1’s in the lower (r-b) rows of the parity check matrix reduces the total

number of XOR gates thus reducing the complexity. Thus, optimization techniques

proposed in [Adalid 15] are orthogonal to the proposed codes and can be used in place of

burst error correcting Hamming codes.

20

2.4 EVALUATION

The burst error correcting codes for different burst sizes were constructed by

extending the codes proposed in [Shamshiri 10]. The proposed codes and the general

burst error correcting Hamming codes were synthesized on Synopsys Design Compiler

using NCSU FreePDK45 45nm library for information length k = 16, 32 and 64 bits, and

burst sizes b = 3, 4, 5, 6 and 7 bits. For the general burst error correcting Hamming codes,

the codes in [Shamshiri 10] were used to construct the parity check matrix for difference

burst sizes. But [Shamshiri 10] doesn’t explicitly define a decoding procedure and simply

mentions the decoding procedure to be the same as a Hamming Code. Thus, the

syndrome matching based decoding from [Dutta 07] was extended to the appropriate

burst sizes for the constructed parity check matrix derived from [Shamshiri 10]. The

proposed codes and the general burst error correcting Hamming codes were implemented

using Dataflow model in Verilog and errors were injected to ensure that all errors within

the burst size was corrected. Exhaustive testing was done for all different error patterns

within the burst size and for various locations of the burst errors. Table 2.2 shows the

comparison of redundancy, decoder area, number of cells in decoder and decoder latency,

between the proposed codes and the general burst error correcting Hamming codes, for

different information lengths and different burst error sizes.

2.4.1 Redundancy

Table 2 shows the comparison of redundancy for k = 16, 32 and 64 bits for

various burst error sizes b = 3, 4, 5, 6 and 7 bits. The redundancy column shows the

number of check bits required to correct the b-bit burst error for different information

lengths k. It can be seen that the redundancy in all the cases is either the same or very

close to the redundancy of the general burst error correcting Hamming codes. Thus, the

21

proposed codes add negligible amount of redundancy compared to the general burst error

correcting Hamming codes.

#Data bits

(k)

Burst

error size

(b-bits)

Burst error correcting Hamming Codes Proposed Codes

#check

bits (r)

Latency

(ns) #cells

Area

(μm2)

#check

bits (r)

Latency

(ns) #cells

Area

(μm2)

16

3 8 0.89 302 807 8 0.79 225 892

4 9 1 531 1327 9 0.93 252 977

5 11 1.31 1015 2396 12 0.9 269 963

6 13 1.33 1559 3570 13 1.23 339 1162

7 15 1.5 2795 6360 15 1.16 276 921

32

3 8 1.11 529 1480 9 1.08 450 1711

4 10 1.29 1108 2759 10 1.15 506 2020

5 12 1.59 2115 4983 12 1.29 674 2708

6 14 1.9 4354 9915 14 1.4 727 2965

7 15 2.39 7753 17482 16 1.27 650 2493

64

3 9 1.47 1056 2967 10 1.18 858 3150

4 11 1.89 2032 5171 11 1.22 1016 3657

5 13 1.8 4102 9696 13 1.4 1270 4774

6 14 2.51 8952 20296 14 1.58 1724 7022

7 16 2.52 15958 36124 16 1.77 1748 7495

Table 2.2: Comparison of redundancy, decoder latency, decoder area and decoder cell

usage between burst error correcting Hamming codes and Proposed Codes.

2.4.1 Hardware Complexity

Table 2.2 shows the comparison of decoder latency, number of cells as well as

decoder area for k = 16, 32 and 64 bits for various burst error sizes b = 3, 4, 5, 6 and 7

bits. It can be seen that the decoder area for general burst error correcting Hamming

codes increases almost exponentially with the increase in burst size. This in turn affects

the decoder latency as well since the gate depth also increases with the increase in burst

size. The decoder area for the proposed codes increases almost linearly with an increase

in burst size as shown in Figure 2.3, which plots the decoder area for both the general

burst error correcting Hamming codes and the proposed codes for different information

22

lengths k and different burst error sizes b. The decoder area is much less for the proposed

codes compared to the general burst error correcting Hamming codes for higher burst

sizes. This also leads to a slower increase in decoder latency compared to the general

burst error correcting Hamming codes. Thus, the decoder latency is also lower for higher

burst sizes compared to the general burst error correcting Hamming codes. It is also seen

that in the cases of k = 16 and 32, the decoder complexity and the decoder latency for

burst size b = 7 is lower compared to burst size b = 5, 6. This is because the H-matrix for

the proposed codes is constructed using a greedy algorithm, that selects the lowest

Hamming weight column that satisfies the conditions described in Sec. 2.3. This

inadvertently creates a very sparse H-matrix which reduces the decoder complexity.

Fig. 2.3: Comparison of decoder area for different information lengths k and different

burst sizes b.

0

5000

10000

15000

20000

25000

30000

35000

40000

3 4 5 6 7

Are

a (μ

m2 )

Burst Size (bits)

k=16, Burst error correcting Hamming codes k=16, Proposed codes

k=32, Burst error correcting Hamming codes k=32, Proposed codes

k=64, Burst error correcting Hamming codes k=64, Proposed codes

23

2.5 CONCLUSION

This research work proposes a new burst error correcting code by modifying the

parity check matrix of a general burst error correcting Hamming code along with its

decoding procedure. The proposed codes achieve significant reduction in decoder area as

well as a non-negligible reduction in the decoder latency for similar amount of

redundancy compared to a general burst error correcting Hamming Code. Also, the

proposed codes can be used in tandem with other burst error correction optimization

methods to achieve better decoder latency as well as better decoder complexity. Thus, the

proposed codes are highly suitable for correcting MBUs in SRAMs specifically for lower

technology nodes where the burst size from a single particle strike is expected to

increase.

24

Chapter 3: Layered-ECC: A Class of Double Error Correcting Codes

for High Density Memory Systems

3.1 INTRODUCTION

Soft errors are a major reliability concern for high density memories. Soft errors

can arise due to numerous mechanisms, e.g., particle strikes, marginal cells, etc. ECCs

can be used to tolerate such soft errors and overcome data corruption by adding parity or

check bits to each word. These bits are then evaluated after the read process to detect

and/or correct errors. DRAM scaling has been a challenge in recent years. But industry

has continued the scaling of DRAM to nanoscale technology nodes. Some of the current

generation of DRAMs are being manufactured as a 10 nm class of memories which pose

new challenges for manufacturing due to the smaller technology node. For DRAMs,

SEC-DED codes have traditionally been sufficient to protect against soft errors. But field

studies of more recent DRAM systems [Meza 15] have observed an increasing failure

rate with increasing DRAM chip density which necessitates the use of stronger ECCs.

In terms of flash memories, NAND flash has been very successful due to its

scalability and low cost per bit. Research has led to the development and manufacturing

of multilevel cell (MLC) NAND flash which stores multiple bits per cell [Lee 11]. Soft

errors in NAND flash memories are addressed using DEC Bose-Chaudhuri-

Hocquenghem (BCH) codes. These DEC BCH codes have complex decoding logic which

takes a high number of clock cycles to decode [Chien 64]. It is possible to parallelize the

This chapter is based on the publication [Das 19]: A. Das and N. A. Touba, "Layered-ECC: A Class of

Double Error Correcting Codes for High Density Memory Systems" in Proc. of IEEE VLSI Test Symposium

(VTS), paper 7A.2, 2019. The author of this dissertation contributed to the conception of the research

problem, theoretical developments and experimental verification of the research work.

25

decoding logic, but that incurs a significant area overhead. The high complexity is mainly

due to Galois Field (GF) operations specifically for larger bits per symbol.

In this work, a layered double error correcting scheme is proposed. These codes

are constructed using two layers of parity bits. The first layer of parity bits is used to

prune down possible error locations through analysis of the computed syndrome of the

received word. The second layer of parity bits is used to compute syndrome bits that are

to be matched with the pruned down error locations and get the final bits that are in error.

The proposed schemes are shown to have a good tradeoff between data redundancy and

decoder complexity or latency. Two different decoding procedures are proposed which

either reduce the decoder complexity or the decoder latency. Thus, these codes can be

used for various class of memories. The rest of the chapter is organized as follows.

Section 3.2 describes the existing schemes. Section 3.3 describes the proposed scheme

and the two types of decoding schemes. Section 3.4 evaluates the two decoding schemes

against the existing schemes. Section 3.5 provides a conclusion of this work.

3.2 RELATED WORK

An SEC-DED Hamming code [Hamming 50] has been traditionally used and is

still in use for protecting memories. These codes can correct a single error and rely on

syndrome matching based decoding i.e. the computed syndrome is directly matched to a

particular column of the parity check matrix and the corresponding bit is flipped. These

codes also detect all double errors but cannot correct them.

For double error correction, a binary BCH code is more prevalent. These codes

have low data redundancy but have high decoding complexity. Generally, a BCH code

has a serial decoder involving a Chien search algorithm. This algorithm iterates n times

where n is the total number of bits in the codeword to detect and correct a certain number

26

of errors. Over the years, numerous different approaches have been proposed to reduce

the time taken for decoding. Parallel Chien search has been proposed to perform p GF

multiplications in parallel [Chang 02]. This reduces the decoding time to n/p cycles

where p is the degree of parallelization. A p-parallel Chien search algorithm is shown in

Fig. 3.1. Low power architectures for a parallel Chien search has also been proposed in

[Yoo 16] using a two-step approach, which reduces power by reducing access to the

second step.

MUX

D

αp

MUX

D

α2p

MUX

D

αtp

...

Ʌ(αip)

α(p-1) α2(p-1) αt(p-1)

Ʌ(αi(p-1))

α α2 αt

Ʌ(αi)

... ...

...

Ʌ1 Ʌ2 Ʌt

Fig. 3.1: p-parallel Chien search architecture with short critical path

For DEC BCH codes, approaches which store possible double error syndromes in

a read-only memory (ROM) and evaluate the computed syndrome against the syndromes

in the ROM were proposed in [Lu 96]. [Naseer 08] proposed a direct decoding method

through syndrome matching for smaller data bit sizes. [Yoo 14] proposed a search-less

DEC BCH decoder which utilized look-up table (LUT) based computations to replace the

Chien search. But for more than a single error, all these decoding schemes either take

multiple cycles to decode using a serial decoding architecture or involve a significant

decoding latency and decoder area for parallel architectures.

27

Another class of codes that are suitable for random-access memories is the

majority logic decodable codes. Orthogonal Latin Square (OLS) codes are one of the best

examples for these types of codes [Hsiao 70]. These codes are modular in design and

have the basic parity check matrix structure as shown in equation (3.1). The submatrices

{M1, M2, … M2t} can be constructed from mutually orthogonal Latin squares. The basic

idea of the majority logic decoding scheme is that in the presence of t errors, each data bit

can be reconstructed from 2t independent sources excluding the data bit itself. Thus, there

are (2t + 1) independent sources for each data bit and in the presence of t errors, (t + 1) of

them will be uncorrupted. Thus, a majority vote will always be able to correct any t

errors. The disadvantage of these codes lies in its data redundancy required to construct

the 2t independent sources. But the decoding scheme itself is very simple and has very

low decoding latency. The DEC OLS codes have recently been used in SRAM based

FPGAs [Reviriego 16]. A second class of majority logic decoding using difference-set

codes was proposed in [Reviriego 12]. But for double error correction these codes only

support a single block size (data block size of 11, codeword size 21) and cannot be

extended to include different sizes. [Liu 18] recently used difference set codes to correct

data block sizes of 32 with some additional decoding complexity. But with only two data

block sizes, the application of such codes is limited.

Multiple cell upsets (MCUs) can also occur in DRAMs wherein adjacent bits are

affected by a single particle strike. MCUs are generally addressed by using word

interleaving such that any MCU will at most cause a single error in any word. [Das 18]

𝐻 =

𝑀1

𝑀2

𝑀3

⋮𝑀2𝑡

𝐼2𝑡𝑚

(3.1)

28

also addresses this issue by modifying Hamming codes to correct MCUs in SRAMs,

which can be extended to DRAMs as well. The focus of this work is on double random

errors only and not on MCUs.

A new class of DEC codes are proposed which are shown to have better data

redundancy at the expense of higher decoding complexity and higher decoding latency

compared to OLS codes. The proposed codes are also shown to have better decoding

latency and better decoding complexity, depending on the type of decoding logic,

compared to DEC BCH codes. But these benefits come at the cost of additional data

redundancy compared to DEC BCH codes.

3.3 PROPOSED SCHEME

The proposed scheme is made up of two layers of ECC. The first layer is based on

a single error correcting OLS code which prunes down possible error location candidates.

The second layer is constructed by adding columns to the parity check matrix such that

the below mentioned conditions are satisfied.

1. All columns added are unique and are not repeated.

2. The sum of any two columns of the complete parity check matrix should not be equal

to the sum of any other two columns of the parity check matrix.

The first condition ensures that double errors produce a syndrome which is non-

zero. The second condition basically ensures that the pruned list of possible error

locations do not produce the same syndrome thus avoiding the chance of mis-correction.

The first layer of the parity check matrix is created using m groups of m data bits each,

where m = √k. The rest of the parity check matrix is created in an algorithmic manner

satisfying the two conditions mentioned above. An example of the parity check matrix

and the corresponding syndrome bits for k = 16 is shown in Fig. 3.2. The syndrome

29

computed from the parity check matrix in this case is also divided into layers. The upper

syndrome layer has 2m bits (S0 to S7 in Fig. 3.2) which help in selecting possible double

error candidates. The lower syndrome layer (S8 to S12 in Fig. 3.2) matches the possible

error candidates to the received syndrome to locate the erroneous bits.

Fig. 3.2: Parity check matrix of proposed scheme for k=16

Consider the simple example from Fig. 3.2 where bits d0 and d7 are in error. The

corresponding syndrome bits for this case is shown in equation (3.2). Considering the

syndrome bits S0 through S3, S0 = 1 suggests that one of the bits in amongst d0, d1, d2 and

d3 is in error. Similarly, S1 = 1 suggests that one of the bits amongst d4, d5, d6 and d7 is

also in error. Now, considering the syndrome bits S4 through S7, S4 = 1 narrows down the

possibility to either d0 or d4. Similarly, S7 = 1 narrows down the second error’s possibility

to d3 or d7. Thus, from the upper layer of syndrome bits, we narrowed the set of suspect

location pairs to {d0, d7} and {d3, d4}. We can now simply compute the XOR of both

pairs and match it against the second layer of syndromes. It can be easily verified that d0

⊕ d7 = S8:12, which means that bits d0 and d7 have flipped.

𝑑0𝑑1𝑑2𝑑3𝑑4𝑑5𝑑6𝑑7𝑑8𝑑9𝑑10𝑑11𝑑12𝑑13𝑑14𝑑15𝑝0𝑝1𝑝2𝑝3𝑝4𝑝5𝑝6𝑝7𝑝8𝑝9𝑝10𝑝11𝑝12

1 1 1 1 11 1 1 1 1

1 1 1 1 11 1 1 1 1

1 1 1 1 11 1 1 1 1

1 1 1 1 11 1 1 1 1

1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1

1 1 1 1 1 1 1

𝑆0

𝑆1

𝑆2

𝑆3

𝑆4

𝑆5

𝑆6

𝑆7

𝑆8

𝑆9

𝑆10

𝑆11

𝑆12

𝑆0 𝑆1 𝑆2 𝑆3 𝑆4 𝑆5 𝑆6 𝑆7 𝑆8 𝑆9 𝑆10𝑆11𝑆12

1 1 0 0 1 0 0 1 1 0 0 0 1 (3.2)

30

The encoding procedure of the proposed codes is a one-step low latency

procedure. Each parity bit can be computed in parallel by XORing all the data bits which

have corresponding 1’s in the row of the parity check matrix. The codeword can be

formed by appending the parity bits to the received data bits and the codeword can then

be stored in memory. For the decoding procedure, there are 3 major cases that need to be

considered. These include the cases of no error, a single error, and a double error in the

word. These three cases can be deciphered using a combination of the upper layer of 2m

syndrome bits as given by the below mentioned equations (3.3-3.6).

GXOR1

GXOR2

Single Error

Double Error Pattern

Generator

Single Error Pattern

Generator

Syn

dro

me

S0

:r-1

Co

lum

ns

C0

:k-1

e0:k-1

e0:k-1

1

0

GOR1

GOR2

No Error

e0:k-1

Double_parity_error_b

Fig. 3.3: Block diagram of proposed decoding logic

For single errors, since the upper 2m rows in the parity check matrix can also be

used as a single error correcting OLS code, a simple majority voting logic can be used to

correct single errors. For double errors, we propose a double error pattern generator

which gets triggered for double error cases. The block diagram of the proposed decoding

𝐺𝑂𝑅1 = 𝑆0 𝑆1 𝑆2 … 𝑆𝑚−2 | 𝑆𝑚−1 (3.3)

𝐺𝑂𝑅2 = 𝑆𝑚 𝑆𝑚+1 𝑆𝑚+2 … 𝑆2𝑚−2 | 𝑆2𝑚−1 (3.4)

𝐺𝑋𝑂𝑅1 = 𝑆0 ⊕ 𝑆1 ⊕ 𝑆2 … 𝑆𝑚−2 ⊕ 𝑆𝑚−1 (3.5)

𝐺𝑋𝑂𝑅2 = 𝑆𝑚 ⊕ 𝑆𝑚+1 ⊕ 𝑆𝑚+2 … 𝑆2𝑚−2 ⊕ 𝑆2𝑚−1 (3.6)

31

logic is shown in Fig. 3.3. Parity bits are not corrected since there is no specific need for

them. The different cases for the decoding logic are as follows:

1. GOR1 = GOR2 = 0: No error.

2. GXOR1 = GXOR2 = 1: This is only possible in case of a single error in one of the data

bits. For this case, a simple majority voting logic can be used.

3. GXOR1 = 1; GXOR2 = 0: This is possible either in case of a single parity error or a

combination of data bit error and a parity bit error. The decoding logic is the same for

either case. In this case, exactly one bit in a m-bit group (e.g., d0 to d3 is one group in Fig.

3.2) is in error and each column from the group is then matched to the lower syndrome to

get the correct error location.

4. GXOR1 = 0; GXOR2 = 1: Similar to case-3, this is also possible for either a single

parity bit error or a combination of single data bit error and a parity bit error. For this

case, a specific column number (whichever of the syndrome bits Sm through S2m-1 is 1) of

each group can create this syndrome. The column from each group is matched to the

lower layer syndrome in this case.

5. GXOR1 = 0; GXOR2 = 0: This is a case of a double error with both the errors in the

data bits. This involves 3 more cases which are distinguished as described below.

5.1 GOR1 = 1; GOR2 = 0: This indicates that two syndrome bits among S0

through Sm-1 are 1 while Sm:2m-1 = 0. Thus, m column pairs from each of the two groups

indicated by S0:m-1 are matched to the lower layer syndrome bits to get the correct pair

that is in error.

5.2 GOR1 = 0; GOR2 = 1: This indicates that two syndrome bits among Sm

through S2m-1 are 1 while S0:m-1 = 0. Thus, column pairs indicated by Sm:2m-1 from each of

the m groups are matched to the lower layer syndrome bits to get the erroneous error pair.

32

5.3 GOR1 = 1; GOR2 = 1: This indicates errors in distinct groups. Based on the

syndrome bits S0:m-1, the groups are narrowed down. Then, based on the syndrome bits

Sm:2m-1, specific column pairs are found. Thus, there are exactly two pairs of columns that

need to be compared to the lower layer syndrome bits.

An example of each of the above cases for k = 16 with the parity check matrix in

Fig. 3.2, the corresponding syndrome bits S0:2m-1 and the possible errors or pairs of errors

has been shown in Table 3.1. The worst-case possibility for all the cases is comparing a

combination of m pairs of syndromes to the lower layer syndrome bits. Based on the

cases above, there are two types of decoding procedures that can be followed. The first is

a low latency decoding scheme, which enumerates all the cases above and is based on

syndrome matching for each case. The second is a low complexity decoding scheme

which instead of operating on individual columns operates on the index of columns

instead. This lowers the complexity of the decoding logic as well as the number of cycles

needed for decoding compared to a serial decoding BCH scheme.

Syndrome Bits S0:7 Possible (pairs of) error candidates

(data bits only, parity bits ignored)

00000000 No error

00010001 Single data error (majority vote)

00000001 Possible single error in {d3, d7, d11, d15}

00100000 Possible single error in {d8, d9, d10, d11}

10000011 Possible single error in {d2, d3}

11000001 Possible single error in {d3, d7}

11000000 {(d0, d4), (d1, d5), (d2, d6), (d3, d7)}

00000110 {(d1, d2), (d5, d6), (d9, d10), (d13, d14)}

11000011 {(d2, d7), (d3, d6)}

Table 3.1: Example of syndrome values and error candidates for different error types

Apart from the above cases, 2 additional check bits are required to distinguish

between a single error and a double error in parity. The two check bits basically compute

the parity of each group of parities to detect errors in the parity bits. In theory, it is also

33

possible to extend these codes to multiple bits per symbol and construct a symbol based

layered ECC. This can be useful for emerging multilevel cell memories or even for

double byte error correction.

3.3.1 Low Latency Decoding

The low latency decoding procedure involves a syndrome matching based

decoding. The upper layer syndromes S0:2m-1 are directly enumerated and depending on

these syndrome values, a certain number of combination of pairs of syndromes are

matched to the lower layer of syndromes. The critical path in this case depends on the

selection of a particular combination of the upper layer of syndromes. The total number

of relevant syndromes in a double error correcting code [Naseer 08] is given by equation

(3.7), where n is the total number of bits in the codeword and k is the number of data bits.

Comparatively, the low latency decoding procedure has fewer syndromes to be

enumerated as shown in equation (3.8). The comparison of the total number of

syndromes for different data bit sizes has been shown in Fig. 3.4.

As the number of data bits increases, the number of syndromes goes up

considerably. This causes a considerable increase in the decoding complexity. But, since

the syndrome matching is done in parallel, the rise in decoding latency is much slower.

This type of decoding method is suitable for mostly random-access memories which need

low decoding latency and high throughput.

#𝑠𝑦𝑛𝑑𝑟𝑜𝑚𝑒𝑠 = 𝑘𝑛 (3.7)

#𝑠𝑦𝑛𝑑𝑟𝑜𝑚𝑒𝑠 = 2𝑚( 𝐶1)𝑚 + 2𝑚( 𝐶2) + 2( 𝐶2𝑚 )2𝑚 (3.8)

34

Fig. 3.4: Comparison of number of syndromes for different data bit sizes between

proposed scheme and a DEC BCH code

3.3.2 Low Complexity Decoding

An alternative decoding procedure is the low complexity decoding procedure.

Compared to the low latency decoding, this procedure operates on the indices of data bits

instead of the columns of parity check matrix. Based on the different cases described

previously, the first index or pair of indices is computed. Also, based on the upper layer

of syndromes an addition factor is computed. This addition factor basically constructs all

other m possible error indices by adding to the previous index or pair of indices. For the

case of a double error in separate groups i.e. case-5.3, the 2 possible pairs of indices are

directly assigned. The m possible error indices can be computed in m clock cycles which

lowers the decoding time considerably. Once all the indices are computed, the

corresponding columns are then matched with the lower layer of syndromes to get the

final error location or pair of locations. Fig. 3.5 shows the partial block diagram of a

double error pattern generator for the low complexity serial decoding procedure.

0

2

4

6

8

10

12

0 100 200 300 400 500 600 700 800 900 1000 1100

#Syn

dro

mes

(x

10k)

Number of data bits k

DEC BCH codes Proposed Codes

35

MU

X

Init

ial I

nd

ex

Co

mp

utat

ion

S0:2m-1

MU

X

D++ D ge

tCo

lum

n

S2m:r-1

e ie m

+i

factor

Index[m+i]

Index[i]

Fig. 3.5: Block diagram of error pattern generation using the serial low complexity

decoding procedure

3.4 EVALUATION

The proposed scheme was implemented using Verilog for both the low latency

and low complexity decoder versions for different data bit sizes. The codes were

synthesized using the NCSU FreePDK45 45nm library and Synopsys Design Compiler.

OLS codes were also implemented and synthesized for different data bit sizes. A double

error correcting BCH code with syndrome matching based decoding [Naseer 08] was

implemented and synthesized as well. The proposed decoding schemes were compared to

the existing schemes in terms of decoder complexity, decoder latency, total dynamic

power consumption and data redundancy. Table 3.2 shows the comparison of the low

latency decoding scheme with OLS codes and the syndrome matching based DEC BCH

codes [Naseer 08]. The redundancy or number of check bits is given by r. As seen in

Table 3.2, OLS codes have the lowest decoder latency but it comes at the expense of very

high data redundancy. DEC BCH codes on the other hand have very low redundancy but

have considerably higher decoder latency instead. The proposed low latency decoding

scheme balances between the two existing schemes. The data redundancy of the proposed

scheme is much less than OLS codes. Similarly, the decoding latency and dynamic power

consumption of the proposed scheme is much lower than the BCH codes. The decoder

36

area of the proposed scheme compared to the existing BCH codes is also comparatively

less since it uses lesser number of syndromes.

The scheme in [Wilkerson 10] was also implemented and synthesized to make a

comparison to the proposed low complexity serial decoder. For the DEC BCH code, the

error location computation was done via direct error location polynomial computation i.e.

the error location polynomial coefficients were computed directly. Table 3.3 shows the

comparison of the low complexity decoding scheme with the existing decoding scheme

of BCH codes in [Wilkerson 10]. The additional check-bit compared to BCH codes

comes from the computation of a parity bit. The decoding scheme is a one-step decoding

procedure for single errors and takes multiple clock cycles for double errors. As seen in

Table 3.3, the number of cycles taken to decode, area and power consumption for the

proposed decoder is significantly lesser. The proposed codes incur a redundancy

overhead in order to provide the benefits of lower number of decoding cycles, lesser

power consumption and lower decoder complexity.

Data

bits

OLS codes BCH codes [Naseer 08] Proposed Low Latency Decoder

#Check

bits

Area

(μm2)

Latency

(ns)

Pdyn

(mW)

#Check

bits

Area

(μm2)

Latency

(ns)

Pdyn

(mW)

#Check

bits

Area

(μm2)

Latency

(ns)

Pdyn

(mW)

16 16 647.63 0.5 0.53 10 3567.15 1.71 2.05 15 1576.38 0.98 0.77

32 28 1307.94 0.63 1.19 12 9249.43 2.17 6.15 20 5117.25 1.09 1.45

64 32 2599.92 0.78 2.98 14 27213.77 2.78 15.18 25 12205.08 1.32 2.48

128 64 5182.95 1.09 6.51 16 80312.72 3.60 23.35 34 51656.79 1.78 4.77

256 64 10280.02 1.2 15.45 18 252717.11 5.09 29.61 45 139603.61 1.60 10.39

Table 3.2: Comparison of proposed low latency decoder with existing schemes

Data bits Serial BCH Codes [Wilkerson 10] Proposed Low Complexity Serial Decoder

#Checkbits Area (μm2) Latency (cycles) Pdyn (mW) #Checkbits Area (μm2) Latency (cycles) Pdyn (mW)

16 11 2341.34 27 2.59 15 1236.61 4 1.19

32 13 3800.39 45 4.50 20 2162.53 6 1.99

64 15 6141.73 79 7.14 25 3471.88 8 2.90

128 17 10800.00 145 12.48 34 6168.48 12 4.99

256 19 18927.34 275 18.42 45 5925.38 16 4.58

512 21 36757.92 533 32.79 61 10389.83 23 7.65

Table 3.3: Comparison of proposed low complexity serial decoder with existing schemes

37

3.5 CONCLUSION

In this research work, a new class of layered double error correcting codes are

presented along with two different decoding procedures for the codes: a low latency

decoding scheme based on syndrome matching and a low complexity decoding scheme

based on error location index evaluation. The schemes are compared with existing

schemes and are shown to provide a good trade-off between data redundancy and

decoding latency or complexity depending on the type of decoder logic. The low latency

scheme can be used for high density random-access memories while flash memories can

benefit from the low complexity scheme. The low complexity serial decoding scheme

achieves orders of magnitude reduction in worst case number of decoding cycles and can

be helpful in enabling high throughput for flash memory-based systems. Thus, these

codes are able to provide a balanced data redundancy and decoder latency/complexity

tradeoff and can be used for high density memory systems.

38

Chapter 4: Limited Magnitude Error Correction using OLS Codes for

Memories with Multilevel Cells

4.1 INTRODUCTION

As discussed in Chapter 1, MLC PCM are very useful for their high density, cost-

effectiveness, non-volatility and portability. But the issue of resistance drifts in MLC

PCM causes a reliability concern. Resistance drifts causes the resistance of a PCM cell to

increase over time due to the phenomenon of structural relaxation. This causes a read

reliability degradation over time i.e. the resistance level of a cell is mis-read as some

other increased resistance level compared to the original resistance level of the cell. A

limited magnitude error simply means that the error magnitude is finite. Resistance drifts

can be thought of as a limited magnitude error model because the increase in resistance

level over time is finite and the maximum increase in resistance level is limited by time.

Limited magnitude errors have been a focus of research through many years. An

asymmetric magnitude-1 error correcting code was first proposed by using even cell

levels in [Ahlswede 02]. An error correction code for limited magnitude errors in general

was introduced in [Cassuto 10] which used a special parity mapping function and non-

binary Hamming code which causes a considerable increase in the decoding latency.

Systematic limited magnitude error correction with very low redundancy using a non-

binary Hamming code was introduced in [Klove 11]. But the codes can correct only a

single error. [Jeon 12] introduced bidirectional limited magnitude error correcting codes

using Reed-Solomon Codes. But these codes are originally intended for MLC flash

This chapter is based on the publication [Das 17]: A. Das and N. A. Touba, "Limited Magnitude Error

Correction using OLS Codes for Memories with Multilevel Cells," in Proc. of IEEE International

Conference on Computer Design (ICCD), pp. 391-394, 2017. The author of this dissertation contributed to

the conception of the research problem, theoretical developments and experimental verification of the

research work.

39

memories and suffer from high decoding complexity beyond a single error correction

spanning multiple clock cycles. Thus, this is not useful for MLC PCM since their

performance in sensitive to decoding latencies. [Namba 15a] introduced symbol error

correcting OLS codes which extended binary OLS codes to multi-bit symbols.

This research work proposes limited magnitude error correcting OLS codes,

which significantly reduce data and hardware redundancy compared to existing schemes

while still detecting limited magnitude errors. It also reduces decoder latency compared

to the other methods mentioned previously. The proposed code ensures that t errors of

limited magnitude L can be corrected with some additional hardware on top of the

decoding logic of the general OLS codes. This work also proposes a new hybrid error

correction code which combines OLS codes with a binary low redundancy code for

symmetric and bidirectional limited magnitude errors. The rest of the work is organized

as follows. Section 4.2 reviews orthogonal Latin square codes. In Sec. 4.3, the proposed

OLS codes for limited magnitude errors are described with its encoding and decoding

procedures. Section 4.4 evaluates the error correction capabilities and hardware

complexity of the proposed code. Finally, Sec. 4.5 presents the conclusion of this work.

4.2 ORTHOGONAL LATIN SQUARE CODES

OLS codes are based on Latin squares [Hsiao 70]. A Latin square of size m is a m

x m square matrix such that the rows and columns of the matrix is a permutation of the

numbers 0 to (m-1) and each number only appears once in each row or column. Two

Latin squares are orthogonal if when they are superimposed on each other produces a

unique ordered pair of elements in the new superimposed matrix. The underlying

principle of a t-error correcting OLS code is that there are (2t+1) independent sources for

re-constructing each data bit. These (2t+1) independent sources involve the data bit itself

40

and 2t parity check equations. The different data bits participating in the parity check

equations are unique in the sense that any data bit occurs at-most in one of the parity

check equations. Thus, for any number of errors ≤ t, at-most t sources are corrupted. The

remaining (t+1) sources remain uncorrupted from errors. A majority logic decoding

simply picks the binary value which occurs in maximum number of its inputs. As a result,

the majority vote of (2t+1) independent sources with t-errors still yields the correct data-

bit. OLS codes have k = m2 data bits, where m is the size of the Orthogonal Latin Square.

The number of check bits is 2tm where t is the maximum number of errors that the code

can correct. OLS codes are modular in design, which means that to correct additional

errors, adding 2m check bits for each error is sufficient.

d0

d1

p0

Maj

orit

y V

ote

r

Cor

rect

ed d

0

d2

p2

1 1 0 0 1 0 0 0

0 0 1 1 0 1 0 0

1 0 1 0 0 0 1 0

0 1 0 1 0 0 0 1

d0 d1 d2 d3 p0 p1 p2 p3

H =

Fig. 4.1: Parity check matrix and decoder logic for a SEC (8,4) OLS Code

The encoding procedure involves the computation of parity bits. This is the XOR

operation of all the data bits which are 1 in the row of the parity check matrix for which

the parity bit is 1. The decoding procedure involves the majority vote between the data bit

itself and the 2t parity check equations constructed from the rows of the parity check

matrix. Thus, the decoder for data bit di will have parity check equations from each row

of the parity check matrix for which the column di is a 1. The main advantage of the OLS

41

codes is the simplicity of the decoder circuit which makes it very useful for memories

with random accesses. The majority logic decoding circuit has very low latency thereby

increasing decoding speed and enabling faster read operations. An example of the parity

check matrix for SEC code for k = 4 (i.e. m = 2) and its decoder logic for data bit d0 has

been shown in Fig. 4.1.

4.3 PROPOSED SCHEME

Limited magnitude errors can be both asymmetric and symmetric. Symmetric

errors are errors of limited magnitude wherein the errors can occur in both directions i.e.

errors can have both positive and negative magnitude. Asymmetric limited magnitude

errors are errors limited to one direction only. A bidirectional error model also assumes

errors in both direction but can be of different magnitudes based on the direction. The

proposed limited magnitude OLS codes aim to deal with all 3 types of limited magnitude

errors and provide a general solution. The key idea is to use parity symbols made up of bp

bits per symbol. The number of bits per parity symbol is given by equation (4.1), where L

= (maximum magnitude of error – minimum magnitude of error).

Thus, if we want to consider asymmetric magnitude-3 errors, L = (0 – (-3)) = 3

and number of bits per parity symbol = 2. The number of bits per parity symbol bp are

sufficient to successfully encode and decode all possible transitions that can occur due to

limited magnitude errors. The encoding procedure is an extended version of the binary

OLS codes. Since we consider parity symbols of bp bits per symbol, we compute the

parity symbol by independently computing each parity bit from the corresponding lower

order bp data bits. These bp bits for each parity symbol are sufficient to detect and correct

𝑏𝑝 = 𝑙𝑜𝑔2(𝐿 + 1) (4.1)

42

any errors of magnitude within the range L. An example of bi-directional transitions has

been shown in Fig. 4.2.

Fig. 4.2: Bidirectional (magnitude-1 upwards and magnitude-2 downwards) error

transitions on lower order 2 bits.

Consider a 4 bits/cell PCM with 16 different resistance levels. If an asymmetric

magnitude-3 error model is considered, the number of bits per parity symbol bp = 2.

Thus, bit-0 of the data symbols are used to compute the lower order parity bit (pLSB),

while bit-1 of the data symbols are used to compute the higher order parity bit (pMSB)

using the same parity check matrix. The parity bits are then concatenated in a 4-

bits/symbol manner for storage in memory cells. This allows us to encode 2 parity

symbols of 2 bits each into a single memory cell.

OLS codes uses a majority logic decoding to compute the correct bit for each

codeword. The proposed decoding procedure is an extension of the original OLS codes.

From the codeword, the lower order bp bits from each symbol are used for the majority

voting decoder circuit. The majority voting circuit recovers the correct lower order bp bits

for each data symbol. But the limited magnitude error may change the higher order bits as

well. We use the decoded lower order bp bits and the received bits to figure out the

direction as well as the magnitude of the error for each data symbol. The decoding

complexity of the proposed decoder for OLS codes increases by a few gates for the error

magnitude computation and a simple adder. As the number of errors corrected t increases,

the number of parity checks for the majority logic decoding increases to 2t parity checks.

43

But the logic after the majority voter remains the same independent of the number of

errors being corrected.

d0,3 d0,2 d0,1 d0,0

d0

d1,3 d1,2 d1,1 d1,0

d1

d2,3 d2,2 d2,1 d2,0

d2

d3,3 d3,2 d3,1 d3,0

d3

p0,LSBp0,MSB p1,LSBp1,MSBParity Symbol PS0

p0,LSB d1,0 d2,0 d3,0 p4,LSB d4,0 d8,0 d12,0

Majority Voter

d0,0

p0,MSB d1,1 d2,1 d3,1 p4,MSB d4,1 d8,1 d12,1

Majority Voter

d0,1

Combinational Logic

d0,0

d0,1

+

d0

Corrected d0

Fig. 4.3. Example of encoder and decoder logic for a proposed scheme

The parity symbol computation for the first parity symbol p0 for k = 16, t = 1

asymmetric magnitude-3 errors and 4-bits/cell memory as well as the decoder circuit for

data symbol d0 has been shown in Fig. 4.3. OLS codes by construction are such that only

a few parity bits are used in the correction circuitry of the data bits. But if a cell

containing encoded parity symbol gets affected, it can potentially cause multiple parity

symbol errors, since multiple parity symbols are stored in each cell. Thus, the only

restriction that needs to be put is that different parity symbols belonging to the same

majority voting decoder circuit should not be adjacent to each other in the same memory

cell containing encoded parity symbol. Since the parity check matrix in an OLS code has

its parity bits in a staggered manner, such restrictions are not violated that often. In cases

where this restriction is violated, reordering of parity bits can be done or dummy bits,

which do not contribute to any of the decoding logic, can be introduced in the symbols

that violate the restriction. Since only a part of the data symbol is used for parity, the

code rate also increases. The modified code rate equation for k = m2 data bits, t errors and

bp parity bits per symbol in a memory with b bits per cell is shown in equation (4.2).

44

4.3.1 Redundancy Optimization

For symmetric and bidirectional limited magnitude errors, the maximum

magnitude in either direction can still be decoded from the lower bu bits as shown in

equation (4.3). Lu is the maximum magnitude of error in either direction. To decode the

direction of error, the higher order bit bh in position (bu +1) is sufficient. Thus, it is not

necessary to correct the higher order bit bh, a simple detection whether it has flipped or

not is sufficient. Thus, for each bh bits of the data symbol, an ECC with reduced

capability can be employed to indicate the direction of error. The task of finding out

whether an error has occurred or not in a position is done by the lower bu bits. The bh bit

gives an indication of the direction of error. If the lower bu bits do not indicate an error,

the bh bit is ignored. This code is called Hybrid code from here-on for easy distinction.

The different cases for reduced error correction capabilities are described below.

Case-1 SEC: Since there is only a single error that can be corrected, the parity can

simply be XOR of all bh bits of the data symbols in a SEC code. Thus, it can be known

that the higher order bit has flipped or not through a single parity bit.

Case-2 DEC or greater: For a code that corrects t errors, the bh bits can be

encoded with any code that corrects (t-1) errors and detects t errors. The (t-1) error

correction capability can indicate whether the bh bit has changed at any given position. If

there are t errors, then it is detected whether the bh bit at the all the error positions have

changed. The code chosen can be one with low data redundancy (e.g. BCH code) to

𝑅𝑐 =𝑘

𝑘+ 2𝑡𝑚 .𝑏𝑝

𝑏

(4.2)

𝑏𝑢 = 𝑙𝑜𝑔2(𝐿𝑢 + 1) (4.3)

45

reduce the overall number of parity symbols. But this lower redundancy comes at the

expense of a higher decoding latency.

For example, consider symmetric limited magnitude-1 errors and a TLC memory

(3-bits/cell). Consider 2 data symbols d0 = 6(110) and d1 = 4(100) amongst many other

symbols. Since limited magnitude is 1, the possible errors for both the data symbols have

been shown in Fig. 4.4.

Fig. 4.4: Symmetric magnitude-1 errors for d0 and d1

The LSB (b0) of the data symbols is used for error location using OLS. The bit b1

is used for indicating direction of error. Since it is a DEC code, SEC-DED Hamming

code is used to encode bit b1 of all the data symbols. Now, if only symbol d1 changes

from 4(100) to 3(011). Then OLS codes of LSB locates the error. The SECDED code can

point out whether bit b1 of symbol d1 has changed or not. Since in this case it has, the

correction circuitry simply adds +1 to received symbol. Now consider the case, where

both data symbols d0 and d1 change to d0 = 5(101) and d1=5(101). In this case, SECDED

Hamming can indicate that bit b1 of d0 has changed while that of d1 hasn’t. Thus, +1 is

added to d0 while -1 is added to d1.

The encoding procedure involves computing 2 sets of parity. One set is for lower

bu bits which is fed to an OLS parity generator. The other set is for the bh bit of each data

symbol which goes to the reduced capability ECC parity generator. Similarly, the

decoding procedure needs to now have two sets of decoding logic, one for OLS codes

46

and another for the reduced capability ECC. For both the proposed OLS codes and

Hybrid codes, the operations are binary. Thus, the decoding speed is faster compared to a

non-binary Hamming code, as used in [Cassuto 10], which uses GF(q) operations and is

dependent on the magnitude of error. [Cassuto 10] also uses a special parity mapping to

prevent multiple parity errors when one symbol is affected. OLS codes dot not have any

such need for a special mapping function. For bidirectional and symmetric errors, a two-

step decoding procedure to recognize miscorrected symbols and correct them in the

second step in [Cassuto 10]. The proposed codes work without any extra decoding step

regardless of the type of error.

4.4 EVALUATION

The conventional symbol error correcting OS-MLD code [Namba 15a], the

proposed OLS code and the redundancy optimized Hybrid code were synthesized on

Synopsys Design Compiler using the NCSU FreePDK45 library for k = 16, 64 and 256

SEC and DEC codes for different error magnitudes. Table 4.1 gives the comparison of

the data redundancy, number of cells used in the decoder logic and the decoder latency

for asymmetric magnitude-3 errors in 3-bits/cell and 4-bits/cell memory. Table 4.2 makes

a similar comparison for symmetric magnitude-1 errors in 3-bits/cell memory. Table 4.3

makes the comparison for symmetric magnitude-3 error in 4-bits/cell memory. Errors

were injected for each implementation to ensure that limited magnitude errors of all types

were corrected. Exhaustive testing was done for different error magnitudes, and for all

the tests the data symbols were successfully decoded to match the original message

symbols.

From Table 4.1, it can be seen that the proposed OLS codes have lower

redundancy as well as lower decoder area compared to standard OS-MLD codes. It is

47

also seen that the proposed codes have a slightly increased decoder latency. This is

because of the additional combinational logic and the adder which increases the critical

path of the decoding logic. But the number of majority voters is reduced since a few bits

from each symbol is used to construct the parity. This leads to a lesser decoder area for

the proposed codes since the area of the combinational logic and adder is less than that of

the additional XOR gates and majority voter in [Namba 15a]. Thus, the proposed codes

are able to achieve lower redundancy and decoder area at the trade-off of slightly

increased decoder latency.

Bits/cell in

memory

Error

Type(t)

Data

symbols

OSMLD [Namba 15a] Proposed OLS

#check

symbols Area (μm2)

Latency

(ns)

#check

symbols Area (μm2)

Latency

(ns)

3

t=1 (SEC)

16 8 1062.96 0.33 6 1111.30 0.58

64 16 5193.74 0.54 12 5122.88 0.81

256 32 22438.17 0.74 22 19586.24 0.89

t=2 (DEC)

16 16 2645.44 0.44 12 1936.80 0.55

64 32 12093.39 0.65 24 8941.57 0.8

256 64 52015.33 0.81 44 38982.87 0.97

4

t=1 (SEC)

16 8 1417.29 0.33 4 1224.40 0.62

64 16 6924.99 0.54 8 5573.88 0.84

256 32 22438.17 0.74 16 21360.19 0.95

t=2 (DEC)

16 16 3527.26 0.44 8 2071.49 0.71

64 32 16004.07 0.65 16 9265.86 0.8

256 64 69345.64 0.82 32 40828.16 1.02

Table 4.1: Comparison of OSMLD and proposed codes for asymmetric magnitude-3 error

Error

Type(t)

Data

Symbols

OSMLD [Namba 15a] Proposed OLS Hybrid Codes

#check

symbols

Area

(μm2)

Latency

(ns)

#check

symbols

Area

(μm2)

Latency

(ns)

#check

symbols

Area

(μm2)

Latency

(ns)

t=1 (SEC)

16 8 1062.96 0.33 6 1250.22 0.54 4 918.89 1.13

64 16 5193.74 0.54 12 5595.93 0.76 6 3959.01 2.29

256 32 22438.17 0.74 22 22748.38 0.93 12 15240.99 3.68

t=2 (DEC)

16 16 2645.44 0.44 12 2150.33 0.63 4 1562.30 1.02

64 32 12093.39 0.65 24 9654.44 0.88 14 6558.47 1.88

256 64 52015.33 0.81 44 41821.67 1.12 26 28247.17 2.07

Table 4.2: Comparison of OSMLD, proposed OLS and Hybrid codes for symmetric

magnitude-1 error in 3-bits/cell memory

From Table 4.2 and Table 4.3, it is seen that the Hybrid codes have the lowest

data redundancy amongst all three codes. This is because the Hybrid codes use a reduced

48

strength ECC for the higher order bit. It is seen that the Hybrid codes also have the

highest decoding latency but slightly lower decoder area. This is because for this

implementation of a DEC code, SEC-DED Hamming code was used to trade-off

redundancy for decoder latency. Hamming codes in general have a higher critical path

but a slightly lower decoder complexity compared to OLS codes. A SEC-DED OLS code

can instead be used to reduce the decoder latency at the expense of higher decoder area

and redundancy. For a SEC code, a simple parity was used for the higher order bits. It can

also be seen that the proposed OLS codes are better in terms of decoder area for a DEC

code and not for a SEC code. This is because the combinational logic and adder logic for

the proposed codes is simpler than a DEC majority voter but complex than an additional

SEC majority voter in the case of symmetric errors. The Hybrid codes are able to achieve

a reduced decoder area for majority of the experiments at the expense of an increased

decoder latency. Thus, the proposed codes and Hybrid codes have a better redundancy for

a trade-off of increased decoder latency.

Error

Type(t)

Data

Symbols

OSMLD [Namba 15a] Proposed OLS Hybrid Codes

#check

symbols

Area

(μm2)

Latency

(ns)

#check

symbols

Area

(μm2)

Latency

(ns)

#check

symbols

Area

(μm2)

Latency

(ns)

t=1 (SEC)

16 8 1417.29 0.33 6 1878.14 0.63 5 1556.20 0.95

64 16 6924.99 0.54 12 7792.26 0.88 9 6344.94 2.12

256 32 22438.17 0.74 24 33027.46 1.05 17 26231.52 3.38

t=2 (DEC)

16 16 3527.26 0.44 12 3255.06 0.71 10 2683.93 1.02

64 32 16004.07 0.65 24 14362.46 0.95 18 11725.93 1.44

256 64 69345.64 0.82 48 62123.12 1.19 35 49404.62 1.99

Table 4.3. Comparison of OSMLD, proposed OLS and Hybrid codes for symmetric

magnitude-3 error in 4-bits/cell memory

4.5 CONCLUSION

In this work, a technique to derive limited magnitude error correction codes from

OLS codes has been proposed. The proposed codes can correct any limited magnitude

49

error with an increase in redundancy as the magnitude of error increases. Also, a

technique to extend the OLS codes further to lower the redundancy for symmetric and

bidirectional errors were also discussed. These codes are useful for MLC PCM which

have very low read latency. The proposed codes provide balanced tradeoffs between the

amount of data redundancy, decoding complexity and decoder latency, and can be used to

increase the reliability of emerging memories like MLC PCM. Also, the low redundancy,

high decoding speed with low decoding complexity of the codes make them a viable

choice for MLC PCM wherein the performance is sensitive to access latencies.

50

Chapter 5: Systematic b-Adjacent Symbol Error Correcting Reed-

Solomon Codes with Parallel Decoding

5.1 INTRODUCTION

Section 1.1.1 presented the basics of write disturbance errors in MLC PCM. With

technology node scaling, the problem of these write disturbance errors gets highly

exacerbated [Jiang 14] specifically for super dense memories. The problem of magnetic

field coupling in STT-MRAMs as discussed in Sec. 1.2.2. [Yoon 18] showed that for

dense memory bits and lower stored energy, a magnetic field- induced coupling between

adjacent bits can cause significant change in the average retention time. As this

technology matures enabling dense multilevel cells, magnetic field-induced coupling is

expected to play a key role. These errors can typically be modeled is the form of a burst

error affecting neighboring cells in the memory. The key point to consider is that for

these memories, the performance is sensitive to memory access latencies.

Binary burst error correcting codes with fast decoding procedures [Klockmann

17] [Datta 11] might not be suitable for such an application because of limited range of

bursts that the codes can correct. Reed-Solomon (RS) codes are highly suitable for these

cases since the correction is on a symbol basis and they have very low redundancy. But

beyond single error correction, Reed-Solomon codes suffer from complex decoding

procedure which results in high decoding latency spanning multiple cycles. [Fujiwara 06]

describes various methods proposed over the years to correct single byte errors with a

parallel decoding methodology. But the main issue with single byte error correction is

This chapter is based on the publication [Das 18a]: A. Das and N. A. Touba, "Systematic b-adjacent

symbol error correcting Reed-Solomon codes with parallel decoding", in Proc. of IEEE VLSI Test

Symposium (VTS), paper 7A.1, 2018. The author of this dissertation contributed to the conception of the

research problem, theoretical developments and experimental verification of the research work.

51

that the burst errors may affect multiple bytes at a time within the burst range and are not

guaranteed to affect only a single byte. An adjacent symbol error correcting Reed-

Solomon code is also proposed in [Namba 15b]. But the codes have a high decoder

latency and decoder area since they use GF(2m) operations where m is the number of bits

per symbol. [Reviriego 13, 15] propose a method to correct double adjacent errors and

triple adjacent errors respectively for binary bits using OLS codes. These methods can be

extended [Namba 15a] to correct adjacent symbol errors as well. But these codes have

very high redundancy.

This research work proposes a methodology to correct b-adjacent symbol errors

using Reed-Solomon codes. The codes are systematic by design and can have a

maximum information symbol length k = 2m-1, where m is the number of bits per

symbol. This contrasts with the traditional Reed-Solomon codes which have a maximum

block length n = 2m-1. The codes have very low redundancy and have a parallel one-step

decoding procedure. The rest of the work is organized as follows. Section 5.2 reviews the

Reed-Solomon codes and its extensions. Section 5.3 describes the proposed b-adjacent

symbol error correcting Reed-Solomon codes in detail along with its construction and

decoding procedures. Section 5.4 evaluates the proposed codes and compares it with

existing codes in terms of hardware complexity and redundancy. Section 5.5 presents the

conclusion of this work.

5.2 REED-SOLOMON CODES

Reed-Solomon codes are a special case of non-binary BCH codes of length n =

qm-1 over GF(qm). The number of parity check symbols for RS codes is given by n – k =

2t, where t is the number of errors being corrected. For q=2, the RS code comprises of m-

bit symbols which is used to construct the code. An extension for the SEC RS code exists

52

and has the parity check matrix of the form shown in equation (5.1). It is a subclass of

Hamming type codes over GF(2m) that has 2 check symbols [Bossen 70]. The codes

consist of m-bit symbols unlike a binary Hamming code. Thus, a single error can result in

a syndrome which is either equal to a column or is a multiple of a column in the parity

check matrix. Any column in the parity check matrix can have (2m-1) multiples i.e. all

possible cases except the all-0 column. Thus, the number of possible syndromes for each

single error is (2m-1). As a result, for the case of extended RS codes, the number of

possible syndromes equals n(2m-1). This can result in huge complexity, since the number

of syndromes increases exponentially with m. Thus, to make the decoding simpler,

[Bossen 70] describes the use of companion matrix which transforms the H-matrix over

GF(2m) to binary form. The binary form of the new parity check matrix for a primitive

polynomial g(x) = x2 + x + 1, k = 3 and n = 5 is shown in equation (5.2).

A companion matrix is a m x m non-singular binary matrix defined from a

primitive polynomial g(x) of degree m [Fujiwara 06]. The syndromes from the parity

check matrix are constructed for each row in a general fashion by XORing all the bits for

which the corresponding column is a 1. Since there are only 2 rows in the parity check

matrix, the syndrome corresponding to any single symbol error will be of the form shown

in equation (5.3), i is the location of the error, ei is the error pattern, and S1, S2 are m-bit

𝐻 = 1 1 1 1 ⋯ 1 1 01 𝛼 𝛼2 𝛼3 ⋯ 𝛼𝑛−1 0 1

(5.1)

𝐻 =

1 0 1 0 1 0 1 0 0 00 1 0 1 0 1 0 1 0 01 0 0 1 1 1 0 0 1 00 1 1 1 1 0 0 0 0 1

(5.2)

𝑆1

𝑆2 =

𝑒𝑖

𝛼𝑖𝑒𝑖 (5.3)

53

symbols. Thus, for column i, αiS1 ⊕ S2 = 0, while for all other columns, αiS1 ⊕ S2 0, ꓯ

j i. The decoding procedure using companion matrix reduces the complexity of the

circuit compared to syndrome comparison decoding method of Hamming codes by

trading off decoder latency. This decoding method is only useful for single symbol error

correction. For t > 1, Reed-Solomon codes have a two-step decoding procedure which

involves two polynomials, an error locating polynomial and an error magnitude

polynomial. [Carrasco 08] describes various methods to compute both the error location

and the error magnitude for t (> 1) errors.

5.3 PROPOSED SCHEME

This section describes the proposed b-adjacent symbol error correcting Reed-

Solomon Codes. The key idea is to construct a new parity check matrix in such a manner

that the first b-rows of the parity check matrix have at-most one 1 within b-adjacent

columns in any given row. The main purpose of such a construction is to distinguish the

error magnitude for the adjacent columns in error in a single step. This type of

construction enables the error pattern of any erroneous symbol to be one of the syndrome

symbols. The lower (r-b) rows of the parity check matrix, where r is the total number of

check symbols, is constructed by using rows from the original Reed-Solomon parity

check matrix such that the following conditions are met.

1. All b-adjacent syndromes generated by XORing b-adjacent columns should be unique.

2. All syndromes for all possible combinations of columns within b-adjacent columns

should be unique.

3. If multiples of a column are used to XOR instead of the original column in the above

two cases, all syndromes thus generated should be unique.

54

Condition 1 ensures that no b-adjacent errors are miscorrected. Condition 2

ensures that any number of errors within the b-adjacent columns are not miscorrected.

The unique syndromes exactly identify which b-adjacent columns contain the errors.

Since Reed-Solomon codes correct non-binary m-bit symbols, each column can also have

different multiples. These multiples can be identified from the upper b-rows of the parity

check matrix. But to avoid mis-correction for different magnitudes of errors, condition 3

needs to be satisfied. Thus, we keep on adding rows to the bottom part of the parity check

matrix until all the conditions above are satisfied.

The proposed codes are systematic by design and have the following parameters:

maximum block length n = k + r, where the number of information symbols is k = (2m-1)

and r is the number of check symbols needed to construct the parity check matrix. The

systematic design of the proposed codes increases the speed of the encoding procedure,

since now it only involves simple XOR operation of data symbols. Also, the parity check

symbols sometimes need to be re-ordered or placed in such a manner that the conditions

above are met without any additional rows. But this placement or re-ordering of the

parity symbols in the codeword does not affect the encoding or decoding latency. An

example of the parity check matrix for a double-adjacent symbol error correcting code for

k = 7 using the companion matrix notation has been shown in equation (5.4).

𝐻 =

1 0 1 0 1 0 10 1 0 1 0 1 01 𝛼 𝛼2 𝛼3 𝛼4 𝛼5 𝛼6

1 𝛼2 𝛼4 𝛼6 𝛼 𝛼3 𝛼5

0 1 0 00 0 1 01 0 0 00 0 0 1

(5.4)

55

5.3.1 Decoding Procedure

The most prevalent decoding procedure is to simply compare all the syndromes

and based on the computed syndrome, both the error location and magnitude can be

found. But this method involves comparison of a very large number of syndromes. Also,

the number of syndromes increases linearly with the number of symbols in a codeword,

and exponentially with the number of bits per symbol. Thus, in order to reduce the

complexity of the decoder we propose a decoding method based on the companion

matrix. This method has less complexity, but the reduction in complexity comes at the

cost of decoder latency. The syndromes are computed by taking the XOR of all data bits

whose corresponding column is 1 in the binary form of the parity check matrix. The

syndromes themselves are made up of m-bit symbols. If b-adjacent symbols are in error

starting from symbol i, the syndromes then are given by equation (5.5). Moreover, it can

be derived that if b-adjacent columns or any number of columns within b-adjacent

columns are in error, then equation (5.6) is always true for the b-adjacent columns.

The implementation of equation (5.6) is simply parallel XOR gates with the

syndrome symbols as its inputs. If equation (5.6) is satisfied for all cases of β ꓯ 1 ≤ β ≤ x

and x = (r-b), then b-adjacent errors have occurred from symbol i. The error location for

each symbol is the OR of equation (5.6) for all β. A symbol di will have b different

possibilities of adjacent columns that can be in error. The final indication of whether a

𝑆1

⋮𝑆𝑏

𝐿𝑆1

⋮𝐿𝑆𝑥

=

𝑒𝑖

⋮𝑒𝑖+𝑏−1

𝑇𝑖𝑒𝑖 ⊕ 𝑇𝑖+1𝑒𝑖+1 ⊕ ⋯⊕ 𝑇𝑖+𝑏−1𝑒𝑖+𝑏−1

⋮𝑇𝑥𝑖𝑒𝑖 ⊕ 𝑇𝑥(𝑖+1)𝑒𝑖+1 ⊕ ⋯⊕ 𝑇𝑥(𝑖+𝑏−1)𝑒𝑖+𝑏−1

(5.5)

𝑇𝛽𝑖𝑆1 ⊕ 𝑇𝛽(𝑖+1)𝑆2 ⊕ ⋯⊕ 𝑇𝛽(𝑖+𝑏−1)𝑆𝑏 ⊕ 𝐿𝑆𝛽 = 0 (5.6)

56

symbol is in error or not is the AND of error location signals of all b adjacent columns it

is part of. This is because if any one of the b different possibilities is in error, then it

equates to 0 indicating that the symbol is in error. A symbol is error free if and only if all

b possibilities equate to 1. The error pattern is obtained from Sα where α is the row

number in the upper b-rows of the parity check matrix for which column i is a 1. The

error pattern is then XORed with the received message symbols to get the decoded

message symbols. Thus, the upper b rows of the syndromes are used to obtain the error

magnitudes or patterns for adjacent errors. The lower (r-b) rows of the syndromes are

used to obtain the error location. The decoding procedure is thus a parallel one step

procedure. A partial schematic of the error pattern generator in the decoding circuit has

been shown in Fig. 5.1.

+

Synd

rom

e G

ener

ator

S7

S8

S9

S10

S11

S12

S1

S2

S3

S4

+ + +

& & &

& & & & & & & & & & & &

...

e0 e1 e5 e6

+

&

& & &

e2

Fig. 5.1: Partial schematics of error pattern generator for the proposed scheme.

57

If any number of check symbols within b-adjacent symbols are in error, then there

is at-least one case of β for which equation (5.6) is not satisfied. This simply indicates

that none of the data symbols are in error. In such a case, we consider the rows of the

parity check matrix which have 1’s in the parity check columns. These rows become the

magnitude computation rows of the parity check matrix, while all the remaining rows are

used for error location. The proposed codes can provide suitable error protection against

burst symbol errors with low latency decoding. Consider m = 6 bits/symbol RS code as

an example, three 2-bits/cell MLC memory cells can be concatenated to form a symbol.

A double adjacent symbol error correction (DAsEC) scheme can protect at least 4

adjacent MLC memory cells and at most 6 adjacent MLC memory cells. Similarly, a

triple adjacent symbol error correction (TAsEC) scheme can protect at least 7 adjacent

MLC memory cells and at most 9 adjacent MLC memory cells. In general, if a symbol is

made up of c MLC cells and a b-adjacent symbol error correcting proposed Reed-

Solomon code is used then the minimum number of adjacent memory cells protected by

the proposed code is Cmin = c(b-1) + 1. Similarly, the maximum number of adjacent

memory cells being protected is Cmax = cb. This is the best possible scenario wherein all

the cells affected in a b-symbol burst lie within b-adjacent symbols.

5.4 EVALUATION

The proposed codes were synthesized on Synopsys Design Compiler using NCSU

FreePDK45 45nm library for DAsEC and TAsEC for different information symbol

lengths. Both these codes were implemented using Dataflow model in Verilog and errors

were injected to ensure that all double adjacent errors and triple adjacent errors were

corrected. Exhaustive testing was done for different magnitudes and locations of the

58

symbols. For triple adjacent errors, the codes were tested to ensure that all errors within

any 3 adjacent columns were also corrected.

[Revirieo 15] proposed a method to correct double adjacent errors using OLS

codes for binary bits by augmenting the parity check matrix and the decoding procedure

of general OLS codes. This method was extended to non-binary symbols using the

method proposed in [Namba 15a], wherein each bit in the symbol had its own

independent parity check matrix as well as its own independent decoder, to correct

double adjacent symbol errors. These double adjacent symbol error correcting OLS

(DAsEC OLS) and double adjacent symbol error correcting Reed-Solomon (DAsEC RS)

code were implemented using Dataflow model in Verilog and synthesized on Synopsys

Design Compiler using NCSU FreePDK45 45nm library in order to compare it to the

proposed DAsEC codes.

[Reviriego 13] proposed a method for correcting triple adjacent errors for binary

bits using OLS codes with the same number of check bits as a double error correcting

OLS code but with an augmented decoding procedure. This was also extended using the

procedure in [Namba 15a] to correct triple adjacent symbol errors. This triple adjacent

symbol error correcting OLS (TAsEC OLS) code was implemented using Dataflow

model in Verilog and also synthesized on Synopsys Design Compiler using NCSU

FreePDK45 45nm library to compare it to the proposed TAsEC codes. The evaluation of

the implemented codes and the proposed codes was done on the basis of area of the

decoder, decoder latency and redundancy (i.e. number of check symbols required for a

given information symbol length per codeword).

59

5.4.1 Redundancy

Table 5.1 shows the comparison of redundancy between, the DAsEC OLS codes

[Reviriego 15] [Namba 15a], the DAsEC RS codes in [Namba 15b] and the proposed

DAsEC codes for k = 8, 16, 32 and 64. k refers to the number of information symbols in

the codeword and r refers to the number of check symbols required. It is seen that the

redundancy of the proposed codes and DAsEC RS codes are the same. It is also seen that

the proposed codes have much better redundancy compared to the DAsEC OLS codes.

Table 5.2 shows the comparison between the TAsEC OLS codes [Reviriego 13] [Namba

15a] and the proposed TAsEC codes for k = 8, 16, 32 and 64. As seen from the table, the

proposed codes have a much better redundancy compared to the TAsEC OLS codes.

Bits/

Symbol k

DAsEC OLS Codes [Namba

15a] [Reviriego 15] DAsEC RS Codes [Namba 15b] Proposed Codes

#check

symbols

Area

(μm2)

Latency

(ns)

#check

symbols

Area

(μm2)

Latency

(ns)

#check

symbols

Area

(μm2)

Latency

(ns)

4 8 9 659 0.27 4 42711 3.09 4 2328 0.96

5 16 12 1634 0.31 4 165222 4.76 4 8420 1.47

6 32 21 3847 0.42 4 1314387 6.61 4 18379 1.64

7 64 24 9225 0.42 4 16563101 9.89 4 43541 2.38

Table 5.1: Redundancy, Decoder Area and Latency Comparison for DAsEC codes

Bits/Symbol k

TAsEC OLS Codes [Namba 15a]

[Reviriego 13] Proposed Codes

#check

symbols

Area

(μm2)

Latency

(ns)

#check

symbols

Area

(μm2)

Latency

(ns)

4 8 - - - 6 2943 1.09

5 16 16 6174 0.77 6 12572 1.74

6 32 26 15456 1.22 6 41045 2.25

7 64 32 37679 1.64 6 128011 3.21

Table 5.2: Redundancy, Decoder Area and Latency Comparison for TAsEC codes

5.4.2 Hardware Complexity

Table 5.1 also shows the comparisons of area of the decoder and the decoder

latency between the DAsEC OLS codes [Reviriego 15] [Namba 15a], the DAsEC RS

60

codes in [Namba 15b] and the proposed DAsEC codes for k = 8, 16, 32 and 64. The

operations involved in the DAsEC RS codes were of GF(2m), where m = number of bits

per symbol, which increases the complexity of the decoder. The complex operations

result in a much greater circuit depth compared to the proposed codes. Due to this higher

circuit depth, these codes have high decoder latency as well. The use of a companion

matrix makes all operations binary for the proposed DAsEC codes. As a result, there are

no complex operations involved for the proposed codes. The decoder circuit is comprised

mainly of XOR gates, with additional AND and OR gates. As a result, the decoder

latency and the decoder area of the proposed codes are much less compared to the

DAsEC RS codes as seen in Table 5.1. The area and decoder latency of the DAsEC OLS

codes are better than the proposed codes, because the circuit depth is higher for the

proposed codes due to the low redundancy. The DAsEC OLS codes have lesser

complexity and parallel decoding logic. This reduces the decoder latency of the DAsEC

OLS codes compared to the proposed codes. But this reduced decoder latency comes at

an expense of very high redundancy.

Table 5.2 also shows the comparison of area of the decoder and the decoder

latency between the TAsEC OLS codes [Reviriego 13] [Namba 15a] and the proposed

TAsEC codes. Similar to the DAsEC code comparison, the proposed codes have higher

complexity due to the higher circuit depth arising from the low redundancy. The low

decoding latency of the TAsEC OLS codes is due to a low complexity parallel decoding

logic but comes at an expense of very high redundancy. The proposed codes can provide

much better redundancy for a slightly higher cost of decoder latency and decoder area.

61

5.5 CONCLUSION

A b-adjacent symbol error correction for Reed-Solomon codes is proposed which

are systematic by design and have low complexity parallel one step decoding. The

proposed codes have better decoding latency and decoder area compared to existing

adjacent error correction for Reed-Solomon codes. The proposed codes are also

compared to symbol error correcting OLS codes and it is shown that the proposed codes

achieve much better redundancy, but at a cost of slightly higher decoder latency and

decoder area. The proposed codes thus provide a balanced tradeoff between the amount

of redundancy required and the decoder complexity and latency. As a result, the proposed

codes can be used to increase the reliability of latest memory technologies like MLC

PCM and STT-MRAM in lower technology nodes.

62

Chapter 6: Efficient Non-binary Hamming Codes for Limited

Magnitude Errors in MLC PCMs

6.1 INTRODUCTION

Chapter 1 discusses the recent focus of MLC PCMs due to their high density and

lower costs [Papandreou 10]. They store multiple bits per cell by configuring the memory

element so that it can exist in 2b different states, where b is the number of bits per cell.

Their ability to store multiple bits in a single cell as well as being byte-addressable makes

them an attractive alternative to DRAM solutions. But, due to this MLC property, other

problems like resistance drift arises [Li 12]. This read reliability degradation due to

resistance drifts is a major reliability concern for MLC PCMs. Apart from resistance drift

issues, MLC PCMs also suffer from write disturbance issues wherein heat dissemination

from writes to a memory cell causes another nearby memory cell’s state to change [Jiang

14]. This happens only for certain states of the memory cell i.e. for certain value patterns

and thus is data dependent.

Thus, for emerging MLC PCMs instead of bit-flips, the dominating errors are of

limited magnitude by virtue of the resistance drifts. Limited magnitude errors refer to the

shift in the state of the cell due to the drifting resistance or due to the thermal disturbance

caused by a program operation. Thus, the dominant error patterns in these memories is

not random but instead depends on the data stored in the cell, the time for which it has

been stored in the cell as well as nearby program operations. Since PCMs are byte-

addressable, the error correction schemes to be used should have very low decoder

This chapter is based on the publication [Das 18c]: A. Das and N. A. Touba, "Efficient Non-Binary

Hamming Codes for Limited Magnitude Errors in MLC PCMs", in Proc. of IEEE International Symposium

on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), pp. 1-6, 2018. The author of

this dissertation contributed to the conception of the research problem, theoretical developments and

experimental verification of the research work.

63

latency since the performance is sensitive to access latencies of the memory. Also, since

both write disturbance and resistance drifts are considered, the error correcting code

should be able to correct errors of limited magnitude in either direction i.e. symmetric

limited magnitude errors.

Chapter 4 discusses research efforts for limited magnitude errors [Ahlswede 02,

Cassuto 10]. Systematic symmetric limited magnitude error correction with very low

redundancy using non-binary Hamming code was introduced in [Klove 11]. But these

codes used decimal arithmetic operation which can be expensive both in terms of decoder

complexity and decoder latency. In Chapter 4, we discussed a limited magnitude error

correcting Orthogonal Latin Square (OLS) codes, which takes advantage of the limited

magnitude error model to reduce the data redundancy [Das 17]. Also, the use of OLS

codes results in low decoder latencies which is important for the performance of MLC

PCMs. Such codes specific to single error correction have a very high redundancy cost

which leads to low memory utilization. It is also possible to use a Gray code mapping so

that a magnitude-1 error will cause only a single bit error. This can then be corrected

using a binary SEC code. But such a scheme cannot address errors of magnitude more

than 1 since in that case more than 1-bit changes.

In this work, a limited magnitude error correcting non-binary Hamming code is

proposed which can protect more information symbols for the same amount of

redundancy as a general non-binary Hamming code. The proposed codes are shown to

have much better hardware complexity and hardware latencies compared to [Klove 11] as

well as much better overall redundancy compared to symbol based binary Hamming

codes. The rest of the chapter is organized as follows. Section 6.2 reviews the general

Hamming codes. Section 6.3 describes the proposed limited magnitude error correcting

Hamming code in detail along with its construction and decoding procedures. Section 6.4

64

compares the proposed codes with existing codes in terms of hardware complexity and

redundancy. Section 6.5 presents the conclusion of this work.

6.2 GENERAL HAMMING CODES

A (n, k) q-ary Hamming code is a linear block code and a k-dimensional linear

subspace of a n-dimensional vector space over a Galois Field GF(q) [Lin 04]. If q = 2, the

codes are referred to as binary Hamming codes. For MLC memories, q = 2b, where b is

the number of bits stored per memory cell. For our purposes, we refer to this case as

general non-binary Hamming codes from hereon. Addition operation in GF(2b) is a

simple XOR operation while multiplication operation is combinational and depends on

the primitive polynomial as well. k is the number of information symbols while n is the

total number of symbols in the codeword. The amount of redundancy r is given by the

relation r = n – k. The length of the codeword is computed using equation (6.1). A parity

check matrix for a (5, 3) 4-ary Hamming code is shown in Fig. 6.1.

Fig. 6.1: Parity Check Matrix of (5, 3) 4-ary Hamming code

We introduce the notion of major element to simplify the explanation of how to

construct the parity check matrix. Major element is defined as the leading non-zero

element in a column of the parity check matrix. We also introduce two additional terms to

ease the explanations further. A major column can be defined as the set of columns

whose major element is 1. A minor column is defined as the set of columns whose major

𝑛 = 𝑞𝑟−1

𝑞−1 (6.1)

𝐻 = 1 1 1 1 01 2 3 0 1

65

element is not 1. Fig. 6.2 highlights all the major elements and labels the major columns

and minor columns amongst all possible columns for a 2-bits/cell memory with 2 parity

check symbols i.e. r = 2. The advantage of using the notion of major element is that

instead of considering individual columns and its multiples, we can instead consider

groups of columns with the same major element as a single entity. This makes it much

easier to visualize the construction process of the parity check matrix.

1 1 1 1 2 2 2 2 3 3 3 3000

1 2 31 2 30 1 2 30 1 2 30

Major ElementsParity Check

Columns

Major Columns

Minor Columns

Fig. 6.2: Classification of columns and elements for a 2-bits/cell memory

1 1 1

1 2 3

1 1 1

4 5 6

1

7

1

0

7 7 7

7 5 2

7 7 7

1 6 4

7

3

7

0

x7

Fig. 6.3: Multiplying major columns with 7 for a 3-bits/cell memory

As an example, consider the columns with major element as 1 for a 3-bits per cell

memory. When the columns are multiplied with 7 in GF(8) with the primitive polynomial

being p(x) = x3 + x + 1, the corresponding results are shown in Fig. 6.3. Thus, we see that

all columns with major element 1 when multiplied with 7 produce all columns whose

major element is 7 and no column is repeated in the multiples. As shown in Fig. 6.1, for

general non-binary Hamming codes, the major element is always 1 since it assumes that

all error patterns are possible. A single error produces a syndrome which is either equal to

a column or a multiple of column in the parity check matrix. The error magnitude and the

data symbol corresponding to the matched column are then added to recover the correct

data symbol.

66

6.3 PROPOSED SCHEME

The underlying principle of the proposed scheme is that not all error patterns are

possible when limited magnitude errors are considered, thus making it possible to use

minor columns in the parity check matrix. Consider an MLC memory with 3-bits/cell and

limited magnitude-1 errors. Since we consider symmetric limited magnitude errors i.e.

errors can occur in either direction, all possible errors and the corresponding error

patterns have been shown in Fig. 6.4. Error patterns are computed by taking the XOR of

the original symbol and the new erroneous symbol.

We define e as the set of possible error patterns. From Fig. 6.4 it is seen that the

set of error patterns is e = {1, 3, 7}. Thus, we can insert minor columns in the parity

check matrix of the Hamming codes if it satisfies the following conditions:

1. The product of a major column and the elements in e should not result in the minor

column.

2. The product of the minor column and the set of elements in e should be distinct and

should not be equal to the product of elements in e and any other column in the parity

check matrix.

Condition 1 ensures that a limited magnitude error in any column in the parity

check matrix does not produce a syndrome equal to some other column in the parity

check matrix. Condition 2 ensures that no two columns in the parity check matrix with

either same or different limited magnitude errors produce the same syndrome.

Next, we show the construction of the parity check matrix by means of an

example. Consider a 3-bits/cell memory with limited magnitude-1 errors. From Fig. 6.4,

we see that e = {1, 3, 7}. Now the total number of available columns for r = 2 parity

check symbols is shown in Fig. 6.5(a). The major elements have been highlighted. As

67

mentioned before, the product operation in this case is a GF(8) operation with the

primitive polynomial being p(x) = x3 + x + 1.

1 2 3 4 5 6 70

1 2 3 4 5 6 7 -

1 2 3 4 5 6- 0

+1

-1

1 3 1 7 1 3 1 -

- 1 3 1 7 1 3 1

Original Symbols

Erroneous Symbols

Error Patterns

Erroneous Symbols

Error Patterns

Fig. 6.4: All possible error patterns for a 3-bits/cell memory

4

444 44 4 4 40

6

4

54

1 ... 11

0 ... 71

2 ... 22

0 ... 71

3 ... 33

0 ... 71

4 ... 44

0 ... 71

5 ... 55

0 ... 71

6 ... 66

0 ... 71

7 ... 77

0 ... 71

0 ... 00

1 ... 72(a)

2 ... 22

0 ... 71

... 44

0 ... 71

5 ... 55

0 ... 71

6 ... 66

0 ... 71

0

2(b)

0 0 0

321 4 5 6 70(c)

Fig. 6.5: (a) All possible columns for a 3-bits/cell memory and 2 parity check symbols (b)

Resultant columns after removal of major elements from e1 (c) Resultant

columns after removal of major elements from e2

Considering the product of the elements in e and the major columns, the resulting

columns are all columns whose major element belongs to the set e1, where e1 is given by

𝑒1 = 1, 3, 7 × 1 = {1, 3, 7} (6.2)

𝑒2 = 1, 3, 7 × 2 = {2, 6, 5} (6.3)

𝑒4 = 1, 3, 7 × 4 = {4, 7, 1} (6.4)

68

equation (6.2). These columns cannot be used in the parity check matrix as per

condition-1 and are removed from the list of available columns. Of the remaining

columns shown in Fig. 6.5(b), we consider the next major element (which in this case =

2). The elements in e are then multiplied with the major element 2, and the resultant

columns are all columns whose major element belongs to the set e2, where e2 is given by

equation (6.3). Since the elements in e2 are distinct and do not belong to e2, condition-2 is

satisfied as well and all columns with major element 2 can be included in the parity check

matrix. The resultant columns with major elements belonging to e2 are removed from the

list of available columns. The only columns that remain is the list of available columns

are columns with major element 4 as shown in Fig. 6.5(c). If we multiply elements in e

with major element 4, we get the columns whose major element belongs to the set e4

given by equation (6.4). Since the elements in e4 also belong to the set e1, it violates

condition-2 and thus cannot be included and is removed from the list of available

columns. The list of available columns is now empty, and the final parity check matrix is

shown in Fig. 6.6. An algorithm to construct the parity check matrix for a given number

of parity check symbols is shown in Fig. 6.7.

Fig. 6.6: Parity check matrix of a limited magnitude-1 error correcting code for a 3-bits

per cell memory

6.3.1 Syndrome Analysis

Considering an MLC memory with b bits/cell, general non-binary Hamming

Codes consider all possible error patterns. It is not a necessary requirement to correct

𝐻 = 0 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 02 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1

69

errors in the parity symbol and only syndromes of information symbols can be

considered. Thus, the total number of possible syndromes of the parity check matrix is

given by equation (6.5).

Inputs:allCols = sorted set of all possible columns except all-0 columne = set of error patterns

Outputs:HCol = columns for parity check matrixsyn = set of all possible valid syndromes

syn = {};HCol = {};totalCols = total #columns in allCols;

for (i=0; i<totalCols; i++){

col = allCols[i];validCol = 1;vSyn = {};

Error_Pattern_Loop: for each element x in e {

mCol = x * col; // GF multiplication

if (mCol is present in syn) {

validCol = 0;exit Error_Pattern_Loop;

}else

add mCol to vSyn;}

if (validCol = 1) // This means that all syndromes generated are unique{

for each sydrome s in vSynadd s to syn;

add col to HCol; }

}

Fig.6.7: Algorithm for construction of the parity check matrix of the proposed scheme

But for the proposed scheme with limited magnitude errors, since the error model

itself assumes that not all possible error patterns are possible, the total number of

syndromes of the parity check matrix depends on the set of all possible error patterns e

and is given by equation (6.6), where |e| is the cardinality of e. Depending on the

magnitudes of errors being corrected, |e| is generally much smaller than (2b-1). Thus, for

cases of limited magnitude errors, the total number of syndromes is lesser compared to

general non-binary Hamming codes. Since the decoding of Hamming codes involve a

syndrome matching circuit wherein the received syndrome is matched to either a column

70

or a multiple of a column, the proposed scheme for limited magnitude error reduces the

decoder complexity simply by reducing the number of syndromes to be matched per

column of the parity check matrix. Fig. 6.8 shows a comparison between the number of

syndromes between the two cases for different magnitudes of error l and different

information lengths k. LM refers to number of syndromes for proposed codes correcting

limited magnitude errors. NB refers to the general non-binary Hamming codes.

Fig. 6.8: Comparison of #syndromes between limited magnitude error correcting

Hamming codes and non-binary general Hamming codes

6.3.2 Companion Matrix

Finite field i.e. GF(2b) operations are expensive in terms of hardware. They can

cause increase in both the encoder and decoder complexity as the number of bits per cell

b increases. [Bossen 70] introduced the use of companion matrices to keep the

#𝑠𝑦𝑛𝑑𝑟𝑜𝑚𝑒𝑠𝑁𝐵 = 𝑘(2𝑏 − 1) ()

#𝑆𝑦𝑛𝑑𝑟𝑜𝑚𝑒𝑠𝐿𝑀 = 𝑘( 𝑒 ) ()

19

2 38

4 76

8

51

2

10

24

20

48

25

6 51

2

10

24

44

8

89

6

17

92

10

24

20

48

40

96

64 128 256

#Syn

dro

me

s

Information length k

LM(q=8, l=1) NB(q=8) LM(q=16, l=1) LM(q=16, l=2) NB(q=16)

71

complexity low by replacing finite field multiplication with binary matrix multiplications

or simple XOR operations. This trades off decoder complexity for a slight increase in

decoder latency. Companion matrices are b x b non-singular matrices derived from the

root α of the primitive polynomial used. Thus, both the encoder and decoder circuit use

companion matrices to reduce the overall hardware complexity.

6.3.3 Encoder

The encoder circuit is similar to that of a general non-binary Hamming code with

the parity check symbols being computed from the information symbols and then

appending them to the information symbols to form the final codeword. The only

difference is that the proposed scheme uses a companion matrix instead of non-binary

alphabets. Thus, all finite field multiplications are replaced by XOR operations. The

binary form of the parity check matrix shown in Fig. 6.6 is shown partially in Fig. 6.9.

Thus, the proposed scheme has a systematic design which results in a low complexity

encoder.

6.3.4 Decoder

The decoder circuit is also similar to that of a general non-binary Hamming code.

The syndromes are computed based on the binary parity check matrix derived using the

companion matrices. Each syndrome symbol consists of b bits. The syndromes are

matched to both individual columns, and their multiples, of the parity check matrix. The

corresponding error magnitude is then XORed to the received symbol based on the

syndrome computation. Fig. 6.10 shows the partial decoding circuit generating the error

magnitudes for the first 2 symbols of the parity check matrix from Fig. 6.9.

72

Fig. 6.9: Binary form of partial parity check matrix of Fig. 6.6

Fig. 6.10: Error Magnitude Computation for first 2 symbols of parity check matrix from

Fig. 6.9

6.4 EVALUATION

[Klove 11] proposed systematic, single limited magnitude error correcting codes

which used decimal arithmetic operations. These codes were implemented and simulated

for various information symbol lengths k, different bits per symbol b and different

magnitudes of errors l. All error magnitudes considered here are symmetric. A symbol-

based binary (SyB) Hamming code was also implemented similar to the idea proposed in

[Namba 15a]. In this case, each bit of the symbol had its own independent parity check

matrix. For the decoding procedure, the syndromes were also a concatenated version of

73

the syndromes of each individual bit in a symbol. Then for each bit in the symbol, the

decoding was done separately and similar to a binary Hamming code’s syndrome-based

matching. Thus, each bit position in the syndrome symbol was matched to a particular

column and the bit position of the corresponding information symbol was flipped.

The proposed codes as well as the above-mentioned codes used for comparison

were implemented using the Dataflow model in Verilog. Exhaustive functional testing

was done for different magnitudes of error. The codes were then synthesized in Synopsys

Design Compiler using NCSU FreePDK45 45nm library. Evaluation was done on the

basis of hardware complexity like area and number of cells, as well as on latency for both

the encoding circuit and the decoding circuit. Additionally, redundancy was also used as

a measure of evaluation between the schemes. In the comparison, b refers to the number

of bits stored per memory cell, l refers to the maximum magnitude of error, k refers to the

number of message symbols in the codeword and r refers to the number of check symbols

in the codeword. Latency in computed in nanoseconds (ns) while area is measured in

square micron (μm2) for the designs. Also, for the different magnitudes of error,

symmetric limited magnitude error model is assumed i.e. errors of maximum magnitude l

can occur in either direction.

Table 6.1 shows the comparison of the encoder latency, encoder area and number

of check symbols in the encoder. In general, the proposed scheme achieves much better

encoder latency as well as encoder area compared to the codes in [Klove 11]. Also, the

encoder area is very similar for the proposed scheme and the SyB Hamming codes. The

encoder latency is the least for SyB Hamming codes since it has more independent

parallel operations per bit in the symbol which leads to lesser circuit depth compared to

the proposed scheme.

74

b l k

Arithmetic Codes [Klove 11] SyB Hamming [Namba 15a] Proposed Scheme

#check

Symbols

Latency

(ns)

Area

(μm2)

#check

Symbols

Latency

(ns)

Area

(μm2)

#check

Symbols

Latency

(ns)

Area

(μm2)

3 1

64 3 2.45 4474 7 0.72 1664 3 1.21 1642

128 3 3.023 9021 8 1.09 3399 3 1.32 2958

256 4 3.79 19908 9 1.65 6884 4 1.94 6308

4 1

64 2 2.66 5952 7 0.74 2223 3 1.16 2086

128 3 3.11 13031 8 1.00 4527 3 1.71 4425

256 3 3.94 30627 9 1.68 9153 3 2.19 9130

4 2

64 3 2.12 8249 7 0.74 2223 3 1.41 2330

128 3 2.69 18431 8 1.00 4527 3 1.98 4700

256 4 3.44 36853 9 1.68 9153 3 2.14 9381

Table 6.1: Comparison of Encoding circuit between the different schemes

b l k

Arithmetic Codes [Klove 11] SyB Hamming [Namba 15a] Proposed Scheme

#check

Symbols

Latency

(ns)

Area

(μm2)

#check

Symbols

Latency

(ns)

Area

(μm2)

#check

Symbols

Laten

cy

(ns)

Area

(μm2)

3 1

64 3 2.82 8210 7 1.04 3686 3 1.66 3690

128 3 3.51 16270 8 1.46 7104 3 2.06 7030

256 4 4.54 35324 9 2.03 14077 4 2.88 13922

4 1

64 2 3.21 10549 7 1.06 4898 3 1.85 4794

128 3 3.96 22028 8 1.4 9435 3 2.51 9649

256 3 4.73 48428 9 1.95 18607 3 3.6 19057

4 2

64 3 3.61 14209 7 1.06 4898 3 2.52 6412

128 3 4.77 29995 8 1.4 9435 3 3.32 12728

256 4 5.08 60129 9 1.95 18607 3 3.13 24435

Table 6.2: Comparison of Decoding circuit between the different schemes

Table 6.2 shows the comparison of redundancy i.e. number of check symbols,

decoder latency and decoder area in the decoder design. The proposed scheme achieves

better decoder latency and decoder area for similar redundancy as the codes in [Klove

11]. Also, compared to the SyB Hamming codes, the proposed codes achieve much better

redundancy and a slight increase in decoder latency for similar decoder area. Thus, the

proposed codes achieve a balanced trade-off between decoder complexity and

redundancy compared to the other two existing schemes.

75

6.5 CONCLUSION

A new efficient non-binary Hamming code is proposed for limited magnitude

errors with the parity check matrix dependent on the possible error patterns. The

proposed codes can correct a single limited magnitude error in a word. The proposed

codes are able to achieve better decoder latency and hardware complexity compared to

arithmetic codes correcting limited magnitude errors for the same amount of redundancy.

The proposed codes also achieve better redundancy compared to binary Hamming codes

when extended to correct symbols. Thus, these codes are useful for protecting newer

emerging non-volatile main memories like MLC PCMs which suffer from read reliability

degradation due to resistance drifts.

76

Chapter 7: Summary and Future Work

7.1 SUMMARY

In this dissertation, various methods to detect and correct online errors in

emerging memories and high-density memory systems were devised and explored. As

technology scales further, conventional memories like SRAM and DRAM become denser

which creates new reliability challenges and higher soft error rates. Flash based memories

lag behind in terms of throughput in order to support stronger ECC schemes. Also, as

conventional memories fall behind in terms of power consumption and performance,

emerging memory technologies are being researched to provide an alternative solution.

But these newer memories have different reliability issues and different error models for

soft errors. Thus, the conventional error correcting codes are not efficient enough to

provide the required error correction capability without compromising on performance or

hardware overhead. The goal of this research is to develop efficient codes suitable for

these new reliability challenges and error mechanisms whilst reducing hardware overhead

and with minimizing impact to the performance of the memory.

In chapter 2, a new code to correct multiple bit upsets in SRAMs is proposed.

This new scheme leads to only a linear increase in decoder complexity as the number of

adjacent bits being upset increases. This provides a huge benefit against existing schemes

wherein the decoder complexity rises exponentially to the number of adjacent bits in

error. It is also shown that this benefit in decoder complexity comes at a slight or almost

no additional overhead in terms of redundancy. Chapter 3 addresses the issue of higher

error rates in both DRAMs and flash-based memories while providing adequate

performance as well. A scheme to correct double errors is proposed while maintaining

low decoder latency in order to enable high throughput. Two new decoding schemes are

77

proposed: first, a low latency decoding scheme for DRAMs and second, a low

complexity serial decoding scheme for flash-based memories. It is shown that the low

latency decoding scheme is able to reduce both the decoder latency and decoder

complexity at the expense of additional check-bits. It is also shown that the low

complexity serial decoding scheme reduces the number of decoding cycles by orders of

magnitude compared to existing schemes, thus enabling high performance. This scheme

also reduces the decoder complexity, but these benefits are achieved at the expense of

higher redundancy.

In chapter 4, a modified OLS code was proposed to address resistance drifts in

MLC PCMs. It was shown that these codes provide a better code rate and lesser decoder

hardware overhead compared to existing solutions. This benefit came with a slight trade-

off f decoder latency in order to achieve a better code-rate. Chapter 5 proposes a new b-

adjacent error correcting code based on Reed-Solomon codes to mitigate burst errors in

super dense memories. These codes are highly useful for mitigating write disturbance

errors in super dense MLC PCM and magnetic field coupling errors in dense STT-

MRAMs. It was shown that the proposed codes provided a balanced trade-off between

the code-rate, decoder hardware overhead as well as decoder latency compared to

existing schemes. The one-step decoding procedure enables high speed decoding for

these memories thus enabling better performance as well.

Finally, chapter 6 proposes a new Hamming code-based scheme for limited

magnitude errors. Limited magnitude errors are bounded by design. Thus, the code-space

of non-binary Hamming codes is exploited to protect a greater number of data symbols

for the same amount of redundancy. It is shown that the proposed scheme achieves a

good balanced trade-off between both decoding and encoding area, latency as well as

78

redundancy compared to existing schemes. This makes it suitable for emerging memory

designs suffering from limited magnitude errors.

7.2 FUTURE WORK

As memories grow more complex and new memory technologies are researched

and explored, the reliability challenges and error mechanisms specifically with respect to

soft errors change with the technology as well. This creates huge opportunities to develop

error correcting codes which are specific and highly efficient based on the design and

memory technology. For the work in chapter 2, one possible extension would be to

enable double error detection along with adjacent error correction. This would enable the

memory to generate either an interrupt or invalidate a memory line when random double

error occurs, thus preventing erroneous data from being used. The scheme in chapter 3

along with the serial decoding scheme can also possibly be extended to correct triple

errors and beyond such that they are able to enable high throughput for various error

prone applications thus increasing their overall reliability.

As the technology with respect to emerging memories and their various

applications mature, it is also possible to optimize the codes in chapters 4, 5 and 6 further

depending on the reliability challenges faced. New technologies like resistive RAMs

(RRAM) are researched for their use in neural networks. The RRAMs are used to store

the weights and also perform matrix multiplication in a single step in the analog domain.

Similar to PCM, RRAMs also exhibit limited magnitude erroneous behavior. A possible

field of exploration would be to optimize the codes presented in this work for limited

magnitude errors specifically in neural networks wherein RRAMs are not used like a

typical memory and online errors would affect the result of matrix multiplication.

79

Bibliography

[Adalid 15] L. S. Adalid, P. Reviriego, P. Gil, S. Pontarelli and J. A. Maestro, “MCU

Tolerance in SRAMs Through Low-Redundancy Triple Adjacent Error

Correction,” in IEEE Transactions on Very Large-Scale Integration (VLSI)

Systems, vol. 23, no. 10, pp. 2332-2336, Oct. 2015.

[Ahlswede 02] R. Ahlswede, H. Aydinian, and L. Khachatrian, “Unidirectional error

control codes and related combinatorial problems,” in Proc. of the Eighth

International Workshop on Algebraic and Combinatorial Coding Theory (ACCT-

8), pp.6-9, 2002.

[Argyrides 11] C. Argyrides, D. Pradhan and T. Kocak, “Matrix codes for reliable and

costefficient memory chips,” in IEEE Transactions on Very Large-Scale

Integration (VLSI) Systems, vol. 19, no. 3, pp. 420-428, Mar. 2011.

[Baeg 09] S. Baeg, S. Wen and R. Wong, “SRAM Interleaving Distance Selection with a

Soft Error Failure Model,” in IEEE Transactions on Nuclear Science, vol. 56, no.

4, pp. 2111-2118, Aug. 2009.

[Baumann 05] R. Baumann, “Soft errors in advanced computer systems,” in IEEE Design

& Test of Computers, vol. 22, no. 3, pp. 258-266, May-Jun. 2005.

[Bossen 70] D. C. Bossen, “b-Adjacent Error Correction,” in IBM Journal of Research

and Development, vol. 14, no. 4, pp. 402-408, Jul. 1970.

[Burton 71] H. O. Burton, “Some asymptotically optimal burst-correction codes and their

relation to single-error-correcting Reed-Solomon codes,” in IEEE Transactions

on Information Theory, vol. 17, no. 1, pp. 92–95, Jan. 1971.

[Carrasco 08] R. A. Carrasco and M. Johnston, Non-binary Error Control Coding for

Wireless Communication and Data Storage. Chichester, West Sussex, UK: Wiley,

2008.

[Cassuto 10] Y. Cassuto, M. Schwartz, V. Bohossian, and J. Bruck, “Codes for

Asymmetric Limited-Magnitude Errors with Application to Multilevel Flash

Memories”, in IEEE Transactions on Information Theory, vol. 56, no. 4, pp.

1582-1595, Apr. 2010.

[Chang 02] H. C. Chang, C. C. Lin and C. Y. Lee, "A low power Reed-Solomon decoder

for STM-16 optical communications", in Proc. of IEEE Asia-Pacific Conference

on ASIC, pp. 351-354, 2002.

[Chappert 07] C. Chappert, A. Fert, and F. N. Van Dau, “The emergence of spin

electronics in data storage,” in Nature Materials, vol. 6, no. 11, pp. 813–823,

Nov. 2007.

[Chien 64] R. Chien, “Cyclic decoding procedures for Bose-Chaudhuri-Hocquenghem

codes,” in IEEE Transactions on Information Theory, vol. 10, no. 4, pp. 357-363,

Oct. 1964.

80

[Das 17] A. Das and N. A. Touba, "Limited Magnitude Error Correction using OLS

Codes for Memories with Multilevel Cells," in Proc. of IEEE International

Conference on Computer Design (ICCD), pp. 391-394, 2017.

[Das 18a] A. Das and N. A. Touba, "Systematic b-adjacent symbol error correcting Reed-

Solomon codes with parallel decoding", in Proc. of IEEE VLSI Test Symposium

(VTS), paper 7A.1, 2018.

[Das 18b] A. Das and N.A. Touba, "Low Complexity Burst Error Correcting Codes to

Correct MBUs in SRAMs", in Proc. of ACM Great Lakes Symposium on VLSI

(GLSVLSI), pp. 219-224, 2018.

[Das 18c] A. Das and N. A. Touba, "Efficient Non-Binary Hamming Codes for Limited

Magnitude Errors in MLC PCMs", in Proc. of IEEE International Symposium on

Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), pp. 1-6,

2018.

[Das 19] A. Das and N. A. Touba, "Layered-ECC: A Class of Double Error Correcting

Codes for High Density Memory Systems" in Proc. of IEEE VLSI Test

Symposium (VTS), paper 7A.2, 2019.

[Datta 11] R. Datta and N.A. Touba, “Generating Burst-Error Correcting Codes from

Orthogonal Latin Square Codes - A Graph Theoretic Approach,” in Proc. of IEEE

International Symposium on Defect and Fault Tolerance in VLSI and

Nanotechnology Systems (DFT), pp. 367-373, 2011.

[Diao 05] Z. Diao, D. Apalkov, M. Pakala, Y. F. Ding, A. Panchula, and Y. M. Huai,

“Spin transfer switching and spin polarization in magnetic tunnel junctions with

MgO and AlOx barriers,” in Applied Physics Letters, vol. 87, no. 23, pp. 1-3, Dec.

2005.

[Dutta 07] A. Dutta and N.A. Touba “Multiple Bit Upset Tolerant Memory Using a

Selective Cycle Avoidance Based SEC-DED-DAEC Code,” in Proc. of IEEE

VLSI Test Symposium (VTS), pp. 349-354, 2007.

[Fujiwara 06] E. Fujiwara, Code Design for Dependable Systems: Theory and Practical

Applications. Hoboken, NJ, USA: Wiley-Interscience, 2006.

[Hamming 50] R. W. Hamming, “Error detecting and error correcting codes,” in Bell

System Technical Journal, vol. 26, no. 2, pp. 147-160, Apr. 1950.

[Hsiao 70] M. Y. Hsiao, D. C. Bossen, and R. T. Chien, ‘‘Orthogonal Latin Square

codes,’’ in IBM Journal of Research and Development, vol. 14, no. 4, pp. 390–

394, Jul. 1970.

[Ibe 10] E. Ibe, H. Taniguchi, Y. Yahagi, K. Shimbo and T. Toba, “Impact of scaling on

neutron-induced soft error in SRAMs from a 250 mm to a 22 nm design rule,” in

IEEE Transactions on Electron Devices, vol. 57, no. 7, pp. 1527-1538, Jul. 2010

81

[Jeon 12] M. Jeon, and J. Lee, “On Codes Correcting Bidirectional Limited-Magnitude

Errors for Flash Memories,” in Proc. of International Symposium on Information

Theory and its Applications, pp. 96-100, 2012.

[Jiang 14] L. Jiang, Y. Zhang and J. Yang, “Mitigating Write Disturbance in Super-Dense

Phase Change Memories,” in Proc. of Annual IEEE/IFIP International

Conference on Dependable Systems and Networks (DSN), pp. 216-227, 2014.

[Jiang 16] L. Jiang, W. Wen, D. Wang and L. Duan, “Improving read performance of

STT-MRAM based main memories through Smash Read and Flexible Read,” in

Proc. of Asia and South Pacific Design Automation Conference (ASP-DAC), pp.

31-36, 2016.

[Kim 07] J. Kim, N. Hardavellas, K. Mai, B. Falsafi and J. Hoe, “Multi-bit error tolerant

caches using two-dimensional error coding,” in Annual IEEE/ACM International

Symposium on Microarchitecture (MICRO), pp. 197–209, 2007.

[Klockmann 17] A. Klockmann, G. Georgakos and M. Goessel, “A new 3-bit burst-error

correcting code,” in Proc. of IEEE International Symposium on On-Line Testing

and Robust System Design (IOLTS), pp. 3-5, 2017.

[Klove 11] T. Klove, B. Bose, and N. Elarief, "Systematic, single limited magnitude error

correcting codes for flash memories," in IEEE Transactions on Information

Theory, vol. 57, no. 7, pp.4477-4487, Jul. 2011.

[Lee 11] K. Lee and S. Choi, “A Highly Manufacturable Integration Technology of 20nm

Generation 64Gb Multi-Level NAND Flash Memory,” in Proc. of IEEE

Symposium on VLSI Technology, pp. 70-71, 2011.

[Li 12] J. Li, B. Luan and C. Lam, "Resistance drift in phase change memory," in Proc.

of IEEE International Reliability Physics Symposium (IRPS), Paper 6C.1, 2012.

[Lin 04] S. Lin and D. J. Costello, Error Control Coding. Upper Saddle River, NJ, USA:

Pearson Education, 2004.

[Liu 18] S. Liu, J. Li, P. Reviriego, M. Ottavi and L. Xiao, “A Double Error Correction

Code for 32-Bit Data Words With Efficent Decoding,” in IEEE Transactions on

Device and Materials Reliability, vol. 18, no. 1, pp. 125-127, Mar. 2018.

[Lu 96] E. H. Lu and T. Chang, “New decoder for double-error-correcting binary BCH

codes,” in IEE Proceedings - Communications, vol. 143, no. 3, pp. 129 – 132,

Jun. 1996.

[Meza 15] J. Meza, Q. Wu, S. Kumar and O. Mutlu, “Revisiting Memory Errors in

Large-Scale Production Data Centers: Analysis and Modeling of New Trends

from the Field,” in Proc. of IEEE/IFIP International Conference on Dependable

Systems and Networks (DSN), pp. 415-426, 2015.

[Namba 14] K. Namba, S. Pontarelli, M. Ottavi and F. Lombardi, “A Single-Bit and

Double-Adjacent Error Correcting Parallel Decoder for Multiple-Bit Error

82

Correcting BCH Codes,” in IEEE Transactions on Device and Materials

Reliability, vol. 14, no. 2, pp. 664-671, Jun. 2014.

[Namba 15a] K. Namba, and F. Lombardi, “Non-Binary Orthogonal Latin Square Codes

for a Multilevel Phase Charge Memory (PCM),” in IEEE Transactions on

Computers, vol. 64, no. 7, pp. 2092-2097, Jul. 2015.

[Namba 15b] K. Namba and F. Lombardi, “A Single and Adjacent Symbol Error-

Correcting Parallel Decoder for Reed-Solomon Codes,” in IEEE Transactions on

Device and Materials Reliability, vol. 15, no. 1, pp. 75-81, Mar. 2015.

[Naseer 08] R. Naseer and J. Draper, “Parallel double error correcting code design to

mitigate multi-bit upsets in SRAMs,” in Proc. of European Solid-State Circuits

Conference (ESSCIRC), pp. 222-225, 2008.

[Neale 13] A. Neale and M. Sachdev, “A new SEC-DED error correction code subclass

for adjacent MBU tolerance in embedded memory,” in IEEE Transactions on

Device and Materials Reliability, vol. 13, no. 1, pp. 223–230, Mar. 2013.

[Ovshinsky 68] S. R. Ovshinsky, ‘‘Reversible Electrical Switching Phenomena in

Disordered Structures,’’ in Physics Review Letters, vol. 21, no. 20, pp. 1450–

1455, Nov. 1968.

[Papandreou 10] N. Papandreou, A. Pantazi, A. Sebastian, M. Breitwisch, C. Lam, H.

Pozidis and E. Eleftheriou, “Multilevel Phase-Change Memory,” in Proc. of IEEE

International Conference on Electronics, Circuits and Systems (ICECS), pp.

1017-1020, 2010.

[Radaelli 05] D. Radaelli, H. Puchner, S. Wong and S. Daniel, “Investigation of multi-bit

upsets in a 150 nm technology SRAM device,” in IEEE Transactions on Nuclear

Science, vol. 52, no. 6, pp. 2433-2437, Dec. 2005.

[Raoux 08] S. Raoux, G. W. Burr, M. J. Breitwisch, C. T. Rettner, Y.-C. Chen, R. M.

Shelby, M. Salinga, D. Krebs, S. H. Chen, H. L. Lung and C. H. Lam, “Phase-

change random access memory: A scalable Technology,” in IBM Journal of

Research and Development, vol. 52, no. 4/5, pp. 465-479, Sep. 2008.

[Reviriego 12] P. Reviriego, M. Flanagan, S.-F. Liu and J. Maestro, “Multiple cell upset

correction in memories using difference set codes,” in IEEE Transactions on

Circuits and Systems-I: Regular Papers, vol. 59, no. 11, pp. 2592–2599, Nov.

2012.

[Reviriego 13] P. Reviriego, S. Liu, J.A. Maestro, S. Lee, N.A. Touba and R. Datta,

“Implementing Triple Adjacent Error Correction in Double Error Correction

Orthogonal Latin Square Codes,” in Proc. of IEEE International Symposium on

Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), pp. 167-

171, 2013.

83

[Reviriego 15] P. Reviriego, S. Pontarelli, A. Evans and J.A. Maestro, “A Class of SEC-

DED-DAEC Codes Derived from Orthogonal Latin Square Codes,” in IEEE

Transactions on very Large Scale Integration (VLSI) Systems, vol. 23, no. 5, pp.

968-972, May 2015.

[Reviriego 16] M. Demirci, P. Reviriego and J. A. Maestro, “Implementing Double Error

Correction Orthogonal Latin Squares Codes in SRAM-based FPGAs,” in

Microelectronics Reliability, vol. 56, pp. 221-227, Jan. 2016.

[Shamshiri 10] S. Shamshiri and K. T. Cheng, “Error-locality-aware linear coding to

correct multi-bit upsets in SRAMs,” in Proc. of IEEE International Test

Conference (ITC), Paper 7.1, 2010.

[Slonczewski 96] J. C. Slonczewski, “Current-driven excitation of magnetic multilayers,”

in Journal of Magnetism and Magnetic Materials, vol. 159, no. 1–2, pp. L1-L7,

Jun. 1996.

[Wilkerson 10] C. Wilkerson, A. R. Alameldeen, Z. Chishti, W. Wu, D. Somasekhar, and

S. Lu, “Reducing cache power with low-cost, multi-bit error-correcting codes,” in

Proc. of ACM annual international symposium on Computer architecture (ISCA),

pp. 83–93, 2010.

[Wong 10] H. S. P. Wong, S. Raoux, S. Kim, J. Liang, J. P. Reifenberg, B. Rajendran, M.

Asheghi, and K. E. Goodson, “Phase Change Memory,” in Proceedings of the

IEEE, Vol. 98, No. 12, December 2010.

[Yamada 91] N. Yamada, E. Ohno, K. Nishiuchi, and N. Akahira, ‘‘RapidPhase

Transitions of GeTe-Sb2Te3 Pseudobinary Amorphous Thin Films for an Optical

Disk Memory,’’ in Journal of Applied Physics, vol. 69, no. 5, pp. 2849–2856,

Apr. 1991.

[Yoo 14] I. Yoo and I. C. Park, “A search-less DEC BCH decoder for low-complexity

fault-tolerant systems,” in Proc. of IEEE Workshop on Signal Processing Systems,

pp. 1-6, 2014.

[Yoo 16] H. Yoo, Y. Lee and I. C. Park, “Low-Power Parallel Chien Search Architecture

Using a Two-Step Approach,” in IEEE Transactions on Circuits and Systems II:

Express Briefs, vol. 63, no. 3, pp. 269-273, Mar. 2016.

[Yoon 18] I. Yoon and A. Raychowdhury, “Modeling and Analysis of Magnetic Field

Induced Coupling on Embedded STT-MRAM Arrays,” in IEEE Transactions on

Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 2, pp.

337-349, Feb. 2018.

84

Vita

Abhishek Das was raised in the small town of Rourkela, India. He received his

Bachelor of Technology degree in Electronics and Communications Engineering from

National Institute of Technology Rourkela in 2012. After graduating, he worked in

Centre for Development of Telematics as a Research Engineer for 2 years. He received a

Master’s degree in Electrical Engineering from the University of Texas at Austin in 2016.

He joined the PhD program at the University of Texas at Austin in 2016 and has been

working in the Computer Aided Testing (CAT) Lab under the supervision of Prof. Nur

Touba. He is currently pursuing his PhD degree. His current research interests include

fault tolerant computing specific to MLC non-volatile memories, VLSI testing and

exploring fault tolerant techniques for memory security.

Permanent address (or email): [email protected]

This dissertation was typed by Abhishek Das.