Real-time lossless data compression techniques for long-pulse operation

7
Fusion Engineering and Design 82 (2007) 1301–1307 Available online at www.sciencedirect.com Real-time lossless data compression techniques for long-pulse operation J. Vega a,, M. Ruiz b , E. S´ anchez a , A. Pereira a , A. Portas a , E. Barrera b a Asociaci´ on EURATOM/CIEMAT para Fusi´ on, Avda. Complutense 22, 28040 Madrid, Spain b Dpto. de Sistemas Electr ´ onicos y de Control, UPM, Campus Sur. Ctra., Valencia km 7, 28031 Madrid, Spain Received 31 July 2006; received in revised form 12 June 2007; accepted 15 June 2007 Available online 3 August 2007 Abstract Data logging and data distribution will be two main tasks connected with data handling in ITER. Data logging refers to the recovery and ultimate storage of all data, independent of the data source. Data distribution is related, on the one hand, to the on-line data broadcasting for immediate data availability and, on the other hand, to the off-line data access. Due to the large data volume expected, data compression is a useful candidate to prevent the waste of resources in communication and storage systems. On-line data distribution in a long-pulse environment requires the use of a deterministic approach to be able to ensure a proper response time for data availability. However, an essential feature for all the above purposes is to apply lossless compression techniques. This article reviews different lossless data compression techniques based on delta compression. In addition, the concept of cyclic delta transformation is introduced. Furthermore, comparative results concerning compression rates on different databases (TJ-II and JET) and computation times for compression/decompression are shown. Finally, the validity and implementation of these techniques for long-pulse operation and real-time requirements is also discussed. © 2007 Elsevier B.V. All rights reserved. Keywords: Data compression; Lossless techniques; Delta compression; Data acquisition; Nuclear fusion; ITER 1. Introduction Compression techniques should be used to reduce storage space and to diminish network resources in data Corresponding author. Tel.: +34 91 346 64 74; fax: +34 91 346 6124. E-mail address: [email protected] (J. Vega). transmissions. The former, in addition to the obvious disk saving, helps in speeding up administrative tasks such as backups or data replication. The latter can be essential for both inter-process communications and data access. Inter-process communications refers to the data exchange between applications (typically via network) to work in a collaborative manner. An example of 0920-3796/$ – see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.fusengdes.2007.06.014

Transcript of Real-time lossless data compression techniques for long-pulse operation

A

rdedtTdat©

K

1

s

f

0

Fusion Engineering and Design 82 (2007) 1301–1307

Available online at www.sciencedirect.com

Real-time lossless data compression techniques forlong-pulse operation

J. Vega a,∗, M. Ruiz b, E. Sanchez a, A. Pereira a,A. Portas a, E. Barrera b

a Asociacion EURATOM/CIEMAT para Fusion, Avda. Complutense 22,28040 Madrid, Spain

b Dpto. de Sistemas Electronicos y de Control, UPM, Campus Sur. Ctra., Valencia km 7, 28031 Madrid, Spain

Received 31 July 2006; received in revised form 12 June 2007; accepted 15 June 2007Available online 3 August 2007

bstract

Data logging and data distribution will be two main tasks connected with data handling in ITER. Data logging refers to theecovery and ultimate storage of all data, independent of the data source. Data distribution is related, on the one hand, to the on-lineata broadcasting for immediate data availability and, on the other hand, to the off-line data access. Due to the large data volumexpected, data compression is a useful candidate to prevent the waste of resources in communication and storage systems. On-lineata distribution in a long-pulse environment requires the use of a deterministic approach to be able to ensure a proper responseime for data availability. However, an essential feature for all the above purposes is to apply lossless compression techniques.his article reviews different lossless data compression techniques based on delta compression. In addition, the concept of cyclic

elta transformation is introduced. Furthermore, comparative results concerning compression rates on different databases (TJ-IInd JET) and computation times for compression/decompression are shown. Finally, the validity and implementation of theseechniques for long-pulse operation and real-time requirements is also discussed.

2007 Elsevier B.V. All rights reserved.

ion; Da

t

eywords: Data compression; Lossless techniques; Delta compress

. Introduction

Compression techniques should be used to reducetorage space and to diminish network resources in data

∗ Corresponding author. Tel.: +34 91 346 64 74;ax: +34 91 346 6124.

E-mail address: [email protected] (J. Vega).

dsed

et

920-3796/$ – see front matter © 2007 Elsevier B.V. All rights reserved.doi:10.1016/j.fusengdes.2007.06.014

ta acquisition; Nuclear fusion; ITER

ransmissions. The former, in addition to the obviousisk saving, helps in speeding up administrative tasksuch as backups or data replication. The latter can bessential for both inter-process communications and

ata access.

Inter-process communications refers to the dataxchange between applications (typically via network)o work in a collaborative manner. An example of

1 ing and

tsu

adTmfu

mcpmoiatrrTa

saaaoafitws

nipF

2

(ao

XS

scasc

oe

H

plej

ctn(etdnΔ

e

abtsit

cwhereby an optimum bit allocation is found after exam-ining the data. To do this, the delta compression can useeither a fixed-size or a variable-length bit allocation

302 J. Vega et al. / Fusion Engineer

his could be real-time visualization applications thatimultaneously display a high number of waveformsnder long-pulse conditions.

Data access is focused on making any kind ofrchived data available to an application. It implies theata transfer from a data server to a client computer.aking into account that the transaction size can containillions of bytes, the data transmission in a compacted

orm will enable both a quicker data transfer and these of a lesser network bandwidth.

In general, there are two kinds of compressionethods: lossless techniques and lossy methods. The

lassification depends on whether the decompressionrocess gets exactly the initial information. Loosyethods produce some distortion in relation to the

riginal data. For instance, distortions are usuallyntroduced in the compression process of images andudio/video streams. This procedure is valid becausehe human perception does not detect small qualityeductions in images, video or audio. Nevertheless, dataelated to operation and diagnostics are particular data.he compression/decompression process must not altert all the Fourier components of waveforms.

Real-time data compression in ITER is a subject ofpecial interest due to two reasons. On the one hand, asn international project, ITER will make all data avail-ble to its partners in the remote experiment sites withminimum delay. It implies a real-time transmission

f data. On the other hand, the ITER long-pulse char-cter compels to modify typical behaviours in presentusion devices. For example, nowadays, data archivings performed after finishing the discharge. However,he ITER long records prevent from storing completeaveforms following the shot. Instead, real-time data

torage will have to take place.This article reviews several delta compression tech-

iques. Also, the concept of cyclic delta transformations introduced. Results on delta and cyclic delta com-ression methods on present fusion databases are given.inally, real-time cyclic delta compression is discussed.

. Foundations on data compression

Given a discrete set of values S = {s1, s2, . . ., sn}source alphabet), a new set X = {x1, x2, . . ., xn} (codelphabet) can be created by associating each sequencef symbols of S with another sequence of symbols of

wttc

Design 82 (2007) 1301–1307

. X represents a set of symbols that enable to encodesymbols using fewer bits than are used in S.Signal digitization can be seen as a source of N pos-

ible symbols {a1, a2, . . ., aN} in its alphabet (ADCodes), with a probability pj of producing the symbolj. At each clock instant one symbol is generated. Thisymbol is independent of the symbols produced at otherlock instants.

Information theory says that the information contentf a source can be expressed by a single number calledntropy of the source. This can be defined by

(p) = −∑N

j=1pj log2pj

According to information theory, no data com-action code for such a source can, on average, useess than H(p) bits per source symbol. Therefore, thentropy of a source provides a standard by which toudge the quality of a data compaction code.

Delta compression is a very easy, inexpensive from aomputational point of view and powerful compactionechnique. After the analog to digital conversion of sig-als, the source alphabet is made up of the digital codesS = {a1, a2, . . ., aN}) and it is characterized by a sourcentropy HS(p). After digitization, a delta transforma-ion is carried out. This transformation computes theifferences between the digital codes of adjacent sig-al samples, thereby generating a new source alphabet= {δ1, δ2, . . ., δN−1}, δi = ai+1 − ai, with a different

ntropy HΔ(p).Typically, most of the delta values are numbers with

small absolute value that can be encoded with fewits. This fact is the key point in the delta transforma-ion. The transformation creates a reorganization of theignal information whose consequence is to decreasets entropy, i.e. HΔ(p) ≤ HS(p). The real compressionakes place when the delta values are encoded.

Delta compression belongs to the family ofompaction techniques known as delayed methods,

hich is applied to the computed deltas. Bit alloca-ion with variable length ensures that the deltas havinghe greatest occurrence are connected to the shortestodes.

ing and

3

atdops[pApae

tsirar

3

partufwcteiab(w[

mTap

3

diiaswedeei

3

dttncoiistbCa

4

a.

mds1eb

J. Vega et al. / Fusion Engineer

. Delta compression in fusion databases

Fusion devices generate very large databases withmillion or more signals and tens or hundreds of

housands of samples per waveform. Some fusionatabases compress the whole data, whereas othernes only compact data from some diagnostics. Exam-les of the first group are the databases of the TJ-IItellarator [1], LHD stellarator [2], JET tokamak3] and the ones based on the MDSplus softwareackage [4]. Belonging to the second group is theSDEX Upgrade tokamak [5] where data com-ression is restricted to specific diagnostics suchs Doppler reflectometers and spectrography cam-ras.

Delta compression is the most typical compactionechnique in fusion databases. However, there is not aingle method. There are several approaches that differn the way that the deltas are encoded. Next paragraphseview three different lossless delta techniques that arepplied to the databases of TJ-II, JET and MDSplus,espectively.

.1. TJ-II compression technique

This technique [6], although it is based on delta com-ression, does not require an examination of the datas in delayed methods, thereby allowing its use undereal-time requirements. Delta distributions are assumedo follow a universal probabilistic model. Delta val-es are compacted according to general encodingorms that satisfy a prefix code property (no codeord is identical to the start segment of a longer

ode word) and which are defined prior to data cap-ure. Software libraries were written in ANSI C, thusnsuring portability and easy maintenance. Comput-ng times to compress/decompress waveforms are 90nd 100 ms/Mbyte, respectively. Both measures haveeen performed on an Intel Pentium IV computer3 GHz, 1 Gbyte RAM) and Suse Linux 9.1. All TJ-IIaveforms are compacted by means of this technique

7].This data compaction is also used for data trans-

ission between applications not only within theJ-II local area network [8] but also between remotepplications in the framework of the TJ-II remotearticipation system [9].

ad

l

Design 82 (2007) 1301–1307 1303

.2. JET compression technique

This technique, in its present form, is only valid forigital samples with a maximum length of 16 bits. Its also based on delta compression and it can be usedn real-time. The differences between data samples offixed bit length are encoded using variable bit length

amples. The compressed data are divided into packetshich have a constant bit length, but this may be differ-

nt to the bit length for adjacent packets or the originalata samples. The packet structure is determined byvaluating a simple cost function when processingach difference. The software library was implementedn C.

.3. MDSplus compression technique

In this technique [10], before the compression isone, the whole data stream is analyzed to determinehe optimum field width for the delta values to providehe highest degree of compression. Therefore, it can-ot be applied to real-time needs. The delta values areontained in smaller bit fields. When a change in valueccurs which is larger than the largest to be representedn the bit field, a special “marker” delta is stored whichndicates that the next value is a full data sample. Themaller the field size the better the compression excepthat if it is too small it causes more full samples toe stored reducing the efficiency of the compression.ompression functions are available under MDSplusnd C codes.

. A new concept of delta transformation

Data digitization with a resolution of B bits cre-tes a source alphabet with 2B possible symbols {0, 1,. ., 2B − 1}. After the delta transformation, the maxi-um delta is δmax = 2B − 1. Equivalently, the minimum

elta is δmin = −δmax. Therefore, the number of pos-ible delta symbols {−δmax, −δmax + 1, . . ., −1, 0,, δmax − 1, δmax} is 2 × 2B − 1. This means that thencoding method has to distinguish a number of sym-ols almost twice the amount of initial digital codes, as

consequence of the positive and negative sign of theeltas.

Encoding forms that use prefix codes may require aarge number of bits to encode big deltas. Therefore, a

1304 J. Vega et al. / Fusion Engineering and Design 82 (2007) 1301–1307

ic delta

riomauw

.

vnddocs

mibδ

d

siaadh

in Fig. 2c. Positive delta values have been mirroredaround delta = 2047 and negative delta values have beenmirrored around delta = −2048. This way, big deltas

Fig. 1. (a) Delta values with simple delta and (b) cycl

eduction of the number of delta symbols would helpn diminishing the need of bits. To reduce the numberf delta symbols, the concept of cyclic delta transfor-ation is introduced. The set of digital codes to samplesignal is made up of 2B codes (0, 1, . . ., 2B − 1). Lets imagine a cyclic arrangement of the codes in such aay that they are indefinitely repeated

. . , 0, 1, . . . , 2B − 1, 0, 1, . . . 2B − 1, 0, . . .

This ordering allows the computation of the deltaalue between adjacent signal samples as a signedumber whose absolute value is the minimum of theistances between digital codes computed upward andownward. Fig. 1a shows six samples and their deltasbtained with a three bit resolution. Fig. 1b shows theyclic delta transformation corresponding to the sameamples.

Hence, this transformation implies that the mini-um delta is δmin = −2B−1 and the maximum delta

s δmax = 2B−1 − 1. As a result, the number of possi-le delta symbols {δmin, δmin + 1, . . ., −1, 0, 1, . . .,max − 1, δmax} is now 2B, i.e. equal to the number ofigital codes.

The meaning of this transformation can be under-tood with Fig. 2. Fig. 2a shows an analog signal thats digitized with an ADC having 12 bits resolution

nd a 10 V peak-to-peak amplitude. Delta distributionppears in Fig. 2b. It should be noted the presence of bigeltas as a consequence of the transitions between theigh and low levels. Cyclic delta distribution is shown

Fd

transformations. Solid arrows represent cyclic deltas.

ig. 2. (a) Signal digitized, (b) delta distribution and (c) cyclic deltaistribution.

ing and Design 82 (2007) 1301–1307 1305

amb

f

tsfinti

et

won

vm

s

s

wn

t

••

Table 1Average compression factors with different compaction techniqueson TJ-II and JET databases

TJ-II JET

TJM

5d

J2ta

nccnstn

tpactnrfto

6c

hed

J. Vega et al. / Fusion Engineer

re transformed into small deltas that are precisely theore frequent elements and therefore, they need few

its to be encoded.By defining the compression factor as

=(

1 − compressed storage

no compressed storage

)× 100

the compression factor with the delta transforma-ion and the TJ-II compaction technique for the signalhown in Fig. 2a is −3.93%. The negative sign signi-es that the compressed size is greater than the storageeeded for the digital samples. However, by applyinghe cyclic delta transformation, the compaction factors 22.46%.

The cyclic delta computation can be computed veryfficiently. It only requires one additional sentence tohe delta calculation.

delta = code(i + 1) − code(i)

cyclicDelta = delta − int(delta/nc2)nc

here int() means truncation, nc represents the numberf codes in the analog to digital conversion, and nc2 isc/2.

Also, the decompression process can be computedery efficiently. Sample recovery from delta transfor-ation is carried out by means of

ample(i) = sample(i − 1) + delta

With cyclic delta, the sentence in C language is

[i] = ((s[i] = s[i − 1]

+ cyclicDelta) ≥ 0)?s[i]%nc : s[i]

+ nc;

here s is the array of samples and nc represents theumber of codes in the analog to digital conversion.

To finish, it should be remarked the advantages ofhe cyclic delta versus the delta transformation:

The number of cyclic delta symbols is equal to

the number of ADC codes, instead of being almosttwice.|cyclicDelta| ≤ |delta|.Big deltas are encoded with few bits.

eu

f

J-II techniques (cyclic delta) 69.87% 49.59%ET technique (delta) 65.36% 56.37%

DSplus technique (delta) 71.92% No data

. Results on delta compression in fusionatabases

This section shows average results on TJ-II andET databases. The number of test waveforms was,913,438 and 62,567, respectively. Table 1 sums uphe compression factors computed with the TJ-II, JETnd MDSplus techniques.

MDSplus technique is a pure delayed method. Iteeds all samples before compression and therefore, itan compute in advance the storage needed for deltaompression. If a negative factor is obtained, data areot compressed. This implies that negative compres-ion factors never appear. However, TJ-II and JETechniques do not analyze the samples in advance andegative compressions may occur.

The most detailed studies have been performed withhe TJ-II database and additional results should beointed out. Firstly, negative compression happened in1.14% of the TJ-II waveforms. Secondly, the worst

ase of negative compression implied the use of twicehe storage space of the digital codes. Thirdly, there isot a single compression method that provides the bestesults in all cases. In fact, by choosing the best methodor each waveform, the average compression factor forhe TJ-II database would have been 72.72%. In the casef JET, the average factor would be 60.80%.

. Real-time application of cyclic deltaompression

Cyclic delta compression methods allow achievingigh compaction rates with negligible computationalffort. The cyclic delta values can be encoded andecoded in real-time. The code alphabet should be

stablished beforehand. Several code alphabets can besed and they have to be known by the data receivers.

Real-time compression/decompression can be usedor real-time inter-process communications, real-

1 ing and Design 82 (2007) 1301–1307

tIabsm

etaruciatb

daectdaisttefs

idsstpftcrgTIo

Table 2Block size (*1024 samples), data acquisition plus compression time(ms), theoretic maximum sampling rate (Msamples/s) and measuredsampling rate (Msamples/s)

Block size

1 2 4 8 16

Time 0.9 1.4 2 3.7 7.3ff

aip

pptwfcw1nTbttsfs

ap

7

turac

306 J. Vega et al. / Fusion Engineer

ime data distribution and real-time data storage.nter-process communication is related to networkpplications. In this case, data exchange should note carried out delta by delta. Instead, compressed datahould be divided into packets of deltas and then trans-itted for a proper use of network resources.Data distribution refers to the on-line delivery of

xperimental data during a shot. Typically, signal digi-ization could be carried out by means of standard datacquisition cards. These standard cards could incorpo-ate neither enough memory for the whole dischargender long-pulse conditions nor real-time processingapabilities. In general, samples should be transferrednto computer random access memory (RAM) memorys a previous step for any kind of processing. Again,he transfer is not performed sample by sample but inlocks.

Data storage is a particular case of data distribution.Therefore, inter-process communication and data

istribution need the management of data packets forn efficient use of computational resources. The pack-ts can be compressed in real-time by using real-timeyclic delta techniques. The determinism in data cap-ure and distribution is defined by the sum of threeifferent times. Firstly, the time devoted to digitizingpacket, sending it to RAM memory and perform-

ng the cyclic delta compression. Secondly, the timepent in transmitting the compacted data. Finally, theime needed to decompress the information in ordero be able to use the data. To speak about real-time,ach specific application must take into account theseactors with their associated constrains: measurementystems, data networks and processing computers.

Some tests were performed to measure the process-ng time for simultaneous data taking and compressionuring a continuous data acquisition. Several blockizes were tested. After acquiring the number ofamples defined by the block size, the samples areransferred to RAM memory and then they are com-ressed with cyclic delta and one of the TJ-II encodingorms. Simultaneously, data digitization continues. Tohis end, a PXI data acquisition system was used. Theontroller was a Pentium III computer (750 MHz),unning Windows 2000. Data acquisition was pro-

rammed with LabView 7.0 and NIDAQ V6.5 drivers.he acquisition card was a PXI 6070-E from National

nstruments which provides a maximum sampling ratef 1.25 Msamples/s. The input is a sinusoidal signal

erc(

max 1.13 1.46 2.04 2.21 2.24

measured 1 1.25 1.25 1.25 1.25

nd processing times were determined with the profil-ng tool provided by LabView. Each data acquisition iserformed during 4 min.

Table 2 summarizes the results. Data acquisitionlus compression time determines the maximum sam-ling rate for each block size. This takes into accounthat the compression on one block must have finishedhen the next block is ready for compaction. Due to the

act that the maximum sampling rate in the acquisitionard is 1.25 Msamples/s, only one problem appearedith block sizes of 1k. They could not be digitized at.25 Ms/s and, in addition, the real sampling rate didot achieve the theoretic maximum value (1.13 Ms/s).his can be interpreted as a consequence of the smalllock size in comparison with the data transfer size ofhe NIDAQ driver for the E-6070 card. It is clear fromhe table that the software driver is optimized for blockizes greater than or equal to 4k. Therefore, data trans-er must be carried out accordingly to the optimizedoftware driver.

It should be noted that the LabView profiler tool isvery intrusive method to measure times, so, the realrocessing times are for sure smaller.

. Conclusions

Data compression in fusion databases has showno be a very useful way to save storage. In partic-lar, delta compression techniques are very fast andobust methods. The cyclic delta compression allows

higher compression factor than the standard deltaompaction and it also maintains the computational

fficiency. Cyclic delta compaction can be used undereal-time requirements and long-pulse operation (ITERonditions) with standard data acquisition systemseven with not very fast processors and time-sharing

ing and

or

A

JOciN

R

J. Vega et al. / Fusion Engineer

perating systems) and 4th generation language envi-onments.

cknowledgements

The authors wish to thank to K. Behler, A. Capel,. Farthing, T. Fredian, J. Lister, H. Nakanishi, H.kumura and W. Suttrop for their help and valuable

omments. This work was partially funded by the Span-sh Ministry of Education and Science under the Projecto. ENE2004-07335.

eferences

[1] http://www-fusion.ciemat.es/New fusion/en/TJII/.

[

Design 82 (2007) 1301–1307 1307

[2] http://www.lhd.nifs.ac.jp/en/.[3] http://www.jet.efda.org/.[4] http://www.mdsplus.org/.[5] http://www.ipp.mpg.de/eng/for/projekte/asdex/.[6] J. Vega, C. Cremy, E. Sanchez, A. Portas, S. Dormido, Encoding

technique for a high data compaction in data bases of fusiondevices, Rev. Sci. Instrum. 67 (12) (1996) 4154–4160.

[7] J. Vega, C. Cremy, E. Sanchez, A. Portas, J.A. Fabregas, R. Her-rera, Data management in the TJ-II multilayer database, FusionEng. Des. 48 (2000) 69–75.

[8] E. Sanchez, A. Portas, A. Pereira, J. Vega, Applying a messageoriented middleware architecture to the TJ-II remote participa-tion system, Fusion Eng. Des. 81 (2006) 2063–2067.

[9] J. Vega, E. Sanchez, A. Portas, A. Pereira, A. Mollinedo, J.A.

Munoz, et al., Overview of the TJ-II remote participation sys-tem, Fusion Eng. Des. 81 (2006) 2045–2050.

10] T.W. Fredian, J.A. Stillerman, MDS/MIT high-speed data-acquisition and analysis software system, Rev. Sci. Instrum.57 (8) (1986) 1907–1909.