Implementation of a Server Architecture for Secure Reconfiguration of Embedded Systems

ARPN JOURNAL OF SYSTEMS AND SOFTWARE VOL. 1 NO. 9 DECEMBER 2011

Implementation of a Server Architecture forSecure Reconfiguration of Embedded Systems

Yannick Verbelen∗, An Braeken∗, Serge Kubera∗, Abdellah Touhafi∗, Jo Vliegen† and Nele Mentens†∗Erasmushogeschool Brussel, Brussels, Belgium

Email: {yannick.verbelen, an.braeken, serge.kubera, abdellah.touhafi}@ehb.be†Katholieke Hogeschool Limburg, Hasselt, Belgium

Email: {jo.vliegen, nele.mentens}@khlim.be

Abstract—Field reconfigurable logic finds an increased integration in both industrial and consumer applications. A need for securereconfiguration techniques on these devices arises as live firmware updates are essential for a guaranteed continuity of theapplication’s performance. Ideally, a wide variety of different reconfigurable devices in a range of applications should be configurablewith suitable firmware from a central location, since outdated or wrong configuration data could potentially cause irreversible damageto the device. At the same time eavesdropping must be made unfeasibly difficult to keep the intellectual properties of the applicationprovider secured.This work proposes a software architecture for a server platform allowing secure bidirectional communication over TCP/IP withreconfigurable logic in the field. Moreover a performance comparison between C# and Java is discussed for the different cryptographicalgorithms applied in the application.

Index Terms—Server Architecture, Embedded System, FPGA, CRU

F

1 INTRODUCTION

The increased presence of reconfigurable logic de-vices such as Complex Programmable Logic Devices(CPLDs) and Field Programmable Gate Arrays (FP-GAs) in secure applications originates the need for amechanism to securely reconfigure these devices witha revised bit stream. In the project STRES (Secure Tech-niques for Remote reconfiguration of Embedded Systems),a complete solution is developed for secure remotereconfiguration of an FPGA-based embedded systemby means of a central reconfiguration unit (CRU).This solution consists of three different parts, as canbe identified in Figure 1. The first part is the un-derlying communication protocol that ensures mutualauthentication of client and server and data integrityand confidentiality. The second component representsthe software implementation of the CRU. Finally, thelast component consists of a synthesizable VHDL corethat can be integrated into any existing application’sVHDL design. This core is developed with a focus oncompactness and simplicity for integration. Especiallythis last property implies that during the design of theapplication, less attention must be given to reconfigu-ration since this capability can at release time be added

to the application’s design.

ServerCRU

ClientFPGA

STRESHandshake

ProtocolSTRES Core

User Application

UserApplication

SecurityAndSafety

Database

Postoffice

{

ST

RE

S C

ore

Fig. 1. Structural model of FPGA and CRU in theSTRES project.

Since the VHDL code is fundamentally hardwareindependent (given the condition that enough recon-figurable space is available in the device) [3], only onehardware feature is required on the client side, being acommunication port to the CRU. Although technicallyany interface connectable to the reconfigurable devicecan be used, the wide availability of the Internet in-spired the limitation of the STRES core communication

270

Vol. 1 No. 9 December 2011 ISSN 2222-9833ARPN Journal of Systems and Software

c©2009-2011 AJSS Journal. All Rights Reservedhttp://www.scientific-journals.org

capabilities to TCP/IP only. This eliminates the needof implementing rarely used interfaces, thus allowingfor a smaller overall design, and potentially a cheaperend product since logically smaller (and cheaper) re-configurable devices can be used. The downside of thisapproach however is the necessity to implement thecomplex TCP/IP communication stack, yet this issuefalls outside the scope of this paper and will not becovered here. The reader is referred to [2] for moredetails concerning the hardware implementation at theembedded system side.

On server side (CRU), a software applicationis required to manage bit streams for differentreconfigurable platforms, to keep track of the differentclients in the field, and to organize communicationwith clients to transfer bit streams when necessary.By convention, in the STRES layout, clients alwaysinitiate bit stream updating sequences as regularfirewall setups in secure environments normally allowoutgoing connections to be made (from client to CRU)but block incoming connections before they reach thesecure network level where STRES enabled secureapplications are typically operated. Essentially, theCRU will be listening for incoming connections, andreconfiguration procedure will be initiated as soon asan incoming client connection is received.

The paper is organized as follows: Section 2 explainsthe STRES protocol between server and client, whileSection 3 discusses the server architecture. In Section4 implementation details are provided for which Sec-tion 5 presents a proof of concept. Finally, Section 6concludes the paper.

2 STRES PROTOCOL

The communication between client and CRU as de-signed for STRES is based on a well established cryp-tographic protocol, enabling mutual authentication ofclient and server, and ensuring confidentiality and in-tegrity of the transferred data. Firstly the cryptographicprotocol will be introduced, followed by a discussionof various practical implementation aspects primarilyaimed at improving the communication security.

2.1 Cryptographic ProtocolIn order to enlarge the credibility of the system, astandardized cryptographic protocol was chosen forexchanging data. The Station-to-Station (STS) protocol,based on the classic Diffie-Hellman protocol, fulfillsall the necessary security requirements. This protocol

defines a shared secret session key K. After the keyagreement, K can be used for further communicationbased on symmetric key encryption.

The STS protocol applied in STRES is based on ellip-tic curve operations. Elliptic curve cryptography (ECC)relies on the ability to compute a point multiplicationand the inability to compute the multiplicand usingthe original and product points [4], creating a one-wayfunction. Although the existence of one-way functionsis still open for debate and as of 2011 no mathematicalproof of security has been published for elliptic curvecryptography, it is regarded as safe for protection ofinformation with a minimum key length of 384-bitkeys [5]. For STRES, this bit length is extended to512 bits [2]. Note that the choice for ECC was mainlydriven by the fact that a compact implementation athardware side was needed. ECC requires smaller keysizes, less storage, less power, less memory, and oftenless bandwidth than other public key systems for anequivalent amount of security. These properties makeECC well suited for application in embedded systems.

CRU Client

keypair (a, A) keypair (b, B)generator P generator P

choose k2

Q = k P11

Q = k P22K = k Q2 1

K = k Q1 2

ςε bK [ { } ]Q U1 2Q UBU IDU2Q

Σ B{ }Q U1 2Qb,

Σ A { }Q U1 2Qa,

ςε aK[ { }Q U1 2Q UA ]

k1choose Q1

reconnect to y

authenticationkey exchange

complete

Fig. 2. STRES cryptographic protocol for authenticationand key exchange.

Next, the protocol is explained more detailed with

271



reference to Figure 2 for a schematic overview of theprotocol.

Both the CRU and the client’s embedded system startwith a generator P and a key pair consisting of aprivate and public key respectively, (a,A) for the CRUand (b, B) for the client. When the client initiates thekey agreement protocol, the server reacts by randomlychoosing a number k1 and executing the elliptic curvepoint multiplication Q1 = k1 ∗ P . The resulting pointQ1 has two coordinates (Q1x,Q1y) each of length 256bit. Likewise, the client also chooses a random numberk2 and multiplies it with the same generator resultingin Q2 = k2∗P . Using k2, the client can calculate k2∗Q1,resulting in the same value k1 ∗Q2 which will be latercalculated by the CRU. This value, K, is the result of theSTS protocol, and can now be used as session specificsymmetric key. K is a point on the elliptic curve with acoordinate pair (Kx,Ky), each having a length of 256bits. Therefore, it must first be reduced by means of ahash function before it can be used as symmetric key.The hash function halves the key length from 512 to256 bits. This string is then split into two parts. The128 most significant bits K ′ are used as symmetric keyfor encryption. The 128 least significant bits are used assecret key for the message authentication code (MAC)to ensure the integrity of the message.

However, to establish a legitimate connection fromthe CRU’s point of view, Q2 is combined with Q1

and signed using the client’s private key b. Signaturegeneration is denoted by ς in Figure 2, while signatureverification is symbolized by Σ. This signature is thenencrypted (ε) with the symmetric key K ′ along withthe client’s public key B and the unique FPGA ID.In the next step, K (and thus K ′ is calculated on theCRU side, and used to decrypt the message sent bythe client. The signature can then be verified using B,and the same process repeats when the CRU echoesan encrypted signature of Q1 and Q2 back to the clientalong with its public key A. In the last step the clientdecrypts and verifies this signature after which bothhave agreed upon a session key for symmetric keycryptography and message authentication and haveverified each other’s origin.

The Advanced Encryption Standard (AES) was chosenas symmetric key algorithm because it is the standardcryptographic encryption algorithm offering 128 bit se-curity [6]. As hash algorithm, the Secure Hash Algorithm(SHA)-256 function was chosen which is one of thefew unbroken and well established hash functions with

128-bit security. Since most elliptic curve operationsare readily available from the key agreement scheme,ECDSA was chosen as signature algorithm [7].

2.2 Practical securityThe communication link itself is theoretically securedue to the difficulty of elliptic curve logic trapdooroperations. However, this concept is only effectiveon the condition that clients are able to access theCRU. If a client is unable to retrieve an updatedbit stream from the CRU to reconfigure itself with,an attacker could be given the time to complete amalicious operation because the faulty bit streamcannot be patched. This implies that measures mustbe taken to prevent the CRU from going offline,getting overloaded or inaccessible to the client byany other means [9]. The most significant threatuncovered by the cryptographic protocol is a DenialOf Service (DoS) attack, which attempts to consumeall available server or client resources renderingthem unavailable to the legitimate users. No provencountermeasure exists other than the installation ofbackup servers, high capacity links on ISP level etc.Fortunately, the STRES mutual authorization systemin combination with the socket interface provided bythe TCP/IP communication protocol, can help makingDoS attacks a harder task. Note that other protocolscan be found in literature typically designed to offerincreased resistance to DoS attacks [10]. However, wehave chosen to use a well established security protocol.

In order to make DoS attacks even more difficult,the communication flow was implemented basedon an easy mechanism called the port forwardingmechanism. In a classic server setup, any clientmight be allowed to connect to any of the 216 logicalUDP ports of the system. As long as neither of bothactively terminates the connection, it stays open andthus keeps the assigned server port in use. In thehypothetical case that 216 malicious clients connect tothat server and keep the connections open indefinitely,any incoming connection request from a genuine clientwill be refused, resulting in a DoS scenario. To preventthis critical security hazard, the STRES communicationprotocol allows clients to connect only to a singlefixed server port which can be chosen at random inthe 210 to 216 − 1 range. This not only eliminates therisk of all available server ports to be taken hostagealmost uncontrollably, but is also from a resourcepoint of view a more profitable setup since listeningto every port would require 216 active sockets, which

272



is especially for a client population size � 216 a wasteof resources.

Since probing the server for its active communicationport would result in a poor performance, the chosenport should preferably be coded into every clientas well because no drop in security is created bythe exposure of this information. Depending on theconnection speed, this implies a possible maximumconnection lifetime per client exceeding incomingrequest time outs. Rather than extending the time outconfiguration on TCP/IP level, the communicationprotocol attempts to structurally solve this problemby forwarding incoming clients to other ports as soonas possible. When a client connection is received onthe fixed port x, the CRU reacts by opening a socketon a randomly chosen free port and returning thisport number y to the client. Next, the CRU terminatesthis connection on port x. To ensure a maximumavailability of the CRU, a new socket is openedon port x which starts listening for other incomingconnections and the client is given the opportunity toconnect to the issued port y. In this setup at any timeonly n + 1 < 216 sockets are active simultaneously,with n being the number of connected (or connecting)clients. After reconnecting to port y, the cryptographicprotocol is initiated.

Despite the more efficient resource management ofthe communication protocol thanks to the port for-warding mechanism, it is still possible to perform aneffective DoS attack on this configuration. One justneeds to connect to port y and follow the protocoldefinition steps until authentication and then stallingits execution. To decrease the effectiveness of this ap-proach, a watchdog timer is implemented to limit theallowed time span to successful authentication with acountdown magnitude being function of the networkresponse (ping).

3 SERVER ARCHITECTURE

3.1 Namespace overview

To avoid excessive entanglement of the applicationwith the STRES CRU core itself, the top level is splitinto two categories of namespaces. The applicationnamespace contains all code unique to a single spe-cific application. The STRES core namespaces containroutines for cryptography, database management, andserver operations. They are called SecurityAndSafety,Database, and Postoffice namespace respectively (see

Fig. 1). This organization prevents editing of the STREScore code for modal applications, thus directly avoidingthe potential insertion of security leaks. Secondly, atransparent implementation of the STRES core also al-lows for quicker and cheaper integration of the STRESsystem into existing or new applications.

3.2 Cryptographic LibrariesA second measure to increase the STRES core reliabilityis the choice for reuse of as much predefined codeas possible, under the form of cryptographic librarieswith proven integrity. Virtually all currently availableplatforms implement cryptographic base functionalityto some degree. However, for the implementation ofclient - server applications, C# and Java stand outto be particularly strong in this field. C# is backedup by the .NET Framework, an extensive library con-taining a multitude of cryptographic algorithms. Javasupports JCE (Java Cryptographic Extension) and JCA(Java Cryptographic Architecture) as a source of basiccryptographic functionality [14]. Unluckily neither aresufficient to implement all the cryptographic function-alities as defined in the STRES protocol, hence anadditional library is required.

Fortunately, the Bouncy Castle cryptographic library,which implements all non licensed cryptographicalgorithms known to date (excluding the IDEAalgorithm), is available for both the C# and Javaplatforms [13]. Moreover, it is a complete open sourcereference library. An important difference betweenthe two platforms is that Bouncy Castle for Javabehaves as a black box in code, while for C# thesource code is much easier tweakable due to its betteravailability. It must also be noted that a larger onlineknowledge base for the use of the Bouncy Castlelibrary with Java is available than there is for C#. Thissituation can mainly be explained because relatively ahigher fraction of applications implement connectionsbetween platforms which both run a Java platform,rather than communicating between a Java platformand low level hardware as is the case for the STRESlayout.

After careful comparison of both platforms againsta list of all features required for the implementationof the STRES CRU, a few flaws in Java were foundthat turned out to be prohibitive for using Java for thedevelopment of the STRES CRU core. The most impor-tant problem is the lack of unsigned integers in Java[15], which makes TCP/IP communication with clientssignificantly more complex. Note that this problem can

273



be solved by means of the JBits library. However sincethe development of JBits is discontinued and recenthigh end FPGAs are unsupported (e.g. up to XilinxVirtex 2), no further attempts were made to integrateJBits in the FPGA communication code.

Consequently, it was decided to implement bothCRU core, database routines and front end using C#.To make a comparison with respect to speed andcode complexity still possible, multiple CRU classeswere implemented in both languages, and in each ofthese situations the superiority of C# over Java wasconfirmed. Section 5 provides details with respect tothis comparison.

4 IMPLEMENTATION

As previously mentioned, the STRES core is an in-dependent unit separated from any application code,acting like a black box and providing a secure shellaround the different building blocks composing it. Itconsists of a namespace bundling all cryptographyand safety routines called SecurityAndSafety namespace,another one managing the database connections calledDatabase namespace, and a third one responsible for net-work communications Postoffice namespace. Since manyimplementation aspects of the CRU are rather trivialfor experienced software architects, only the most im-portant implementation aspects are discussed.

4.1 SecruityAndSafety namespace

The SecurityAndSafety namespace has a doublefunction. It groups on the one hand classes togetherimplementing the STRES specific cryptographic andsecurity code as well as their auxiliary routines.On the other hand, it also provides an entry pointfor externally called cryptographic routines fromthe Bouncy Castle library. Although these externalcryptographic calls imply a security threat if the libraryis altered, this is considered negligible since full accessto the server file system would be required. However,it is possible to fully integrate the Bouncy Castleroutines necessary for STRES in the STRES core sincethe complete C# source of Bouncy Castle is publiclyavailable [13]. The Safety part of the namespace’s namereflects the presence of lower importance routinessuch as CRC-checking of bit files and verification ofbit file platform identifiers to prevent the transmissionof incompatible bit files to clients.

4.1.1 Protocol implementation

A first important class in the SecurityAndSafety names-pace is the STRESProtocol class which implements theIHandshakeProtocol interface specifying the sequence ofdifferent steps required to authenticate a client andagree on a session key. In other words, STRESProtocolimplements the cryptographic protocol as described inSection IIa. An implementation of the IHandshakePro-tocol interface can be seen as a state machine callingcryptographic operations in a protocol-specific order.Requests are relayed to the Cryptographer class whichacts as a proxy for the four main algorithms used by thecryptographic protocol: Diffie Hellman Elliptic Curvekey agreement (DHEC), Elliptic Curve DSA (ECDSA),AES-128 and SHA-256. Security settings for encryptionare part of the Cryptographer configuration stack, andpassed down to the underlying algorithm implementa-tions. The native availability of a SHA implementationin the .NET Framework for instance allows for a selec-tion of either the SHA digest algorithm in .NET’s Se-curity namespace, and the digest algorithm of BouncyCastle. Furthermore, both Bouncy Castle as C# providea (pseudo)random number generator specifically de-veloped for cryptography with a period of 219937 − 1(19937 being a Mersenne prime), sufficiently secure forapplication in the STRES cryptographic protocol. Thisinformation can subsequently be used for the construc-tion of an elliptic point multiplication algorithm, thepreviously discussed one-way algorithm on which thesecurity of DHEC is based.

Entries for both .NET and Bouncy Castleimplementations have been made accessible from theCryptographer and selectable where possible, primarilyexecuted in parallel as a means of fast verificationof end results (if for the same input both outputsmatch it is accepted, else it is rejected and logged).This parallel processing and comparison makestampering with cryptographic routines significantlymore challenging for attackers. The Cryptographerhandles this parallelization transparently and onlyreturns a value to the calling routine, the protocol statemachine in most cases, when both values match.

Since every cryptographic protocol implementsthe IHandshakeProtocol interface and almost anycryptographic algorithm is directly available in eitherthe .NET Framework, Bouncy Castle or both, newprotocols can easily be constructed. Actions as simpleas rearranging the steps of existing inherited protocolsguarantee to slow down attackers, while the CRU’scryptographic strength can continuously be updated

274



throughout the life time of connected products byinserting the newest cryptographic algorithms into thehandshake mechanism.

4.1.2 Key and data handlingData blocks and keys are 128 and 256 bits in lengthrespectively, while the largest unsigned integer in C#only has a length of 64 bits. Fortunately, the BigIntegerstructure extends the base type 64 bit integer andallows for the creation of arbitrary sized integers.The BigInteger structure provides basic arithmeticoperations on the integer value as a whole, whilealso allowing bit level operations such as XOR oftwo BigIntegers of equal length, bit inversions etc.Both the .NET Framework as the Bouncy Castlelibrary provide an implementation of the BigIntegerstructure with comparable functionality. While thevalue of BigIntegers can be set from a multitudeof base types including a string with known radix(allowing hexadecimal user input, for example), abyte vector is preferable in the STRES applicationcontext since bytes are directly ready for transmissionto the client. Likewise, BigIntegers representing keyscan be constructed directly from incoming byte data.BigIntegers play a double role in the STRES core: theyrepresent AES data blocks and hold elliptic curvepoint coordinates.

4.1.3 Elliptic Curve implementationAs with many trapdoor functions in key agreementschemes, elliptic curve operations require a set of pa-rameters needed for initialization [5]. For elliptic curvecryptography this includes, but is not limited to, thedefinition of the curve itself which takes the shape ofan equation of the form y2 = x3+ax2+b, parameterizedby the variables a and b. Secondly, a field prime isneeded, and care must be taken when choosing itsince the security of later elliptic curve operations willmainly depend on this parameter. Hence a large num-ber of primes have been constructed historically, manyof which are described in NIST publications [8] andrecommended for use with elliptic curve cryptographyas published in appearing in X9.62 and FIPS PUB-186-2.The curve selected for the STRES cryptographic proto-col is P-256 with prime p = 2256 − 2224 + 2192 + 296 − 1.

Fortunately, Bouncy Castle features a list of NISTrecommended curves in the NistNamedCurves class,providing the GetByName function to retrieve all nec-essary parameters from P-256 as well as many others

and thus eliminating the need of hard coding prime, aand b parameters directly in the Cryptographer class.

The computation of the elliptic point multiplicationsis implemented using the shift-and-add algorithm,which loops through the bits of a when multiplyinga with b and adds b to a nulled multiplicand whena given bit in a is 1. After each cycle, regardlessof the adding step, the multiplicand is shifted tothe left for one bit, doubling its value. a, b and themultiplicand are all elliptic points, thus requiringmore advanced arithmetic operations than can beused for regular integers. The Bouncy Castle libraryprovides the ECPoint class however, from which theinstances represent points on elliptic curves. ECPointimplements an Add function which can be used to addanother ECPoint to a given ECPoint object, and a Twicefunction for doubling the value of a point using thepredefined constant points ECPoint.Zero, ECPoint.Oneand ECPoint.Two. For the exact implementation of theAdd and Twice functions, the reader is directed to thepublished Bouncy Castle source.

While it is possible to retrieve the x and y coordinatesfrom ECPoint objects to avoid redundant serializationof the objects for transmission, the inverse operationis defined. For example, the ECPoint class does nothave a public constructor taking x and y coordinatesas arguments. The most important reason for thiscomplication is the inability to construct an ECPointobject capable of performing the arithmetic operationsdescribed above solely from the x and y coordinates.Information about the curve on which the point issituated on is also required. As a logical consequence,arbitrary points built from coordinates are createdby a point generator integrated in the ECCurve classwhich provides a portal for prime and both a andb parameters. Its function CreatePoint is essential forconverting received point coordinates (Q2) from theclient to an ECPoint object which can be used forcalculations.

4.2 PostOffice namespaceSince any server application must be capable ofmaking different connections simultaneously, fromwhich none are allowed to stall the server, intuitivelya multithreaded code structure is necessary. Thereforein the STRES PostOffice namespace, the ClientLinkthread is responsible for managing the connectionwith a single client FPGA bidirectionally. At anygiven time, as many ClientLink threads are running

275



as there are FPGA’s connected to the server. Everyactive ClientLink thread consumes a single logical porton the host, which is used for both transmitting data(more specifically bit streams) to the client FPGA andreceiving commands from it. Routines in a ClientLinkinstance are encapsulated within the running thread,and only accessible in the context of the specificassociated client FPGA. Depending on the state of theClientLink as defined in the LinkState enumeration,additional non shared code can be retrieved as objectinstances. For example the handshake routine isnot needed before a connection with the client isestablished, and the Authorizing state is reached in thestate diagram. The isolation of state specific routinescontributes to saving server resources as obsoleteobject instances can be disposed of without severingthe ClientLink itself (e.g. when the state diagramreaches the state Authenticated, the handshake routineis no longer needed and can be safely removed fromthe working set.

LinkMaster

PortMonitor

ClientLink

Handshake Transmission

WatchdogEvents

Fig. 3. Software layout of the Postoffice namespace.

For a factory pattern to operate efficiently, asupervising monitor is mandatory. This purposeserves the LinkMaster class, which continuouslylistens to a fixed port for incoming connections andnegotiates a port number to which the client can bedelegated. When an incoming connection is received,the primary task of the LinkMaster is to execute itsfactory pattern constraints to spawn a new ClientLinkinstance by calling its constructor. In the ClientLinkconstructor, a port number is requested from thesystem’s network, and returned to the LinkMasterusing a PortMonitor object. This mutex locks the port

exclusively for the associated ClientLink object, and asa dependent resource the LinkMaster is stalled untila free port number can be obtained. The LinkMastercan then retrieve the reserved port number from thePortMonitor and launch the ClientLink thread beforesending the port number to the client FPGA. Althoughcreating a small overhead, this setup ensures that theserver will be listening to the reserved port before theFPGA can attempt to establish a connection there, thuspreventing a possible deadlock situation. Immediatelyupon passing the port mutex for a specific port numberto its corresponding ClientLink thread, an independentwatchdog timer is started by the LinkMaster to controlirregular ClientLink behavior. This timer must beactively reset by the ClientLink by progressing throughits state diagram, else a time out will be generated,causing the LinkMaster to terminate the ClientLinkprocess. The most critical goal of the watchdog timeris to reduce the likeliness of DoS attacks, by releasingresources on server side that are illegitimatelyoccupied. Without the watchdog timer, a rogue clientcould for example continuously establish connections,thus reserving port numbers and spawning ClientLinkthreads, but stalling them by not initiating the STRESauthentication procedure. In that case however, thewatchdog will expire at LinkMaster level regardlesswhether or not the ClientLink is stalled, and triggerthe necessary events to terminate the thread andrecover its resources. Since communication betweenClientLink and LinkMaster happens asynchronously,multiple events are more efficient than passingin a set of conditions using a single notificationevent. The events exposed by the LinkMasterare ConnectionAccepted, ConnectionTerminatedand ConnectionAutenticated. The interval untiltime out before a ConnectionTerminatedEventwill be triggered unless a ConnectionAcceptedor ConnectionAuthenticated event is fired by theClientLink thread can be configured as a server settingand is vital for server performance optimization. Thetime out period will be proportional to the availableserver resources, but will also be a function of theexpected network delays between client and server.Figure 3 displays the layout of the Postoffice namespace,as explained above.

4.3 Database namespaceAs a data driven application, a repository is neededfor both descriptive information about the client (suchas its location, IP address, platform, current bit file

276



version etc.) and the bit files themselves. Althoughfrom the point of view of a relational database design,a bit file is also a property of the entity Client.Therefore, it was chosen to exclude the bit files frommanagement by the Database Management System(DBMS) (PostgreSQL was chosen for STRES). Becausea bit file contains all the data needed to systematicallyreconfigure the client FPGA regardless of its occupancyratio (e.g. how much logic is actually used by thedesign), its size will consequently outnumber the othercombined client properties by a factor typically being104 - 106. Instead, saving the bit files directly in thefile system requires no overhead due to the nature ofthe information as a file. This allows a reduction ofthe size of the database with at least a factor 104. Sincebit file synthesizers natively deposit output in the hostfile system, it also circumvents the obsolete step ofinserting or updating existing bit files in the database.

The primary key uniquely identifying the clients inthe database is the FPGA ID. This ID number is re-trieved in the mutual authentication handshake, duringthe initial connection phase of the client. It is then usedby the CRU to retrieve all client’s properties as well asthe absolute file path to the most recent bit file for thatparticular client. Next, the bit file is loaded from thedisk as soon as the FPGA ID is known to the CRU toconserve time. It is unloaded by the LinkMaster whenthe ClientLink requests resource recovery by firing theConnectionTerminated event.

5 PROOF OF CONCEPT

To verify the operation of the STRES reconfigurationsystem, a test case requiring most of the STRES coreelements was chosen in function of the quantity ofreconfigurable logic available on an average 2010 highend FPGA system (Xilinx Virtex 5, Spartan 6 or equiv-alent FPGA classes). The selected user application in-cludes intensive image processing on client side (thusimplying the necessity of dedicated FPGA hardwareto accelerate this operation) and secure transmission ofdata extracted from the images to the CRU.The application was demonstrated on January 27, 2011as part of the concluding lectures of the STRES project.The reconfiguration capabilities of the STRES systemwere proven by the live real time reconfiguration of theimage processing unit, adding background subtractingfunctionality to the design. All tests were performedon a Xilinx SP605 FPGA test platform. Comparison ofthe independent computation results on the same inputdata of the STRES core and the Magma computer algebra

system [16][17], which was used extensively to verifythe correct operations in the Cryptographer class in earlystages of STRES development, reveal a perfect matchfor all cryptographic levels.

223 279 t(ms)

f(n)

326295

ECDH key agree

ECDSA signature (gen + check)

C# C#Java Java

Fig. 4. Comparison of elliptic curve benchmarking in C#and Java on Intel P7450 2.13 GHz CPU.

In order to give an idea about the efficiency andcomplexity of the code for both C# and JAVA, twoimportant metrics for Java with respect to C# arecompared: speed and lines of code (LOC).

5.1 SpeedThe speed is computed for 250 independent runsof several cryptographic algorithms that were bothimplemented in JAVA and in C#.A first observation is the difference in executionspeed of C# and Java. The average speed turnedout to be 25 % higher for ECDH key agreementbenchmarks and around 20 % for ECDSA signaturegeneration and verification. Interestingly, a wide rangeof measurements can be observed when running Javabenchmarks, even on identical systems (see Fig. 4).C# results are more tightly concentrated around apole value whereas Java results are spread out. Thisphenomenon exists for both elliptic curve operations,and currently no explanation can be given for itother than the inaccurate timing functionality of theJava platform for small time spans. It is essentialto note here that both Java and C# compile intomanaged code, and hence cannot execute real timeoperations. The speed advantage of C# seems topersist when benchmarking on other hardware,though again wide variations are possible (Figure4 shows benchmarking results on an Intel P7450 @2.13 GHz). From comparison of our measurementswith speed benchmark results in literature, it can beconcluded that the speed difference between Java and

277



C# in this particular setup is attributed entirely to theadditional delays induced by Java by creating key andparameter objects from data bytes while this step isnot needed in C#.

Secondly, both C# and Java are able to encrypt anddecrypt data blocks faster than calculating ellipticcurve instructions, respectively 0.6 ms on average forC# and 0.9 ms on average for Java, resulting in a 50 %performance difference. However, it must be notedthat also in this test values around 0.7 ms have beenobserved for Java, which feeds the assumptions thatthe real performance difference is rather negligible andthe wide variations are indeed caused by inaccuratetime measurements. All data was measured whenrunning a sequential encryption and decryption passon a chunk of 1024 bits data, equal to the chunksize used to transfer bit files in STRES (10,000 passeswere measured at once), using cipher-block chaining(CBC) mode with IV = 0. It is possible that testing oflarger data chunks might uncover completely differenttrends, but these tests were not performed since theydo not contribute to the performance analysis of theSTRES cryptographic framework. Although the AESsymmetric key algorithm was implemented in bothC# and Java with and without help of Bouncy Castleand despite observation of minor speed differences,these were found to be statistically insignificant andthus will not further be discussed. It is assumed thatBouncy Castle, JCE/JCA and .NET Framework all usea very similar implementation of a highly optimizedAES algorithm because of its wide usage. It is howeverimportant to mention here that the algorithm accessin Java, e.g. the application of Policy Files, doescause a small delay on first request. Since this delayis subsequently spread over 250 sequential runs, itpasses unnoticed in the execution of benchmarkingtests.

5.2 LOCFinally, the complicity of code can be measured usingthe LOC parameter, although this is obviously highlydependent on the coding style of the software devel-oper. However since Java and C# code have both beendeveloped by the same author in this case, a reasonablefoundation for comparison can be legitimized. A func-tion for AES encryption in Java requires for examplearound 50 LOC using JCA/JCE while the same canbe done in C# using the .NET Framework in lessthan 10 LOC. Most other routines are much harder

to compare due to Java’s black box nature when itcomes to cryptography, as well as the addition of codein C# to import and export keys as byte vectors (thelatter being very hard to implement in Java). Generally,it can be concluded that a routine with similar func-tionality implemented in Java will require more linesof code compared to C# because of initialization (theGetInstance call to retrieve an algorithm from a securityprovider, for example) and other overhead such asthe registration of security providers and policy filesthemselves.

6 CONCLUSION

A server architecture implemented in C# was demon-strated as a functioning solution against interceptionof reconfiguration bit streams for embedded systems.Elliptic curve key agreement and DSA schemes, AESand SHA-256 cryptographic routines from the BouncyCastle library as well as the native .NET FrameworkSecurity namespace have proven to be successful toimplement a secure connection between server andFPGA client to exchange both data and reconfigurationbit streams.

ACKNOWLEDGMENT

The STRES project has been supported financially bythe IWT - Flemish Agency for Innovation by Scienceand Technology under Tetra.

REFERENCES[1] Braeken, A., Kubera, S., Trouillez, F., Touhafi, A., Mentens, N.,

Vliegen, J., Secure FPGA Technologies and Techniques, Proceedingsof Field Programmable Logic and Applications, 2009, eds. M.Danek, pp. 560-563, 2009.

[2] Vliegen J, Mentens N., Genoe J., Braeken A., Kubera S., TouhafiA., Verbauwhede I., A compact FPGA-based architecture for ellip-tic curve cryptography over prime fields, 21st IEEE InternationalConference on Application-specific Systems Architectures andProcessors, pp. 313-316, 2010.

[3] BLACK, David, An Application of VHDL-Based Hardware/SoftwareCodesign, TRW Space and Electronics Group, 1996

[4] SILVERMAN, J. H., The Arithmetic of Elliptic Curves, SpringerVerlag, Berlin-Heidelberg-New York, 1986.

[5] Fact Sheet NSA Suite B Cryptography, National Security Agency,http://www.nsa.gov/ia/programs/suiteb cryptography/index.shtml,retrieved March 31, 2011.

[6] RIJMEN, Vincent, Practical-Titled Attack on AES-128 Using Chosen-Text Relations, 2010.

[7] BROWN, Daniel R. L., The Exact Security of ECDSA, Advances inElliptic Curve Cryptography, 2000.

[8] Recommended Elliptic Curves for Fed-eral Government Use, July 1999,http://csrc.nist.gov/groups/ST/toolkit/documents/dss/NISTReCur.pdf,retrieved April 6, 2011.

[9] SCHUBA, C., Analysis of a Denial of Service Attack on TCP, Pro-ceedings of the 1997 IEEE Symposium on Security and Privacy.

278



[10] HIROSE, S., Enhancing the Resistance of a Provably Secure KeyAgreement Protocol to a Denial-of-Service Attack, Lecture Notesin Computer Science 1726, Springer-Verlag, Berlin, P. 169-182,November 1999.

[11] DYKES, S. G., An Empirical Evaluation of Client-side Server Se-lection Algorithms, 19th Annual Joint Conference of the IEEEComputer and Communications Societies, Texas University, SanAntonio, TX, pp. 1361-1370 vol. 3, 2000.

[12] VALICEK, Michal, Software Implementation of Advanced ServerWatchdog, IIT.SRC 2010, Faculty of Informatics and InformationTechnologies, pp. 1-8, April 21, 2010.

[13] Bouncy Castle Official Portal, http://www.bouncycastle.org/, re-trieved April 5, 2011.

[14] KUMAR, P, J2EE security for servlets, EJBs and web services,Prentice Hall PTR, May 2004.

[15] Venners, Bill, James Gosling on Java, May 2001, June 2001,http://www.artima.com/intv/gosling3P.html, retrieved April 5, 2011.

[16] Magma official website, http://magma.maths.usyd.edu.au/magma/,retrieved May 19, 2011.

[17] BOSMA W., CANNON J., Discovering Mathematics with Magma,Algorithms and Computations in Mathematics, Springer, Vol. 19,2006.

279

Implementation of a Server Architecture for Secure Reconfiguration of Embedded Systems

Documents

Transcript of Implementation of a Server Architecture for Secure Reconfiguration of Embedded Systems