Model Building for Electronic Security - Departement ...

ARENBERG DOCTORAL SCHOOLFaculty of Engineering Science

Model Building for ElectronicSecurity

Gabriel Hospodar

Dissertation presented in partialfulfillment of the requirements for the

degree of Doctor in Engineering

July 2013

Model Building for Electronic Security

Gabriel HOSPODAR

Supervisory Committee:Prof. Dr. Adhemar Bultheel, chairProf. Dr. ir. Joos Vandewalle, supervisorProf. Dr. ir. Ingrid Verbauwhede, supervisorProf. Dr. ir. Johan SuykensProf. Dr. ir. Patrick WambacqProf. Dr. José Gabriel R. C. Gomes

(Univ. Federal do Rio de Janeiro, Brazil)Prof. Dr. ir. François-Xavier Standaert

(Univ. catholique de Louvain, Belgium)

Dissertation presented in partialfulfillment of the requirements forthe degree of Doctorin Engineering

July 2013

© KU Leuven – Faculty of Engineering ScienceKasteelpark Arenberg 10 box 2446, B-3001 Heverlee (Belgium)

Alle rechten voorbehouden. Niets uit deze uitgave mag worden vermenigvuldigden/of openbaar gemaakt worden door middel van druk, fotocopie, microfilm,elektronisch of op welke andere wijze ook zonder voorafgaande schriftelijketoestemming van de uitgever.

All rights reserved. No part of the publication may be reproduced in any formby print, photoprint, microfilm or any other means without written permissionfrom the publisher.

D/2013/7515/77ISBN 978-94-6018-691-2

Preface

This PhD dissertation is about security for the electronic world in which we areimmersed. In my opinion, security goes beyond the technological terrain. I am abeliever that interpersonal and procedural domains play a more important rolethan anything else especially for security. After conducting cutting-edge researchin electronic security for about four years I basically came to the ultimate – andperhaps obvious – conclusion that either a motivated attacker or defender relyingon a highly concentrated blend of motivation and perseverance can eventuallyget anywhere. But this is not all. I want you to enjoy this dissertation.

Overall, a PhD requires maturity, long-term commitment, and vision. Youneed to be resilient, have high doses of stamina, and take ownership. Mostimportantly, passion can always smooth the often blurry pathway, in additionto making it rewarding and eventually meaningful in some sense. Pursuing orquitting a PhD is not a matter of right-or-wrong or better-or-worse. Trust yourintuition. Lead the way yourself and be courageous to do what has to be done.

I would like to humbly express great thanks to my supervisors and to allcolleagues that supported my doctoral studies at COSIC, KU Leuven, in any way.I am very grateful for the multitude of learning experiences that this universityand Belgium have contributed to my professional and personal growth. Just asimportantly, the successful achievement of my PhD degree would certainly havenot been possible without having had an appropriate technical and interpersonalfoundation and support. Therefore, a big thanks goes also to the UniversidadeFederal do Rio de Janeiro (COPPE/UFRJ), Brazil, and to my beloved wife,family and friends from all corners of the world, respectively. Muito obrigado!

Gabriel Mayrink da Rocha HospodarLeuven, Belgium, June 2013

iii

AbstractPeople and their inherent need for security.

Security and its enabling capabilities.New technologies and their latest breaches.

This dissertation takes you on a fascinating tour in modern security. Ittakes o� from a psychological view on defence mechanisms and lands inmodern paradigms of the Internet of Things and their impact on omnipresentembedded electronic devices. Before entering technical soils, we will guideyou through an interesting pathway transiting across the millennial protectionstrategies, contemporary daily interactions highly dependent on security, high-level security domains complementing the technological one, and even anaudacious attempt to sell a fresh business interpretation of security. Wedemystify the basics of techniques that essentially disguise and hide information,i.e. cryptography and steganography respectively, before quickening the pacetowards embedded electronic security. Information protection techniques mustbe securely implemented in order to guarantee that no confidential informationwithin the implementation can be astutely exploited. The technical contributionof this dissertation is threefold and broad in taste. We demonstrate how machinelearning techniques can be used to deploy a more generic side-channel poweranalysis with the same e�ectiveness as template attacks. Moreover, we researchinto the innovative concept of physically unclonable functions – an analogue ofbiometrics to electronics – using physically realised constructions. A genericframework to assess the security of PUFs that may undergo modelling attacksis proposed and applied. We finally deem Arbiter and AES S-box-based GlitchPUFs weak for applications like authentication and key generation. The lastingredient in this dissertation is a pinch of digital image steganography. Theincrease in the amount of data shared online creates channels for hiding anyinformation within innocent looking media. We suggest a simple algorithm toembed information within digital images without visually distorting them orcompromising their statistics considerably. We also implement machine learning-based steganalysts. We conclude the dissertation highlighting the most powerfultool available whether for good or for evil: motivation and perseverance.

v

SamenvattingMensen en hun inherente behoefte aan veiligheid.

Veiligheid en wat ermee mogelijk gemaakt kan worden.Nieuwe technologieën en de recentste aanvallen erop.

Dit proefschrift neemt je mee op een fascinerende reis in de moderne beveiliging.We stijgen op vanuit een psychologische visie op afweermechanismen en landenin moderne paradigma’s van het Internet der Dingen en hun impact opde alomtegenwoordige ingebedde elektronische apparaten. Vooraleer we intechnische bodems terechtkomen, leiden wij je via een interessante weg doorheende beschermingstrategieën van het nieuwe millennium, moderne dagelijkseinteracties die sterk afhangen van beveiliging, beveiligingsdomeinen op hoogniveau die het technologische aanvullen, en zelfs een gedurfde poging om eennieuwe zakelijke interpretatie van veiligheid te verkopen. We ontsluieren debasis van technieken die in essentie informatie vermommen en verbergen, zijnderespectievelijk cryptografie en steganografie, vooraleer we het tempo verhogenrichting ingebedde elektronische beveiliging. Informatiebeschermingstechniekenmoeten veilig geïmplementeerd worden om te garanderen dat er geenvertrouwelijke informatie uit de implementatie op gewiekste wijze misbruiktkan worden. De technische bijdrage van dit proefschrift is drieledig en heefteen brede waaier van smaken. We zien hoe technieken uit machinaal lerengebruikt kunnen worden om een meer generische nevenkanaalaanval in te zettenmet dezelfde e�ciëntie als sjabloonaanvallen. Voorts hebben we ook onderzoekgedaan naar het innovatieve concept van fysiek onkloonbare functies (PUF’s)– een soort biometrie voor elektronica – met fysiek gerealiseerde constructies.Een generiek raamwerk om de veiligheid van PUF’s die modelleringsaanvallenondergaan te beoordelen wordt voorgesteld en toegepast. We achten uiteindelijkdat Arbiter en AES S-box-gebaseerde Glitch PUF’s zwak zijn voor toepassingenzoals authenticatie en het genereren van sleutels. Het laatste ingrediënt indit proefschrift is een snuifje steganografie van digitale beelden. De toenamevan de hoeveelheid data die online gedeeld wordt, creëert kanalen om eenderwelke informatie te verbergen binnen onschuldig uitziende media. Wij stelleneen eenvoudig algoritme voor om informatie binnen digitale afbeeldingen in te

vii

viii SAMENVATTING

bakken zonder deze visueel te vervormen, noch hun statistieken aanzienlijk tewijzigen. Wij implementeren ook steganalyse gebaseerd op machinaal leren. Wesluiten het proefschrift af met aandacht voor het meest krachtige hulpmiddeldat er bestaat, zowel ten goede als ten kwade: inzet en doorzettingsvermogen.

Contents

Abstract v

Samenvatting vii

Contents ix

List of Figures xv

List of Tables xvii

I Electronic Security 1

1 Introduction 3

1.1 Security Demand . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Information Security 15

2.1 Steganography . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3 Cryptanalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Embedded Security 23

ix

x CONTENTS

3.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Physical Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3 Secure Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.4 Side-Channel Analysis: Power Consumption . . . . . . . . . . . 28

3.4.1 Non-Profiled versus Profiled Power Analysis . . . . . . . 30

3.5 Physically Unclonable Functions (PUFs) . . . . . . . . . . . . . 32

3.5.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.5.2 Constructions . . . . . . . . . . . . . . . . . . . . . . . . 35

3.5.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . 38

3.6 Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4 Contributions 41

4.1 PUF Modelling Consequences . . . . . . . . . . . . . . . . . . . 42

4.1.1 Machine Learning PUF Modelling . . . . . . . . . . . . 42

4.1.2 Modelling Impact on PUF Applications . . . . . . . . . 44

4.2 Performance and Security of Glitch PUFs . . . . . . . . . . . . 48

4.3 Machine Learning in Power Analysis . . . . . . . . . . . . . . . . 51

4.4 Statistical Digital Image Steganography . . . . . . . . . . . . . 53

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5 Conclusion 57

II Publication-based Chapters 59

Machine Learning Attacks on 65nm Arbiter PUFs:Accurate Modeling poses strict Bounds on Usability 61

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

CONTENTS xi

2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

2.1 Arbiter PUFs . . . . . . . . . . . . . . . . . . . . . . . . 64

2.2 Machine Learning (ML) . . . . . . . . . . . . . . . . . . 65

3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . 67

3.1 PUF Implementation and Experiment Setup . . . . . . 67

3.2 Modeling Attacks and Results . . . . . . . . . . . . . . . 67

4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.1 Implications on Challenge-Response Authentication (CRA) 70

4.2 Implications on Secure Key Generation (SKG) . . . . . 73

5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Performance and Security Evaluation of AES S-Box-basedGlitch PUFs on FPGAs 77

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

2 Glitch PUF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

2.1 Original GPUF Proposal by Suzuki et al. [124] [115] . . . 81

2.2 Our Response Generation Method . . . . . . . . . . . . . 81

3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . 83

3.1 Experimental Environment . . . . . . . . . . . . . . . . 83

3.2 Performance at the Standard Voltage of 1.20V . . . . . 85

3.3 Performance at Non-Standard Voltages (1.14V and 1.26V) 86

4 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Machine Learning in Side-Channel Analysis: a First Study 97

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 100

xii CONTENTS

2.1 Pearson Correlation Coe�cient Approach . . . . . . . . . 101

2.2 SOST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

2.3 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

3.1 Practicalities . . . . . . . . . . . . . . . . . . . . . . . . 105

4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

4.1 Experimental Settings . . . . . . . . . . . . . . . . . . . 106

4.2 Influence of the LS-SVM Parameters . . . . . . . . . . . 107

4.3 Varying the Number of Traces and Components . . . . 115

4.4 Outliers Removal, SOST and PCA . . . . . . . . . . . . 116

5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Algorithms for Digital Image Steganography via StatisticalRestoration 121

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

2 Approach of Sarkar and Manjunath (S & M) . . . . . . . . . . 124

3 Our Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 126

3.1 Proposal A . . . . . . . . . . . . . . . . . . . . . . . . . 126

3.2 Proposal B . . . . . . . . . . . . . . . . . . . . . . . . . 128

3.3 Proposal C . . . . . . . . . . . . . . . . . . . . . . . . . 128

4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

4.1 Average Performances using the Optimal Hiding Fraction 130

4.2 Steganographic Capacity . . . . . . . . . . . . . . . . . 130

5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Bibliography 133

CONTENTS xiii

Curriculum Vitae 145

List of Publications 147

List of Figures

II Publication-based Chapters

Machine Learning Attacks on 65nm Arbiter PUFs:Accurate Modeling poses strict Bounds on Usability

1 Box plots of obtained SR(qtrain) of our ML attacks. . . . . . . . 68

2 PUF-based challenge-response authentication (CRA). . . . . . . 71

3 Upper bounds on the secrecy capacity of Arbiter PUF responses. 74

Performance and Security Evaluation of AES S-Box-basedGlitch PUFs on FPGAs

1 Glitch PUF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

2 Number of glitches with respect to Cp and Cc. . . . . . . . . . 82

3 Experimental evaluation system. . . . . . . . . . . . . . . . . . 84

4 Circuit design on FPGAs. . . . . . . . . . . . . . . . . . . . . . 85

5 Reliability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6 Uniqueness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7 Bit-aliasing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

8 Response error rates against various voltages. . . . . . . . . . . 87

xv

xvi LIST OF FIGURES

9 Reliability and Uniqueness vs. S-Box output bits (left histograms:with masking, right: w/o masking). . . . . . . . . . . . . . . . . 89

10 Reliability and Uniqueness vs. HD(Cp, Cc). . . . . . . . . . . . 89

11 Number and ratio of stable responses in three cases. . . . . . . . 91

12 Number of glitches (S-Box[6], Chip i=1). . . . . . . . . . . . . . 92

13 An AES S-Box implementation using composite field. . . . . . 92

Machine Learning in Side-Channel Analysis: a First Study

1 Threshold approach: decision boundaries of the LS-SVMclassifiers. (continued) . . . . . . . . . . . . . . . . . . . . . . . 109

1 Threshold approach: decision boundaries of the LS-SVM classifiers.110

2 Intercalated approach: decision boundaries of the LS-SVMclassifiers. (continued) . . . . . . . . . . . . . . . . . . . . . . . . 111

2 Intercalated approach: decision boundaries of the LS-SVMclassifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

3 Bit(4) approach: decision boundaries of the LS-SVM classifiers.(continued) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

3 Bit(4) approach: decision boundaries of the LS-SVM classifiers. 114

List of Tables

I Electronic Security

1.1 Security cost with regard to security and usability levels. . . . . 9

II Publication-based Chapters

Performance and Security Evaluation of AES S-Box-basedGlitch PUFs on FPGAs

1 The 16 patterns of the input x generating almost no glitches iny[6]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Machine Learning in Side-Channel Analysis: a First Study

1 Success rates (SR) for the bit(4) approach. . . . . . . . . . . . . 116

Algorithms for Digital Image Steganography via StatisticalRestoration

1 Average results over a database containing 1,200 images. . . . . 130

2 Average results computed over the test set using the neuralnetwork specialized on the hiding process from [111]. . . . . . . . 131

xvii

xviii LIST OF TABLES

3 Average results computed over the test set using the neuralnetwork specialized on the hiding process from Proposal A (or C).131

Part I

Electronic Security

1

Chapter 1

Introduction

“As it is written: There is no one righteous, not even one;”

— Paul, Romans 3:10

. . . at the end of the day, it is all about people.

Security1, first and foremost, is meaningless if it is not for people and theirmatters. The vast majority of technical texts, especially on computer andembedded electronic security, neglects this. And this is a big deal. Typically,most works hastily connect their theme with either popular or promisingtechniques or technological applications. We believe that such motivationalapproach for security feels lethargic and has been worn out.

We take one step back, thus not rushing to assessing technical security materialright away. Behind any application or system, whether interfacing with humansor other systems, there are people – e.g. stakeholders, executives, managers,developers, users, side a�ected ones, and so on. All their various concerns andexpectations come bundled with them.

It is important to first comprehend the underlying essence of where securitystands for people. We take a shallow dive in the world of the human mindfunctioning to get started.

“Only amateurs attack machines; professionals target people.”2

1Security: “the state of being free from danger or threat” [Oxford dictionary].2Schneier in [8].

3

Gabriel Mayrink Hospodar




4 INTRODUCTION

1.1 Security Demand

Defence mechanisms seek to ensure security. The pioneers in the study ofpsychological defence mechanisms were Anna Freud [41] and her father SigmundFreud in the 1930’s. In summary, defence mechanisms are a natural reactionof the ego to anxiety in order to protect the individual. Rewording, defencemechanisms are inherent to everyone, generally speaking, to fight inevitablefeelings like fear and concern about known threats or the uncertain. Examplesof common threats include those that may harm our psychological or physicalhealth, prevent our freedom, invade our territory, take our belongings away, andso on. Security is innately demanded by people.

Although there is no complete consensus on the list of existing psychologicaldefence mechanisms, a widely accepted four-level categorisation has beenproposed in 1977 by Vaillant [129]. The levels, followed by examples of defencemechanisms, are:

1. Pathological defences: delusion, denial, distortion, . . .

2. Immature defences: fantasy, idealisation, projection/paranoia, . . .

3. Neurotic defences: hypochondriasis, intellectualisation, isolation, rational-isation/excusing, repression, undoing, withdrawing, . . .

4. Mature defences: altruism, anticipation, humour, identification, thoughtsuppression, . . .

After a self-assessment one may identify with some of these mechanismsregardless of the peculiar label of each level. The same holds for securityexperts or anyone else. Consciously or not, psychological defence mechanismsare practically translated into people’s daily lives in order to manage all kindsof risks – or the relative perception thereof. For example, one can naivelyignore risks by suppressing thoughts on the ease of someone guessing his orher ingenuous birthdate-based password. Another popular way of dealing withrisks concerns anticipating to potentially upcoming threats, such as setting upa firewall and antivirus to defend from malware that may target your machine.

The cross-fertilisation between security and other fields including but notrestricted to defence mechanisms and risk analysis, respectively like Psychologyand Behavioural Economics, has great potential to be very fruitful. Someresearchers have already realised it and currently investigate how to contributeinterdisciplinary insights to security. Ross Anderson’s website [4] points toplenty of content on economics and psychology of information security.

SECURITY DEMAND 5

Security is Needed Everywhere

Businesses, institutions and countries are formed by (intricately dis)organisedgroups of people. Just as with people, these instances must interact with othersfor various reasons. Di�erent kinds of things such as messages, confidentialinformation, packages and so forth are required to be securely stored and shared.Mutual trust between parties is oftentimes not present, besides being di�cultto establish in many cases. Distrust typically triggers anxiety, whether single orcollective, which in consequence prompts defence mechanisms.

Common security concerns regard confidentiality, integrity and authenticationof information and parties. When using online banking, you want to be 100%sure that these security objectives are met. At the same time, availability is aconcept that is just as important insofar as security must not be an obstacle,which would be very impractical. Security mechanisms come into play as asolution for these concerns, thus o�ering a (relative) sense of protection andsafety. Although security does not necessarily make a legitimate bond of trust, itis the only available powerful means to protect or prevent interactions betweenuntrusted parties, or even trusted parties in an untrusted environment. Simplyput, the straightforward primitive ways of securing anything are three:

1. Controlling the access to the stu�;

2. Giving the stu� a di�erent appearance, or disguising the stu�;

3. Putting the stu� out of sight, or hiding the stu�.

There is not much to innovate here. This list of strategies is not fundamentallymutually exclusive, but attempts to be at least collectively exhaustive. Forexample, some people care too much about food. One can put padlocks onthe fridge to avoid access to the food by others. In case the fridge is public,with no possibility of access control, one can disguise the food into somethingunrecognisable, and hopefully unattractive, before storing it in the fridge.Alternatively, one can just hide the original food without messing with it bypacking and subsequently soaking it inside of an innocent bowl of soup.

Messing with edible material may represent a great o�ence and already yieldconflicts. However, many other things are very sensitive as well. Commonworries may involve well-being, reputation, money, power, and so on. In ourinformation- and knowledge-based economy era, particularly, digital informationand knowledge generating technologies are gold. As it will become clearerlater, disguising and hiding information go by the names cryptography andsteganography, respectively.

6 INTRODUCTION

Security is an Enabler

The progress of technology comes in handy for society. Demand for moderndigital solutions has been continuously growing. Driving forces includeserving the society, pushing the knowledge and technology frontiers, increasingbusinesses profitability, supplying warfare demands, and so on. Informationprocessing can be reliably automated using digital solutions. Digital processingis faster and can handle large amounts of data more accurately than humans.

The abundant emergence of new embedded devices because of e.g. the Internetof Things, in which everything senses and communicates, springs new challenges.Two of the biggest ones span the large increase in the amount of new datagenerated – big data3, not addressed here – and the need for novel embeddedsecurity solutions that are e�cient.

The trend will continue as more and more products and services will be o�eredelectronically. Many interactions that were previously required to be donepersonally or mechanically can now comfortably happen electronically. Security,which is often invisible, is a crucial practical enabler for that. For instance, it ispossible in a secure way to digitally:

• Communicate between people and devices;

• Control devices;

• Ensure confidentiality and integrity of information and parties;

• Sign legally binding documents;

• Identify and track virtually anything, from pets to people and objects;

• Ensure anonymity through communication protocols;

• Conduct financial transactions;

• Vote in the national election in some countries;

• Etc.

Important applications would not be trustworthy to some extent without havingsecurity embedded in digital applications and electronic devices. Security canbe generically depicted as an enabler for the sake of viability. Security has themission to guarantee a successful outcome.

3Big data is basically a collection of any large data sets. Processing becomes more complex.

SECURITY DEMAND 7

Security has Di�erent Domains

As with any kind of innovation, novelties from the digital age are oftenaccompanied by unforeseen security issues. As the saying goes, “Securityis only as strong as its weakest link”. Weaknesses not necessarily come fromtechnologies, as one may initially think. Security strategies should be addressedin di�erent domains, such as: people æ process æ technology:

• People is the most complex dimension of security partly because theycannot be fully controlled, as opposed to processes and technologies – atleast in theory. It is strongly recommended that people get educated ondigital security. Even so, people have their own way and behave di�erentlyand irrationally security-wise. How many of you disable the security suitethus exposing your personal computer and private data when securityprevents you of doing something online or renders your machine slow?Have you ever used a public, probably insecure Internet connection becauseyou really had to check your messages and could not wait? People may beeven more careless and irrational with others’ businesses, hence puttingsecurity at big risk. Frequently, security is conveniently ignored.

• Process is related to standard operating procedures. It somehow dictateshow people and technology should interrelate. It concerns businessprocesses, for instance. At governmental level, it involves policies andprocedures linked to legislation on digital matters. In 2013, somecountries already count on appropriate specific legislation and directives ondigital/electronic matters. On the other hand, many others are far behindand lack legal enforcement on digital and high tech abuses in general.The reason for the delay in keeping up with technological trends mayinclude a combination of several factors: lack of specialised technical sta�,slowness in the legislature, conflict of interest between political parties,corrupt politicians taking advantage of the situation, etc. Organisationswith ulterior motives may benefit from outdated legislation, lack of lawenforcement, or absence of best practices in security. For example, hackingmachines may be o�shored to no-man’s-lands.

• Technology, left by last on purpose, is the first thing that comes to mindwhen talking about security. However, technology alone does not representa comprehensive security solution. State-of-the-art technologies are uselessif not implemented and used carefully. We will particularly expand onthis domain later in this dissertation.

8 INTRODUCTION

Money Matters

Businesses want to be profitable4, i.e. increase revenues and lower costs. Costsform the only component of profit that can be fully controlled in a business.From a purely rational perspective, any costs can be cut o�. Revenues, asopposed to costs, typically depend also on external, uncontrollable factorsinvolving market-related issues a�ecting pricing and sales volume. These arebasically the reasons why executives worry so much about lowering their costs.The truth is that security is often seen as an operating expense. To make mattersworse, industries other than the security one may have di�culty in accountingrevenues coming directly from security, if any. It may be a very di�cult taskto justify or convince businesses to invest in digital security. This can be evenmore critical for small- and medium-sized companies holding breakthroughinnovations, but a�ording little capital investment for security.

As stressed earlier, it is practically impossible to get rid of security concerns.One can either accept, attempt to mitigate, transfer or avoid risks. Yet, not allthreats can ever be identified. Digital security should contribute in practice tofoster productivity. A fashionable catchphrase says that “security is a businessenabler”. Unlike IT, which is clearly a business enabler for most industries in ourtime, there are controversies concerning the enabling capabilities of security tobusinesses. On the one hand, digital security can be seen as indispensable whenconsidering customers, products/services, the company itself, and competition.As a simple example:

• Customers are becoming increasingly more digitally educated. Theyrequire their personal information to be securely processed and stored.They want products and services that allow them to preserve their privacy.

• Products and services must o�er the desired level of security demandedby customers to survive competition and substitution forces. In addition,in many cases there are regulations specifying the required level of digitalsecurity for an application or system.

• Companies must care about their organisational digital health. They mustprotect both storage and sharing of confidential business information (e.g.in the cloud), avoid access from intruders to their systems, protect forand against all kinds of devices that connect to the enterprise network,and transparently assure customers and stakeholders of compliance withmodern security best practices.

• Digital security can protect technologies and information from beingcopied by competitors especially in the case of a di�erentiated product

4Profit = Revenues - Costs = (Price ◊ Sales Volume) - Costs

SECURITY DEMAND 9

or service, in addition to intellectual property measures. Di�erentiationmay also allow for increasing prices of a product and therefore revenueswithout a remarkable sales volume drop. Profits are assured if securitydevelopment costs do not o�set revenues.

Security has great potential to directly add or pave the way for creating value infour key areas that are relevant for a business. Good security solutions should betransparent, seamlessly integrated and smartly adaptable to processes. Securitymust not be an inhibitor and eventually jeopardise any revenue streams. Onthe other hand, one may say, security can be seen as dispensable in the sensethat it is not an essential building block that is strictly needed to make thingswork. In theory, businesses may still run while being lackadaisical about digitalsecurity. Therefore, one may logically say that security is not a business enabler.Theoretically, security is unnecessary in an entirely protected, fair and honestindustry environment. In “real life” such environment does not exist.

Business is not made of theories, but of profit-oriented practices. In summary,we are more in favour than against the “security as business enabler” motto.However, making an analogy with the human biological system, we would ratherpicture security as a health and well-being enabler for businesses. If any of themis missing, then we promptly realise that something might be wrong. Otherwise,everything keeps running transparently in the right way.

Security Costs in a Nutshell

Cost-wise, security is practically traded-o� by usability with respect to aspectssuch as reliability, speed, transparency, user-friendliness, and so on. The triplet(security level, usability level, security costs) roughly behaves as in Table 1.1.

Table 1.1: Security cost with regard to security and usability levels.

Security Usability Security CostsLow Low LowLow High LowHigh Low ?High High High

If security does not represent a major concern anyway, then the securitycomponent of the overall solution cost is low regardless of the usability level.Otherwise, if security is deemed important, then the security cost stronglydepends on the usability level. For cases in which the usability level is not

10 INTRODUCTION

that relevant, the cost of providing high security is open. Depending on theapplication, it is possible to make something highly secure e.g. by fiercelyconstraining its usability. For cases in which the required level of security andusability are high, the technicalities are usually challenging. Such solutionsrequire more e�ort, time, and resources to be accomplished. Therefore the priceis more expensive. High security costs do not necessary imply a high securitylevel though. Take e.g. the security expenses of some of the world powers.In any case, security should aim at providing a kind of health and well-beingassurance.

DISSERTATION OUTLINE 11

1.2 Dissertation Outline

Contributions We started by motivating security from a psychologicalviewpoint. We then moved towards modern security applications a�ectingpeople’s lives either directly or indirectly. In the remaining of this dissertationwe investigate issues on electronic security. Cryptographic primitives areknown for securing information and providing authentication schemes forentities and data. Besides the information protection technique itself, a secureimplementation thereof is mandatory for a practical security assurance. Thetechnical contribution of this dissertation is threefold and broad in spectrum.We research into physically unclonable functions (PUFs) [98]. They mayrepresent a breakthrough in embedded security. PUFs can dismiss the need forcostly circuitry to store confidential information based on complex, sometimesobscure, protection mechanisms. We take a closer look particularly into physicalimplementations of the Arbiter and Glitch PUFs. In addition, we proposea generic methodology to assess the security of PUF applications such asauthentication protocols and cryptographic key generation and storage. Shiftinggears to a second topic, we motivate the use of machine learning techniques [91],i.e. systems that learn from data, to explore information leaking via the powerconsumption of cryptographic implementations. This is an example of side-channel analysis [59]. Third, we briefly discuss a covert technique so-calledsteganography to hide information within innocent looking media. We proposean algorithm to embedded secret information within digital images withoutvisually distorting them or compromising their statistics to some extent. Wealso implement machine learning-based steganalysts to identify the presence ofcovered information within images. This later work is an extension of [111].

Structure This book is composed of two parts. Part I discusses electronicsecurity. This part targets not only the technical, but the general public as well.It touches on conceptual, theoretical and practical aspects of information andembedded electronic security. A curious lay reader is recommended to read PartI sequentially. The text gradually becomes more technical. Part II is purelytechnical. It presents our specific technical contributions in di�erent areas ofelectronic security through reproductions of our publications. Chapters fromPart II should be read freely according to the reader’s curiosity and interestsin any of the covered topics. A global bibliography for Parts I and II appearsat the end of the dissertation. The detailed structure of this dissertation issummarised in the following.

12 INTRODUCTION

Part I: Electronic Security

• Chapter 1: Introduction thoroughly motivates security. We startour reasoning based on people’s inherent need for security. Weexpand towards modern demands for digital security. We motivatesecurity from a simplified economic perspective as well.

• Chapter 2: Information Security provides background infor-mation security techniques based on data hiding and disguising –steganography and cryptography, respectively.

• Chapter 3: Embedded Security discusses security aspects ofphysical implementations of cryptographic techniques for practicalapplications. We provide an introduction to physical attacks andfocus on side-channel (power) analysis (SCA), or the analysisof indirect power consumption leakages from embedded devicescontaining (confidential) information. Afterwards, we discuss aboutan alternative technology for secure implementations called physicallyunclonable5 functions (PUFs). PUFs intrinsically generate and storeconfidential information in a reliable and secure fashion; and theycan also be used for authentication schemes.

• Chapter 4: Contributions highlights our contributions toembedded electronic security and further motivates the topicsaddressed in this dissertation. We discuss in four individualsections: i) machine learning PUF modelling consequences on twoPUF applications: challenge-response entity authentication andsecure cryptographic key generation; ii) performance and securityassessment of Glitch PUFs; iii) the application of machine learning forpower analysis of cryptographic implementations; iv) digital imagesteganography via statistical restoration. Technical details on eachcontribution are left for Part II.

• Chapter 5: Conclusion concludes the work touching on directionsfor future work on electronic security as well.

5“(Un)clonable”: that can(not) be cloned.

DISSERTATION OUTLINE 13

Part II: Publication-based Chapters

• Gabriel Hospodar, Roel Maes, and Ingrid Verbauwhede. MachineLearning Attacks on 65nm Arbiter PUFs: Accurate Modeling posesstrict Bounds on Usability. In 4th IEEE International Workshopon Information Forensics and Security (WIFS 2012), IEEE, Pages37-42, 2012.

• Dai Yamamoto, Gabriel Hospodar, Roel Maes, and Ingrid Ver-bauwhede. Performance and Security Evaluation of AES S-Box-based Glitch PUFs on FPGAs, In International Conference onSecurity, Privacy and Applied Cryptography Engineering (SPACE2012), Lecture Notes in Computer Science, Springer-Verlag, Pages45-62, 2012.

• Gabriel Hospodar, Benedikt Gierlichs, Elke De Mulder, IngridVerbauwhede, and Joos Vandewalle. Machine Learning in Side-Channel Analysis: a First Study. In Journal of CryptographicEngineering (JCEN), Lecture Notes in Computer Science, Springer-Verlag, Volume 1, Issue 4, Pages 293-302, 2011.

• Gabriel Hospodar, Ingrid Verbauwhede, and José Gabriel R. C.Gomes. Algorithms for Digital Image Steganography via StatisticalRestoration. In 2nd Joint WIC/IEEE Symposium on InformationTheory and Signal Processing in the Benelux (WIC 2012), Pages5-12, 2012.

Chapter 2

Information Security

“There are no dull subjects. There are only dull writers.”

— H. L. Mencken

This chapter presents a short overview on information security. ICT (Informationand Communication Technology) integrates electronic computing devices(hardware and software of computers, embedded devices, smartphones, . . .) andtelecommunication (Internet, wireless digital cellular networks, . . .). Informationsecurity techniques are mandatory to ensure the proper protection of datathat can be used for stealing banking credentials and money, spying ongovernment agencies and businesses technologies, profiling people’s behaviourwith privacy-intrusive marketing purposes, impersonation, etc. The mostcommon information security objectives include data confidentiality (protectionof data from DVDs, Pay TV, hard disks, . . .), entity authentication (secureaccess to online banking, . . .) and data integrity and authentication (assuranceof correctness of online banking data, . . .). These objectives are typicallyaccomplished using cryptography, which o�ers several options of data protectionalgorithms and secure communication protocols. In essence, an encryptionprimitive is concerned with the absolute incomprehensibility of a message tothird parties by using a series of mathematical transformations. Customarilyoverlooked is the fact that cryptography by itself raises enormous suspicion onwhat is encoded behind ciphers having a non-natural message-like appearance.An alternative to cryptography to secure information is steganography, which isalmost always left out of information security overviews. Steganography, unlikecryptography, astutely hides information within innocent media without evenbringing up awareness of any information storage or communication occurrence.

15

16 INFORMATION SECURITY

Hence, steganography represents a potential tool for smuggling messages withthreatening motives such as attack plots, for example. The following two sectionspresent a brief introduction to steganography and cryptography.

2.1 Steganography

The sharing of digital information over the Internet and portable data storagedevices has grown largely in the last years. Although the Internet currentlyreaches only about one third of the world population, the World Wide Webalready hosts more than ten billion websites [2]. Security-wise, innocent lookingmedia such as text, image, audio and video files represent potential channels tohide and communicate confidential information.

Steganography represents a powerful means to secretly store and communicateconfidential information. Although the word steganography sounds fancy andraises thoughts about an elaborate technical idea, its core concept is quitesimple. Steganography concerns hiding any information within any innocuousmedium. This is one of the most intuitive, and certainly primitive, ways ofprotecting information. In 1499, Tritemius titled such information concealmenttechnique steganography in his three-volume book entitled Steganographia [126].This book appeared to be about magic and presented communication techniqueseven relying on carrier spirits. Etymologically, steganos and graph respectivelymean covered and writing in Greek.

Steganography can be used in limitless ways and examples are countless. Togive an idea on how generic steganography can be, two chronologically sampledexamples include: tattooing a message in a slave’s scalp that would be latercovered by hair in ancient Greece; and using invisible ink and photographicallyproduced microdots respectively by the French and Germans in the WorldWar II. More fascinating historical examples can be found in [15].

The problem in steganography can be pictured as follows [116]. Inge andVince want to communicate confidentialities with each other. However, theyknow that Sooj has always been suspicious about them and that he veryprobably monitors their communication passively. In order to succeed insecretly communicating without even raising any suspicion from Sooj, Inge andVince hide their information within an innocent medium (cover/carrier) in theview of Sooj, before any sharing takes place. The product of steganographyis called a stego work. The steganalyst, Sooj, should not be able to identifythe presence of the concealed message in the (stego) work shared between Ingeand Vince by any means (statistically, visually, . . .). It is assumed that bothInge and Vince possess a public embedder/reader function and a private key –

STEGANOGRAPHY 17

if in accordance with the Kerckho�s’ desiderata1 [66] – allowing them to hideand extract information from stego works, thus eventually succeeding in foolingSooj.

The power of steganography relies on its sneaky invisible nature. Steganographyis concerned solely with the imperceptibility of a message embedded in anunsuspected carrier, but not with the robustness of the embedded messagewith respect to manipulation of the carrier. Confidentiality and deniability canbe achieved through proper steganography. The former is because the hiddeninformation is kept secret from all who are not supposed to see it – they are evenunaware of the existence of the message. The latter is also due to the secrecynature of steganography, which allows one to deny that a hidden message exists.

Steganography is often neglected by the scientific community. Argumentsrange from Shannon’s aphorism that “concealment systems are primarily apsychological problem” from his seminal work on information theoretical aspectsof secrecy systems [114] to the slyness of steganography working against itself.Not to mention excuses relating to the di�culty in dedicating to something thatby principle cannot be seen or easily shown to be largely used. Steganographymay also be depicted as security through obscurity therefore violating Kerckho�s’desiderata. For the sake of security in society, it is very important to thoroughlyinvestigate how modern digital steganographic techniques work. Steganalysis isthe counterpart of steganography aiming at detecting concealed informationwithin innocent carriers. The huge importance of steganalysis regards theprotection from criminals hiding and sharing secret attack plans and illegalcontents such as child pornography through stealth techniques like steganography.Interesting works on steganography relying on information theory include [15,16,29,35] among many others.

Last, it is worth mentioning that in spite of similarities between steganographyand watermarking, they are di�erent. Watermarking is a pattern embeddingtechnique often used to prove ownership, authenticity, or provide integrityassurance of media, objects, and so forth. Watermarking is not necessarilyconcerned with the imperceptibility of a message embedded in some carrier,but typically with the robustness of the embedded message with respect tomanipulation of the carrier. Two types of watermark are robust and fragile. Theformer should resist accidental and intended manipulations of the media. Thelatter should be destroyed after any manipulation, hence being comparable tostego messages in this sense. Watermarking is not addressed in this dissertation.

In Section 4.4 of our Contributions Chapter we further motivate steganographyand highlight our contribution especially to digital image steganography [60].

1A security system should remain secure even if everything about the system is of publicknowledge, with the exception of a privately held piece of information – the key.


We pay special attention to the statistics restoration process of stego images.This work is fully reproduced in Part II of this dissertation.

2.2 Cryptography

Cryptography aims to protect information. Like steganography, the basicidea is simple, although technicalities can quickly become complex as well.Instead of hiding plain information behind innocuous covers like steganography,cryptography disguises information using enciphering methods with purposes ofsecure storage or transmission through admittedly unsafe channels. Following isa very simple distinguishing example between steganography and cryptography:

• “Hi, Emily. Let’s love others when others resent life’s disappointments.” isa steganographic way to hide the message “hello world” within an innocentlooking sentence. The message is revealed by connecting a particularlypositioned letter from each word in the sentence. This example is a simplewordplay technique, in which the letter positioning to be looked at in eachword is the key (first letter here, aka acrostic);

• “ifmmp xpsme” is a cryptographic way of disguising, or enciphering, themessage “hello world” by simply substituting each letter by a shifted onein the alphabet. This is an example of the General Julius Caesar’s cipherused for military purposes in Rome around 50 BC, in which the letter shiftparameter is both the encryption and decryption key (one shift here).

The term cryptography, in which crypto stands for hidden or secret inGreek, was introduced in English by Bishop John Wilkins in the seventeenthcentury. Although there are evidences of cryptography usage by the Egyptiansaround 4000 years ago, the terms cryptography and steganography have notbeen technically defined ab initio. A comprehensive historical overview oncryptography can be found in [63].

Cryptography, or more specifically an encryption primitive, is not concernedwith the imperceptibility of a message, but with the total unintelligibilityof the message to a third party. An authentication primitive would beconcerned about proving the truth of an entity or piece of data. The roots ofcryptographic primitives are planted in mathematics. Thereby, cryptographyprovides means to meet all information security objectives (data confidentiality,entity authentication, data integrity and authentication, data availability, . . .).In accordance with Kerckho�s’ desiderata, or Shannon’s maxim that “the enemyknows the system”, a consequent cryptography standard design principle is that

CRYPTOGRAPHY 19

the security of publicly known (and scrutinised) cryptographic primitives shouldbe reduced to the secrecy of one private parameter, which is called the key.Key-based security relies on the fact that the key itself should be very di�cultto be practically forged by an adversary probabilistically. Security reductionstypically relate to assumptions of an adversary’s power (e.g. passive vs. active,available computing and time resources, . . .). The security of cryptography iseither reduced to: the mathematical di�culty of a problem, heuristic-basedapproaches such as empirical resistance to the strongest attacks to date, orinformation-theoretical arguments about the lack of information availability tobreak the system. A thorough overview on contemporary cryptography can befound in [89]. Regarding the forms of usage of cryptographic keys, cryptographicprimitives can be split up in three groups:

1. “Unkeyed” primitives: These are generally hash functions used for dataintegrity and digital signature schemes. Cryptographic hash functions“injectively” map messages of di�erent lengths to unintelligible ciphertextswith a fixed length. They do not rely on a specific key parameter. Theseprimitives are suitable for applications that can be directly computed byanyone. With regard to security properties, it should be very di�cultto invert the cryptographic primitive output (ciphertext) back to theprimitive input (message to be protected, or plaintext). In addition, theremust not exist di�erent inputs colliding to the same output. SHA-32 [9]is one of the hash function on the spot currently.

2. Symmetric-key primitives: They can provide data integrity and authentica-tion, in addition to entity authentication. Their security strongly dependson one private key kept secret by authorised parties. An adversary inpossession of pairs of plaintexts and ciphertexts should not be able toperform any decryption without knowing the key. A message can be eitherencrypted digit by digit, using a stream cipher (e.g. RC4), or block byblock of bits, using a block cipher (e.g. AES [37] – NIST standard since2001; and PRESENT [23] – ISO3 lightweight standard since 2012).

3. Public-key primitives: They regard a particularly elegant cryptographicscheme using a pair of keys: a public and a private one. It is alsocalled asymmetric-key cryptography. Instead of using one single key forencrypting and decrypting messages, requiring the sharing of such privatekey among trusted parties before an encrypted communication takes place,

2The American National Institute of Standards and Technology (NIST) launched theworldwide public SHA-3 competition between 2007 and 2012 to develop a new cryptographichash algorithm due to advances in cryptanalysis of existing ones. Keccak [21] won thecompetition.

3ISO stands for International Organization for Standardization.


public-key primitives allow one to encrypt a message to any party usingthe party’s public key. Only the intended party can decrypt the messageusing its own private key. Conversely, one party may use its private keyfor digitally signing messages, which can be verified by others through thepublic key of the signatory. Public-key primitives (e.g. the popular RSAbased on large integer-factoring hard problems) serve to confidentialitypurposes, entity authentication, and so on.

2.3 Cryptanalysis

Modern cryptography owes much to Shannon (cf. Shannon’s confusion anddi�usion properties for secure ciphers [114]). The most popular cryptographicarchitecture relies on the concept of gradually building up security by iteratinga number of cryptographically weak but easy to implement mathematicaltransformations. The transformations typically include substitutions andpermutations as in Shannon’s product cipher [114]. Such approach facilitatesthe implementation and a careful investigation on the properties of eachstage of the cryptographic algorithm. Moreover, the number of stages ofthe cryptographic algorithm can be increased in order to enlarge the level ofsecurity of the cipher for some application. Typically, parts of cryptographickey are input little by little in the rounds of the cryptographic algorithm sothat the intermediate data becomes increasingly scrambled as the algorithmprogresses. Alternatively, cryptographic primitives can also be constructed usinga smaller number of cryptographically strong and complex transformations (e.g.combining operations from di�erent groups from number theory as in the IDEAblock cipher). Anyhow, cryptographic primitives are preferably implementedmodularly as a sequence of non-complex operations for practical reasons.

The security assessment of cryptographic techniques is called cryptanalysis.It intends to either partially or totally nullify the security of cryptographicprimitives. Cryptanalysis’ main goal boils down to decrypting an encryptedmessage intended to another recipient without necessarily knowing theencryption/decryption key beforehand. Cryptanalysis and cryptography cross-fertilise each other in a “cat-and-mouse” game. Advancements in cryptanalysispush cryptography to increase security levels by countering new attacks, whileadvancements in cryptography continuously challenge cryptanalysts to breakingcodes. The most trivial attack strategy for a cryptanalyst is the brute forceapproach. This means that all key combinations, or plaintext combinationsfor “unkeyed” primitives, should be tested in order to accomplish a successfuldecryption. For that reason, the key length in bits should be set large enough torender such an attack vector practically unfeasible. Any successful cryptanalytic

CRYPTANALYSIS 21

attack with a lower complexity than the brute force attack is referred toas a code breaker. However, attacks only represent a realistic threat whenthe cryptographic key search space is reduced to a practically exploitablesize. Generic scenarios for cryptanalysis include all practical combinations ofchosen/known input/output.

Cryptanalysis will continue to be very important. However, it is limited. Forcryptanalysis the implementation of the cryptographic primitive is not relevant.The cryptographic algorithm is depicted as a sealed box (aka black box) thatcan be analysed only in terms of inputs, outputs and an abstraction of the(de/en)ciphering function. There is more to it than meets the eyes. Next in thisdissertation we will discuss security aspects and issues arising when informationsecurity techniques like cryptography are implemented in embedded electronicdevices.

Chapter 3

Embedded Security

“The only rule is that there are no rules.”

— The Exception Paradox

3.1 Context

Lately, more and more objects are becoming embedded with computingdevices connected to sensors and actuators, in addition to being capable ofcommunicating. This evolution brings out a paradigm that is referred to as theInternet of Things. Its name says it all: the sky is the limit for applicationsthat may range from ID cards to pacemakers and automotive electronic devices.Cryptography has become increasingly more ubiquitous in the same proportionas more embedded electronic devices are created. Security naturally remainsmandatory for most applications. As it turns out, cryptographic primitivesshould be implemented either in software or hardware to be of practical use.Regardless of how mathematically flawless a cryptographic primitive is, itsphysical implementation must be carefully designed in order to eliminate anychances of an accidental information leak. Implementations must not be theweakest link in the security solution. Any internal information leak may beexploited by a clever adversary.

The boom in modern applied cryptography started in the twentieth century.For example, during the first half of that century the Germans used variants oftheir electro-mechanical Enigma machine to implement a cryptographic cipher

23

24 EMBEDDED SECURITY

to protect military and government information. With the technology evolutionof computers and mainframes still in the last century, banks, data centres andother businesses started using electronic machines for digitally processing moneyand other sensitive data. These machines have been placed in safe locationsfor their physical safety against scrutiny, in addition of using cryptography fordata protection. From the latest part of last century onwards the paradigmhas changed. Cryptographic computation has been increasingly shifted to smallembedded devices. In this new evolving scenario, two important factors comeinto play a�ecting the implementation of security measures for embedded devices:devices typically have more limited resources; and devices are in possession ofpotential adversaries. First, a constrained computing power, memory and energybudget is intended to serve a multitude of competing system’s requirements.Each wasted resource in an implementation, however small, undesirably raisesmass production costs and, if pertinent, may reduce the usability level of thesystem. E�ciency is a must especially for security schemes and their softwareor hardware implementation. This is particularly true when security is deemedan overhead because of not being the main feature of the embedded device.Second, embedded devices are otherwise exposed to anyone (smart cards, smartphones, smart meters, portable players, . . .), in addition to being frequentlyinterconnected. Cryptography should now foresee a legitimate user who mayalso represent a threat for the security of the embedded device and its innerinformation.

3.2 Physical Attacks

Physical accessibility to embedded devices brings forth another category ofattacks that can not be purely countered by cryptography or alternative mathe-matical means. Physical attacks target the implementation of cryptographicprimitives and therefore are said to break implementations, rather than breakingciphers or codes as in cryptanalysis. For all practical purposes, however, physicalattacks are extremely powerful as they o�er alternative means for an adversaryto recover cryptographic keys and thence decode confidential information withmuch less e�ort than brute force attacks. The security analysis context thattakes the cryptographic implementation into account is not characterised by acryptographic sealed (black) box as in cryptanalysis. The attacker will breachthe weakest entry point in the system, analogously to a burglar stealing thingsfrom the inside of a car after breaking its window, instead of fighting againstthe secure lock. Often times the implementation of cryptographic primitives isnot given the proper attention and investment due to other conflicting systemrequirements or “ignorance by design”. Further to plaintexts and ciphertexts,additional sources of information (internal processing values, physically leaked

PHYSICAL ATTACKS 25

properties, erroneous results, . . .) extracted from the implementation can nowbe exploited. The embedded security analysis model is now depicted as aninformation-leaking and penetrable box (aka grey box).

Formal academic research on embedded security has gained momentum overthe past few years, although embedded security analysis has certainly beenacknowledged by hackers, security agencies and the military long before. Forexample, it was only in 1999 that the first edition of the prestigious Workshopon Cryptographic Hardware and Embedded Systems (CHES) occurred. Todaythere are dozens of decent and high-level conferences on embedded security anda few thousands of papers1 are published each year on this area. One of theprincipal goals of works on embedded security analysis is to reveal secret data.There are essentially two categories of physical attacks: passive and active.

• In passive attacks the adversary observes the operation of the devicenot acting to change its normal operation. The adversary observes,for example, implementation-dependent side-information spontaneouslyleaked during the operation of the device. A specialised burglar whocarefully listens to the clicks emanated by the internals of the wheel lockof a money vault may obtain valuable acoustic side-channel informationhelping reveal the secret. For cryptographic embedded devices otherside-channels than sound typically bear information on the internals ofthe implementation. The power consumption [68], processing time [67]and electromagnetic radiation [44, 103], among other properties of arunning embedded device, somehow correlate with the processed data andoperations performed by the device. Passive physical attacks relying onside-information originated from implementation properties are called side-channel analysis. Power analysis will be explored later in this dissertation,as well as our contribution to the field in [59].

• In active attacks, on the other hand, the adversary observes the operationof the device acting to change its normal operation. The adversary activelyinterferes with the operation of the embedded devices’ circuitry in orderto analyse faulty behaviours that may leak information about secret data.In connection to the previous example, the burglar would now possiblytamper with or tweak the vault’s lock with a pin or paper clip and examineits internal behaviour. One of the most studied active physical attacksis the fault analysis. In embedded devices, a fault can be injected bychanging the supply voltage, lagging behind the clock signal, alteringthe operating temperature, modifying the circuit execution using somekind of radiation like laser beams, skipping instructions, manipulatingintermediate data values, and so on. Fault analysis will not be addressed

1According to Google Scholar [1]


in this dissertation. A comprehensive study on fault analysis can ratherbe found e.g. in [19,51,64].

Passive and active physical attacks can be further categorised as non-invasive,semi-invasive or invasive depending on how physically intrusive the attack is withregard to the cryptographic core. The more invasive an attack is, the more costlyin terms of required technical resources, time and therefore money the attackgets on the one hand; and the higher the expected success rate of the attacktypically becomes on the other hand. Timing attacks are non-invasive by essence.They represent the first academic publication on physical attacks, dated 1996.This timing attack was against a physical implementation of the RSA algorithmexploiting the time delay between the input of data into the cryptographic deviceand the observation of its output. Inattentive implementation of public keyalgorithms can leak information about their private key through the executiontime of the algorithm. Like timing attacks, electromagnetic attacks do nottheoretically require tampering with the cryptographic embedded device either.However, practice has shown that a semi-invasive approach can enhance theelectromagnetic attack. Casting aside any shielding caused e.g. by the chippackaging, hence allowing the electromagnetic probe to reach nearer the circuit,leads to cleaner electromagnetic measurements. De Mulder has carefully studiedelectromagnetic-based techniques for side-channel analysis and countermeasuresin her PhD dissertation in 2010 [93]. Last, as an example of a devastatinginvasive attack, the first fault attack registered in the scientific literature waspublished in 1997 [24] breaching an RSA implementation based on the Chineseremainder theorem for faster computation. This fault attack consisted inrevealing one of the prime numbers constituting the private key (crucial forfinding the full key) by inserting one fault in either of the two branches of theRSA algorithm.

3.3 Secure Circuitry

The emergence of realistic physical attacks implies the need for secureimplementations of cryptographic primitives. Ideally, the secure circuitry oughtto provide countermeasures against all existing and unknown attacks. This mustbe done without creating breaches for new attacks. Unfortunately, having thesetwo ideal requirements implemented to fully protect a system is unattainablefrom a technical perspective. Ensuring resistance against a large variety ofexisting physical attacks already raises design costs. In practice, countermeasuresare implemented with two interconnected goals. First, a security-usability-costtradeo� should be set. Second, the security solution should be such that thecost of a successful attack for an adversary considerably outweighs its rewards

SECURE CIRCUITRY 27

in case of a favourable outcome. These two goals frequently conflict and theirorder of priority is important. If the goals are ordered as described earlier, thenthere is a chance that the agreed security-usability-cost tradeo� will not havethe attack cost expensive for the adversary. Otherwise, if the priorities areswitched, then the costs can get be too high on the side of the developer.

There are basically two strategies for countermeasures against implementationattacks. One strategy concerns solving the problem in the bud, i.e. impedingthe escape of internal information from the implementation or preventing faultinjection. In this case the grey box should be darkened to the maximum inorder to thwart physical implementation vulnerabilities. Nonetheless, the boxwill never be completely black as there is no way to entirely defend againstunknown attacks. Fully accomplishing this strategy can be unfeasible forinstance in the case of timing and power analysis – for devices relying on anexternal power supply. All embedded devices consume energy and require timeto operate. On the other hand, electromagnetic emanations can be shieldedavoiding the leakage of side-information to the exterior. Alternatively, the otherstrategy for countermeasures concerns accepting and remedying the problem,i.e. ensuring that the information extracted from the grey box model will beuseless for an adversary. This strategy can be implemented for instance bybalancing the implementation of a cryptographic primitive, performing dummyoperations intending to confuse the adversary, masking intermediate values withsecret random values, etc. Research in the direction of provable security forembedded security considering side-channel information in a formal model hasbeen addressed by leakage resilient cryptography [39].

In lieu of providing secure implementations by mending and patching currenttechnologies, a more liberal approach concerns exploring innovative technologiessuch as physically unclonable functions (PUFs) [98]. PUFs are a maturingtechnology that enables the implementation of information security objectives,e.g. through entity authentication schemes, and physical security objectives,e.g. by securely generating and storing cryptographic keys. PUFs maytherefore render unnecessary current architectures using random numbergenerators and secure memories required respectively for generating and storingcryptographic keys in a physically secure implementation. We discuss PUFs inSection 3.5 of this dissertation. Before either “going liberal” or conceptualisingcountermeasures for customary embedded security, however, it is very importantto understand very well the properties and characteristics of naked cryptographicimplementations even without countermeasures. This allows for assessing worst-case scenarios, i.e. upper bounds on the power of an attack against embeddeddevices. Researchers have to wear the mask of adversaries and attack embeddeddevices in order to find security breaches before attackers with bad intentionsdo. In the following section (3.4) we ergo discuss side-channel power analysis.


3.4 Side-Channel Analysis: Power Consumption

Side-channel analysis is a powerful threat for embedded security. Physicalimplementations transpire possibly exploitable information about their internallyprocessed data one way or another. This section aims at providing an abridgedoverview on side-channel power analysis. Gierlichs’ PhD dissertation (2011) [49]provides one the most elucidating yet relatively concise introductions to side-channel analysis with a focus on power analysis. Another comprehensivereference to the world of power analysis and countermeasures is the textbookby Mangard et al. [86], which also points to other good references.

The decision on which physical attack should be drawn in order to cracka physical implementation of some cryptographic primitive may depend onfour factors related to the attack cost: accessibility to the embedded device;availability of exploiting tools; accessibility to information leakage; and leakageexploitability. Overall, in contrast to fault analysis, the undetectability ofside-channel analysis due to its passivity, in addition to the relatively ease ofdata obtention due to the spontaneous nature of a side-channel leakage, makeside-channel analysis more attractive.

Within side-channel analysis, the investigation of the instantaneous powerconsumption of a targeted embedded device o�ers an attractive cost-e�ectivecompromise among alternative side-channels. The power consumption measuredin the specific time frame in which the cryptographic computations occurautomatically encompasses all timing information on the execution of thecryptographic algorithm. While overall timing analysis alone provides combinedinformation about the cryptographic implementation, power traces o�er morediscernible time sampled information. Therefore, power analysis enables theinspection of the inner working of the embedded device at specific points in time.Furthermore, unlike electromagnetic analysis, power analysis does not requireunpacking the chip to bypass any shielding caused by the chip’s material. Powertraces, i.e. signals containing instantaneous power consumption information, canbe conveniently acquired by tapping the embedded device’s (external) powersupply. Power analysis can be performed with ordinary equipment present in abasic electronics lab (oscilloscope, probes, data analysis software,. . .), most ofthe time not requiring any fancy apparatus. Concerning data quality, accuratemeasurements with a high signal-to-noise ratio can considerably contribute to thesuccess of an attack requiring fewer power traces. Physical accessibility to a largevariety of embedded devices is perfectly conceivable nowadays as the number ofcommercially available small computing devices rises. Alternatively, remote side-channel attacks are even possible depending on the type of information leakageto be observed (e.g. remote timing attacks [28]). Last, side-channel analysis canalso be used to reverse engineer a cryptographic implementation [31,32].

SIDE-CHANNEL ANALYSIS: POWER CONSUMPTION 29

Side-channels originated from embedded devices typically depend on the circuitrytechnology and architecture. The praised power consumption e�ciency of thedominating CMOS2 technology for integrated circuits is conversely a side-channel villain. The foundation for the CMOS technology relies on the use ofcomplementary and symmetrical pairs of transistors guaranteeing a low staticpower consumption. The benefits of the CMOS technology include savingenergy and generating little heat, thus being especially interesting for battery-powered devices. However, state switching in CMOS circuits yields spikes in thepower consumption. The more dynamic the circuit is, the greater the dynamicpower consumption is. The instantaneous power consumption of a CMOS-basedembedded device therefore informs how much work is performed internally.It will be explained later that the power consumption of a CMOS device iscorrelated with and caused by the (confidential) data being processed internally.Circuits implementing cryptographic algorithms generally follow their originaldesign architectures. For example, many cryptographic algorithms recursivelyiterate the same combination of mathematical transformations many times. Theunintelligibility level of interleaved chunks of plaintext and key data is thusincreased gradually. Side-channel analysis exploits implementation architectures.Also known as the divide et impera principle, a common strategy of side-channel attacks concerns splitting up a secure, di�cult to break cryptographicproblem into smaller, weaker and manageable portions exposing cryptographicintermediate variables. Therefore an attacker aims at revealing parts of the key(sub-key) at a time, instead of fighting a much more di�cult get-the-key-at-oncechallenge. The attacker can either reconstruct the full cryptographic key bycombining all recovered sub-keys, or can remarkably reduce the cryptographickey search space for a posterior brute force attack.

Venturing upon covering all possible physical attack scenarios risks being farincomplete as an adversary’s creativity is limitless and unpredictable. Withoutmuch loss of generality, we assume that the adversary ends up correctlyhypothesising the cryptographic algorithm and its physical implementationarchitecture operating within a targeted embedded device. Clues on those cane.g. be obtained from algorithmic- or implementation-specific patterns occurringin power traces. Either way, the adversary can always reformulate the workinghypothesis. The adversary can typically a�ord much more time for analysisthan the embedded security designer.

In a generic side-channel leakage model, there is an implementation of acryptographic algorithm that outputs a ciphertext after processing both theplaintext and key input to the grey box. One of the main hypothesis for side-channel power analysis is that the instantaneous power consumption of theimplementation depends on an intermediate variable that is somehow sensitive

2CMOS stands for Complementary Metal Oxide Semiconductor.


to the plaintext, the cryptographic key, and perhaps other constants. Typically,an adversary is better o� attacking the implementation gradually. The adversarystarts o� by focusing on one of the ends of the implementation, i.e. examiningwhat happens right after the plaintext input or right before the ciphertextoutput. For commonplace recursive cryptographic algorithms it means payingspecial attention to the computation of either their first or last round. Themain reason for that is because the adversary may have the advantage ofalready knowing one of the arguments (plaintext or ciphertext) of the sensitiveintermediate variable leaking secret-bearing information via power consumption.

3.4.1 Non-Profiled versus Profiled Power Analysis

A taxonomy of power analysis sorts attacks according to the method of useof the set of measured power traces. Non-profiled and profiled attacks canbe respectively related to unsupervised and supervised machine learning [91]paradigms. Non-profiled attacks work in one step using all acquired power traces,which are typically unlabelled, to recover (sub-)keys from implementations.These attacks are interesting e.g. when the adversary has limited access tothe targeted embedded device and can obtain a small or moderate numberof power consumption measurements only once. Profiled attacks work in twosteps instead. In the profiling (training) phase, sets of measurements labelledwith di�erent values of a sensitive variable are profiled correspondingly. In theattacking (classification) phase, the previously built profiles are used to classifyyet unseen power traces either extracted from the same embedded device or aduplicate. Profiled attacks suit cases in which the adversary has to performanalysis with few power traces after having had free access to the targeteddevice or a clone thereof for profiling. Moreover, in case a masking scheme issecuring the implementation, the extra entropy provided by the countermeasurecan be reduced by profiled attacks [12, 96]. In both non-profiled and profiledattack strands the adversary aims at breaking the cryptographic implementationusing as few power traces as possible. This is in order to reduce attacking costsand show o� the power of the attack (or the weakness of the implementation).Accurate profiling represents the strongest form of attack in the sense thatthe source of side-channel leakage is precisely modelled. Therefore, results ofprofiled analysis represent an upper bound on the performance of non-profiledanalysis provided enough statistically relevant data.

Non-Profiled Attacks

The most elegant and annihilating non-profiled attacks are the simple poweranalysis (SPA) and di�erential power analysis (DPA) [7]. SPA is more of a

SIDE-CHANNEL ANALYSIS: POWER CONSUMPTION 31

subtle or even artistic attack. It is based on the visual identification of key-related patterns over time that are hypothetically present in one or a few powertraces. In order to succeed, SPA usually requires a perceptive and trainedadversary knowing some of the implementation details. The more dynamic theimplementation is with regard to key bits, the more chances an adversary has toidentify spikes in the power traces caused by the processing of particular (smallblocks of) key bits. As the adversary acquires two or more power traces e.g. forthe same pair of plaintext and key input to the implementation, uninterestingnoise- and operation-related spikes can be gradually filtered out through achallenging visual exercise. Thus the adversary can focus more sharply on therelevant power consumption spikes particularly related to the cryptographic key.Examples of the application of SPA against the AES block cipher are [69]. To thedegree that more power traces are made available, more elaborate attacks suchas DPA can be mounted. DPA is less artistic, or heuristic, than SPA. DPA hasa more solid technical framework nevertheless remaining simple and being verypowerful. DPA further explores the previous line of thought of exposing powertraces features depending solely on the processed input data (chunks of plaintextand key). The underlying idea of the attack is based on a clever brute forceapproach. One statistically compares estimations of power consumption – for allpossible sub-keys input to a specific part of the cryptographic implementationhandling a sensitive variable – to the actual power traces. A statistical similaritymetric, like the di�erence of averaged signals [30], correlation [27], or mutualinformation [50], reveals the best estimated power consumption, which is inturn attributed to the correct sub-key hypothesis. As an illustration, thepower consumed during the manipulation of a key-sensitive variable might beapproximated by number of bit flips3 after a transition between its current andprevious state. Although rough, this model is many times su�ciently adequatefor modelling the power consumption from technologies like CMOS, which areknown to leak information during state changes. In addition, a key-relatedsensitive variable could be represented by the substitution box (S-box) that istypically present in rounds of cryptographic algorithms processing chunks ofintermediate ciphertext and key bits. An S-box nonlinearly intertwines all bitsof the sub-key and plaintext input to it. Because all S-box output bits dependon an intricate combination of all bits input to it, S-boxes contribute to a morerobust distinction between correct and wrong sub-key hypotheses.

3cf. Hamming weight (sum of bits in a vector) and Hamming distance models (number ofbit flips in a vector) [90].


Profiled Attacks

Profiled attacks have the greatest potential to better model side-channel leakagesin an information theoretical sense [30]. The best power consumption profilefor each key dependence that can be built in a statistical context is a faithfulapproximation to the probability mass function of the leakage. The price to payin exchange is that accurate power consumption profiles solicit a representativeamount of leakage measurements. Template Attacks [30] a�ord this price andeven embraces it. Once the likelihood probabilities, or leakage templates for eachkey dependence, are estimated, the attacking phase is much less computationallyexpensive. Conversely, the posterior probability of each key dependence givena (few) targeted power trace(s) is drawn from the Bayes’ theorem. Finally,a maximum likelihood approach reveals the most probable key (dependence).In more detail, Template Attacks model noise within power traces assuminga multivariate Gaussian distribution. Step-wise, a quantity of the order ofthousands of power traces have to be obtained for each key dependence. Next,averages of the power traces corresponding to each key dependence should becomputed. Only particular points of interest of the power traces are selectedfor further analysis. This idea relates to feature selection in machine learning,therefore discarding irrelevant data and favouring complexity reduction. Thechoice of the most key-dependent points in time can be done by observing thelargest di�erences between the averaged power traces for each key dependence.A more elaborate feature selection approach, still based on the di�erence ofsignals, also motivates the use of the second order moment in a second orderexpression [48]. The next step in the attack concerns calculating the covariancematrix of the noise vector for each key dependence. Each Gaussian template iseventually characterised by the corresponding average and covariance matrix.

Template Attacks are very e�ective provided that the Gaussian distributionleakage hypothesis holds and a su�cient amount of data is somehow collected [48].In [59], we compare a supervised machine learning approach – without any priordistribution assumption – to Template Attacks. We investigate three di�erentclassification cases using power traces from an unprotected implementation of theAES algorithm implemented in software. This interdisciplinary work is brieflydiscussed in Section 4.3 of our Contributions Chapter and fully reproduced inPart II of this doctoral dissertation.

3.5 Physically Unclonable Functions (PUFs)

Because of the existence of physical attacks such as side-channel and faultanalysis, the implementation of information security techniques play a very

PHYSICALLY UNCLONABLE FUNCTIONS (PUFS) 33

important role in an overall security solution. As we move towards thebottom of the information storage and processing chain, digital informationabstractions fade away and information is eventually represented in a physicalway. Cryptographic embedded devices typically store keys in protected siliconmemories. At this low level, cryptographic techniques do not apply. Moreover,security often relies on obscurity. The silicon memory should be hidden in acomplex chip layout and possibly shielded by extra metal layers. However, chipmanufacturing is often outsourced and manufacturers cannot be fully trustedin some situations. The integrated circuit risks being reproduced or revealedwithout permission, or even being stolen. This is a great concern especiallyfor highly critical embedded devices for the military industry and intelligenceagencies, for example.

3.5.1 Concept

Physically Unclonable Functions, or PUFs, introduced by Pappu in 2001 [98]4o�er a higher level of security against physical attacks. PUFs implemented insilicon have been proposed as a physically more secure alternative to storingsecrets in a digital memory. Instead of working with binary values stored in asilicon memory, PUFs use (unique) random nanoscale structures which occurnaturally in silicon devices in order to store secrets. An interesting aspect isthat the PUF randomness, whether intrinsic or not, cannot be fully controlledin the manufacturing process or physically cloned by a third party. Followingthe Kerckho�s’ desiderata, a PUF design and implementation details may bepublicly accessible by an adversary that however will not be able to learn complexinstance-specific random features unavoidably created during the manufacturingprocess. Additionally, PUFs have the potential to be more e�cient since costlyphysical protection measures can be avoided. The entropy extracted fromthe randomness present in PUFs may be deployed to directly generate andinherently store cryptographic key bits. The randomness present within PUFsis expected to remain steady. PUF responses have to be reproducible to be of apractical use. An extraction algorithm is responsible for guaranteeing that themeasurements of a physical component are correctly and reliably translated intoPUF responses. The merger of the reproducibility and uniqueness propertiesof PUFs enable them to work as identifiers too. PUFs may thus be appliedin secure identification for IDs, passports, driving licences, medical devices,military equipment, and so on.

The most comprehensive academic piece on PUFs to date is Maes’ PhDdissertation (2012) entitled “Physically Unclonable Functions: Constructions,

4The concept of PUFs was first named physical one-way functions [98].


Properties and Applications” [80]. In that work, a PUF is defined as “anexpression of an inherent and unclonable instance-specific feature of a physicalobject”. To accompany that, the Wikipedia [11] definition5 o�ers a perhaps moreinformally tangible characterisation of the PUF concept: “a PUF is a functionthat is embodied in a physical structure and is easy to evaluate but hard topredict. Further, an individual PUF device must be easy to make but practicallyimpossible to duplicate, even given the exact manufacturing process that producedit.”. PUFs can be somewhat compared to human biometric features. There areno two human beings having exactly the same fingerprint, iris, DNA and soforth; even though artificial forgery may always be attempted. In the context ofa system, a PUF can be interpreted as a probabilistic function that is input witha challenge and outputs a corresponding response. The challenge is typicallymulti-bit, whilst the response is typically one-bit in size. The probabilisticbehaviour of the PUF output is due to its physical nature and vulnerability toenvironmental conditions such as temperature and other e�ects coming frompower supply variations, etc. PUFs are distinguished from other constructionsand secure primitives such as true random number generators and cryptographicprimitives by virtue of properties. Back to Maes in [80], one of the maincontributions of his work concerns a thorough analysis on six PUF constructionsleading to a conclusion on the two PUF-defining properties, which were foundto be identifiability and physical unclonability6. Neither true random numbergenerators or cryptographic hash function outputs are identifiable, whereasneither keyed or unkeyed cryptographic primitives are physically unclonable.Identifiability and physical unclonability necessarily imply the PUF being unique,reproducible, evaluable and constructible. Other desirable but not rigorouslyrequired PUF properties concern strict unpredictability, mathematical andphysical unclonability, “one-wayness”, and tamper evidence. Behind all this,two important concepts forming the foundation for describing PUF propertiesregard the PUF response intra- and inter-distance. The intra-distance is limitedto one PUF instance. It describes the distance between two responses whenone PUF is excited twice with the same challenge. Ideally, a PUF shouldalways output the same response given a particular challenge, i.e. the intra-distance should ideally be minimal (zero). The inter-distance, on the otherhand, considers the distance between responses of two di�erent PUFs of thesame type when input with the same challenge. Any two PUF instances shouldbe as di�erent as possible, i.e. the inter-distance should ideally be maximal(50%).

5As on March 22, 2013.6“(Un)clonability”: the property of being (un)clonable.


3.5.2 Constructions

There are over a dozen types of PUFs and the number is growing. In practice,not all proposed PUFs meet the PUF-defining or additional desirable propertiesto a major extent – with the exception of the optical PUF [98, 99]. In thelatter, a laser beam is shot into a semi-transparent material embedded withrandom speckles. This optical token is the source of randomness of the PUF.The challenge of the optical PUF is the laser orientation with regard to theoptical token. The response is obtained by processing the scattered patternresulting from the laser beam passing through the semi-transparent token.

PUF constructions are typically distinguished by their electronic nature (non-electronic; electronic; silicon), randomness source (intrinsic; extrinsic), ordi�culty of getting them modelled by mathematical/software means froma subset of challenge-response pairs (weak; strong) [52]. A PUF is consideredstrong if an adversary with mathematical modelling capabilities is not ableto predict the response for a yet unseen challenge after having had fullaccess to the PUF for a su�cient time. The exemplary optical PUF is anon-electronic, extrinsic, and strong construction. However, PUFs of morepractical use for embedded security have shown to be the silicon ones. Existingsilicon PUFs are not remarkably strong, but still have potential for embeddedsecurity. Silicon PUFs naturally possess intrinsic randomness generated in themanufacturing process. They are typically classified into memory-based ordelay-based silicon PUFs, in addition to a third category of mixed-signals siliconPUFs. Memory-based silicon PUFs (e.g. [52, 56, 70, 71, 81, 119, 120]) rely oncharacteristics of memory cells such the device mismatch in bistable memoryelements. Delay-based silicon PUFs (e.g. [74, 121,124]) rely on delays occurringin digital circuits such as signal race conditions. Mixed-signals silicon PUFs(e.g. [43, 77, 102, 108, 109]) extract the PUF behaviour from analog electronicsignals that are later digitised.

PUFs have been subjected to validation attacks aiming at creating mathematicalor software clones in order to challenge their security [110]. If the PUF behaviouris learned by an adversary, then the assumed PUF security reduction to itsinnerly stored physical secret does not hold anymore. In this case the securityof the PUF-based system is nullified in consequence of the adversary actinglike a legitimate PUF with a high probability. The most traditional scenarioto model PUFs is equivalent to supervised machine learning. The goal isto generalise the PUF internal behaviour by analysing a limited number ofchallenge-response pairs. First an adversary acquires a subset of challenge-response pairs from the PUF under attack. In a modelling/cloning/trainingphase, the adversary builds a model with the goal to mimic the underlying PUFresponse generation mechanism. In an attacking/challenging/testing phase,


the adversary is provided with random challenges not used in the previousstep in order to predict PUF responses. The best response prediction rate, orsuccess rate, that an adversary can achieve equals the PUF robustness, which isgiven by the ratio of error-free response reconstructions. This number is lowerthan 100% in practice. On the other hand, an adversary should be capable ofmaking response predictions with a higher accuracy than the PUF responsebias towards either zero or one. Because it is cheaper and less time consuming,challenge-response pairs collected from software simulations of PUFs, instead ofhardware implementations, are often used for the security assessment of PUFconstructions (e.g. in [110]). Conclusions from such experiments are relevantto some extent but lack evidences from a physical implementation, which areparticularly important for PUFs. We evaluate the security of two physicallyimplemented delay-based silicon PUFs named Arbiter PUF and Glitch PUFrespectively in [75] and [124]. These works are motivated in our ContributionsChapter and fully reproduced in Part II of this dissertation. The Arbiter andGlitch PUF constructions are summarised below.

Arbiter PUF

The following description of Arbiter PUFs is extracted from our publication [61].Arbiter PUFs [75] are a type of silicon PUFs for which the PUF behavior iscaused by the intrinsic manufacturing variability of the production processof integrated circuits. They are constructed as a concatenation of stages,with each stage passing two inputs to two outputs, either straight or crosseddepending on a challenge bit. The propagation of two signals through an ¸-stageArbiter PUF is determined by an ¸-bit challenge vector. By careful design, thenominal delays of both paths are made identical. However, the e�ective delaysof both paths in a particular implementation are never exactly deterministic,but are subject to random delay mismatch caused by the integrated circuitmanufacturing variability. As a consequence, one of both paths will propagatea signal slightly faster or slower than the other, depending on the consideredphysical implementation, and depending on the applied challenge vector. Anarbiter sitting at the end of both paths determines on which of the outputs arising edge, applied simultaneously to both inputs, arrives first. The arbiteroutputs a one-bit response accordingly. An Arbiter PUF implementationgenerates one-bit responses from ¸-bit challenges, and is hence able to produceup to 2¸ di�erent CRPs. Lee et al. [75] immediately realised that these 2¸

di�erent response bits of an Arbiter PUF are not independent, but can bemodeled by an additive linear delay model with a limited number of unknownparameters. An adversary can attempt to estimate these parameters for aparticular Arbiter PUF from a set of qtrain known CRPs, e.g. using machinelearning techniques. Once an accurate model of the PUF is built, the remaining


2¸ ≠ qtrain response bits are not random anymore but can be predicted by theadversary. Parallel to modeling attacks on Arbiter PUFs, countermeasureswere introduced aimed at preventing modeling. Their basic idea is to disturbthe linearity of the delay model by adding non-linear elements to the responsegeneration, such as feed-forward challenge bits [75] and exclusive-or (XOR)combinations of responses [85, 121]. Nonetheless, it was demonstrated basedon simulations of PUFs [110] that advanced machine learning techniques arestill able to generate good models after training with a larger number of CRPs.Besides the simple Arbiter PUF as described above, we also consider in [61]the k-XOR Arbiter PUF [85,110,121] consisting of k equally challenged simpleArbiter PUFs, with the response bit of the k-XOR Arbiter PUF being theXOR of the separate k arbiter outputs. It should be noted that the finalresponse created by the combination of intermediary responses from di�erentPUF instances becomes noisier because of the composition of the independentnoise sources.

Glitch PUF

The following description of Glitch PUFs is adapted from our publication [131].Di�erent Glitch PUFs have been proposed until now. In 2008, Crouch et al. [36,100] first proposed the concept of extracting a unique digital identification usingglitches obtained from a 32-bit combinational multiplier. In 2010, Anderson [14]proposed a glitch-based PUF design specifically targeted for FPGAs7. ThisGlitch PUF generates a one-bit response based on the delay di�erences betweentwo multiplexer chains. Then, a new glitch-based PUF using one AES S-Boxas a glitch generator was proposed in 2010 [124], and improved in 2012 [115]by Suzuki et al. The latter Glitch PUF proposal suggests good performanceand security features – such as resistance against machine learning attacks, andpractical advantages as it can be implemented in ASIC8 and FPGA platforms,as claimed by the authors. The Glitch PUF [115] uses one eight-bit AES S-Boxbased on composite Galois field as a glitch generator. The challenge input tothe Glitch PUF has eleven bits and is composed of two parts. The first partof the challenge is composed of eight bits input from the data registers to theAES S-Box. Each of the eight output bits of the S-Box generates a di�erentnumber of glitches due to the complicated non-linearity of the AES S-Boximplementation. The second part of the challenge is composed of three bits toselect one out of the eight AES S-Box output bits. A toggle flip-flop eventuallyoutputs the Glitch PUF response by evaluating the parity of the number ofglitches that appear in the selected AES S-Box output bit. A masking scheme

7FPGA stands for Field-Programmable Gate Array.8ASIC stands for Application-Specific Integrated Circuit.


is used to select stable challenges outputting the same responses at normaloperating condition (room temperature and standard supply voltage) most ofthe times.

3.5.3 Applications

PUFs are used for challenge-response authentication and secure key generation.As in [99], the basic idea for challenge-response entity authentication is to havea verifier probing an authenticating PUF-based device with challenges for whichthe correct responses are known beforehand. The device is deemed authenticby correctly recreating a number of responses. The identifying feature of PUFscome as a result of their uniqueness and reproducibility properties. The inherentidentity of PUFs represent an advantage in the sense that no identity generationprocess should occur as per usual. However, a PUF should be used with carebecause its responses have a stochastic component that makes them not perfectlyreconstructible. Because of this, the verifier makes allowance for a number ofresponse bit errors replied from the authenticating party. The error thresholdjudging a device’s authenticity depends on the specific application. For example,it may be more relevant to guarantee the authentication of a legitimate buthighly unstable PUF-based device instead of mistakenly considering fake devicesas genuine because they happen to output correct responses. Performances ofbinary decision classifiers with regard to the discrimination threshold can beassessed by receiver-operating characteristic (ROC) curves [125]. A generallyaccepted security-usability trade-o� solution for optimising the discriminationthreshold concerns the point in which a device’s false acceptance rate and falserejection rate intersect. Such intersecting point is so-called equal error rate. Oneway to ensure a practically acceptable security level, i.e. lower the equal erroras much as possible, concerns increasing the number of messages exchangedbetween the verifier and device in the authentication protocol.

A second important application for PUFs concerns the generation and storage ofcryptographic key bits. The creation of cryptographic keys requires a source ofrandomness such that key bits cannot be predicted by an adversary (cf. attackson similar public RSA keys sharing the same prime factor [76]). PUFs arecapable of catering fresh keys because of their source of true randomness, inaddition to natively storing the random key bit generator equivalent. However,schemes for generating keys from PUFs have to deal with their noisy andrelatively unreliable nature regarding its faulty mathematical function-behaviour.Furthermore, the entropy contained in a PUF may not be enough for generatingde facto random keys. Therefore processing techniques should compress asmuch entropy available as possible into a key vector. Such requirements are

MODELLING 39

addressed by fuzzy extractors [38]. More information on secure key generationfrom PUFs can be found in [25,52,78,82,127,132].

3.6 Modelling

Modelling attacks on PUFs a�ect PUF-based security applications directly. O�-the-shelf machine learning techniques, such as Artificial Neural Networks [53] andSupport Vector Machines [34], combined with typical (un)supervised learningstrategies can be deployed for PUF modelling without requiring any specificadjustments. Information on machine learning techniques can be found in manygood textbooks such as [91] and certainly others. In our publication [61],reproduced in Part II, we provide a summary on machine learning. We referthe interested reader to page 65 of this dissertation. In addition, in that samework, [61], we provide a generic methodology to assess the impact of modellingattacks on challenge-response entity authentication and secure key generation.Section 4.1 continues motivating PUF modelling and consequences for commonPUF attacking scenarios.

3.7 Conclusion

New security issues arise when information security techniques are implementedin hardware. In this chapter we discussed physical attacks targeting embeddedelectronic devices. We focused on power analysis and presented similarities tomachine learning approaches. In addition, we discussed PUFs without the goalof being complete, but on the other hand aiming at eventually highlighting theimpact of modelling attacks on PUF-based security applications.

Chapter 4

Contributions

“Simplicity is the ultimate sophistication.”

— Leonardo Da Vinci

In the two previous chapters, we connected the various topics involved in thisdoctoral dissertation and introduced each of them. This chapter highlightsour corresponding research contributions. They are briefly described in fourindependent sections. We emphasise that the contributions are not necessarilypresented in chronological order. Full technical details are provided only inPart II (Publication-based Chapters) together with a brief note on our specificcontribution in each topic due to the collaborative nature of each work.

The sections in this chapter are ordered as follows:

• Section 4.1: PUF Modelling Consequences

• Section 4.2: Performance and Security of Glitch PUFs

• Section 4.3: Machine Learning in Power Analysis

• Section 4.4: Statistical Digital Image Steganography

41

42 CONTRIBUTIONS

4.1 PUF Modelling Consequences

Research on PUFs has been lacking security and performance analyses ofphysically realised constructions. Although results from software simulationsof PUFs are much cheaper and still relevant, insights from otherwise moreexpensive physical realisations can not be enjoyed from simulations. In addition,standardised assessment methodologies for PUFs are needed both in terms ofnew proposals and consensus from the academic community. In [61] (reproducedin Part II), we work towards an attempt to address these issues.

First, we empirically verify the vulnerability of simple and 2-XOR 64-stage Arbiter PUFs implemented in a modern 65nm CMOS technology tomachine learning-based modelling. The majority of related previous work onPUF modelling has dealt only with mathematical or circuit simulations ofPUFs [97, 110], instead of using data from a silicon implementation. Justas importantly, none of them has assessed the e�ect of modelling on PUFapplications. Second, we propose a generic methodology for assessing the impactof modelling on the two most popular PUF applications. For challenge-responseentity authentication, we propose a way of assessing the security vs. usabilitytradeo�, which directly relates to security costs. For secure cryptographickey generation, we calculate upper bounds on the number of secure key bitsextractable from PUF responses. In both cases we consider an adversaryequipped with a relatively accurate machine learning-based model of the PUFsin vogue.

4.1.1 Machine Learning PUF Modelling

Before setting up any attack scenario, we consider an adversary willing to clonea PUF for some reason. One of the adversary’s strategies concerns learning theunderlying PUF behaviour that generates a PUF response given a particularchallenge. A suitable approach for this problem is to use a supervised learningalgorithm aiming to generalise the PUF behaviour by analysing a limitedamount of challenge-response pairs extracted from the targeted PUF. Followingthis line of reasoning, we consider an adversary acquainted with two popularmachine learning techniques: Artificial Neural Networks [53] and Support VectorMachines [34]. These techniques have shown to be flexible enough to learn alarge variety of linear and non-linear models without necessarily dependingupon prior assumptions that are often restrictive and/or require additionalspecific parameters.

In our experiments, we model simple and 2-XOR 64-stage Arbiter PUFs. The

PUF MODELLING CONSEQUENCES 43

challenge-response pairs come from a 65nm CMOS implementation1 of thesimple 64-stage Arbiter PUF. The responses of the 2-XOR Arbiter PUFs aregenerated o�ine by performing the XOR of the responses of two di�erent simpleArbiter PUFs inputted with the same challenge.

Our working hypothesis is that the larger the amount of challenge-response pairsan adversary possesses, the more accurate the machine learning-based PUFmodel becomes. We train ANNs and SVMs with di�erent numbers of challenge-response pairs. After, we check our PUF modelling accuracy in recreatingresponses extracted from the same PUF when excited with challenges not usedin the model building phase. The parameters of our ANN- and SVM-basedmodels are heuristically tuned using a simple grid search approach. A numberof predefined candidate values for the parameters was tested. The parametersleading to the highest accuracy in a validation phase were selected. This avoidsintroducing unnecessary complexity and sounds reasonable from an adversary’sperspective.

The results of our experiments confirm our hypothesis for both simple and2-XOR 64-stage Arbiter PUFs:

• For simple Arbiter PUFs, we observe that the accuracy of our modelsimproves as more challenge-response pairs are fed into the model buildingphase. Our ANN-based models’ accuracy reach the response predictionupper bound, given by the Arbiter PUF robustness of our siliconimplementation. Interestingly, as the number of challenge-response pairsused for model building decreases our SVM-based models outperform ourANN-based models. We use a maximum of 5000 out of the 264 total numberof available Arbiter PUF challenge-response pairs for building the models.Our results show that a modern silicon Arbiter PUF implementationis totally vulnerable to modelling. We finally corroborate results fromprevious work on Arbiter PUF modelling [110] using simulated data aboutthe susceptibility of this PUF type to modelling.

• For 2-XOR Arbiter PUFs, we also observe that the accuracy of ourmodels improve as more challenge-response pairs are provided in themodel building phase. Our ANN-based models outperform our SVM-based models, but none of their accuracy reach the maximum given bythe 2-XOR Arbiter PUF robustness. The reason behind that is therelatively little amount of data used in our 2-XOR Arbiter PUF modellingexperiments, which lead to underfitting of our models. Nevertheless,our models’ response predictions are considerably more accurate than

1We have a pool of 192 ICs (integrated circuits) containing the 8 PUF types presented inSection 3.5. Each IC implements 256 simple 64-stage Arbiter PUFs [10].

44 CONTRIBUTIONS

performing a random guess, i.e. our average correct response predictionrate is remarkably greater than the PUF bias. In the 2-XOR ArbiterPUF case, modelling is more di�cult due to the hard-to-invert nature ofthe XOR operation used in this PUF type. A larger number of challenge-response pairs is required for accurately modelling this PUF type.

We do not claim that either of the machine learning techniques discussed isbetter than the other is any sense in the context of PUF modelling. Ourexperiment setup and results to do not allow us to make such a claim mostlybecause the techniques are not optimally tuned in any strict sense. Nevertheless,the so called No Free Lunch theorem [130], which should be used with care sothat it does not sound as poorly reasoned justification, tells us that there is nosuch a thing as a universally best machine learning technique. However, wedo show that a plain use of o�-the-shelf machine learning techniques succeedin modelling real simple and 2-XOR Arbiter PUFs. This is already critical forthe security of these types of PUFs. With accurate modelling the number ofunpredictable (2-XOR) Arbiter PUF responses is reduced and the number ofuseful challenge-response pairs for some PUF application gets limited.

4.1.2 Modelling Impact on PUF Applications

If an adversary has a model of a PUF used in a critical part of some secureapplication, then the security of such application is directly compromised. Inthis case, by having access to the challenges input to the PUF the adversarycan recreate the corresponding responses thus behaving as the legitimate PUF.The extent to which an adversary’s PUF model represents a practical threat tosome secure application basically depends on two factors. One of the factors isthe accuracy of the model in recreating ideally unclonable PUF responses. Theother factor is the number of challenge-response pairs that is relevant for theproper functioning of the application.

Here, our working hypothesis is that the impact of PUF modelling on challenge-response entity authentication [98] and secure cryptographic key generation [87,127] can be assessed in a methodological and quantified way. We performexperiments that provide security bounds for these two applications consideringan adversary equipped with machine learning-based models of simple and 2-XOR64-stage Arbiter PUF (built in the previous subsection).

The motivation for PUF-based challenge-response entity authentication schemesis the “unclonability” property of PUFs. The main idea is that a verifierchallenges the untrusted, authenticating PUF-based entity and verifies thereturned responses. A PUF-based entity is authenticated if at least a minimum


number of its recreated responses match those that were stored earlier in averifier’s database containing authentic PUF responses. Anyway, the verifiershould tolerate a few response errors from the authenticating PUF-based entitybecause PUF responses are not perfectly reconstructible. Therefore, the numberof recreated response errors that is admitted by the verifier represents a tradeo�between security and usability in the challenge-response entity authenticationscheme. If the verifier allows a small (resp. large) number of response bit errors,then there is a high chance that authentic (resp. non-authentic) PUF-basedentities that happen to generate a large (resp. small) number of bit errorswill be mistakenly rejected (resp. accepted). The analysis is equivalent to thereceiver operating characteristic (ROC) of a binary classifier with a varyingdiscrimination threshold.

An adversary can e.g. eavesdrop on a PUF-based challenge-response authentica-tion protocol and thus obtain a number of challenge-response pairs to build hismodel. We assess the relation of the accuracy of the adversary’s model to thesecurity-usability tradeo� of a challenge-response entity authentication schemeby analysing authentication rejection and acceptance rates. Furthermore, wecalculate lower bounds on the number of times the verifier should challenge thePUF-based device in order to ensure practically acceptable levels of security.

Another important application suitable for PUFs relies on their implicit andstatic randomness to securely generate and store cryptographic keys. Fuzzyextractors [127] generate secure cryptographic keys from PUFs. They increasethe reliability of typically noisy PUF responses and compress their entropy in afixed number of response bits. These particular response bits compose the securecryptographic key. The length of the cryptographic key depends on the securitylevel required by some application. The theoretical maximum number of securekey bits that can be extracted from a fuzzy secret – vector containing di�erentPUF responses from the same device – is given by the secrecy capacity [87].The secrecy capacity, in turn, is defined as the mutual information between tworealisations of the same fuzzy secret. These realisations are di�erent with highprobability due to the noisy characteristic of PUF responses. Considering anadversary2 having a model of the PUF used for generating secure key bits, wepropose a methodology to estimate upper bounds on the secrecy capacity of aPUF in the light of information theory. Our secrecy capacity is upper boundedby the accuracy of the adversary’s model and the PUF response error rate.

The results of our experiments indicate that our hypothesis is correct:2In practice, the attack scenario for secure key generation is a brute-force attack. Even

if the adversary cannot access responses, he or she may have access to some challenges.The adversary guesses the responses of a limited number of obtained challenges – hopefullycontaining most of the entropy – and builds a nearly full-accurate model allowing him or herto accurately predict the remaining response bits composing the cryptographic key.

46 CONTRIBUTIONS

• For challenge-response entity authentication, we use an analysis basedon authentication acceptance/rejection rate with regard to the numberof bit errors accepted by the verifier. We assess by how much anadversary in possession of a PUF model created with a limited numberof possibly eavesdropped challenge-response pairs increases his or herchances of authenticating on behalf of an authentic PUF-based device.We show that 64-stage Arbiter PUFs are not suitable for challenge-responseauthentication. Regardless of the number of responses a verifier requeststo a PUF-based entity in order to verify its authenticity, a man-in-the-middle-eavesdropper is in theory able to collect enough challenge-responsepairs to build an accurate PUF model even before the authenticationprotocol is over. For 2-XOR 64-stage Arbiter PUFs, the conclusion is lesspessimistic though. In order to succeed in impersonating an authentic 2-XOR 64-stage Arbiter PUF-based device, the adversary needs to eavesdropon more than one protocol run between the PUF-based device and verifier.Nevertheless, the number of possible secure authentications for 2-XOR64-stage Arbiter PUFs is severely limited.

• For secure cryptographic key generation, we derive upper bounds on thesecrecy capacity of our simple and 2-XOR Arbiter PUF implementationsrelative to an adversary’s modelling power. The underlying hypothesisfor considering PUF modelling attacks is that responses generated by thesame PUF are not independent. The probability of observing a particularresponse given other responses from the same PUF can be approximatedby the accuracy of a good PUF model. The entropy of this approximatedprobability is greater than the entropy of the real probability, as there ismore uncertainty in the former. This analysis allows us to upper boundthe secrecy capacity’s mutual information after expanding its marginal andcondition entropy components. Therefore, our methodology to calculatethe upper bound of a PUF’s secrecy capacity relies on the accuracy ofthe PUF model. The more accurate the PUF model is, the tighter ourbounds become. What is important to note is that no algorithms areknown to reach the secrecy capacity’s theoretical maximum. E.g. for64-stage Arbiter PUFs, we show that the secrecy capacity is limited to atmost 600 secure bits out of a pool containing 5000 PUF response bits. Forthe 2-XOR 64-stage Arbiter PUFs, the secrecy capacity cannot be largerthan twice as much the secrecy capacity of the simple 64-stage ArbiterPUF. The entropy of the output of a deterministic function, such as theXOR, cannot be larger than the combined entropy of its inputs. Last, inorder to verify the convergence of the secrecy capacity’s upper bound, weevaluate what we call the incremental secrecy capacity. The incrementalsecrecy capacity expresses by how much the secrecy capacity increaseswhen considering one additional response bit in the pool of response bits.


Our proposed methodology for assessing the implications of modelling attackson PUF-based security applications is generic for any PUF type su�eringfrom modelling and for any PUF modelling approach. We demonstrated itsapplication using our machine learning-based modelling results of simple and2-XOR 64-stage Arbiter PUFs implemented in 65nm CMOS. To the best of ourknowledge, our proposed methodology to assess the e�ect of modelling on thesecurity and usability of PUF applications contributes to fill in an existing gapin the research field of PUFs [80].

48 CONTRIBUTIONS

4.2 Performance and Security of Glitch PUFs

In [131] (reproduced in Part II), we carry out an independent performance andsecurity analysis of Glitch PUFs proposed by Suzuki et al. [124]. Performingan unbiased third party evaluation of new PUF proposals is very important toverify designers’ results and claims, as well as for contributing new insights.

Glitch PUFs [14,36,100,115,124] were introduced with the goal to counter therelative ease of modelling inherent to some PUF types, such as the Arbiter PUF,while being practically feasible. The response generation mechanism of GlitchPUFs relies on the complex non-linear behaviour of electronic glitches appearingin some digital circuits. Suzuki et al. [124] propose the use of a compositefield-based AES S-Box circuit as the glitch generator. A one-bit response isthen generated according to the parity of the number of glitches present in oneof the eight AES S-Box output signals. Suzuki et al. [124] consider Glitch PUFchallenges to be composed of eight bits, which are input to the AES S-Box, inaddition to three bits selecting one AES S-Box output bit as the PUF response.

We implement composite field AES S-Box-based Glitch PUFs in 20 FPGAs(henceforth interchangeably referred to as Glitch PUFs) and conduct perfor-mance and security assessments. Our main contribution is threefold. First,we point out that the number of challenge-response pairs evaluated in [124]is underestimated. This means that performance and security results in [124]are incomplete. Glitches serviceable for PUFs are basically generated whenthe challenge bits input to the AES S-Box implementation change from onevalue to another. We acknowledge the fact that the Glitch PUF response isequally a�ected by both the current and previously used challenges. For thatreason, we understand that both current and previously used challenge shouldbe practically regarded as the Glitch PUF challenge. Suzuki et al. [124] do notmake this consideration. Though it is not clear in [124], it seems that theyalways reset the registers storing the PUF challenges to zero before inputtingthe new challenge. Therefore, Suzuki et al. [124] end up using a very limitedsubset of all possible Glitch PUF challenge-response pairs in their experiments.

Second, we reveal critical performance issues of Glitch PUFs when usingall challenge-response pairs according to the aforementioned criterion. Ourexperiments expose stability- and randomness-related issues not reported in [124].Such issues make us question the very PUF nature of Glitch PUFs because theyrelate to their identifiability – one of the two PUF defining properties togetherwith physical unclonability as concluded in [80].

Furthermore, PUF responses should be reproducible [80]. It is required that aPUF consistently outputs the same response given the same challenge. Accordingto the Section 3.5.1, the PUF intra-distance should be as close to zero as

PERFORMANCE AND SECURITY OF GLITCH PUFS 49

possible. We observe that this does not necessarily hold in practice for GlitchPUFs to a reasonable extent. Glitch PUF responses are not robust especiallywhen operating conditions such as temperature or supply voltage vary. As aconsequence, Glitch PUFs are inappropriate for applications in which either thevoltage supplier is not quite stable or e.g. the ambient temperature may exertinfluence over the normal PUF operating temperature. An adversary could takeadvantage of such undesirable Glitch PUF characteristics to gain some kind ofadvantage over a Glitch PUF-based system.

Another PUF requirement if that the responses should be unique for eachPUF [80]. It is expected that di�erent PUFs of the same type produce di�erentresponses especially when excited with the same challenges. According to theSection 3.5.1, the PUF inter-distance should be as as close to 50% as possible. Weobserve that Glitch PUFs present low uniqueness in practice. Our experimentsshow that di�erent Glitch PUF instances generate the same response for agiven challenge with a relatively high probability. Hence, di�erentiating a greatnumber of Glitch PUFs is practically unfeasible, thus limiting their large scaleapplication. An adversary who is able to model one Glitch PUF instance istherefore capable of modelling other Glitch PUF instances to some extent. Weobserve that the Glitch PUF reliability and uniqueness depend on the Hammingdistance between the current and previously used challenge. In order to addressthe performance issues mentioned above, one has to carefully select a specificset of challenge-response pairs meeting the requirements of some application.This is feasible from a technical standpoint, but the design costs would increase.

In our third contribution, we assess the predictability of Glitch PUF responses.Ideally, all responses should be equally unpredictable and independent. Weclarify that Glitch PUFs, on the other hand, have a set of responses that aremore easily predictable than the others. We perform an analysis on the numberof glitches occurring in each AES S-Box output with regard to the current andpreviously used challenge. This analysis reveals clear patterns relating somePUF responses to particular challenges. The composite field-based AES S-Boximplementation is to blame. The transitions between the previous and currentchallenge sometimes does not change internal variables of the implementationthat are essential for generating glitches. A machine learning-armed adversarycould e.g. incorporate this information in the PUF model to improve its GlitchPUF modelling accuracy3. As a prevention measure, Glitch PUF responsesknown to be more easily predictable should be discarded before applying GlitchPUFs for some security application.

Our performance and security findings on Glitch PUFs as proposed by Suzuki3We have shortly experimented with o�-the-shelf machine learning techniques for Glitch

PUF modelling, but did not reach any improvements over randomly guessing responses.

50 CONTRIBUTIONS

et al. [124] lead us to conclude that the AES S-Box implementation usinga composite field should not be used as a glitch generator for Glitch PUFs.There are two main reasons for that. First, such Glitch PUFs present seriousstability and randomness issues that disqualify them from being a PUF. Second,the composite field-based AES S-Box implementation permits an adversary toinversely map responses to challenges. Furthermore, in [131] we argument thatalternative AES S-Box implementations are not suitable for generating glitchesfor Glitch PUFs either. We finally extend our conclusion and strongly suggestthat AES S-Boxes should not be used as glitch generators for Glitch PUFs.

MACHINE LEARNING IN POWER ANALYSIS 51

4.3 Machine Learning in Power Analysis

The connection between machine learning and cryptography is not new.Kearns, in his PhD thesis [65] published in 1989, investigated how to reduceproblems from cryptography to machine learning problems. Besides, in 1991Rivest [106] published a very interesting article on the synergies between machinelearning and cryptography even referring to them as “sister fields”. Machinelearning techniques have already been applied to side-channel analysis as well.For example, by using acoustic information acquired through a microphone,researchers have succeeded in revealing words printed in dot-matrix printers [18]and typed on computer keyboards [17, 133]. Embedded devices, in turn,require security measures, and by consequence cryptographic primitives, to bephysically implemented. As discussed in Section 3, electronic implementationsof cryptographic primitives end up creating breaches that can be exploited usingattacks like side-channel analysis.

To the best of our knowledge, our work in [58], concurrently with Lermanet al. [73], is the first to merge machine learning and power analysis. Bothpapers were independently presented at the relatively new, but decently levelledpeer reviewed side-channel oriented International Workshop on ConstructiveSide-Channel Analysis and Secure Design (COSADE) in 2011. Six out of the22 accepted submissions were further sieved and selected to be published inthe Journal of Cryptographic Engineering (JCEN). The JCEN version of ourwork [59] is presented in Part II of this dissertation.

The power consumption of a device is known to carry information about theinternals of the device like processed data and performed operations. Thereare basically two general attack scenarios for side-channel analysis: profiledand non-profiled attacks. We focus on profiled power consumption attacksaiming at recovering (pieces of information about) the cryptographic key thatis processed in a cryptographic embedded device. In the profiling or trainingphase, the adversary has access either to the target device or a copy thereofso that the device can be scrutinised. The adversary collects a number ofpower traces that are labeled according the adjustments made in either inputor internal parameters of the device in a controlled experiment. Afterwards,the adversary creates profiles for the power traces that are equally labelled.In the attacking/testing phase, the adversary relies on the previously createdtemplates to analyse yet unseen power traces and ideally reveal the huntedcryptographic key.

In [48], the claimed superiority in terms of key-prediction accuracy of TemplateAttacks [30] in the context of profiled power analysis attacks is put into testagainst stochastic methods [113]. It was concluded that stochastic methods are

52 CONTRIBUTIONS

more e�cient in the sense that they are more accurate than Template Attackswhen the amount of training data is small. On the other hand, TemplateAttacks squeeze more information about the processed cryptographic key frompower traces, thus confirming their eminence in an information-theoretic sense,provided that the size of the training data set is adequate. The reason behindthat is the accurate estimation of the statistics (probability mass functions) ofthe noise in the power traces. We therefore benchmark our machine learningapproach to power analysis against Template Attacks.

We used a relatively new machine learning technique in this first study on theapplication of machine learning techniques for power analysis. It is called LeastSquares Support Vector Machines (LS-SVM) [123]. LS-SVMs, as introducedby Suykens et al. [123], stand for a reformulation of traditional SVMs [34].The former deals with linear systems, instead of the quadratic programmingproblems involved in the latter. In addition to simplifying the functioningof standard SVMs, LS-SVMs have performed very well for a large variety ofclassification tasks [47] in comparison to well established learning techniquessuch as linear discriminant analysis [40], logistic regression [88], etc.

As it turns out, most of the flavours of machine learning techniques require tuningat some point in order to yield good results. Using power traces measured froman unprotected software implementation of a core part of the AES algorithm, weshow the impact of the LS-SVM tuning parameters on the recovery performanceof key-related pieces of information intricately hidden in the power traces.In order to create solid knowledge on the application of machine learningtechniques for power analysis, we initially consider simplified but importantone-bit prediction scenarios. Our attacks distinguish between two di�erentclasses related to the output of one AES S-Box implemented in software. Thisapproach is relevant to breed insights paving the way for the ultimate quest forrecovering the full AES key.

In summary, we find that our LS-SVM approach to power analysis is as e�ectiveas Template Attacks with respect to classification accuracy in the proposedattack scenarios, as detailed in [59]. References to our work can already befound on the literature [20, 54,55,72,101,134]. Motivated by the results of ourwork Zohner et al. [134] use SVMs to perform side-channel analysis on four outof the five cryptographic hash algorithm finalists of the SHA-3 competition [9].

STATISTICAL DIGITAL IMAGE STEGANOGRAPHY 53

4.4 Statistical Digital Image Steganography

In a digital context, steganography deals with hiding and identifying thepresence of hidden information within innocuous media. In this section, we firstemphasise the importance of the oft-forgotten steganography. In our opinion,steganography deserves much more attention especially from the academiccommunity. We subsequently summarise our contribution to digital imagesteganography through statistical restoration. Before moving ahead, however,it is worth recapping that steganography clearly di�ers from watermarking andcryptography:

“Steganography is concerned solely with the imperceptibility of a messageembedded in an unsuspected carrier, but not with the robustness of the

embedded message with respect to manipulation of the carrier.”

“Watermarking is not necessarily concerned with the imperceptibility of amessage embedded in some carrier, but typically with the robustness of the

embedded message with respect to manipulation of the carrier.”

“Cryptography, or more specifically an encryption primitive, is not concernedwith the imperceptibility of a message, but with the total unintelligibility of themessage to a third party. An authentication primitive would be concerned about

proving the truth of an entity or piece of data.”

Shannon himself considered concealment systems such as steganography to be“primarily a psychological problem” [114]. He did not investigate steganography(nor psychology, to the best of our knowledge). When contrasting steganographyto both watermarking and cryptography, research on the latter group seems to bemuch more alluring to academia and industry. The reason is straightforward andneedless of references. Watermarking and cryptography, unlike steganography,can be directly deployed in tangible commercial applications involving forexample digital rights management and electronic banking transactions – tomention only two strong funding-appealing applications. Furthermore, thecategorisation of steganography as security through obscurity, therefore opposingthe commonplace Kerckho�’s transparent security principle, does not rendersteganography any less worthy of attention anyway. Some security expertsdiscard steganography as a threat because of its level of complexity in comparisonto alternative stealth techniques (such as plain data smuggling), in additionof being conspicuous. Although steganography should be deployed cleverly, itdoes not have to be technically complex. The bottom-line, nonetheless, is thatwhilst watermarking does not fundamentally fulfil the unique steganographic

54 CONTRIBUTIONS

purpose of imperceptibility, cryptography by itself raises enormous suspicion onits incomprehensible contrived ciphertexts.

Steganography is far from being a security-related paranoia, as one may conjureup. Independently of the steganography’s popularity or public level of concern,the classified intelligence community has been constantly on guard and well-aware of sneaky steganography, fortunately. German investigators found in2011 several dozens of documents containing very serious security-threateninginformation hidden in a video in the possession of a suspect, as reported bythe newspaper Die Zeit [94] about one year later. Rumours that steganographyhas been used by those involved in the 9/11 attack in the U.S. to establishhidden communication also circulate through the Internet. Understandably,intelligence agencies cannot disclose much information on their investigationsto the public for obvious reasons. Therefore, it is di�cult to prove with datafrom real cases how relevant steganography actually is for the society. ThePurdue University has conducted research showing that criminals make useof steganography in field [3]. We believe that research on steganography is ofutmost importance. Steganography undoubtedly represents a potential weaponfor creative criminals to hide and disseminate information concerning attackplots and child pornography, for example, within natural digital images via webpages or social network web sites.

In [60] (reproduced in Part II), we empirically investigate steganography fornatural digital images. Playing the role of one who hides information, wepropose ways of hiding the largest amount of information within imageswithout severely distorting them statistically and visually. Reversing roles,we implement steganalysts based on Artificial Neural Networks attempting todetect the presence of hidden messages in our images. Our main motivation isthe approach of steganographic capacity estimation that Sakar and Manjunathpropose in [111]. Our objective is to secretly hide a larger amount of bitsof information. Likewise in [111], we embed information within the DiscreteCosine Transform (DCT) coe�cients of image blocks. The overall informationembedding process is composed of two steps: actual information embedding;and statistical compensation of the DCT coe�cients altered by the previousprocessing. A subset of DCT coe�cients is reserved for statistical compensation,thus being useless for information embedding.

Our strategy to embed more information within natural digital images than [111]concerns automatically performing part of the statistical compensation alreadyin the information embedding phase. Our approach therefore reduces thenumber of DCT coe�cients that would otherwise be required especially forstatistical compensation. As a result, additional DCT coe�cients are madeavailable for hiding extra bits of information. We adapt both informationembedding and statistical compensation methods used in [111]. The merger of

CONCLUSION 55

one of our information embedding approaches with the statistical compensationused in [111] allows for imperceptibly hiding 8.3% more bits of informationwithin natural digital images than in [111], under the analysis of our ANN-based steganalysts. Our disguised images present a relatively low average peaksignal-to-noise ratio of 37 dB, meaning that a person with a trained eye can stillcapture nuances of the hidden message within the images. Due to the first ordersimplicity of the information embedding and statistical compensation approachesused in [60], our compensated stego images are not expected to pass any advancedhigh order statistical-based steganalysis test. However, if no proper attention isgiven to steganography, then even the simplest steganographic approaches mayrepresent a powerful tool to hide and share any kind of information.

4.5 Conclusion

Machine learning – or more generally model building – techniques combinedwith technical security expertise represents a powerful tool applicable foranalysis in di�erent electronic security areas. In this chapter we summarisedour contributions on: 1) PUF Modelling Consequences; 2) Performance andSecurity of Glitch PUFs; 3) Machine Learning in Power Analysis; 4) StatisticalDigital Image Steganography. All of them had a clear level of connection tomachine learning and security analysis. The corresponding publications to eachcontribution are presented next in Part II of this dissertation.

Chapter 5

Conclusion

“Eu só quero é ser feliz, Andar tranquilamente na favela

onde eu nasci, é. E poder me orgulhar, E ter a consciência

que o pobre tem seu lugar.”

— Cidinho e Doca, Rap da Felicidade

We defend against ourselves. Security is a human need. People have beencreating all kinds of protection mechanisms since the beginning of humanhistory. This is no di�erent today. It will not be di�erent in the future. Besidessecuring our families, own lives, and belongings, another major concern regardssecuring information. In a modern context, embedded electronic devices withsensing and communication capabilities have become increasingly omnipresent.They create and process more and more information. Security remains indemand. Devastating physical attacks such as side-channel analysis imply thatcryptographic techniques must be securely implemented to actually protectinformation. Beyond existing technologies, there is always room for innovation.Physically Unclobale Functions (PUFs) convert undesirable variations that arebound to happen due to the manufacturing process of integrated circuits into auseful feature for security. Last but not least, special attention should be givento the fact that the growth in the amount of digital data generated represents apotential pathway for illegally hiding dubious information via steganography.

In this dissertation we have addressed several aspects of modern security. Webuilt a top-down reasoning starting from Freud and arriving at cutting-edgeparadigms of the Internet of Things. After thoroughly motivating our needfor security, we went through enabling capabilities of security touching upon

57




58 CONCLUSION

its people, process and technology domains. We dared to frame security as ahealth and wellbeing enabler for businesses. Even after shifting gears to moretechnical domains, our dissertation remained interdisciplinary. We covered topicsranging from information security techniques to security aspects of physicalimplementations.

Starting with the youngest, PUFs form a relatively new research field. Researchon the topic has grown more considerably in the last few years. In thisdissertation we have proposed a generic methodology for assessing the impactof modelling attacks on PUF applications (authentication and key generation).By feeding our framework with data obtained from experiments with physicallyimplemented PUF instances, we have concluded that the simple Arbiter PUF isnot secure in the presence of a modelling capable adversary, supporting previousworks using simulated challenge-response pairs. Although generating responsesby “XORing” a few Arbiter PUFs considerably complicates modelling, thepertinence of 2-XOR Arbiter PUFs for security applications has also shown tobe limited. Popular machine learning techniques su�ce to model Arbiter PUFs.We have also exposed serious security and performance issues of Glitch PUFsthat use an AES S-Box implementation as a glitch generator. Glitch PUFs maybe better o� using a more unpredictable source of glitches, ideally with a lesscomplex circuit. PUF proposals have to continue being independently proposedand analysed in order to confront security claims and provide alternativeinsights. Our framework for the analysis of consequences of PUF modelling onsecurity applications has potential to reveal interesting aspects of other PUFconstructions as well. Research on PUFs still has much to o�er and maturein certain aspects like a consensus on properties. In this dissertation, we alsoconcluded that machine learning techniques can represent a strong and genericalternative to Template Attacks for power analysis of traditional CMOS circuitsimplementing cryptographic ciphers. Side-channels will leak information on theinternals of an implementation. Further research on side-channel analysis mayrange from improving leakage models to introducing smart countermeasures.Last, we have proposed a steganographic approach to hide information withinnatural digital images. Our method embeds bits of information while restoringthe original statistics of the discrete cosine transform coe�cients of the image.Most importantly, research on steganography should remain developing towardsfinding criminal messages concealed within innocent looking media.

Security is not an easy game, not even for game theorists. Oftentimes one can notprove something is secure beyond the existing most powerful attack. On the otherhand, resistance against the strongest attack does not logically imply supremesecurity. Other times, provable security is found to be flawed. There is nogeneric solution for security. The most powerful tool available whether for goodor evil is what we call a why & will combo, i.e. motivation and perseverance.

Part II

Publication-based Chapters

59

Publication-based Chapter

Machine Learning Attacks on65nm Arbiter PUFs:Accurate Modeling posesstrict Bounds on UsabilityPublication Data

Gabriel Hospodar, Roel Maes, and Ingrid Verbauwhede. Machine LearningAttacks on 65nm Arbiter PUFs: Accurate Modeling poses strict Bounds onUsability. In 4th IEEE International Workshop on Information Forensics andSecurity (WIFS 2012), IEEE, Pages 37-42, 2012.

Contribution

• Shared authorship.

• Section 4 (Discussion) by Roel Maes.

61

Machine Learning Attacks on 65nm Arbiter

PUFs: Accurate Modeling poses strict Bounds

on Usability

Gabriel Hospodar, Roel Maes, and Ingrid Verbauwhede

ESAT/SCD-COSIC and IBBT (iMinds), KU Leuven, [email protected]

Abstract. Arbiter Physically Unclonable Functions (PUFs) have beenproposed as e�cient hardware security primitives for generating device-unique authentication responses and cryptographic keys. However, theassumed possibility of modeling their underlying challenge-responsebehavior causes uncertainty about their actual applicability. In thiswork, we apply well-known machine learning techniques on challenge-response pairs (CRPs) from 64-stage Arbiter PUFs realized in 65nmCMOS, in order to evaluate the e�ectiveness of such modeling attackson a modern silicon implementation. We show that a 90%-accuratemodel can be built from a training set of merely 500 CRPs, and that5000 CRPs are su�cient to perfectly model the PUFs. To study theimplications of these attacks, there is need for a new methodology toassess the security of PUFs su�ering from modeling. We propose such amethodology and apply it to our machine learning results, yielding strictbounds on the usability of Arbiter PUFs. We conclude that plain 64-stage Arbiter PUFs are not secure for challenge-response authentication,and the number of extractable secret key bits is limited to at most 600.

Keywords: Physically Unclonable Function (PUF), Challenge-responseauthentication, Secure key generation, Modeling, Machine Learning, Securitybounds.

1 Introduction

Implementations of classical cryptographic primitives rely heavily on the abilityto securely store secret information. Regardless of higher level abstractionsand protection mechanisms, the lowest representation of a secret is always of a

[email protected]

INTRODUCTION 63

physical nature. In modern digital implementations, this typically comes downto a binary vector stored in a silicon memory, which hence requires physicalsecurity measures. This turns out to be a non-trivial requirement since standardcryptographic techniques cannot be used any longer, and security is oftenreverted back to obscurity, e.g. by hiding secret data in a complex chip layoutor beneath dense metal layers to prevent visual scrutiny. Silicon PhysicallyUnclonable Functions [46], or PUFs, have been proposed as a physically moresecure alternative to storing secrets in a digital memory. Instead of binaryvalues, they use random nanoscale structures which occur naturally in silicondevices in order to store secrets. This o�ers a higher level of security againstphysical attacks, and additionally has the potential to be more e�cient sincecostly physical protection measures can be avoided.

A structure is said to show PUF-behavior if it e�ciently generates a responsewhen challenged, with the challenge-response behavior depending in anunpredictable and unique way on the challenge and on the considered physicalinstantiation. A number of di�erent integrated circuits (ICs) designs exhibitingPUF-behavior have been proposed. We refer to [83] for an extensive overview.A particularly interesting construction is the so-called Arbiter PUF, which isable to generate an exponential number of response bits based on the randomoutcome of a race condition on a silicon chip. Lee et al. already showed for theiroriginal Arbiter PUF implementation in 0.18µm CMOS that it was vulnerableto modeling attacks [75], which severely reduces the number of unpredictableresponse bits. In later work, more elaborate modeling attacks on Arbiter PUFs,including anti-modeling countermeasures, were proposed [97, 110]. However,these results only use data obtained from mathematical or circuit simulations,not from actual implementations, and the achieved modeling results should beconsidered in this perspective. Moreover, neither Lee et al. nor later works onArbiter PUF modeling formally specify what the implication of their modelingattacks is on the actual security of using Arbiter PUFs.

The main goal and contribution of this work is twofold:i) The susceptibility to modeling for simple and 2-XOR Arbiter PUFimplementations in 65nm CMOS is evaluated by assessing the model buildingperformance of two di�erent machine learning techniques: Artificial NeuralNetworks and Support Vector Machines. This evaluation shows how the resultsfrom Lee et al. scale for an independent implementation in a modern silicontechnology, and moreover provides a physical verification for the simulation-based modeling results.ii) The usability of a PUF in the presence of modeling attacks is evaluated forchallenge-response authentication and for secret key generation. This results inpractical security bounds for these applications. The proposed methodology isgeneric, but the presented bounds are specifically for our modeling results on

64 MACHINE LEARNING ATTACKS ON 65NM ARBITER PUFS:ACCURATE MODELING POSES STRICT BOUNDS ON USABILITY

the Arbiter PUF implementation.

This paper is organized as follows. Section 2 presents background information onArbiter PUFs and machine learning. Section 3 details the PUF implementationand the modeling experiments and results. A discussion on the implicationsof PUF modeling for security applications is put forward in Section 4. Finally,Section 5 concludes the work.

2 Background

2.1 Arbiter PUFs

Arbiter PUFs [75] are a type of silicon PUFs for which the PUF-behavior iscaused by the intrinsic manufacturing variability of the production processof ICs. They are constructed as a concatenation of stages, with each stagepassing two inputs to two outputs, either straight or crossed depending on achallenge bit. The propagation of two signals through an ¸-stage Arbiter PUF isdetermined by an ¸-bit challenge vector. By careful design, the nominal delaysof both paths are made identical. However, the e�ective delays of both pathsin a particular implementation are never exactly deterministic, but are subjectto random delay mismatch caused by the IC manufacturing variability. As aconsequence, one of both paths will propagate a signal slightly faster or slowerthan the other, depending on the considered physical implementation, anddepending on the applied challenge vector. An arbiter sitting at the end of bothpaths determines on which of the outputs a rising edge, applied simultaneouslyto both inputs, arrives first. The arbiter outputs a one-bit response accordingly.

An Arbiter PUF implementation generates 1-bit responses from ¸-bit challenges,and is hence able to produce up to 2¸ di�erent CRPs. Lee et al. [75] immediatelyrealised that these 2¸ di�erent response bits of an Arbiter PUF are notindependent, but can be modeled by an additive linear delay model with alimited number of unknown parameters. An adversary can attempt to estimatethese parameters for a particular Arbiter PUF from a set of qtrain known CRPs,e.g. using machine learning techniques. Once an accurate model of the PUF isbuilt, the remaining 2¸ ≠ qtrain response bits are not random anymore but canbe predicted by the adversary. Lee et al. successfully constructed a model oftheir 0.18µm Arbiter PUF implementation which achieved prediction successrates up to 97%. More e�cient and more accurate model building attackson Arbiter PUFs, based on di�erent modeling techniques, were subsequentlyproposed in [97, 110]. However, besides the initial work from Lee et al., alllater model building attempts use responses generated by simulated Arbiter

BACKGROUND 65

PUFs, instead of measurements from physical implementations. Although theseare valuable contributions for introducing new PUF modeling techniques andhypothesizing powerful attacks, their numerical outcomes were (to the best ofour knowledge) never verified on a modern physical implementation. One ofthe main goals of this work is to provide this physical verification.

Parallel to modeling attacks on Arbiter PUFs, countermeasures were introducedaimed at preventing modeling. Their basic idea is to disturb the linearity ofthe delay model by adding non-linear elements to the response generation, suchas feed-forward challenge bits [75] and exclusive-or (XOR) combinations ofresponses [85, 121]. Nonetheless, it was demonstrated based on simulationsof PUFs [110] that advanced machine learning techniques are still able togenerate good models after training with a larger number of CRPs. Besides thesimple Arbiter PUF as described above, we will consider the k-XOR ArbiterPUF [85,110,121] consisting of k equally challenged simple Arbiter PUFs, withthe response bit of the k-XOR Arbiter PUF being the XOR of the separate karbiter outputs.

2.2 Machine Learning (ML)

Machine learning (ML) [91] is concerned with computer algorithms thatautomatically learn a complex behavior from a limited number of observations,by trying to generalize the underlying interactions from these examples. Sincethe apparently complex challenge-response behavior of a PUF is the result ofan underlying physical system with a limited number of unknowns, appropriateML techniques could be able to learn this behavior from a relatively smalltraining set of qtrain known CRPs and use it to make accurate predictionsof unknown responses. In this work, the ML techniques of Artificial NeuralNetworks (ANN) and Support Vector Machines (SVM) are tested for theirability to model physically realized Arbiter PUFs and k-XOR Arbiter PUFs. Anadvantage of ANN and SVM is their flexibility to learn any model, as opposedto techniques assuming a prior model with unknown parameters, which aremore restrictive. In this work, the used ML techniques were heuristically tunedto give good results without introducing unnecessary complexity.

Artificial Neural Networks (ANN)

ANNs are adaptive systems formed by interconnected computing nodes calledneurons, which are typically structured in feedforward layers. A strongmotivation to use ANN is given by the Universal Approximation Theorem [57],which in short states that a two-layer feedforward ANN containing a finite


number of hidden neurons can approximate any function with arbitrary precision.However, the theorem does not hint at how to tune a network to e�cientlyreach such an approximation. The simplest form of an ANN consists of a singlelayer of neurons and is called the single layer perceptron (SLP) [107]. In eachneuron, all input vector values are weighed, summed, biased and applied to anactivation function to generate an output. In SLP training, a neuron’s weightsand bias are updated according to a linear feedback function of the predictionerror on a known training set. The training process stops when the predictionerror reaches a predefined value or a predetermined number of iterations iscompleted. SLPs are only capable of solving linearly separable classificationproblems. Multilayer ANNs are required for nonlinear problems. In this work,the multilayer ANNs are trained by the resilient backpropagation (RProp) [105]training algorithm because it o�ers fast convergence. RProp is an improvedversion of the SLP training algorithm based on the gradient descent method.The important tuning parameters that should be set to create accurate ANNmodels are: the number of layers and neurons in each layer, the activationfunction of each neuron and the training algorithm.

Support Vector Machines (SVM)

SVM is a ML technique able to learn a binary classification pattern froma set of training examples. In the learning phase, known training examplesare mapped into a higher dimensional space to relax the classification task.The learning algorithm tries to find a good separating hyperplane allowingto linearly solve classification problems that are not linearly separable in theoriginal input space. The separating hyperplane should have the largest possibledistance between input vectors belonging to di�erent classes, and the inputs withminimal distance to the separating hyperplane are called support vectors. Theseparating hyperplane is constructed with the help of two parallel supportinghyperplanes through the corresponding support vectors. The distance betweenthe supporting hyperplanes is called the margin. The basic idea of building agood SVM is to maximize the margin while minimizing the classification error.These conflicting goals are traded-o� by a regularization parameter “. TrainedSVMs rely heavily on the self inner product of the mapping function, calledkernel, evaluated respectively on the support vectors and on the challenge tobe classified. Three commonly used kernels K(·, ·) are: i) Linear: K(w, z) =zT w (no mapping – solves only linearly separable problems); ii) Radial BasisFunction (RBF): K(w, z) = exp

1≠||w≠z||2

2‡2

2; iii) Multilayer Perceptron (MLP):

K(w, z) = tanh(Ÿ1zT w + Ÿ2). The important tuning parameters for a goodSVM classifier are: “, and ‡2 (RBF) or (Ÿ1, Ÿ2) (MLP).

EXPERIMENTS AND RESULTS 67

3 Experiments and Results

3.1 PUF Implementation and Experiment Setup

The studied Arbiter PUFs were implemented in silicon using TSMC’s 65nmCMOS process. To minimize systematic bias, the layout of the delay line stagesand of the arbiter element was fully custom designed. A test board for the ICsprovides a convenient digital interface to the PUFs allowing e�cient collectionof CRPs using a standard PC. A total of 192 ICs were produced, each oneimplementing 256 simple 64-stage Arbiter PUFs as described in Section 2.1. Atnominal operating conditions, the measured response bits have a robustnessof 97% (ratio of error-free response reconstructions) and, despite the designe�ort, still exhibit a 60% bias towards zero. The uniqueness, measured as therate of di�ering bits generated by the same PUF and challenge but on di�erentICs, is 48%. The responses of 2-XOR Arbiter PUFs were obtained by XORingthe outputs of pairs of simple Arbiter PUFs from the same IC using the samechallenges. The resulting 2-XOR Arbiter PUF responses have a robustnessaround 94%, a bias towards zero around 55% and a uniqueness close to 50%.

For the machine learning attacks 20 CRP data sets were used, obtained fromfour di�erent Arbiter PUFs on five distinct ICs, with each data set containing10,000 randomly selected challenges and 10 independent measurements of everyresponse bit. A model’s performance is assessed by its success rate after trainingwith a set of qtrain CRPs. This success rate SR(qtrain) is defined as the ratioof correct response predictions for CRPs from an independent test set. In arealistic attack scenario, qtrain would be the number of CRPs an adversary needsto obtain, e.g. by eavesdropping on a protocol, in order to build a model of theused PUF. To give an idea of the complexity: all models were trained in lessthan one minute on a standard machine (dual core @ 3 GHz, 4 GB of RAM).

3.2 Modeling Attacks and Results

The performance of ML attacks on the simple Arbiter PUF was evaluated fortraining set sizes ranging from qtrain = 25 up to 5,000. In order to obtaina meaningful estimate on the results of an attacker’s model, for each valueof qtrain ten independent experiments are performed using di�erent randomlyselected training sets of size qtrain from each of the 20 data sets. Motivatedby the fact that the adversary has freedom to scrutinize the training set, andas we verified that the modeling performance depends strongly on the usedtraining CRPs, 100 di�erent models were created for each experiment. Eachmodel was respectively built and validated on subsets formed by random splits


25 50 100 250 500 1000 2500 500050

55

60

65

70

75

80

85

90

95

100

qtrain

SR(q

train

) [%

]

ANNSVMPUF RobustnessPUF Bias

(a) Simple Arbiter PUF.

2000 3000 4000 5000 6000 7000 8000 900050

55

60

65

70

75

80

85

90

95

100

qtrain

SR(q

train

) [%

]

ANNSVMPUF RobustnessPUF Bias

(b) 2-XOR Arbiter PUF.

Figure 1: Box plots of obtained SR(qtrain) of our ML attacks.

EXPERIMENTS AND RESULTS 69

of a training set containing 70% and 30% of the training CRPs. The modelwith the best validation results was selected to evaluate the success rate usinga test set of 5,000 previously unseen CRPs.

In the Arbiter PUF additive delay model, e.g. as detailed in [85, 110], theresponse is shown to be linearly dependent on the cumulative XORs of thechallenge bits, rather than on the challenge bits directly. Performing thisnonlinear operation prior to training the ML algorithms substantially improvestheir performance: the ANN models use fewer neurons and the SVM modelscount on fewer support vectors. Consequently, as the models get simpler, fewertraining CRPs are required. The results for the ANN and SVM modelingattacks on the simple Arbiter PUF are shown in Figure 1a. The used ANNsconsist of a single neuron SLP using a threshold comparator as the activationfunction. The SVM models were based on linear kernels with “ = 0.1. Thegraph shows the box plots of SR(qtrain) for both techniques over all performedexperiments on all 20 data sets. Also shown are the Arbiter PUF’s robustnessand bias which indicate practical lower and upper bounds for the achievablesuccess rates.

On average, SVM yields more accurate Arbiter PUF models than ANN forqtrain Æ 500, but ANN outperforms SVM for larger training sets. SVM achievesSR(50) ¥ 70% and for qtrain = 500, both SVM and ANN are able to predictresponses with an accuracy close to 90%. For qtrain Ø 5,000, ANN is able toperfectly model an Arbiter PUF by achieving success rates arbitrarily close tothe PUF’s robustness. The decreasing height of the box plots indicates thatthe estimation of SR(qtrain) gets more accurate as qtrain increases.

Similarly, the modeling performance of ANN and SVM on 2-XOR ArbiterPUFs is evaluated. As their behavior is more complex, more training CRPsare required for e�ective modeling and we use training set sizes ranging fromqtrain = 2,000 up to 9,000 CRPs, and 1,000 CRPs for the test set. Ten modelsare created for each experiment. The used ANNs consist of two layers withrespectively four and one neurons, and respectively using hyperbolic tangentand linear activation functions. The SVM models were based on RBF kernelswith (“ = 10, ‡2 = 3.16) for experiments with qtrain Æ 6,000 and on MLPkernels with (“ = 2.7, Ÿ1 = 0.015, Ÿ2 = ≠1.2) for qtrain > 6,000.

Figure 1b shows the box plots of the obtained SR(qtrain) for all the experimentsfor the ANN and SVM models. This shows that SVM performs better thanANN when qtrain Æ 3,000, but ANN outperforms SVM for qtrain > 3,000.ANN achieves SR(9,000) ¥ 87% and even up to 90% in certain experiments.Although this is still below the upper bound given by the PUF robustness, thesteadily rising trend of both graphs suggests that ANN and SVM possibly reachnear-perfect models respectively for qtrain ¥ 12,000 and qtrain ¥ 14,000, if a


plateau does not happen before these values. The large spread of the observedSR(qtrain) values, as shown by the long box plot whiskers, indicates that incertain cases modeling is considerably harder than in the average experiment.This observation suggests that ad hoc adjustments of the ML tuning parameterscan significantly improve the results. For k-XOR Arbiter PUFs with k > 2,the considered ML techniques perform considerably worse. ANN achievesSR(9,000) ¥ 75% for a 3-XOR Arbiter PUF. We emphasize that the modelingresults can be optimized if qtrain increases and if more specific models are created,e.g. by fine tuning the parameters for each attack or using other ML techniques.

4 Discussion

In this section, the implications of model building attacks on the security of PUF-based applications are discussed. The provided results are specifically based onour modeling results of the Arbiter PUFs, but the introduced methodology canbe applied to any type of PUF which su�ers from model building.

4.1 Implications on Challenge-Response Authentication (CRA)

We first consider the implications of modeling attacks on a PUF-based CRAscheme [99]. More e�cient and/or practical variants of this scheme have beenintroduced, but the core idea remains the same: during an enrollment phase,CRPs are collected from every device and stored in a verifier’s database; andin the verification phase, a device authenticates itself by proving that it canrecreate (almost) the same PUF responses stored by the verifier. The assumedunclonability of PUFs ensures that only enrolled devices can be authenticated.

A PUF response is not perfectly reconstructible and a verifier needs to take thisinto account by allowing a number of errors when matching the regeneratedwith the stored response bits. This is often done by forgiving bit errors, oralternatively by applying some form of error correction on the responses, upto a certain error threshold t. If t is set too low, authentic PUFs that happento have too many bit errors will be rejected, this is called false rejection; whilesetting t too high will cause non-authentic PUFs to be accepted when theirresponses happen to be too close to that of an authentic PUF, which is calledfalse acceptance. The rates of false rejections (FRR) and false acceptances(FAR) cannot be optimized simultaneously, and setting t is a careful trade-o�between security and reliability requirements. In this sense, a good performanceindicator of a PUF-based CRA scheme is the point where FAR and FRR areequal and the corresponding failure rate is called the equal error rate (EER). In

DISCUSSION 71

0 5 10 15 20 25 30 3510−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

t [bits]

Rat

e

Observed FRRExtrapolated FRRObserved FARExtrapolated FARObtained AAR for qtrain=100

Extrapolated AAR

EER

AEER

(a) FRR, FAR and AAR graphs for a simple Arbiter PUF with N = 64.

101 102 103 104102

103

104

105

106

qtrain

N [b

its]

Simple Arbiter PUF ; AEER ≤ 10−6

Simple Arbiter PUF ; AEER ≤ 10−9

2−XOR Arbiter PUF ; AEER ≤ 10−6

2−XOR Arbiter PUF ; AEER ≤ 10−9

(b) Lower bounds on N to respectively reach AEER Æ 10≠6 and Æ 10≠9.

Figure 2: PUF-based challenge-response authentication (CRA).

Figure 2a, FAR and FRR of 64 response bits obtained from our Arbiter PUFimplementations are plotted as a function of t. To determine EER, both plotsare extrapolated with a fitted binomial cumulative distribution to find theirintersection at (tEER = 11, EER = 5.0 · 10≠7).


In a modeling attack on a PUF-based CRA scheme, an adversary tries tofool the verifier into believing he possesses an enrolled PUF, while in realityhe only has a (possibly accurate) model thereof. He might have trained hismodel using eavesdropped CRPs from previous successful runs of the CRAprotocol. We define the adversary acceptance rate (AAR) as the probability thatan adversary, without direct access to an enrolled PUF, achieves a successfulauthentication. FAR is a lower bound for AAR, since an adversary can alwaystry to authenticate with an unenrolled PUF. However, in general AAR > FAR,especially if the adversary possesses an accurate model of an enrolled PUF.Figure 2a also shows the AAR for a CRA scheme using N = 64 simple ArbiterPUF response bits, considering our modeling results as described in Section 3.2.For the adversary’s PUF model, we considered the ML technique with the bestmedian success rate after being trained with qtrain = 100 random CRPs. It isclear from this graph that these modeling attacks severely reduce the security ofthe CRA scheme. If the verifier keeps allowing up to tEER = 11 bit errors thenthe probability of a successful attack becomes as high as AAR(t = 11) = 19.2%!Aware of the existence of these attacks, it is wiser to select the point whereFRR = AAR as a performance indicator for the CRA scheme. We will call thispoint the attack equal error rate (AEER) and for the considered example it liesat (tAEER = 6, AEER = 5.3 · 10≠3). We note that the actual value of AEERstrongly depends on the considered adversary. We have evaluated AEER usingthe results of our ML attacks as reported in Section 3.2, but AEER will increasewhen better modeling attacks are found.

The design parameter of a CRA scheme directly a�ecting the value of AEER isthe number N of used response bits per authentication, with AEER decreasing asN increases. In the discussed example, AEER = 5.3 ·10≠3 for N = 64. However,a reasonable security-reliability trade-o� in practice requires that AEER Æ 10≠6

down to Æ 10≠9. To obtain these bounds in the considered example, N needs tobe increased to 214 or 371 respectively. When qtrain increases, the adversary’smodel becomes more accurate and even more response bits are required toobtain practical security levels. Figure 2b shows the evolution of the lowerbounds on the required number of response bits to achieve AEER Æ 10≠6 andÆ 10≠9 respectively for increasing qtrain. A particularly pessimistic conclusionfrom this plot is the observation that N > qtrain for all considered trainingset sizes. This implies that a simple Arbiter PUF can be authenticated atmost once with a CRA scheme, since an adversary learns more than enoughresponse bits from eavesdropping on one protocol run, to build an accuratemodel which can impersonate the PUF during subsequent authentications. Anadaptive adversary, capable of building and evaluating a model during a runof the CRA scheme, might even be able to accurately impersonate an ArbiterPUF during its very first authentication attempt.

DISCUSSION 73

Figure 2b also shows the same N vs. qtrain analysis for our ML results on the2-XOR Arbiter PUF. The conclusion is not as pessimistic as for the simpleArbiter PUF since N < qtrain for all training set sizes, though not by a largefactor, indicating that the number of possible secure authentications is alsostrictly limited. Moreover, the plots from Figure 2b are lower bounds on Nbased on our ML attack results, and any improvement upon our attacks willfurther increase them.

4.2 Implications on Secure Key Generation (SKG)

In this second analysis we investigate the e�ects of modeling on the usabilityof an Arbiter PUF in a Secure Key Generation (SKG) algorithm. We referto [127] for an extensive background on how PUF responses can be consideredas fuzzy secrets from which secure keys can be extracted. In line with [127],the implications of modeling attacks on PUF-based SKG are discussed from aninformation-theoretical viewpoint.

In the following, we denote a vector of N response bits as a random variableXN = (X1, X2, . . . , XN ) with Xi a single response bit, and a subvector consist-ing of the first j bits as X(j) © (X1, . . . , Xj). By pi we mean the conditionalprobability of Xi after observing x(i≠1): pi © Pr(Xi = 1|x(i≠1)). The operatorsH(.) and I(.; .) respectively stand for entropy and mutual information, and thebinary entropy function is defined as h(p) © ≠p · log2 p ≠ (1 ≠ p) · log2(1 ≠ p).

The premise of considering PUF modeling attacks is the assumption thatdi�erent response bits generated by the same PUF are not independent. Thereal conditional probability pi cannot be learned, but it can be approximatedas p̃i © Pr(Xi = 1|x̃i), with x̃i the response bit predicted by a modeling attacktrained on x(i≠1). The unknown value of pi is bounded as h(pi) Æ h(p̃i).Moreover, p̃i will be equal to SR(i ≠ 1) or 1 ≠ SR(i ≠ 1) depending on x̃i,and in any case h(p̃i) = h(SR(i ≠ 1)). In the following, we use as values forSR(q) the linear interpolation of the median success rates from Section 3.2 ofthe best ML technique given q.

In earlier work on SKG, the secrecy capacity S(X) of a fuzzy secret X isdefined as the theoretical maximum number of secure key bits that canbe extracted from X [87], and it is shown that S(X) = I(X; X Õ) with Xand X Õ two noisy realisations of the same fuzzy secret. We calculate thismutual information bound of XN as I(XN ; X ÕN )= H(XN ) ≠ H(XN |X ÕN ) andconsider both terms separately. We expand H(XN ) as

qNi=1 H(Xi|X(i≠1)), and

H(Xi|X(i≠1)) ©q

x(i≠1) Pr(x(i≠1)) · H(Xi|x(i≠1)) =q

x(i≠1) Pr(x(i≠1)) · h(pi)Æ

qx(i≠1) Pr(x(i≠1)) · h(SR(i ≠ 1)) = h(SR(i ≠ 1)). From which it follows that:


0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000.00

0.25

0.50

0.75

1.00

N

Δ S

(XN

) [bi

t]

0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000

500

1000

1500

2000

S(XN

) [bi

t]

Δ S(XN) (Simple Arbiter PUF)Δ S(XN) (2−XOR Arbiter PUF)S(XN) (Simple Arbiter PUF)S(XN) (2−XOR Arbiter PUF)

Figure 3: Upper bounds on the secrecy capacity of Arbiter PUF responses.

H(XN ) ÆqN

i=1 h(SR(i ≠ 1)). To evaluate H(XN |X ÕN ), we assume a simplebut realistic noise model for the PUF response with bit errors occurring i.i.d.over the di�erent response bits with probability pe © Pr(Xi ”= X Õ

i). This isequivalent to a transmission over a binary symmetric channel and in that caseH(XN |X ÕN ) = N · h(pe).

Substituting both results in the secrecy capacity bound leads to S(XN ) ÆqNi=1 h(SR(i ≠ 1)) ≠ N · h(pe). An upper bound for the secrecy capacity can

thus be calculated using the success rates SR(i ≠ 1) of a PUF response modeland the bit error probability pe estimated from the PUF’s statistics. Usingour empirical results from Section 3, we calculate this bound for increasingN . The result is shown in Figure 3. Also shown is an upper bound on theincremental secrecy capacity �S(XN ), which indicates a bound on how muchS(XN ) increases by considering a single additional response bit. It is clear that�S(XN ) decreases steadily as N grows and approaches 0 for N Ø 5000. TheS(XN ) upper bound of our simple Arbiter PUF implementation reaches 600bits for N = 5000 and will not increase substantially for larger N .

We also performed the S(XN ) vs. N analysis for the 2-XOR Arbiter PUF results,considering the fact that from an information-theoretical viewpoint, �S(XN )of a 2-XOR Arbiter PUF response bit can never be larger than 2 ◊ �S(XN ) ofa simple Arbiter PUF as calculated earlier. Moreover, �S(XN ) of a single PUFresponse bit can never be larger than 1. The results are also shown in Figure 3.

CONCLUSION 75

Again, the SKG results shown in Figure 3 express rather loose upper bounds onthe number of secure key bits which can be generated in practice. First of all,S(XN ) expresses a theoretical maximum, but no e�cient algorithms are knownto reach this maximum. Secondly, we did not calculate S(XN ) exactly but onlyan upper bound thereof. Finally, any improvement upon our ML attacks willfurther decrease these upper bounds.

5 Conclusion

We have demonstrated the susceptibility of an actual 65nm CMOS Arbiter PUFimplementation to modeling attacks based on machine learning. Summarizing,even after training a model with merely a couple of dozen CRPs, it can predictresponses from simple Arbiter PUFs with a success rate significantly betterthan random guessing. After 1,000 training CRPs the prediction accuracy isalready > 90%, and after 5,000 training CRPs the prediction is perfect up tothe robustness of the PUF. For the 2-XOR Arbiter PUF a prediction accuracyclose to 90% is achieved after training with 9,000 CRPs.

Additionally, we have proposed a methodology for assessing the implicationsof modeling attacks on PUF-based security applications and applied it to ourmodeling results on Arbiter PUFs. We conclude that simple Arbiter PUFscannot be securely used for PUF-based challenge-response authentication andthe applicability of 2-XOR Arbiter PUFs is also limited. For PUF-based securekey generation, we find that the number of information-theoretically secure keybits which a simple Arbiter PUF can generate is at most about 600 and for2-XOR Arbiter PUFs at most twice that amount. Moreover, the secure keymaterial contributed by each additional CRP decreases rapidly and approacheszero after about 5,000 CRPs.

We stress that these numerical limitations on the usability of Arbiter PUFs,both for authentication as for key generation, are merely bounds. These boundswill become tighter when better modeling attacks are found, or when the PUF’srobustness decreases, e.g. as a consequence of varying temperature and supplyvoltages which we did not consider here. Future work based on our proposedmethodology will reveal tighter bounds on the actual applicability of ArbiterPUFs.


Acknowledgment

This work was supported by the Research Council K.U.Leuven: GOA TENSE(GOA/11/007), by the IAP Programme P6/26 BCRYPT of the Belgian State (BelgianScience Policy), by the European Commission through the ICT programme undercontract ICT-2007-216676 ECRYPT II, by the Flemish Government through FWOG.0550.12N and the Hercules Foundation AKUL/11/19, by the European Commissionthrough the ICT programme under contract FP7-ICT-2011-284833 PUFFIN andHINT, and FP7-ICT-2007-238811 UNIQUE. Roel Maes is funded by IWT-Flandersgrant no. 71369.

References

Please refer to the Bibliography at the end of the dissertation.


Performance and SecurityEvaluation of AESS-Box-basedGlitch PUFs on FPGAsPublication Data

Dai Yamamoto, Gabriel Hospodar, Roel Maes, and Ingrid Verbauwhede.Performance and Security Evaluation of AES S-Box-based Glitch PUFson FPGAs, In International Conference on Security, Privacy and AppliedCryptography Engineering (SPACE 2012), Lecture Notes in Computer Science,Springer-Verlag, Pages 45-62, 2012.

Contribution

• Data analysis and insights together with Dai Yamamoto.

• Contributed to all parts of the text.

• Implementations made by Dai Yamamoto.

77

Performance and Security Evaluation of AES

S-Box-based Glitch PUFs on FPGAs

Dai Yamamoto1,2, Gabriel Hospodar1, Roel Maes1, and Ingrid Verbauwhede1

1KU Leuven ESAT/SCD-COSIC and IBBT (iMinds)Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium

[email protected] LABORATORIES LTD.

4-1-1, Kamikodanaka, Nakahara-ku, Kawasaki, 211-8588, [email protected]

Abstract. Physical(ly) Unclonable Functions (PUFs) are expected torepresent a solution for secure ID generation, authentication, and otherimportant security applications. Researchers have developed severalkinds of PUFs and self-evaluated them to demonstrate their advantages.However, both performance and security aspects of some proposalshave not been thoroughly and independently evaluated. Third-partyevaluation is important to discuss whether a proposal performs accordingto what the developers claim, regardless of any accidental bias. In thispaper, we focus on Glitch PUFs (GPUFs) that use an AES S-Boximplementation as a glitch generator, as proposed by Suzuki et al. [124].They claim that this GPUF is one of the most practically feasibleand secure delay-based PUFs. However, it has not been evaluated byother researchers yet. We evaluate GPUFs implemented on FPGAs andpresent three novel results. First, we clarify that the total number ofchallenge-response pairs of GPUFs is 219, instead of 211. Second, weshow that a GPUF implementation has low robustness against voltagevariation. Third, we point out that the GPUF has “weak” challengesleading to responses that can be more easily predictable than othersby an adversary. Our results indicate that GPUFs that use the AESS-Box as the glitch generator present almost no PUF-behavior as bothreliability and uniqueness are relatively low. In conclusion, our casestudy on FPGAs suggests that GPUFs should not use the AES S-Boxas a glitch generator due to performance and security reasons.

Keywords: Glitch PUF, FPGA, Security, Performance, Key Generation,Authentication.

[email protected]

INTRODUCTION 79

1 Introduction

Secure identification/authentication technology using integrated circuits (ICs) isvery important for a secure information infrastructure. One is often concernedwith finding solutions for anti-counterfeiting devices on medical supplies, prepaid-cards and public ID cards such as passports and driver’s licenses. The IC cardis a well-known solution for this kind of application. Counterfeiting is preventedby storing a secret key on the IC card and using a secure cryptographic protocolto make the key invisible to the outside. In theory, however, the possibility ofcounterfeiting still remains if the IC design is revealed and reproduced. Recently,interest has been focused on Physical(ly) Unclonable Functions (PUFs) as asolution to the aforementioned issue [98]. In a PUF realized in an IC (siliconPUF), the output value (response) to the input value (challenge) is unique foreach individual IC. This uniqueness is provided by random process variationsthat occur in the manufacturing process of each IC [46] [45]. It is expectedthat PUFs will represent a breakthrough in technology for anti-counterfeitingdevices through its use for ID generation, key generation and authenticationprotocols, making cloning impossible even when the design is revealed.

The silicon PUFs are basically classified into two categories [79]. One usesthe characteristics of memory cells such as SRAM-PUFs [52] [56], ButterflyPUFs [71], Flip-flop PUFs [81], Mecca PUFs [70] and Latch PUFs [119] [120].The other uses the characteristics of delay variations such as Ring OscillatorPUFs [121], Arbiter PUFs [74] and Glitch PUFs (GPUFs) [124]. This paperfocuses on the latter. Ring Oscillator PUFs derive entropy from the di�erencein oscillator frequencies. Arbiter PUFs have an arbiter circuit that generates aresponse determined by the di�erence in the signal delay between two paths setby a challenge. However, a machine learning attack can predict responses ofArbiter PUFs by using a number of challenge-response pairs (CRPs), as it hasbeen shown that the relationship between challenges and responses is linear [110].The GPUF [124] was proposed to solve this problem of ease of prediction. Aglitch is a pulse of short duration which may occur before the signal settles toa value. The GPUF generates a one-bit response by using the parity of thenumber of glitches obtained from an 8-bit AES S-Box implementation usedas a glitch generator. Part of the challenges correspond to 8-bit inputs to theS-Box. Since the response to challenges behaves like a non-linear function, thedevelopers claim that machine learning attacks are prevented.

Although PUF developers evaluate their proposals themselves, some of themmay either accidentally exaggerate on good results or not mention undesirableones. Hence it is quite important not only to propose and evaluate new PUFs,but also to get the proposals evaluated and analyzed by third-party researchers.

80 PERFORMANCE AND SECURITY EVALUATION OF AES S-BOX-BASEDGLITCH PUFS ON FPGAS

Our Contributions

In this paper, we evaluate both performance and security aspects of the GPUFdeveloped by Suzuki et al. [124] (i.e. “developers”) implemented on FPGAs.The reason why we focus on this PUF is that it is one of the most feasible andsecure delay-based PUFs because of the resistance against machine learningattacks. However, it has not been evaluated by other researchers yet. Ourmain contribution consists of three parts. First, we propose a general methodto generate responses because the original paper is somewhat obscure aboutit. To the best of our knowledge, the developers used only 28 challenges asinput to the 8-bit AES S-Box glitch generator. Hence they relied on a total of256 ◊ 8 = 2, 048 responses since the AES S-Box has 8 1-bit outputs. We pointout that glitches normally appear when an 8-bit input value of the S-Box istransitioned from one value to another. The glitches thus depend on the inputvalues both before and after the transition. Consequently, a GPUF based on an8-bit AES S-Box has 256 ◊ 256 ◊ 8 = 219 CRPs. It means that the performanceresults presented by the developers are insu�cient as they evaluated only asubset of all CRPs. Second, we evaluate the performance of GPUFs using allCRPs. We clarify that both reliability and uniqueness strongly depend on theHamming distance between the AES S-Box input values before and after thetransition. Therefore, GPUF designers have to carefully select the set of CRPsmeeting their security requirements, which increases design costs. Additionally,if the supply voltages are changed within the rated voltage range of FPGAs(1.14V ≥ 1.26V), GPUFs present low reliability – meaning that the intra-chipvariation is greater than 30%. This value exceeds the error correction rangewhen using a Fuzzy Extractor with a reasonable size of redundant data. Thisindicates that GPUFs present almost no PUF-behavior. Third, we analyzethe security of GPUFs. If the AES S-Box input value after the transition ischosen to be one out of 16 specific values, then the number of glitches is almostzero regardless of the input value before the transition. AES S-Box-basedGPUFs have “weak” challenges (like a weak key for a block cipher) leading toresponses that are more easily predictable than others by an attacker, whichcould compromise the whole security of a GPUF-based system.

Organization of the Paper

The rest of the paper is organized as follows. Section 2 gives an outline ofthe original GPUF proposed by the developers, and our proposed method togenerate responses using all CRPs. Section 3 evaluates the performance of theGPUF implemented on an FPGA platform. We evaluate both reliability anduniqueness in various voltages. Section 4 evaluates the security of the GPUF,

GLITCH PUF 81

and discusses weak challenges that should not be used. Finally, in Section 5 wesummarize our work and comment on future directions.

2 Glitch PUF

2.1 Original GPUF Proposal by Suzuki et al. [124] [115]

Di�erent GPUFs have been proposed until now. In 2008, Crouch et al. [36] [100]first proposed the concept of extracting a unique digital identification usingglitches obtained from a 32-bit combinational multiplier. In 2010, Anderson [14]proposed a glitch-based PUF design specifically targeted for FPGAs. ThisGPUF generates a one-bit response based on the delay di�erences between twomultiplexer (MUX) chains. Then, a new glitch-based PUF using one AES S-Boxas a glitch generator was proposed in 2010 [124], and improved in 2012 [115]by Suzuki et al. In this paper, we focus only on the third GPUF proposal(and refer to it as only GPUF) because of its good performance, good securityfeatures – such as resistance against machine learning attacks, and practicaladvantages as it can be implemented on ASIC and FPGA platforms, as claimedby the authors. Figure 1 presents this GPUF. It uses one 8-bit AES S-Boxbased on composite Galois field as a glitch generator. The challenge input to theGPUF has 11 bits and is composed of two parts. The first part of the challengecontains 8 bits inputted from the data registers to the AES S-Box. Each of the8 output bits of the S-Box generates a di�erent number of glitches due to thecomplicated non-linearity of the AES S-Box implementation. The second partof the challenge contains 3 bits to select one out of the 8 AES S-Box output bits.A toggle flip-flop (TFF) eventually outputs the GPUF response by evaluatingthe parity of the number of glitches that appear in the selected AES S-Boxoutput bit. To the best of our knowledge, the developers have evaluated 211

CRPs. The masking scheme is used to select stable challenges that output thesame responses at normal operating condition (room temperature and standardsupply voltage) most of the times. For each challenge, the developers evaluatedits response 10 times. A challenge was considered stable if all 10 responses wereequal. According to their strict methodology, challenges yielding at least onedi�erent response were discarded.

2.2 Our Response Generation Method

In this paper, glitches appear right after the first 8-bit part of the challenge istransitioned from one value (previous 8-bit challenge: Cp) to another (current


Figure 1: Glitch PUF.

8-bit challenge: Cc). Figure 2 depicts a conceptual explanation of two cases.For example, for the same value of Cc (e.g. 31), the number of glitches arerespectively 5 or 2 for Cp equal to 246 or 97. Actually, the number of glitchesstrongly depends on both Cp and Cc according to our experiments (details inSection 3). Therefore, we claim that the first part of the GPUF challenge hasnot 8, but 16 bits (8 bits from Cp and 8 bits from Cc). The combination ofall values of Cp and Cc leads to 256 ◊ 256 = 65, 536 CRPs per S-Box outputbit. However, if both Cp and Cc are equal, then no glitch occurs since there isno bit transition, making the responses always equal to zero. Thus, the validnumber of CRPs is reduced to 256 ◊ 255 = 65, 280. As the second part ofthe challenge has 3 bits, the AES S-Box-based GPUF has in fact a total of65, 280 ◊ 23 = 522, 240 CRPs.

Figure 2: Number of glitches with respect to Cp and Cc.

PERFORMANCE EVALUATION 83

3 Performance Evaluation

3.1 Experimental Environment

Figure 3 shows our experimental evaluation system, which uses a Spartan-3Estarter kit board [5] with a Xilinx Spartan-3E FPGA (XC3S500E-4FG320C) anda custom-made expansion board with a Xilinx Spartan-6 FPGA (XC6SLX16-2CSG324C). The developers implemented both peripheral circuits such as theblock RAM, RS232C module and GPUF circuit on the same FPGA chip. Incontrast, we implement the peripheral circuits separately on a Spartan-3E(SP3E) FPGA, and the GPUF circuit on a Spartan-6 (SP6) FPGA. Suchconfiguration enables us to change only the core voltage of the SP6 FPGAchip. The voltage change does not impact the peripheral circuits and doesnot cause data garbling, which enhances the confidence of our experimentalresults. An SP6 FPGA chip is put on a socket of the expansion board, beingtherefore easily replaceable by another chip. A programmable ROM (PROM)is implemented on the expansion board, allowing us to download our circuitdesign on the PROM through a JTAG port. The core voltage of an SP6 chipcan be changed by 0.01V using a stabilized power supply. The two boards areconnected with user I/O interfaces through a connector. The clock signal isprovided from the SP3E to the SP6 through a SMA cable and port in order toprevent signal degradation. A micro SD adapter and card are also connected tothe SP3E board to store the responses from the GPUF. We evaluate 20 GPUFsimplemented on 20 SP6 FPGA chips.

Figure 4 shows the details of our circuit designs realized on the SP3E andSP6 FPGA chips. The AES S-Box implementation based on composite Galoisfield techniques was obtained from the RTL code from [95]. A 50-MHz clocksignal generated by an on-board oscillator is applied to a Digital Clock Manager(DCM) primitive yielding a 2.5-MHz clock signal that is applied to the GPUF.The data acquisition process is as follows. When the RS232C module fromthe SP3E chip receives a start command from a user PC, the module sendsa start signal to the CTRL module. The module initializes the values of Cpand Cc to zero, and stores them into two registers dedicated for Cp (P1 ≥ P8)and Cc (C1 ≥ C8) on the SP6, respectively. After that, registers storing theinputs to the S-Box (R1 ≥ R8) are transitioned from Cp to Cc in one cycle.We evaluate not the parity but the actual number of glitches output from theglitch generator. This does not influence the GPUF performance. The numberof glitches is stored into eight 8-bit counters with TFFs (T1,1 ≥ T8,8). Then,the total amount of 64 bits coming from eight 8-bit counters are sent to a blockRAM on the SP3 bit-sequentially. The values of the block RAM are sent to aSD write module, and written into a micro SD card. This process is repeated


Figure 3: Experimental evaluation system.

with the same Cp and Cc 100 times as in [124] [115]. Then both Cp and Ccare incremented by 1 from 0 to 255 and the process is repeated 100 timesanalogously. Note that the responses are meaningful when Cp is not equal toCc, as mentioned in Section 2.2.

In Section 3.2, we evaluate the following performance-related figure of merits [84]of GPUFs operating at 1.20V: reliability, uniqueness, uniformity and bit-aliasing.We choose 1.20V as the standard voltage because the rated voltage range ofthe SP6 FPGA (XC6SLX16-2CSG324C) is 1.20 ± 0.06V (1.14V ≥ 1.26V). Inthe standard voltage of 1.20V our GPUF implementations present performanceresults in accordance with the developers’ ones. Later, in Section 3.3 we evaluateour GPUF implementations operating at the maximum allowed FPGA ratedvoltages of 1.14V and 1.26V.


Figure 4: Circuit design on FPGAs.

3.2 Performance at the Standard Voltage of 1.20V

The reliability and uniqueness results of our GPUF implementations are shownin Figs. 5 and 6, respectively. In order to evaluate the reliability, 101 responsesare generated per SP6 FPGA chip (see Appendix). One response is used as thereference, and the remaining are used for analysis. The response space size is65, 280 ◊ 8 bits. Figure 5 shows a histogram of normalized Hamming distancesbetween the reference response and each repeated one (i.e. 100 ◊ 20(chips) =2,000 elements). The average error rate when masking is on is approximately1.38% with a standard deviation (S.D.) of 0.11%, which is much less than the15% assumed in [78] for stable responses based on a Fuzzy Extractor with areasonable size of redundant data. Hence our result shows that the GPUFyields highly reliable responses, in accordance with the developers’ results.Next, in order to evaluate the uniqueness, a total of 20 responses using all 20FPGAs (one response per FPGA) is generated. Figure 6 shows a histogram ofnormalized Hamming distances between every combination of two responses,i.e. 20C2 = 190 combinations. This evaluation is a general way of showing theextent to which the responses of the chips are di�erent. The di�erence in theresponses of two arbitrary PUFs is approximately 39.8% with a S.D. of 1.1%when masking is on. GPUF yields responses with a lower level of uniquenessthan the ideal di�erence of 50%. This result also corresponds to the developers’one.

Next, we evaluate both the uniformity and bit-aliasing of GPUFs – a contributionthat has not been addressed by the developers in [124] [115]. The uniformityevaluates how uniform the proportion of ‘0’s and ‘1’s is in the response bits of a


0

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0 0.05 0.10 0.15

Rat

io

Normalized Hamming Distance

w/o Masking

with Masking

Figure 5: Reliability.

0

0.02

0.04

0.06

0.08

0.10

0.35 0.40 0.45 0.50

Rat

io


w/o Masking

with Masking

Figure 6: Uniqueness.

PUF. For our GPUF implementations, the average uniformity is approximately50.6% and 50.7% when masking is o� and on, respectively. Since the idealuniformity is 50%, our GPUFs satisfy the requirement for uniformity. Thebit-aliasing evaluates how di�erent the proportion of ‘0’s and ‘1’s is in the 20response bits extracted respectively from the 20 PUFs given the same challenge.The ideal bit-aliasing is also 50% with a S.D. of 0%. Figures 7 (I) and (II) showhistograms of the proportion of ‘1’s when masking is o� and on, respectively.The bit-aliasing S.D. is approximately 4.7% larger when masking is used thanwhen it is not used. This is because the masking scheme discards the responseswhose proportion of ‘1’s is around 50%. Hence Figure (II) lacks the peak of thenormal distribution. It turns out that there are many responses fixed to 0 or 1in the GPUF implementations on the 20 chips. This means that GPUFs havemany useless CRPs due to the predictability of the responses. Hence GPUFdesigners should not use all CRPs due to security reasons. This result is impliedby the low uniqueness of GPUFs as shown in Figure 6. The fact that the S.D.becomes larger when masking is being used is related to the lower uniquenessand entropy of responses, as previously mentioned by the developers.

3.3 Performance at Non-Standard Voltages (1.14V and 1.26V)

In this section, we evaluate the robustness of the GPUF against voltage variation– the reliability of GPUFs when their supply voltage is changed to 1.14V and1.26V. Figure 8 (I) shows the response error rates (see Appendix) of our GPUFimplementations in comparison to the developers’ ones. At 1.14V, our responseerror rate is approximately 35% when masking is on, di�erently from thedeveloper’s results (¥10%) [115]. A possible reason for the di�erence in theresults could have been caused by our expansion board. However, the properoperation of our expansion board was verified by implementing Latch PUFs


0

0.05

0.1

0.15

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Rat

io

Proportion of ’1’s

(I) Without Masking(Mean=50.6%, S.D.=19.0%).

0

0.05

0.1

0.15

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Rat

io

Proportion of ’1’s

(II) With Masking(Mean=50.8%, S.D.=23.7%).

Figure 7: Bit-aliasing.

on the SP6 FPGAs and confirming that the response error rates are less than15% even when changes in the supply voltage occur. Consequently, accordingto our evaluation, the robustness against voltage variation of GPUFs is muchlower than the one provided by the developers. This is partly because theyevaluated only 256 ◊ 8 CRPs, while we consider all 256 ◊ 255 ◊ 8 CRPs. Infact, if we choose only 256 ◊ 8 CRPs satisfying the following two conditions:the Hamming distance between Cp and Cc being equal to 1 (HD(Cp, Cc)=1),and the di�erent bit position being the least significant bit, then the robustnessagainst voltage variation becomes remarkably better than the developers’ results,as shown in Figure 8 (II). In the following, we discuss the relationship betweenboth reliability and uniqueness to the CRPs. The CRPs are divided into 8groups based either on each value of HD(Cp, Cc) (excluding HD(Cp, Cc) = 0,as no glitches occur) or on which S-Box bit is used to generate a response.

Ours-w/o MaskingOurs-with MaskingDevelopers -w/o MaskingDevelopers-with Masking

0

5

10

15

20

25

30

35

40

45

50

1.14 1.20 1.26

Res

ponse

Err

or

Rat

e [%

]

Voltage (V)

(I) Our vs. developers’ results.

0

5

10

15

20

25

30

35

40

45

50

1.14 1.20 1.26

Res

ponse

Err

or

Rat

e [%

]

Voltage (V)

Ours-w/o Masking (HD=1)Ours-with Masking (HD=1)

(II) Our results for HD(Cp, Cc) = 1.

Figure 8: Response error rates against various voltages.


Figures 9 (I)-(III) and (IV) show the response error rates (reliability) and theuniqueness of CRPs extracted from each S-Box bit (S-Box[0] ≥ S-Box[7]),respectively. The reliability is evaluated at three voltages (1.14V, 1.20V and1.26V), while the uniqueness is evaluated only at the standard voltage of1.20V. The results when masking is on and o� are shown in the left and righthistograms, respectively. At 1.14V and 1.26V, the reliability ranges from 30to 40% depending on the S-Box bit even when masking is on. At 1.20V, theuniqueness ranges from 35 to 45% also depending on the S-Box bit whenmasking is on. The reliability and uniqueness distributions are thus close toeach other, possibly overlapping. Therefore, our GPUF implementations showalmost no PUF behavior as an authentication protocol free of errors cannot beimplemented. As both reliability and uniqueness of GPUFs strongly depend onthe S-Box bits used to generate responses, if GPUFs are used for key generation,suitable challenges should be carefully chosen based on security requirements.

Figures 10 (I) and (II) show the reliability and uniqueness of CRPs with respectto HD(Cp, Cc) = 1, . . . , 8, respectively. Due to space constraints, we show: theaverage of 8 results (from S-Box[0] to S-Box[7]), the results of S-Box[2] (lowestreliability), and the results of S-Box[7] (highest reliability), as shown in Figure9 (I). The smaller HD(Cp, Cc) is, the higher the reliability is, and the lower theuniqueness is. This is because if HD(Cp, Cc) is small, the number of changedbits in the S-Box is also small. As a result, the transition from Cp to Cc haslittle influence on the generation of glitches. As GPUFs perform di�erentlywith regard to HD(Cp, Cc), the need for a designer to select appropriate CRPsmeeting a system’s requirement leads to an additional increase in the design cost.The reliability at 1.20V can be dramatically enhanced by using the maskingscheme proposed by the developers. However, the reliability cannot be enhancede�ectively at 1.14V and 1.26V using the masking scheme. Consequently, there isno correlation between unstable CRPs at 1.20V and at 1.14V or 1.26V. GPUFdesigners should thus remove, i.e. mask, CRPs that are unstable not only at1.20V but also at 1.14V and 1.26V. However, this is not realistic and practical.Such solution not only increases the manufacturing costs as well, but alsoreduces the number of CRPs, which causes loss of information entropy in theresponses.

Finally, we evaluate the side e�ects of using the masking scheme: how manyresponses are unstable and therefore discarded. The three types of bar graphs inFigure 11 show the number of stable responses in three cases: without masking(all responses), with masking at 1.20V (stable responses at 1.20V) and withmasking at three voltages (stable responses at 1.14V, 1.20V and 1.26V). In fact,the third case is not realistic and practical since the masking processes at allvoltages have to be applied. We, however, show this case to evaluate the actualnumber of valid and stable CRPs in the GPUF. The line graphs in Figure 11


0.25 0.30 0.35 0.40 0.45 0.50


0

0.1

0.2

0.3

0

0.1

0.2

0.3

0

0.1

0.2

0.3

0

0.1

0.2

0.3

0

0.1

0.2

0.3

0

0.1

0.2

0.3

0

0.1

0.2

0.3

0

0.1

0.2

0.3

Rat

ioR

atio

Rat

ioR

atio

Rat

ioR

atio

Rat

ioR

atio

0.25 0.30 0.35 0.40 0.45 0.50


S-Box[0]

S-Box[1]

S-Box[2]

S-Box[3]

S-Box[4]

S-Box[5]

S-Box[6]

S-Box[7]

(I) Reliability at1.14V.

Normalized Hamming Distance 0 0.05 0.10 0.15 0.20

0

0.3

0.6

Rat

io

0

0.3

0.6R

atio

0

0.3

0.6

Rat

io

0

0.3

0.6

Rat

io

0

0.3

0.6

Rat

io

0

0.3

0.6

Rat

io

0

0.3

0.6

Rat

io

0

0.3

0.6

Rat

io

Normalized Hamming Distance 0 0.05 0.10 0.15 0.20

S-Box[0]

S-Box[1]

S-Box[2]

S-Box[3]

S-Box[4]

S-Box[5]

S-Box[6]

S-Box[7]

(II) Reliability at1.20V.

0.25 0.30 0.35 0.40 0.45 0.50


0

0.1

0.2

0.3

0

0.1

0.2

0.3

0

0.1

0.2

0.3

0

0.1

0.2

0.3

0

0.1

0.2

0.3

0

0.1

0.2

0.3

0

0.1

0.2

0.3

0

0.1

0.2

0.3

Rat

ioR

atio

Rat

ioR

atio

Rat

ioR

atio

Rat

ioR

atio

0.25 0.30 0.35 0.40 0.45 0.50


S-Box[0]

S-Box[1]

S-Box[2]

S-Box[3]

S-Box[4]

S-Box[5]

S-Box[6]

S-Box[7]

(III) Reliability at1.26V.

Normalized Hamming Distance 0.30 0.35 0.40 0.45 0.50

0

0.1

0.2

Rat

io

0

0.1

0.2

Rat

io

0

0.1

0.2

Rat

io

0

0.1

0.2

Rat

io

0

0.1

0.2

Rat

io

0

0.1

0.2

Rat

io

0

0.1

0.2

Rat

io

0

0.1

0.2

Rat

io

Normalized Hamming Distance 0.30 0.35 0.40 0.45 0.50

S-Box[0]

S-Box[1]

S-Box[2]

S-Box[3]

S-Box[4]

S-Box[5]

S-Box[6]

S-Box[7]

(IV) Uniqueness at1.20V.

Figure 9: Reliability and Uniqueness vs. S-Box output bits (left histograms:with masking, right: w/o masking).

0

10

20

30

40

50

60

Res

ponse

Err

or

Rat

e [%

]

Average

HD

=1

HD

=2

HD

=3

HD

=4

HD

=5

HD

=6

HD

=7

HD

=8

S-Box[2]

HD

=1

HD

=2

HD

=3

HD

=4

HD

=5

HD

=6

HD

=7

HD

=8

S-Box[7]

HD

=1

HD

=2

HD

=3

HD

=4

HD

=5

HD

=6

HD

=7

HD

=8

1.14V - w/o Masking1.20V - w/o Masking1.26V - w/o Masking

1.14V - with Masking1.20V - with Masking1.26V - with Masking

(I) Reliability.

0

10

20

30

40

50

60

No

rmal

ized

Ham

min

g D

ista

nce

[%

]

Average

HD

=1

HD

=2

HD

=3

HD

=4

HD

=5

HD

=6

HD

=7

HD

=8

S-Box[2]

HD

=1

HD

=2

HD

=3

HD

=4

HD

=5

HD

=6

HD

=7

HD

=8

S-Box[7]

HD

=1

HD

=2

HD

=3

HD

=4

HD

=5

HD

=6

HD

=7

HD

=8

1.20V - w/o Masking

1.20V - with Masking

(II) Uniqueness.

Figure 10: Reliability and Uniqueness vs. HD(Cp, Cc).


show the ratio of stable responses in each group of HD(Cp, Cc). We show oncemore the average results for S-Box[2] and S-Box[7] due to space constraints,over the 20 SP6 FPGAs. The larger HD(Cp, Cc) is, the lower the ratio of stableresponses is (i.e. the larger the number of discarded responses is). That is whythe larger HD(Cp, Cc) is, the higher the response error rate is, as shown inFigure 10 (I). Also, there are large gaps between the two lines in Figs. 11 (I)-(III). This means that the stable responses at 1.20V are not always stable at 1.14Vand 1.26V. Hence the response error rate is high and the voltage resistanceof the GPUF is quite low, as shown in Figure 10 (I). By comparing Figure11 (II) and Figure 11 (III), the lower the reliability is, the larger the numberof discarded responses is. Out of a total of 65,280 ◊ 8 responses, the ratiosof stable responses at 1.20V and at the three voltages are 61.7% and 30.1%,respectively. Consequently, GPUFs have in fact a number of useless CRPs thatshould be removed by the masking scheme. This masking reduces the totalnumber of CRPs or the total pattern of keys generated by multiple GPUFs.The low total number of CRPs or keys might facilitate an attacker to succeed inher modeling attack. In conclusion, our GPUFs implemented on FPGAs have alow robustness against voltage variation according to our evaluation results. Inaddition, both reliability and uniqueness strongly depend on the selected CRPs.

4 Security Analysis

In this section, we evaluate the security of AES S-Box-based GPUFs. Concretely,we clarify that the GPUF has “weak” challenges that are associated with moreeasily predictable responses. Figure 12 depicts the number of glitches generatedfrom S-Box[6] on a single specific chip (i=1). This figure represents a 256 ◊ 256matrix, where the horizontal axis represents Cp and the vertical axis representsCc. Each element is colored from black to gray according to the number ofglitches. For example, there are less glitches (¥ 0 ≥ 1) when we choose achallenge corresponding to a black element. The response is unstable when wechoose a challenge corresponding to a white element. Note that the elementmeans not the parity but the number of glitches. Naturally, a black diagonalline can be observed in this figure because no glitch occurs when both Cp andCc are equal. Note that there are also a few black “horizontal lines”, marked byarrows (A1 ≥ A8). All 20 chips present the same pattern of lines. This meansthat some values of Cc lead to a small number of glitches independently of Cp.Hence if we use such values of Cc as challenges to the GPUF, then adversarieswill have the advantage of knowing that the number of glitches is small, whichmay help them succeed more easily with an attack aiming at predicting GPUFresponses.

SECURITY ANALYSIS 91

0

2

4

6

8

10

12

14

16

18

20

1 2 3 4 5 6 7 8

Nu

mb

er o

f re

spo

nse

s (x

10

3)

HD(Cp, Cc)

w/o Masking

with Masking at 1.20V

with Masking at 3 voltages

Rat

io o

f re

spo

nse

s

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

(I) Average (from S-Box[0] to S-Box[7])

0

2

4

6

8

10

12

14

16

18

20

1 2 3 4 5 6 7 8

Num

ber

of

resp

onse

s (x

10

3)

HD(Cp, Cc)

w/o Masking



Rat

io o

f re

sponse

s

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

(II) S-Box[2] (lowest reliability)

0

2

4

6

8

10

12

14

16

18

20

1 2 3 4 5 6 7 8

Num

ber

of

resp

onse

s (x

10

3)

HD(Cp, Cc)

w/o Masking



Rat

io o

f re

sponse

s

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

(III) S-Box[7] (highest reliability)

Figure 11: Number and ratio of stable responses in three cases.

The following discusses the reason why such non-secure challenges exist usingFigure 13. An AES S-Box implementation using composite field consists of threesub-parts: isomorph ”, Galois Field (GF) inverter, and a combination moduleof inverse isomorph ”≠1 and a�ne transformation. Let the 8-bit variables xand y be the input and output of the AES S-Box, respectively. Also, let the8-bit variables a and b be the outputs of the isomorph ” and the GF inverter,respectively. Our goal is to find special values of x making the 6-th output bitof the S-Box (y[6]) zero. In step 1 from Figure 13, according to the propertiesof the combination module, the output value y[6] satisfies:

y[6] = b̃[4] ü b[5] ü b[6] ü b[7].

Hence y[6] depends on the upper 4 bits of b.


Figure 12: Number of glitches (S-Box[6], Chip i=1).

Figure 13: An AES S-Box implementation using composite field.

Next, in step 2, we focus on the GF inverter. The value b[7 : 4], which representsthe four most significant bits of b, satisfies:

b[7] = tn[0] ü tn[1] ü tn[3] ü tn[4],

b[6] = tn[0] ü tn[2] ü tn[3] ü tn[5],

b[5] = tn[0] ü tn[1] ü tn[7] ü tn[8],

b[4] = tn[0] ü tn[2] ü tn[6] ü tn[7].

SECURITY ANALYSIS 93

Here, the 9-bit variable tn is an internal variable in the GF inverter. Thevariable tn satisfies:

tn[8] = (v[3]) & (a[7]),

tn[7] = (v[2] ü v[3]) & (a[6] ü a[7]),

tn[6] = (v[2]) & (a[6]),

tn[5] = (v[1] ü v[3]) & (a[5] ü a[7]),

tn[4] = (v[0] ü v[1] ü v[2] ü v[3]) & (a[4] ü a[5] ü a[6] ü a[7]),

tn[3] = (v[0] ü v[2]) & (a[4] ü a[6]),

tn[2] = (v[1]) & (a[5]),

tn[1] = (v[0] ü v[1]) & (a[4] ü a[5]),

tn[0] = (v[0]) & (a[4]).

The 4-bit variable v is an internal variable in the GF inverter. Let us focuson the 4-bit variables, a[4], a[5], a[6] and a[7], on the right-hand side of theabove-mentioned equations of tn. If the values of the 4-bit variables are all zero,tn also becomes zero. So the glitches caused by the variable v do not propagateto tn, b[7 : 4] and y[6]. Consequently, if the most significant 4 bits of a are zero,then no glitch is expected to appear in y[6].

In step 3, our goal is to find special values of x which make a[7 : 4] equal tozero. The variable a[7 : 4] satisfies:

a[7] = x[5] ü x[7],

a[6] = x[1] ü x[2] ü x[3] ü x[4] ü x[6] ü x[7],

a[5] = x[2] ü x[3] ü x[5] ü x[7],

a[4] = x[1] ü x[2] ü x[3] ü x[5] ü x[7].


Hence the following holds:

x[1] = 0,

(x[5], x[7]) = (0, 0) or (1, 1),

(x[2], x[3]) = (0, 0) or (1, 1),

(x[4], x[6]) =

Y]

[

(0, 0) or (1, 1) (if x[7] = 0),

(0, 1) or (1, 0) (if x[7] = 1).

Finally, we obtain the 16 patterns of the input x that are expected to generatealmost no glitches in the S-Box bit y[6], as shown in Table 1. They correspond tothe 16 specific values of Cc marked by the eight arrows in Figure 12. There areactually 16 black horizontal lines in Figure 12, but only eight lines correspondingto the eight arrows can be visually observed. This is because the 16 specific valuesconsist of eight pairs of consecutive numbers. In our GPUF implementations,the number of glitches whose challenges Cc and Cp are one of the 16 ◊ 255patterns is zero or one, which is smaller than for other challenges. However, theGPUF responses include zero and one with almost the same ratio. This meansthat such 16 patterns of Cc are secure if the parity of the number of glitchesis used as response. Some GPUFs implemented on other kinds of FPGAs orASICs, however, have a possibility to generate no glitch if using the above valuesof Cc. Hence we suggest that such values of Cc should not be used.

Table 1: The 16 patterns of the input x generating almost no glitches in y[6].(x[5], x[7]) (x[2], x[3]) (x[4], x[6]) x[1] x (bin.)†

x (dec.) Arrows (Fig. 12)

(0, 0) (0, 0) 0 0000000ú 0, 1 A1(1, 1) 0 0101000ú 80, 81 A3(0, 0)

(1, 1) (0, 0) 0 0000110ú 12, 13 A2(1, 1) 0 0101110ú 92, 93 A4

(0, 0) (0, 1) 0 1110000ú 224, 225 A7(1, 0) 0 1011000ú 176, 177 A5(1, 1)

(1, 1) (0, 1) 0 1110110ú 236, 237 A8(1, 0) 0 1011110ú 188, 189 A6

†Asterisks mean ‘0’ or ‘1’.

CONCLUSION 95

5 Conclusion

This paper experimentally analyzed GPUFs using a composite field-based AESS-Box implementation as a glitch generator on FPGAs. First, we clarifiedthat the number of glitches depends on both the previous and current statesof the registers dedicated to storing the challenge bits that are input to theAES S-Box. As a consequence, GPUFs have a total of 219 CRPs, which ismuch more than the 211 CRPs evaluated by the GPUF developers [124] [115].According to our experiments with 20 FPGAs, GPUFs using all 219 CRPsshowed a low robustness against voltage variation. Within the rated voltagerange of the FPGAs (1.14≥1.26V), response error rates approached 35%. Theresult exceeds the error correction range of a Fuzzy Extractor with a reasonablesize of redundant data. Our results also indicated that GPUFs present almostno PUF-behavior as both reliability and uniqueness are relatively low. Finally,we found that our GPUF implementations have 16 ◊ 255 weak challengesleading to almost no glitches regardless of the previous challenge bits stored inthe registers. In conclusion, the AES S-Box implementation using compositefield may not represent the best option for generating glitches for the GPUFdue to issues with robustness against voltage variation and easily predictableresponses.

To the best of our knowledge, other well-known AES S-Box implementations,such as sum of product (SOP), product of sum (POS), table lookup (TBL),positive polarity Reed-Miller (PPRM) [112] and 3-stage PPRM [92], are notsuitable for GPUFs either. Although SOP, POS or TBL are able to generateglitches, these implementations have larger area size than a composite field-based implementation. Hence these designs are not suitable for PUFs on ICcards with limited resources. PPRM or 3-stage PPRM are designed to reducethe power consumption by preventing the generation of glitches. These areobviously not suitable for GPUFs. Thus we suggest that the AES S-Box shouldnot be used as a glitch generator for GPUFs on FPGAs.

An ASIC implementation of the AES S-Box would probably not behave likewiseFPGAs. The performance, such as reliability and uniqueness, has a possibilityto improve if GPUFs are implemented on ASICs. Future work should include adiscussion of performance and security evaluation of GPUFs on ASICs.

Acknowledgment

This work was supported in part by the Research Council KU Leuven: GOATENSE (GOA/11/007), by the IAP Programme P6/26 BCRYPT of the Belgian


State (Belgian Science Policy), by the European Commission through the ICTprogramme under contract ICT-2007-216676 ECRYPT II, by the Flemish Governmentthrough FWO G.0550.12N and the Hercules Foundation AKUL/11/19 and by theEuropean Commission through the ICT programme under contract FP7-ICT-2011-284833 PUFFIN.

References


Appendix

In order to evaluate the reliability, we extract an n-bit (65,280 ◊ 8) referenceresponse (Ri) from i-th FPGA chip (1 Æ i Æ w, w = 20 in this work) atnormal operating condition (room temperature and standard supply voltage of1.20V). The same n-bit response is extracted at a di�erent operating condition(di�erent temperature and/or supply voltage) with a value RÕ

i. Then, m samples(m = 100 in this work, as in [124] [115]) of RÕ

i are collected. Here, RÕi,t is the

t-th (1 Æ t Æ m) sample of RÕi. For the chip i and the sample t, each data

element of the reliability histogram is calculated as follows:

HDi,t = 1n

255ÿ

Cp=0

255ÿ

Cc=0HD{RÕ

i(Cp, Cc), RÕi,t(Cp, Cc)}.

Note that we exclude the responses where Cp equals Cc because no glitchoccurs. The reliability histograms shown in Figure 5 and Figure 9 (I) include2,000 data elements, resulted from i and t. The response error rate shown inFigure 8 and Figure 10 (I) is calculated as follows:

ErrorRate = 1w · m

wÿ

i=1

mÿ

t=1HDi,t.


Machine Learning inSide-Channel Analysis: a FirstStudyPublication Data

Gabriel Hospodar, Benedikt Gierlichs, Elke De Mulder, Ingrid Verbauwhede,and Joos Vandewalle. Machine Learning in Side-Channel Analysis: a FirstStudy. In Journal of Cryptographic Engineering (JCEN), Lecture Notes inComputer Science, Springer-Verlag, Volume 1, Issue 4, Pages 293-302, 2011.

Contribution

• Main author.

97

Machine Learning in Side-Channel Analysis: a

First Study

Gabriel Hospodar, Benedikt Gierlichs, Elke De Mulder, Ingrid Verbauwhede,and Joos Vandewalle

Katholieke Universiteit Leuven, ESAT-SCD-COSIC & IBBT (iMinds)Kasteelpark Arenberg 10, B-3001, Leuven-Heverlee, Belgium

Tel.: +32 16 32 10 [email protected]

Abstract. Electronic devices may undergo attacks going beyondtraditional cryptanalysis. Side-channel analysis is an alternative attackthat exploits information leaking from physical implementations of e.g.cryptographic devices in order to discover cryptographic keys or othersecrets. This work comprehensively investigates the application of amachine learning technique in side-channel analysis. The consideredtechnique is a powerful kernel-based learning algorithm: the LeastSquares Support Vector Machine (LS-SVM). The chosen side-channelis the power consumption and the target is a software implementationof the Advanced Encryption Standard. In this study, the LS-SVMtechnique is compared to Template Attacks. The results show thatthe choice of parameters of the machine learning technique stronglyimpacts the performance of the classification. In contrast, the numberof power traces and time instants does not influence the results in thesame proportion. This e�ect can be attributed to the usage of data setswith straightforward Hamming weight leakages in this first study.

Keywords: Power analysis, Side-channel analysis, Cryptography, Supportvector machines, Machine learning.

1 Introduction

Security in electronic devices must not only rely on cryptographic algorithmsproven to be mathematically secure. An attacker may use information leakedfrom side-channels [68] resulting from physical implementations. This is calledside-channel analysis (SCA) and it may endanger the overall security of a system.

[email protected]

INTRODUCTION 99

The ever increasing demand for security on a number of applications, includingthe internet of things, financial transactions, electronic communications anddata storage, should drive designers to strongly consider the possibility ofphysical attacks in addition to attacks on cryptographic algorithms, as most ofthese applications require the use of embedded devices.

In cryptography, SCA usually aims at revealing cryptographic keys. Theunderlying hypothesis for SCA assumes that physical observables carryinformation about the internal state of a chip implementing some cryptographicalgorithm. In this context, useful key-related information can often be obtainedfrom side-channels such as: processing time [67], power consumption [68] andelectromagnetic emanation [44,103].

In this work, we focus on power analysis. Given a set of power traces measuredfrom a chip implementing a cryptographic algorithm, the ultimate goal is totell which cryptographic key has been processed internally. This problem isfrequently addressed using a divide-and-conquer approach. This approachbreaks down a problem into practically tractable sub-problems. For example,parts of the cryptographic key, also known as subkeys, may be attacked at atime. Each attack may be formulated as a classification problem intending todiscover which subkey is linked to a given power trace.

Machine learning [91] is often used to solve classification and regression problems.Machine learning concerns computer algorithms that can automatically learnfrom experience. As a number of side-channel analysis related problems canbe formulated as classification problems, it turns out that machine learningrepresents a potential, useful tool for side-channel analysis.

Machine learning techniques have already been applied in cryptanalysis [106].Furthermore, Backes et al. [18] used machine learning techniques for acousticside-channel attacks on printers. To the best of our knowledge, there has notbeen further thorough investigation on the application of machine learningtechniques in side-channel analysis.

This contribution lies in the context of profiled attacks, such as Template Attacks(TAs) [30]. These attacks comprise two phases: profiling and classification. Inmachine learning terms, these phases are respectively known as training andtesting. In TAs, multivariate Gaussian templates of noise within power tracesare generated either for all possible subkeys or for the results of some particularfunction involving them. Subsequently, power traces are classified under amaximum likelihood approach. The TA is regarded as the strongest form ofside-channel attack possible in an information theoretic sense [30]. AlthoughTA can be seen as a machine learning technique, we refer to TA and machinelearning techniques separately in this work.

100 MACHINE LEARNING IN SIDE-CHANNEL ANALYSIS: A FIRST STUDY

In our work, the machine learning technique used is the Least Squares SupportVector Machines (LS-SVMs) [123], which is a kernel-based learning algorithm.LS-SVM classifiers have achieved considerable results in comparison to otherlearning techniques, including standard SVM classifiers, for classification on 20public domain benchmark data sets [47].

We trained LS-SVM classifiers by supervised learning. In the training phase, anumber of power traces were provided to the classifier in order to teach it themost important features of the data set. In the testing phase, one unseen powertrace was presented to the classifier at a time.

To get insight on the behavior of all the LS-SVM parameters, we focused onbinary classification problems. Artificial data were created based on powertraces extracted from a software implementation of the Advanced EncryptionStandard (AES) without countermeasures. We considered one single S-Boxlookup. In this first study, there was no actual key recovery yet.

Three experiments were conducted. We first investigated the influence ofchanging the parameters of the LS-SVM classifiers in order to learn their e�ects.Analyses varying both the numbers of power traces and their components (timeinstants possibly preprocessed) were also performed. In addition, we examinedthe impact of three feature selection techniques (Pearson correlation coe�cientapproach for component selection [104], sum of squared pairwise t-di�erences(SOST) of the average signals [48] and principal component analysis (PCA) [62])and one preprocessing technique (outliers removal).

Our approaches were compared to TAs in terms of e�ectiveness in order toprovide a known reference. TAs strongly rely on parametric estimationsof Gaussian distributions. Machine learning techniques are able to bypassrestrictive assumptions about probability density distributions.

This paper is organized as follows. Section 2 describes the feature selectiontechniques used in this work. In Section 3, our selected machine learningtechnique is explained. The results of the experiments are presented in Section 4.The preprocessing technique is also explained in this section. Section 5 presentsthe conclusions. Section 6 is dedicated to future work.

2 Feature Selection

Generally, machine learning approaches make use of a preliminary step beforetackling the classification problem to be solved. This step is called featureselection. It filters out and/or preprocesses the components of a given data setin order to extract the intrinsic parts of the data containing the most relevant

FEATURE SELECTION 101

pieces of information under some criteria. The two main advantages of usingthe feature selection step are: 1) it allows for the reduction of the computationalburden of classifiers with respect to processing and memory issues; and 2) itavoids confusing or teaching the wrong features of the data to the classifier.High-dimensional data can prevent classifiers from working in practice.

Power traces have thousands or millions of samples in time, which are hereregarded as components. In this work, each component is represented by t.Many of them do not carry relevant information related to the targeted subkeys.They rather represent noise and ideally should not be presented to the classifier.

The following sections present: the Pearson correlation coe�cient approach forcomponent selection (mostly used in this work); the sum of squared pairwise t-differences (SOST) of the average signals; and the principal component analysis(PCA).

2.1 Pearson Correlation Coe�cient Approach

A straightforward way to select the N most relevant components of the powertraces concerns finding the N points which have the largest correlations withrespect to a function of the targeted subkey, as shown by Rechberger et al. [104].

The Pearson correlation coe�cient is given by

fl(t) = cov(x(t), y)var(x(t)) · var(y)

, ≠1 Æ fl(t) Æ 1,

where cov(·, ·) represents the covariance, var(·) represents the variance, x(t) isthe vector with the component t of all power traces in the training set, and y isa vector containing the target values.

2.2 SOST

Chari et al. [30] originally proposed the following method to choose the mostrelevant components of the power traces, as part of TAs. Let Mi be thestatistical average of the power traces associated to the subkey i, i = 1, . . . , S.The N most relevant components would be those at which large di�erencesshow up after computing the sum of pairwise di�erences between the averagesignals Mi.


Gierlichs et al. [48] showed that this feature selection method was not optimal.Better results were achieved using the Sum Of Squared pairwise T-di�erences(SOST) of the average signals. SOST is based on the T-Test – a statistical toolto distinguish signals. The T-Test is expressed by the ratio

T = Mi(t) ≠ Mj(t)Ú‡2

i(t)

ni+ ‡2

j(t)

nj

. (1)

The denominator of the formula weights the di�erence between Mi(t) and Mj(t)according to the variabilities ‡2

i (t) and ‡2i (t), in relation to the number of signals

ni and nj associated to the sets i and j. Equation (1) can also be seen as ananalogy to the signal-to-noise ratio, in which the di�erence between the averagesis the signal and the denominator is a measure of dispersion, being interpretedas noise. This noise may make the distinction between the distributions withaverages Mi(t) and Mj(t) hard, but it should vanish if ni and nj are large.

The most relevant components will be those at which the sum of squaredpairwise T-di�erences,

SOST(t) =Sÿ

j>i=0

Q

ccaMi(t) ≠ Mj(t)Ú

‡2i

(t)ni

+ ‡2j

(t)nj

R

ddb

2

,

presents the N highest peaks.

2.3 PCA

Principal Component Analysis (PCA) [62] is a well-known orthogonal, non-parametric transformation that provides a way to e�ciently select relevantinformation from data sets. The core idea is to compute a new basis that betterexpresses the data set, revealing its intrinsic structure.

By assuming linearity, the set of new plausible bases is significantly reducedand the problem turns into finding an appropriate change of basis. PCA alsoassumes that mean and variance are su�cient statistics.

CLASSIFICATION 103

Presuming that the directions with largest variances contain most relevantinformation, PCA sorts the transformed, most important components of avector with regard to their variances.

In addition to maximizing the signal, measured by the interclass variance, PCAalso intends to minimize the redundancy within components of the data set.This can be achieved by setting all o�-diagonal terms of the covariance matrixof the transformed data to zero, making the components uncorrelated to eachother.

After estimating the covariance matrix C of the original data set x œ RM ,where M is the number of components of x, the N < M eigenvectors relatedto the largest eigenvalues ⁄t from the eigenvector decomposition Cut = ⁄tut

should be selected. The transformed, lower dimensional variable will be givenby zt = uT

t (xt ≠ µt), t = 1, . . . , N, where µt is the mean of the t-th componentof x. The error on the new data set resulting from the dimensionality reductionis determined by

qMt=N+1 ⁄t. For the transformed variable, t does not have a

time connotation anymore.

3 Classification

The classification technique used in this work is the Least Squares SupportVector Machine (LS-SVM) [122]. LS-SVM tackles linear systems rather thansolving convex optimization problems, typically quadratic programs, as instandard support vector machines (SVM) [34]. This is done by both introducinga least squares loss function and working with equalities, instead of the intrinsicinequalities of SVM formulations. One advantage of this reformulation iscomplexity reduction.

(LS-)SVM classifiers are originally formulated to perform binary classification.In the training phase, the (LS-)SVM classifier constructs a hyperplane in ahigh dimensional space aiming to separate the data according to the di�erentclasses. This data separation should occur in such a way that the hyperplanehas the largest distance to the nearest training data points of any class. Theseparticular training data points define the so-called margin.

Let Dn = {(xk, yk) : xk œ RN , yk œ {≠1, +1}; k = 1, . . . , n} be a training set,where xk and yk are respectively the k-th input (power trace after featureselection) and output (subkey related) patterns.

The classifier in the primal weight space takes the form


y = sign[wT Ï(x) + b],

where sign(x) = ≠1 if x < 0, else sign(x) = 1, and Ï(x) : RN æ RNf mapsthe N -dimensional input space into a higher, possibly infinite, Nf -dimensionalspace. Both the weights w œ RNf and bias b œ R are parameters of the classifier.These parameters can be found by solving the following optimization problemhaving a quadratic cost function and equality constraints:

minw,b,e

J (w, e) = 12wT w + “

2

nÿ

k=1e2

k

s.t. yk[wT Ï(xk) + b] = 1 ≠ ek, k = 1, . . . , n, (2)

which is a modification of the basic SVM formulation. In Equation (2), e =[e1, . . . , en]T is a vector of error variables, tolerating misclassification, and “ isthe regularization parameter, determining the trade-o� between the margin sizemaximization and the training error minimization.

After constructing the Lagrangian,

L(w, b, e; –) = 12wT w + “

12

nÿ

k=1e2

k ≠

nÿ

k=1–k{yk[wT Ï(xk) + b] ≠ 1 + ek},

and taking the conditions for optimality, by setting

ˆLˆw

= 0,ˆLˆb

= 0,ˆLˆek

= 0,ˆLˆ–k

= 0, k = 1, . . . , n,

the classifier formulated in the dual space is given by

CLASSIFICATION 105

y(x) = signA

nÿ

k=1–kykK(x, xk) + b

B,

where K(x, xk) = Ï(x)T Ï(xk) is a positive definite kernel matrix, –k œ R arethe Lagrange multipliers, or support values. Both –k and b are the solutions ofthe following linear system

Q

a0 yT

y � + 1“ In

R

b

Q

ab

–

R

b =

Q

a0

1n

R

b ,

with 1n = (1, . . . , 1)T and �kl = ykylÏ(xk)T Ï(xl). The solution is unique whenthe matrix corresponding to the linear system has full rank.

According to Mercer’s theorem [13], a positive definite K guarantees theexistence of the feature map Ï, which is often not explicitly known.

As explained in [123], from ˆLˆek

= 0 we have –k = “ek, meaning that the supportvalues are proportional to the errors corresponding to the training data points.As –k ”= 0, k = 1, . . . , n, every data point is a support vector, implying lack ofsparseness. High –k values suggest high contributions of training data pointson the decision boundary created by the classifier to distinguish the di�erentclasses.

3.1 Practicalities

Roughly, when working with (LS-)SVMs one usually chooses the kernel Kbetween the linear kernel,

K(x, xk) = xTk x,

and the radial basis function (RBF) kernel,

K(x, xk) = exp{≠||x ≠ xk||22‡2 },


where || · ||2 is the L2-norm and ‡2 œ R+ is a parameter to be chosen. Otherkernel options, such as the polynomial kernel and the multilayer perceptron(MLP) kernel, are beyond the scope of this work.

Kernel-based models usually depend on parameters controlling both theiraccuracies and complexities. When using linear kernels, the only parameterto be tuned is “. If “ æ 0, the solution favors margin maximization, puttingless emphasis on minimizing the misclassification error. RBF kernels requiretuning an additional parameter ‡2, which is directly related to the shape of thedecision boundary.

Considering RBF kernels, both parameters “ and ‡2 should be optimized inorder for the classifier to maximize the success rates concerning the analysisof unknown power traces from the testing data set. A combination of a cross-validation and a grid search algorithm is recommended in the literature [123] forparameter tuning. Cross-validation can help preventing overfitting. However,its computation may be computationally costly.

Many classification problems have more than two classes. As (LS-)SVMs aredesigned to perform binary classification, a typical approach to cope with a multi-class problem is to split the problem up into binary classification problems usingsome coding technique. For example, a multi-class problem comprising p classesmay be broken down into Álog2 pË binary classification tasks. Subsequently, theresults of the binary classifiers should be combined in order to reconstruct avalid result for the original multi-class problem. In this work we only deal withbinary classification problems.

4 Results

This section investigates the potential that LS-SVMs have to work as robustpower traces analyzers. We chose to get started with a thorough investigationon the machine learning technique behavior. To this end, several tests wereperformed using both linear and RBF kernels. We are one step behind of theactual discovery of cryptographic subkeys. In this work, our attacks distinguishbetween 2 di�erent classes related to the output of one AES S-Box.

4.1 Experimental Settings

Preliminary tests were performed on real measurements from an implementationconsisting of the subpart of the AES algorithm composed by the XOR betweenan 8-bit subkey and the input word, followed by the application of one S-Box.

RESULTS 107

The LS-SVM supervised learning classifiers have been implemented using theLS-SVMlab1.7 [26].

Our data set contains 5 000 power traces with 2 000 components each. In thiswork, attacks distinguish only between 2 di�erent classes, in order to help usinitially build a solid understanding about the techniques involved. These 2di�erent classes were chosen in 3 ways.

The first two approaches considered the relationship between the power tracesand the internal state of the cryptographic algorithm to be represented by theHamming weight model [90]. The threshold approach divided the data set intotwo classes depending on the Hamming weights of the outputs of the S-Box(less than or greater than 4). The intercalated approach divided the data setdepending if the Hamming weights were even or odd. The third approach,named bit(4), focused on the 4-th least significant bit of the output of the S-Box,since it was the bit leaking more information. Both intercalated and bit(4)approaches were created so that their classes would not be as trivially separableas those from the threshold approach.

In the training phase, a number of inputs from the training set were providedto the classifier in order to teach it the most important features of the data set.In the testing phase, one input of the test set was presented to the classifier ata time. Success rates were calculated as percentages of correct classificationsamong the power traces from the test sets.

Section 4.2 investigates the impact that the variation of the parameters “ and‡2 have on the success rates. In Section 4.3, we analyzed the influence of varyingboth the numbers of traces and components on the classification. The mostrelevant components of the power traces were selected by the Pearson correlationcoe�cient approach in these two sections. Lastly, Section 4.4 examined whetherremoving outliers or using either the SOST or PCA feature selection techniques,instead of the Pearson correlation coe�cient approach, would increase thesuccess rates. All these sections provide a brief comparison of our results toTAs in terms of e�ectiveness.

4.2 Influence of the LS-SVM Parameters

In this part, the training and test sets comprised respectively 3 000 and 2 000power traces. At first, only 2 components, out of 2 000, were selected by pickingthose having the largest Pearson correlation coe�cients and not belonging to thesame clock cycle. Working with 2-dimensional inputs allowed visualization of thedecision boundaries created by the classifiers with respect to the components.


The parameters “ and ‡2, in case of using the RBF kernel, assumed the followingvalues: 0.1, 1 and 10. After analyzing all their combinations, we verified thatthe success rates for the threshold approach were as high as 99.3%, regardlessof the type of kernel used. This is because the classes assigned by the thresholdapproach are easily distinguishable.

The intercalated approach led to results sensitive to both the type of kernel andthe tuning parameters. When using RBF kernels, the success rates showed adirect relation to ‡2. Success rates as high as 99.0%, 94.0% and 82.7% wereachieved for ‡2 = 0.1, 1 and 10, respectively. Although “ did not influence theresults as much as ‡2, high values of “ slightly increased the results. Linearkernels yielded success rates around 49.9%, performing as well as randomguesses. The reason behind this is that a linear function cannot separatenonlinear, intercalated data.

The Bit(4) approach achieved success rates of 74.0%, regardless of both thekernel type and the values of the tuning parameters. These results, despitebeing worse than those from both the threshold and intercalated approaches,are yet clearly better than random, considering that only one out of 8 bits fromthe output of the S-Box was taken into account.

Figs. 1-3 show how the classification in the threshold, intercalated and bit(4)approaches, respectively, occurs in relation to the tuning parameters. Thehorizontal and vertical axes are respectively the two considered power tracescomponents. Figs. 1-3 (a) present the two classes for each approach: the square-shaped points belong to one class, whereas the circle-shaped points belong to theother class. Figs. 1-3 (b) concern the decision boundaries (dark lines scatteringthe space in two regions: light and dark colored) of the linear kernel-basedclassifiers for “ = 1. Varying “ for the linear kernel does not influence thedecision boundary considerably. Figs. 1-3 (c)-(f) concern the decision boundariesof the RBF kernel-based classifiers for the combinations of “ œ {0.1, 10} and‡2 œ {0.1, 10}.

We verified that underfitting arises for low values of “, whilst overfitting occursfor high values of “. This conclusion is supported by Equation (2). When usingthe RBF kernel, low values of ‡2 make the decision boundary fit the data, whilsthigh values of ‡2 spread the decision boundaries.

Particularly, the orientations of the decision boundaries shown in Figure 2 (d)for “ = 0.1 and ‡2 = 10 seem counter-intuitive. The reason behind this istwofold: 1) the low value of “ favored underfitting; and 2) the relatively highvalue of ‡2 favored the spread of the decision boundary in the wrong direction,which has been poorly chosen due to underfitting. However, as “ increases, thedirection of the decision boundaries tend to fit the orientation of the data more

RESULTS 109

Component 1

Co

mp

on

en

t 2

(a) The two classes: square- and circle-shaped.1

1

1

Component 1

Co

mp

on

en

t 2

(b) Linear kernel: “ = 1.

11

1

1

1

1

1

1

1

1

11

1

1

Component 1

Co

mp

on

en

t 2

(c) RBF kernel: “ = 0.1, ‡2 = 0.1.

Figure 1: Threshold approach: decision boundaries of the LS-SVM classifiers.(continued)


1 1

1

1

1

1

1

Component 1

Co

mp

on

en

t 2

(d) RBF kernel: “ = 0.1, ‡2 = 10.

1

1

1

1

1

1

1

1

1

1

11

1

Component 1

Co

mp

on

en

t 2

(e) RBF kernel: “ = 10, ‡2 = 0.1.

1

1

1

1

1

1

1

1

1

Component 1

Co

mp

on

en

t 2

(f) RBF kernel: “ = 10, ‡2 = 10.

Figure 1: Threshold approach: decision boundaries of the LS-SVM classifiers.

RESULTS 111

Component 1

Co

mp

on

en

t 2

(a) The two classes: square- and circle-shaped.

1

1

1

1

Component 1

Co

mp

on

en

t 2


1

1

1

1

1

1

1

1

1

1

1

111

11

1

1

1

Component 1

Co

mp

on

en

t 2

(c) RBF kernel: “ = 0.1, ‡2 = 0.1.

Figure 2: Intercalated approach: decision boundaries of the LS-SVM classifiers.(continued)


11

1

1

1

1

11

1

1

1

1 1

1

1

1 1

1

1

11

Component 1

Co

mp

on

en

t 2

(d) RBF kernel: “ = 0.1, ‡2 = 10.

1

1

1

1

1

1

1

1 1

1

1

1

1

1

1

1

1

1

1

Component 1

Co

mp

on

en

t 2

(e) RBF kernel: “ = 10, ‡2 = 0.1.

11

1

1

1

1

11

1

1

1

1

1

11

11

1

1

1

Component 1

Co

mp

on

en

t 2

(f) RBF kernel: “ = 10, ‡2 = 10.

Figure 2: Intercalated approach: decision boundaries of the LS-SVM classifiers.

RESULTS 113

Component 1

Co

mp

on

en

t 2

(a) The two classes: square- and circle-shaped.

1

1

1

1

Component 1

Co

mp

on

en

t 2


1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

11

1

1

Component 1

Co

mp

on

en

t 2

(c) RBF kernel: “ = 0.1, ‡2 = 0.1.

Figure 3: Bit(4) approach: decision boundaries of the LS-SVM classifiers.(continued)


1

11

1

1

1

1

Component 1

Co

mp

on

en

t 2

(d) RBF kernel: “ = 0.1, ‡2 = 10.

1

1

11

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

11

1

1

Component 1

Co

mp

on

en

t 2

(e) RBF kernel: “ = 10, ‡2 = 0.1.

1

1

1

1

1

1

1

1

1

1

1

1

Component 1

Co

mp

on

en

t 2

(f) RBF kernel: “ = 10, ‡2 = 10.

Figure 3: Bit(4) approach: decision boundaries of the LS-SVM classifiers.

RESULTS 115

suitably, as shown in Figure 2 (f) in comparison to Figure 2 (d).

TAs using 2 templates led to success rates of 99.6%, 50.3% and 73.7% forthe threshold, intercalated and bit(4) approaches, respectively. Except for theintercalated approach, results were similar to those obtained with LS-SVMsusing RBF kernels. Results on the intercalated approach were poor, as inLS-SVMs with linear kernels. The Gaussian-based templates did not fit wellthe distributions of the intercalated data. This was because the intercalateddata was actually drawn from a Gaussian mixture.

4.3 Varying the Number of Traces and Components

Since the classes defined by the threshold approach are trivially separable, thissection deals only with the intercalated and bit(4) approaches. The first partof the experiments consisted in varying the number of power traces within thetraining and test sets. Out of 5000, 4000, 3000, 2000, 1000 and 500 randomlychosen power traces, 70% were assigned to the training set and the remainingwere assigned to the test set. The parameters “ and ‡2 were kept constant. Forthe intercalated approach: “ = 10 and ‡2 = 1. For the bit(4) approach: “ = 1and ‡2 = 0.1.

Interestingly enough, the results did not vary noticeably when contrasted withthose from Section 4.2. However, in the bit(4) case, the success rates droppedapproximately 10% when using the minimum number of power traces, though.For these first experiments with binary classification problems a relatively smallamount of power traces seemed to be enough, as far as the relevant componentsof the power traces have been selected. TAs behaved likewise.

The second part of the experiments in this section attempted to raise the successrates (SR) in the bit(4) approach by increasing the number of components.Table 1 shows the result for “ = 1 and ‡ = 0.1 using respectively 3 500 and1 500 power traces for the training and test sets.

When using the RBF kernel, the success rates dropped as more componentswere included. It means that the added components did not bring additional,valuable information to the classifier. On the other hand, they representednoise, making the classification task even harder. The contemplation of a searchmethod for more suitable parameters may help improve the results. In boththe linear kernel and TA cases, the success rates did not vary as a function ofthe considered number of components.


Table 1: Success rates (SR) for the bit(4) approach.

Method Number of components SR (%)LS-SVM: RBF 3 74.7LS-SVM: RBF 4 67.1LS-SVM: RBF 6 52.7LS-SVM: Linear 3 75.5LS-SVM: Linear 4 75.5LS-SVM: Linear 6 75.1TA 3 73.0TA 4 75.8TA 6 75.0

4.4 Outliers Removal, SOST and PCA

In this part, we examined whether removing outliers or using either the SOST orPCA feature selection techniques, instead of the Pearson correlation coe�cientapproach, would increase the success rates.

Traces were drawn as outliers if the value of one of its selected componentssubtracted by the mean of this component yielded a value larger than 2.7 timesthe related standard deviation. The threshold 2.7 was chosen ad hoc. It assured99.3% of data coverage. Still focusing on the bit(4) approach, this preprocessingtechnique did not improve the previous results.

Concerning the SOST, even with this feature selection technique not choosingexactly the same components as those from the previous section, the successrates also did not change in comparison to the prior results.

The scenario did not change remarkably when using PCA either. By keepingthe features representing more than 98.0% of the variance of all the features, theresults remained similar to the ones from Section 4.2. When saving respectively98.0%, 99.0% and 99.5% of the variance, the number of selected features were4, 5 and 7, respectively.

5 Conclusions

Side-channel analysis is a powerful and feasible way of attacking secure systems.In this novel work, we started a comprehensive study on the application of

CONCLUSIONS 117

machine learning in side-channel analysis. One of the main motivations to usemachine learning techniques is their outstanding results in di�erent domains.

We focused on power analysis. Power traces often leak meaningful amountsof information about the processed cryptographic key. Power traces extractedfrom a software implementation of the AES without countermeasures were usedfor initial analysis. To help us better understand the behavior of the machinelearning technique, we created three simple arrangements of class distributions:threshold, intercalated and bit(4). All of them contained two di�erent classes.

In this work, we chose the LS-SVM as our classification technique. Weperformed three experiments. Firstly, experiments examining the influenceof the parameters of the machine learning technique on the accuracy of theclassifiers were performed on data sets with obvious Hamming weight leakage.Secondly, we investigated the consequences of varying both the number oftraces and their components. Thirdly, we ran tests considering feature selection(Pearson correlation coe�cient approach, SOST and PCA) and preprocessing(outliers removal) techniques. The results were compared to the relativelysimple, strong and well-known template attacks.

The success rates obtained in the threshold approach were high regardless of thetype of kernel used, mainly because the data set was easily separable. Resultsfor the intercalated approach showed that the RBF kernel is more suitablefor nonlinear problems. Since the choice of the LS-SVM parameters directlya�ected the results, applying an automatic parameter tuning technique shouldbe considered in future works.

The influence of varying the number of power traces for training could onlybe noticed when using as few as 500 power traces, which yielded lower successrates. When increasing the number of components of the inputs of the classifiers,the results got worse due to lack of valuable information within the additionalcomponents. Likewise, the application of a preprocessing technique did notcontribute. All the feature selection techniques inspected in this work performedsimilarly.

TAs performed similarly to LS-SVM classifiers using linear kernels. Thissimilarity was clarified in the analysis on the intercalated approach, whichgenerated comparable (poor) results for both LS-SVM classifiers with linearkernels and TAs. LS-SVM classifiers with RBF kernels were able to producereasonable results in the intercalated approach. They outperformed TAs in thiscase.


6 Future Works

LS-SVM It would be interesting to look further at the functioning of kernel-based learning algorithms. It may be that other possibly tailored kernels improvethe results. As a short reminder, the kernel is responsible for mapping thedata into a feature space in order to facilitate the distinction between di�erentclasses.

E�ciency comparison Future works should include an assessment of thee�ciency of machine learning approaches in comparison to TAs. We observedthat LS-SVMs are significantly heavier than template attacks, especially in thetraining phase. This observation is likely to hold for other elaborated machinelearning techniques.

Other cryptographic algorithms and implementations Any implementationof any cryptographic algorithm may be attacked. The more attacks areperformed, the more can be learned about using machine learning techniquesin side-channel analysis. Here the attacker is free to attack either software orhardware implementations of cryptographic algorithms such as the AES, 3DES,etc.

Attacks on protected implementations A more realistic attack scenarioshould consider attacks on protected implementations. Some designers makeuse of countermeasures such as masking [33], for instance. Also, more noisydata should be considered.

Other machine learning techniques One is basically free to use any o�-the-shelf machine learning technique or even some other approach. Examples ofother machine learning techniques include artificial neural networks [22, 53].Machine learning techniques that learn in an unsupervised way (by not makinguse of any labeled data: e.g. clustering) are able to perform non-profiled attacks,which were out of the scope of this work.

Acknowledgment

This work was supported in part by the European Commission’s ECRYPT II NoE(ICT-2007-216676), by the Belgian State’s IAP program P6/26 BCRYPT, by the K.U.

FUTURE WORKS 119

Leuven-BOF (OT/06/40) and by the Research Council K.U. Leuven: GOA TENSE(GOA/11/007). Benedikt Gierlichs is a Postdoctoral Fellow of the Fund for ScientificResearch - Flanders (FWO).

References



Algorithms for Digital ImageSteganography via StatisticalRestorationPublication Data

Gabriel Hospodar, Ingrid Verbauwhede, and José Gabriel R. C. Gomes.Algorithms for Digital Image Steganography via Statistical Restoration. In 2ndJoint WIC/IEEE Symposium on Information Theory and Signal Processing inthe Benelux (WIC 2012), Pages 5-12, 2012.

Contribution

• Main author.

Note

• The technical part of this work has been performed at the UniversidadeFederal do Rio de Janeiro (COPPE/UFRJ), Brazil. An extension in termsof documentation has been done in collaboration with the KU Leuven.

• Notation and some mathematical derivations and results borrowedfrom [111] for ease of comparison and due to space constraints.

121

Algorithms for Digital Image Steganography via

Statistical Restoration

Gabriel Hospodar1,2, Ingrid Verbauwhede1, and José Gabriel R. C. Gomes2

1 ESAT/SCD-COSIC and IBBT (iMinds), Katholieke Universiteit LeuvenKasteelpark Arenberg 10, bus 2446, 3001 Heverlee, Belgium

[email protected] Programa de Engenharia Elétrica, COPPE, Univ. Federal do Rio de JaneiroCentro de Tecnologia, Ilha do Fundão, Rio de Janeiro, RJ 21941-972, Brazil

[email protected]

Abstract. Steganography is concerned with hiding informationwithout raising any suspicion about the existence of such information.Applications of steganography typically involve security. We considerthat information is embedded into Discrete Cosine Transform (DCT)coe�cients of 8◊8-pixel blocks in a natural digital image. The apparentabsence of the hidden information is guaranteed by a compensationmethod that is applied after the hiding process. The compensationmethod aims at restoring the original statistics, e.g. probability massfunction, of the DCT coe�cients from the cover image, as Sarkar andManjunath have done in [111]. In this work we propose three alternativesteganographic approaches for the hiding and compensation processesbased on [111]. Our embedding processes automatically perform partof the statistical restoration before the compensation process. Wealso propose an intuitive histogram-based compensation method. Itsoperation is similar to filling the bins of a histogram with a liquid, as ifthis liquid corresponds to probability flowing from the bins with excessof probability to the bins with deficit of probability. Classifiers basedon artificial neural networks are trained to distinguish original (cover)and information-bearing (stego) images. The results show that weimperceptibly hide 8.3% more information than [111] by combining oneof our hiding processes with the histogram-based compensation methodused in [111]. The peak signal-to-noise ratio between the compensatedstego and cover images is close to 37 dB.

Keywords: Steganography, Information hiding, Image processing, DCT,Artificial neural networks.

[email protected]

[email protected]

INTRODUCTION 123

1 Introduction

Steganography aims at hiding information in an original – cover – data insuch a way that a third party is unable to detect the presence of suchinformation by analyzing the information-bearing – stego – data. Unlikewatermarking, steganography does not intend to prevent an adversary fromremoving or modifying the hidden message that is embedded into the stego data.Steganography is particularly interesting for applications in which encryptionmay not be used to protect the communication of confidential information.

The largest amount of information that can be embedded into a cover datawithout producing either statistical or visual distortion to some extent iscalled steganographic capacity [42]. The objective of this work is to hidethe maximum amount of information within natural digital images in orderto compute the steganographic capacity. One of the motivations for this workconcerns the approach presented by Sarkar and Manjunath [111]. Likewise,we hide information within the discrete cosine transform (DCT) coe�cients ofdigital images. Part of these coe�cients are used to actually hide the message,whilst another set of coe�cients are used to statistically restore [117,118], orcompensate, the stego image. The considered statistics is the probability massfunction (PMF) of the DCT coe�cients. Though important to achieving ahigher concealment level, higher-order statistics are out of the scope of this work.However, similar analyses and considerations presented here may be applied forhigher-order statistics.

In [111], information is hidden using an even/odd hiding method in the DCTdomain. Subsequently, a histogram-based compensation process [117,118,128] isapplied. This work proposes three approaches of steganographic systems basedon [111] for digital images with a view towards hiding even more information.Our embedding processes are enhanced by automatically performing part ofthe statistical restoration before the compensation process. Furthermore, theoperation of our histogram-based compensation method is similar to filling thebins of a histogram with a liquid, as if this liquid corresponds to probabilityflowing from the bins with excess of probability to the bins with deficit ofprobability. The resulting stego images are analyzed by two steganalystsimplemented using artificial neural networks [53] aiming at identifying thepresence of any hidden content. In case an steganalyst knows the underlyingstochastic process that generates natural cover images, a hidden messagecan be detected by simply analyzing the statistics of the suspicious stegoimage. However, it is impossible in practice to characterize natural-imagestochastic processes. The steganalyst should then consider simplified statistics.Such simplification unintentionally creates breaches allowing for imperceptibleinformation hiding.

124 ALGORITHMS FOR DIGITAL IMAGE STEGANOGRAPHY VIA STATISTICALRESTORATION

The paper is organized as follows. Section 2 summarizes the approach proposedby Sarkar & Manjunath in [111]. Section 3 describes our proposals. Section 4presents the experiments and results. Section 5 concludes the paper.

2 Approach of Sarkar and Manjunath (S & M)

In the context of this work, the goal of the steganographer is to hide themaximum amount of information in such a way that the PMF of the stegoDCT coe�cients can be matched to the cover statistics. As in [111], let X bethe union of two disjoint sets H and C composed of cover DCT coe�cientsrespectively available for data hiding and statistical compensation. The ratiobetween the number of elements in H and X is called the hiding fraction⁄ = |H|/|X|. The cover set X is formed by calculating the 8 ◊ 8 DCT for allblocks of the cover image and subsequently dividing point-wise each of theseblocks by a quantization matrix subject to a quality factor. Then a frequencyband is chosen and the selected coe�cients are rounded o�, thus generatingquantized DCT coe�cients. After the hiding and compensation processes areperformed, X, H and C become Y , ‚H and ‚C, respectively. In order to concealthe maximum amount of information with the assurance that the PMF of Ycan match the PMF of X, an optimal hiding fraction ⁄opt that simultaneouslymaximizes |H| and allows for statistical compensation should be found.

Even/Odd Hiding Method The even/odd hiding method converts an elementfrom H to the nearest even or odd integer depending on the bit of informationto be hidden. If the parities of an element of H and of the bit to be embeddedin it are di�erent, the element of H is either added to or subtracted by 1 with a50% chance. Let X(i) (resp. ‚H(i)) be the subset of X (resp. ‚H) such that allits elements belong to X (resp. H) and are equal to i. The number of elementsin X(i) (resp. ‚H(i)) is represented by BX(i) (resp.B‚H(i)). It is considered that⁄ is the common hiding fraction for all possible i.

Assuming that the message to be hidden is large enough and that it hasapproximately the same number of zeros and ones a�ecting the elements inH(i), there is a 50% probability that each element of H(i) will be changed. If theelement is changed, it can either be mapped to the next or to the previous integerwith the same probability. It is also known that only a fraction ⁄ of elementsfrom each subset X(i) will be used for hiding, whereas the remaining fraction(1≠⁄) of elements will be reserved for the compensation process. Therefore, ⁄. 1

2elements of X(i) will remain unchanged in ‚H(i) and equal fractions of ⁄. 1

2 . 12

APPROACH OF SARKAR AND MANJUNATH (S & M) 125

elements of X(i) will be transferred to ‚H(i ≠ 1) and ‚H(i + 1). Based on thisanalysis [111], the number of elements in ‚H(i) is given by:

B‚H(i) ¥ ⁄.BX(i)2 + ⁄.BX(i ≠ 1)

4 + ⁄.BX(i + 1)4 .

As shown in [111], the optimal hiding fraction is given by

⁄opt = argmax⁄= |H|

|X|

{|H| = | ‚H| : BX(i) ≠ B‚H(i) Ø 0, ’i}.

As ⁄ increases, the distance between the two PMFs associated to BH and B‚Hincreases and less elements are made available for the compensation.

High frequency elements are not frequent when the DCT is applied over 8 ◊ 8blocks of a natural image. The distribution of the quantized DCT coe�cientsof a natural image presents high values in the region close to the zero frequency.However, these values rapidly decrease the more the frequencies di�er fromzero. High magnitude elements are rare and di�cult to compensate, thereforebeing unsuitable to information hiding. The elements considered in practice forhiding and compensation, i.e. to compute the histogram of the quantized DCTcoe�cients, should lie within a predefined bandwidth whose absolute valuesare less than a threshold T . Hence a higher probability of compensation for allused coe�cients is ensured. Given T , there are (2T + 1) bins in the interval[≠T, T ]. The hiding process occurs in all bins, except for the extreme bins. Itmay be di�cult to perfectly compensate the extreme bins because they haveonly a one-sided neighborhood providing resources for the compensation.

In order to find an e�ective hiding fraction ⁄ú that is suitable for all possiblevalues of i, the minimum value of

⁄i =I

BX(i)BX (i≠1)

4 + BX (i)2 + BX (i+1)

4

J, i = ≠T, · · · , T,

that is greater than zero should be used. The condition ⁄i > 0 ensures that thehiding fraction will not be reduced to zero in case there are empty bins. This cancause a di�erence between the PMFs of the quantized DCT coe�cients beforeand after the data hiding process. Such PMF mismatch is not statistically


significant, hence not contributing e�ectively to a detection attempt of thehidden information.

Let PX be the PMF of X. The fraction of elements available for the hiding andcompensation processes considering a threshold T is G(T ) =

q≠T <i<T PX(i).

The maximum hiding fraction of elements that can be used to hide a messageas a function of a certain threshold allowing for the statistical compensation iscalled the embedding rate R(T ) = ⁄ú(T ).G(T ). If T increases, G(T ) increaseswhile ⁄ú(T ) decreases. Hence the probability of finding a smaller ⁄i in aninterval [≠T, T ] rises. The optimal threshold Topt that leads to the highestachievable embedding rate Ropt for the even/odd hiding method is found bysearching for the value of T that maximizes the embedding rate R(T ).

Compensation Method The compensation method aims at restoring thestatistics of the stego image in comparison to the cover image. The usage ofDCT coe�cients for statistical compensation purposes leads to a concealmentcost. The more coe�cients are used for compensation, the less coe�cients willbe available for information hiding. In [111,118], the authors use a minimummean squared error (MMSE)-based compensation method [128] to change thedistribution of C in order to approximate the PMF of the stego coe�cients tothe PMF of the cover coe�cients.

3 Our Approaches

3.1 Proposal A

Hiding Method The hiding process can be implemented in such a way thatit automatically helps the compensation process. This hiding method aims atimproving the standard even/odd hiding method. The even/odd hiding methodfrom [111] randomly decides whether a coe�cient is added to or subtracted by1 should the coe�cient be changed. This random decision does not necessarilyhelp the compensation process. When the modification of the parity of a hostcoe�cient is required, it can be inferred that there are cases in which it is moreinteresting either to use the addition or subtraction operation specifically. Arandom decision about the choice of the operation to be performed, accordingto this remark, may be not-optimal.

We propose using a history of modifications for each value in [≠Topt, Topt]of the available DCT coe�cients for hiding. The history of modificationshelps the hiding process decide whether a DCT coe�cient should be added

OUR APPROACHES 127

to or subtracted by 1, if it has to be changed. This procedure intends toautomatically compensate the coe�cients during the data hiding process. Thehistory of modifications can be represented as a vector with length 2Topt + 1.The first position of the vector represents the history of modifications of theDCT coe�cients equal to ≠Topt. The second position of this vector concerns theDCT coe�cients equal to ≠Topt + 1 and so on. The (2Topt + 1)-th position ofthe vector concerns the DCT coe�cients equal to Topt. Initially, the positions ofthe modification history vector are set to the initial quantities of each coe�cientavailable for hiding. If a coe�cient with value a is changed to b, the hidingprocess will subtract 1 in the position of the history of modifications related tothe coe�cients with value a, because the deficit of elements with value a willincrease. Simultaneously, the hiding process will add 1 in the position of thehistory of modifications related to coe�cients with value b, because there willbe a new coe�cient with value b. In order to help the compensation process,the hiding process shall change a to a ≠ 1 if the deficit of elements with valuea ≠ 1 is smaller than the deficit of elements with value a + 1. If the values in themodification history vector related to the coe�cients a ≠ 1 and a + 1 are equal,the coe�cient a is be randomly changed either to a ≠ 1 or to a + 1. The historyof modifications gets more meaningful as more bits from the hidden messageare inserted into the DCT coe�cients.

Compensation Method We propose a histogram compensation method andclaim that it is more intuitive than the one proposed in [128]. Its operation issimilar to filling the bins of a histogram with a liquid, as if this liquid correspondsto probability flowing from the bins with excess of probability to the bins withdeficit of probability. Given the target histogram B‚C and the input histogramBC we should drain the bins from the input histogram until all its bins becomeequal to the bins at the target histogram. This is achieved by mapping theoriginal data from the subset C, which generates the input histogram BC , intoa new data set ‚C. The bins from BC and B‚C are analyzed in ascending order.The algorithm starts by analyzing the leftmost bins from both histograms. Forinstance, if the leftmost bin of the input histogram has a bin shorter thanthe leftmost bin of the target histogram (BC(≠Topt) < B‚C(≠Topt)), then it isneeded to increase the leftmost bin of the input histogram. This is achievedby moving the necessary number of elements immediately greater than thebin under analysis to the bin presenting a deficit. In other words, we searchfor coe�cients equal to ≠Topt + 1 in the subset C in order to map them to≠Topt, implying BC(≠Topt) = B‚C(≠Topt). If there are not enough coe�cientsequal to ≠Topt + 1 allowing for the compensation of the bin at ≠Topt, thenwe search for coe�cients equal to ≠Topt + 2 in C in order to map them to≠Topt. This procedure continues until BC(≠Topt) = B‚C(≠Topt). Similarly, ifBC(i) > B‚C(i), then elements in C equal to i are mapped to i + 1, so that we


have BC(i) = B‚C(i). In summary, this process compensates the input histogramBC bin-wise with respect to the target histogram B‚C .

3.2 Proposal B

Proposal B presents a new hiding method, but remains using the thecompensation method from Proposal A because of its simplicity and goodresults.

Hiding Method This hiding method also embeds the hidden data in the LSBsof the coe�cients from the subset H. We divide the histogram BH into twohistograms. The first one contains the bins ranging from ≠Topt to ≠1. Thesecond one contains the bins ranging from 1 to Topt. The hiding process isperformed separately with regard to each new histogram. The hiding processover the coe�cients equal to 0 is performed by the even/odd method. The mainidea of this proposal consists in starting the hiding process by the least frequentcoe�cient of each new histogram. For instance, the least frequent coe�cientsin the first new histogram are those with value ≠Topt. If these coe�cientshave to be changed, they can only be changed to ≠Topt + 1. Subsequently, thehiding process should write over the coe�cients from the subset H equal to≠Topt + 1, because they are the second least frequent coe�cients from the firstnew histogram. The coe�cients from H equal to ≠Topt + 1 that need to bemodified should firstly be mapped to ≠Topt, in order to compensate for the deficitof coe�cients equal to ≠Topt that has been caused by the previous iteration.The coe�cients ≠Topt + 1 that are not mapped to ≠Topt should be strictlymapped to ≠Topt + 2. This will generate an unbalance at the bin ≠Topt + 2 ofthe first new histogram. This unbalance should later be compensated by thehiding process over the coe�cients equal to ≠Topt + 3. The procedure continuessuccessively and analogously for the second new histogram.

3.3 Proposal C

As will be seen in Section 4, the results from the Proposal A were better thanthose from the Proposal B. This motivated the creation of the Proposal C, thatcombines the hiding method from Proposal A and the compensation methodfrom [111].

RESULTS 129

4 Results

We aim at verifying which of the steganographic approaches is capable of hidingthe largest amount of information within digital images while preserving tosome extent their statistics and visual quality. Steganalysts based on supervisedartificial neural networks1 trained with the error backpropagation algorithm [53]were implemented to classify images as cover or stego. The steganographiccapacity of an image was assessed by incrementally embedding bits of informationwithin the image by increasing the optimal hiding fraction by 0.02 until thesteganalysts could eventually classify the image as stego. Our database contained1,200 TIFF [6] images with 256 gray levels and dimensions of 256 ◊ 256, 512 ◊512, and 1024 ◊ 1024 pixels. Two neural networks were trained with DCTcoe�cient histograms from 400 randomly chosen, non-compensated stego imagescorrespondingly generated by the hiding processes of the S & M approach andProposal A, in addition to data from 400 cover images, using a hiding fractionof ⁄opt. The 301-dimensional histograms were preprocessed using PrincipalComponent Analysis (PCA) [62], yielding 3-dimensional vectors, yet keeping99% of the information of the original data. The test sets were composed of400 stego-compensated images not used in the training phase. The embeddingparameters used in all tests were the same from [111]. The JPEG compressionquality factor was 75, yielding the quantization matrix used to normalize theDCT coe�cients of the 8 ◊ 8 blocks of the images. Both the hiding andcompensation processes were performed in the frequency band comprising thefirst 19 AC coe�cients following a zigzag scan of the DCT coe�cients over each8 ◊ 8 block of each image. We assume that the largest value of Topt is 30 duringthe data hiding process. This intermediary value ensures that the hiding processdoes not write over high-valued DCT coe�cients, which is not desirable becausethese coe�cients are rare and therefore more di�cult to compensate. Further,to make the hidden data extraction process feasible, both the encoder anddecoder should agree on a secret key. This key is the seed of a pseudo-randomnumber generator known to both parties. The pseudo-random sequence tellsthe decoder which DCT coe�cients contain the hidden message. Moreover, thefirst 8 bits of the hidden message are reserved to carry information about thethreshold Topt. The decoder needs to know whether a DCT coe�cient carries apiece of information, which happens when its absolute value is less than or equalto Topt. Another 20 bits are reserved to carry information about the length ofthe hidden message.

1Sarkar and Manjunath [111] used support vector machines instead.


4.1 Average Performances using the Optimal Hiding Fraction

First we compared the results of our implementations of the S & M approachand Proposals A, B and C to the results in [111] using an optimal hiding fraction⁄opt calculated as shown in Section 2. In this preliminary assessment we do nottry to embed as much information as possible within the images yet. Table 1presents the average values of the following parameters and figures of merit withrespect to all 1,200 images from the database: optimal threshold (Topt); optimalhiding fraction (⁄opt); maximum hidden data embedding rate (Ropt); PSNRin dB between non-compensated stego images and cover images (PSNRH);standard deviation ‡PSNRH

; PSNR in dB between stego-compensated imagesand cover images (PSNRHC); standard deviation ‡PSNRHC

; and amount ofbits of information hidden per pixel (bits/pixel). The column S & M providesthe results from our implementation of the method [111]. The next column(S & M [111]) reproduces the results from [111], which did not include PSNRvalues. These two columns indicate that our implementation of [111] is correct.The small di�erences between the results from the second (S & M) and third(S & M [111]) columns are related to the fact that we used a di�erent imagedatabase than [111]. The same values found for Topt, ⁄opt, Ropt and bits/pixelfor all approaches – except for S & M [111] – is due to the use of the samehiding fraction ⁄opt.

Table 1: Average results over a database containing 1,200 images.Parameter S & M S & M [111] Prop. A Prop. B Prop. C

Topt 28 27 28 28 28⁄opt 0.4642 0.4834 0.4642 0.4642 0.4642Ropt 0.472 0.502 0.472 0.472 0.472

PSNRH 42.2 - 42.3 42.3 42.3‡PSNRH

1.6 - 1.5 1.6 1.5PSNRHC 37.2 - 40.4 39.9 38.2‡PSNRHC

1.0 - 1.1 1.1 0.7bits/pixel 0.136 0.141 0.136 0.136 0.136

4.2 Steganographic Capacity

Tables 2 and 3 present the average results concerning the steganographiccapacity of 400 natural digital images from the test set with respect to theneural networks specialized on the hiding processes from S & M and ProposalA (or C), respectively. Both the maximum hiding fraction ⁄max and maximum

RESULTS 131

amount of bits per pixel that could be imperceptibly hidden within the imageswere experimentally found.

The results from Tables 2 and 3 suggest that Proposal C is the beststeganographic approach, as it is capable of comparatively hiding the maximumamount of information, as shown by “bits/pixel”, while preserving an acceptablevisual quality, as shown by PSNRH and PSNRHC . Counter-intuitively, thestego-compensated images generated by Proposal C hid more information fromthe steganalyst specialized in its own hiding process than from the steganalystspecialized in the S & M approach. The reason behind that is related to thetraining process of the neural networks: the training set generated by theProposal C seemed to lead to a harder training in terms of the recognition ofparticular features related to hiding method from the Proposal C itself.

Table 2: Average results computed over the test set using the neural networkspecialized on the hiding process from [111].

Parameters S & M Proposal A Proposal B Proposal C⁄max 0.5477 0.5429 0.5251 0.5759Ropt 0.557 0.552 0.534 0.586

PSNRH 41.4 41.6 41.7 41.4‡PSNR (H) 1.2 1.3 1.3 1.1PSNRHC 36.8 37.2 37.2 37.9

‡PSNR (H-C) 1.0 1.4 1.4 0.7bits/pixel 0.161 0.159 0.154 0.169

Table 3: Average results computed over the test set using the neural networkspecialized on the hiding process from Proposal A (or C).

Parameter S & M Proposal A Proposal B Proposal C⁄max 0.5716 0.5578 0.5430 0.6181Ropt 0.582 0.568 0.553 0.629

PSNRH 41.3 41.5 41.6 41.1‡PSNR (H) 1.2 1.4 1.4 1.2PSNRHC 36.8 37.0 36.9 37.9

‡PSNR (H-C) 1.0 1.2 1.3 0.8bits/pixel 0.168 0.164 0.160 0.182


5 Conclusion

We proposed three steganographic approaches based on [111] aiming at pushingforward the steganographic capacity of natural digital images. All approachesconsisted of a hiding process – to embed the hidden message – and a statisticalcompensation process – to restore the degradation in the statistics and visualquality caused by the hiding process. We implemented two steganalysts basedon artificial neural networks. The combination of one of our compensation-aiding hiding process with the compensation process used by Sarkar andManjunath in [111] led to an 8.3% increase in the amount of informationthat can be statistically hidden within digital images without producing severevisual distortion. Future works may investigate higher-order statistics, theapplication of more elaborate steganalysts and a comparison or our proposalsto modern steganographic methods.

Acknowledgment

This work was supported by the CNPq (Brazilian Research Support Agency). Inaddition, this work was supported in part by the Research Council K.U.Leuven: GOATENSE (GOA/11/007), by the IAP Programme P6/26 BCRYPT of the BelgianState (Belgian Science Policy) and by the European Commission through the ICTprogramme under contract ICT-2007-216676 ECRYPT II.

References


Bibliography

[1] Google scholar. http://scholar.google.com. Accessed: 2013-Mar-25.

[2] Internet world stats. http://www.internetworldstats.com/stats.htm.Accessed: 2013-Apr-01.

[3] Purdue Univeristy, cyber forensics. http://www.cyberforensics.purdue.edu/default.aspx. Accessed: 2013-Mar-01.

[4] Ross Anderson. Website on economics and psychology of informationsecurity. http://www.cl.cam.ac.uk/~rja14/#Econ. Accessed: 2013-Mar-15.

[5] Spartan-3E starter kit board. http://www.xilinx.com/products/devkits/HW-SPAR3E-SK-US-G.htm.

[6] TIFF revision 6.0. http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf, 1992.

[7] Introduction to di�erential power analysis and related attacks. http://www.cryptography.com/dpa/technical, 1998.

[8] Semantic attacks: The third wave of network attacks. http://www.schneier.com/crypto-gram-0010.html#1, 2000.

[9] National Institute of Standards and Technology. Announcing request forcandidate algorithm nominations for a new cryptographic hash algorithm(SHA-3) family. http://csrc.nist.gov/groups/ST/hash/documents/FR_Notice_Nov07.pdf, 2007.

[10] UNIQUE project. Foundations for forgery-resistant security hardware. EUframework programme 7. https://www.unique-project.eu, 2009–2012.

[11] Wikipedia. Article on physically unclonable functions. http://en.wikipedia.org/wiki/Physical_unclonable_function, 2013.

133

http://scholar.google.com

http://www.internetworldstats.com/stats.htm

http://www.cyberforensics.purdue.edu/default.aspx

http://www.cyberforensics.purdue.edu/default.aspx

http://www.cl.cam.ac.uk/~rja14/#Econ

http://www.xilinx.com/products/devkits/HW-SPAR3E-SK-US-G.htm

http://www.xilinx.com/products/devkits/HW-SPAR3E-SK-US-G.htm

http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf

http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf

http://www.cryptography.com/dpa/technical

http://www.cryptography.com/dpa/technical

http://www.schneier.com/crypto-gram-0010.html#1

http://www.schneier.com/crypto-gram-0010.html#1

https://www.unique-project.eu

http://en.wikipedia.org/wiki/Physical_unclonable_function

http://en.wikipedia.org/wiki/Physical_unclonable_function

134 BIBLIOGRAPHY

[12] Dakshi Agrawal, Josyula R. Rao, Pankaj Rohatgi, and Kai Schramm.Templates as master keys. In Worshop on Cryptographic Hardware andEmbedded Systems – CHES, volume 3659 of Lecture Notes in ComputerScience, pages 15–29. Springer-Verlag, 2005.

[13] M. A. Aizerman, E. A. Braverman, and L. Rozonoer. Theoreticalfoundations of the potential function method in pattern recognitionlearning. In Automation and Remote Control, number 25, pages 821–837, 1964.

[14] Jason H. Anderson. A puf design for secure fpga-based embedded systems.In Asia and South Pacific Design Automation Conference – ASPDAC,pages 1–6. IEEE Press, 2010.

[15] Ross Anderson and Fabien Petitcolas. On the limits of steganography.IEEE Journal of Selected Areas in Communications, 16:474–481, 1998.

[16] Ross J. Anderson. Stretching the limits of steganography. In InternationalWorkshop on Information Hiding, Lecture Notes in Computer Science,pages 39–48. Springer-Verlag, 1996.

[17] Dmitri Asonov and Rakesh Agrawal. Keyboard acoustic emanations. InIEEE Symposium on Security and Privacy, pages 3–11, 2004.

[18] Michael Backes, Markus Dürmuth, Sebastian Gerling, Manfred Pinkal,and Caroline Sporleder. Acoustic side-channel attacks on printers. InUSENIX Security Symposium, pages 307–322, 2010.

[19] Hagai Bar-El, Hamid Choukri, David Naccache, Michael Tunstall,and Claire Whelan. The sorcerer’s apprentice guide to fault attacks.Proceedinds of the IEEE, 94:370–382, 2006.

[20] Timo Bartkewitz and Kerstin Lemke-Rust. E�cient template attacksbased on probabilistic multi-class support vector machines. In SmartCard Research and Advanced Application Conference – CARDIS, pages263–276, 2012.

[21] G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche. The keccak sha-3submission. Submission to NIST (Round 3), 2011.

[22] C. M. Bishop. Neural Networks for Pattern Recognition. Oxford UniversityPress, USA, 1995.

[23] Andrey Bogdanov, Lars R. Knudsen, Gregor Leander, Christof Paar, AxelPoschmann, Matthew J. B. Robshaw, Yannick Seurin, and C. Vikkelsoe.Present: An ultra-lightweight block cipher. In Worshop on Cryptographic

BIBLIOGRAPHY 135

Hardware and Embedded Systems – CHES, volume 4727 of Lecture Notesin Computer Science, pages 450–466, 2007.

[24] Dan Boneh, Richard A. DeMillo, and Richard J. Lipton. On theimportance of checking cryptographic protocols for faults. In InternationalConference on Theory and Application of Cryptographic Techniques –EUROCRYPT, pages 37–51, 1997.

[25] Christoph Bösch, Jorge Guajardo, Ahmad-Reza Sadeghi, JamshidShokrollahi, and Pim Tuyls. E�cient helper data key extractor on fpgas.In Worshop on Cryptographic Hardware and Embedded Systems – CHES,volume 5154 of Lecture Notes in Computer Science. Springer-Verlag, 2008.

[26] K. De Brabanter, P. Karsmakers, F. Ojeda, C. Alzate, J. De Brabanter,K. Pelckmans, B. De Moor, J. Vandewalle, and J.A.K. Suykens. LS-SVMlab toolbox user’s guide version 1.7. http://www.esat.kuleuven.be/sista/lssvmlab/, 2010.

[27] Eric Brier, Christophe Clavier, and Francis Olivier. Correlation poweranalysis with a leakage model. In Workshop on Hardware Security andEmbedded Systems – CHES, volume 3156 of Lecure Notes in ComputerSciences, pages 16–29, 2004.

[28] David Brumley and Dan Boneh. Remote timing attacks are practical.In In Proceedings of the 12th USENIX Security Symposium, pages 1–14,2003.

[29] Christian Cachin. An information-theoretic model for steganography.In Workshop on Information Hiding, volume 1525 of Lecture Notes inComputer Science, pages 306–318. Springer-Verlag, 1998.

[30] S. Chari, J. R. Rao, and P. Rohatgi. Template attacks. In Workshop onCryptographic Hardware and Embedded Systems – CHES, volume 2523 ofLecure Notes in Computer Science, pages 172–186. Springer-Verlag, 2002.

[31] Christophe Clavier. Side channel analysis for reverse engineering (scare) -an improved attack against a secret a3/a8 gsm algorithm. IACR CryptologyePrint Archive, 2004:49, 2004.

[32] Christophe Clavier. An improved scare cryptanalysis against a secreta3/a8 gsm algorithm. In Information Systems Security – ICISS, volume4812 of Lecture Notes in Computer Science, pages 143–155, 2007.

[33] Jean-Sébastien Coron and Louis Goubin. On boolean and arithmeticmasking against di�erential power analysis. In Workshop on CryptographicHardware and Embedded Systems – CHES, volume 1965 of Lecture Notesin Computer Science, pages 231–237. Springer-Verlag, 2000.

http://www.esat.kuleuven.be/sista/lssvmlab/

http://www.esat.kuleuven.be/sista/lssvmlab/

136 BIBLIOGRAPHY

[34] Corinna Cortes and Vladimir Vapnik. Support-vector networks. MachineLearning, 20(3):273–297, 1995.

[35] Ingemar J. Cox, Ton Kalker, Georg Pakura, and Mathias Scheel.Information transmission and steganography. In International Workshopon Digital Watermarking – IWDW, Lecture Notes in Computer Science,pages 15–29, 2005.

[36] James W. Crouch, Hiren J. Patel, Yong C. Kim, and Robert W.Bennington. Creating unique identifiers on field programmable gatearrays using natural processing variations. In International Conference onField Programmable Logic and Applications – FPL, pages 579–582, 2008.

[37] Joan Daemen and Vincent Rijmen. The design of rijndael. Springer-Verlag,2002.

[38] Yevgeniy Dodis, Rafail Ostrovsky, Leonid Reyzin, and Adam Smith. Fuzzyextractors: How to generate strong keys from biometrics and other noisydata. SIAM Journal on Computing, 38:97–139, 2008.

[39] Stefan Dziembowski and Krzysztof Pietrzak. Leakage-resilientcryptography. In Symposium on Foundations of Computer Science –FOCS, pages 293–302. IEEE, 2008.

[40] R. A. Fisher. The use of multiple measurements in taxonomic problems.Annals of Eugenics, 7(7):179–188, 1936.

[41] A. Freud. The Ego and the Mechanisms of Defense. Hogarth Press andInstitute of Psycho-Analysis, 1937.

[42] Jessica J. Fridrich, Miroslav Goljan, Dorin Hogea, and David Soukal.Quantitative steganalysis of digital images: estimating the secret messagelength. Journal on Multimedia Systems, 9(3):288–302, 2003.

[43] H. Fujiwara, M. Yabuuchi, H. Nakano, H. Kawai, K. Nii, and K. Arimoto.A chip-id generating circuit for dependable lsi using random address errorson embedded sram and on-chip memory bist. In Proceedings of the IEEEVLSI Circuits Symposium, pages 76 –77, 2011.

[44] Karine Gandolfi, D. Naccache, C. Paar, Karine G, Christophe Mourtel, andFrancis Olivier. Electromagnetic analysis: Concrete results. In Worshopon Cryptographic Hardware and Embedded Systems – CHES, volume 2162of Lecture Notes in Computer Science, pages 251–261. Springer-Verlag,2001.

BIBLIOGRAPHY 137

[45] B. Gassend, D. Clarke, D. Lim, M. van Dijk, and S.Devadas.Identification and authentication of integrated circuits. In Concurrencyand Computation: Practice and Experiences, pages 1077–1098, 2004.

[46] Blaise Gassend, Dwaine Clarke, Marten van Dijk, and Srinivas Devadas.Silicon physical random functions. In ACM Conference on Computer andCommunication Security – CCS, pages 148–160, 2002.

[47] T. Van Gestel, J.A.K. Suykens, B. Baesens, S. Viaene, J. Vanthienen,G. Dedene, B. De Moor, and J. Vandewalle. Benchmarking least squaressupport vector machine classifiers. Machine Learning, 54:5–32, 2004.

[48] B. Gierlichs, K. Lemke-Rust, and C. Paar. Templates vs. stochasticmethods. In Worshop on Cryptographic Hardware and Embedded Systems– CHES, volume 4249 of Lecture Notes in Computer Science, pages 15–29.Springer-Verlag, 2006.

[49] Benedikt Gierlichs. Statistical and Information-Theoretic Methods forPower Analysis on Embedded Cryptography. PhD thesis, KU Leuven,2011.

[50] Benedikt Gierlichs, Lejla Batina, Pim Tuyls, and Bart Preneel. MutualInformation Analysis - A Generic Side-Channel Distinguisher. In ElisabethOswald and Pankaj Rohatgi, editors, Worshop on Cryptographic Hardwareand Embedded Systems – CHES, volume 5154 of Lecture Notes in ComputerScience, pages 426–442. Springer-Verlag, 2008.

[51] Christophe Giraud and Hugues Thiebeauld. A survey on fault attacks. InSmart Card Research and Advanced Application Conference – CARDIS,pages 159–176, 2004.

[52] Jorge Guajardo, Sandeep S. Kumar, Geert-Jan Schrijen, and Pim Tuyls.Fpga intrinsic pufs and their use for ip protection. In Worshop onCryptographic Hardware and Embedded Systems – CHES, Lecture Notesin Computer Science, pages 63–80. Springer-Verlag, 2007.

[53] Simon Haykin. Neural Networks: A Comprehensive Foundation.Macmillan College Publishing Company: Englewood Cli�s, 1998.

[54] Annelie Heuser, Michael Kasper, Werner Schindler, and Marc Stöttinger.A new di�erence method for side-channel analysis with high-dimensionalleakage models. In CT-RSA, pages 365–382, 2012.

[55] Annelie Heuser and Michael Zohner. Intelligent machine homicide- breaking cryptographic devices using support vector machines. InWorkshop on Constructive Side-Channel Analysis and Secure Design(COSADE), pages 249–264, 2012.

138 BIBLIOGRAPHY

[56] D. E. Holcomb, W. P. Burleson, and K. Fu. Initial SRAM state asa fingerprint and source of true random numbers for RFID tags. InProceedings of the Conference on RFID Security, 2007.

[57] Kurt Hornik. Approximation capabilities of multilayer feedforwardnetworks. Neural Networks, 4(2):251–257, 1991.

[58] G. Hospodar, E. De Mulder, B. Gierlichs, J. Vandewalle, andI. Verbauwhede. Least squares support vector machines for side-channelanalysis. Workshop on Constructive Side-Channel Analysis and SecureDesign (COSADE), 2011.

[59] G. Hospodar, E. De Mulder, B. Gierlichs, J. Vandewalle, andI. Verbauwhede. Machine learning in side-channel analysis: A first study.Journal of Cryptographic Engineering, 1(4):293–302, 2011.

[60] G. Hospodar, I. Verbauwhede, and J. Gomes. Algorithms for digitalimage steganography via statistical restoration. 2nd Joint WIC/IEEESymposium on Information Theory and Signal Processing in the Benelux(WIC), pages 5–12, 2012.

[61] Gabriel Hospodar, Roel Maes, and Ingrid Verbauwhede. Machine learningattacks on 65nm Arbiter PUFs: Accurate modeling poses strict boundson usability. 4th IEEE International Workshop on Information Forensicsand Security (WIFS 2012), pages 37–42, 2012.

[62] I. T. Jolli�e. Principal Component Analysis. Springer-Verlag, 1986.

[63] D. Kahn. The codebreakers. MacMillan, 1979.

[64] Dusko Karaklajic. Securing Cryptographic Hardware against Fault Attacks.PhD thesis, KU Leuven, 2012.

[65] M. J. Kearns. The computational complexity of machine learning. PhDthesis, Harvard University, 1989.

[66] A. Kerckho�s. La cryptographie militaire. Journal des Sciences MilitairesIX, pages 161–191, 1883.

[67] P. C. Kocher. Timing attacks on implementations of Di�e-Hellman, RSA,DSS, and other systems. In Crypto 96 - Advances in Cryptology, volume1109 of Lecture Notes in Computer Science, pages 104–113. Springer-Verlag, 1996.

[68] P. C. Kocher, J. Ja�e, and B. Jun. Di�erential power analysis. InCrypto 99 - Advances in Cryptology, volume LCNS 1666, pages 388–397.Springer-Verlag, 1999.

BIBLIOGRAPHY 139

[69] Francois Koeune, Francois Koeune, Jean-Jacques Quisquater, and Jean-Jacques Quisquater. A timing attack against rijndael. Technical report,UCL Crypto Group, 1999.

[70] Aswin Raghav Krishna, Seetharam Narasimhan, Xinmu Wang, andSwarup Bhunia. Mecca: A robust low-overhead puf using embeddedmemory array. In Worshop on Cryptographic Hardware and EmbeddedSystems – CHES, Lecture Notes in Computer Science, pages 407–420,2011.

[71] S. Kumar, J. Guajardo, R. Maes, G. J. Schrijen, and P. Tuyls. Thebutterfly puf: Protecting ip on every fpga. In IEEE InternationalSymposium on Hardware-Oriented Security and Trust – HOST, pages67–70, 2008.

[72] C. Meuter G. Bontempi O. Markowitch L. Lerman, S. Fernandes. Semi-supervised template attack. Workshop on Constructive Side-ChannelAnalysis and Secure Design (COSADE), 2013.

[73] O. Markowitch L. Lerman, G. Bontempi. Side channel attack: an approachbased on machine learning. Workshop on Constructive Side-ChannelAnalysis and Secure Design (COSADE), 2011.

[74] Jae W. Lee, D. Lim, B. Gassend, G. E. Suh, M. Van Dijk, and S.Devadas.A technique to build a secret key in integrated circuits with identificationand authentication applications. In Proceedings of the IEEE VLSI CircuitsSymposium, pages 176–179, 2004.

[75] Jae W. Lee, Daihyun Lim, Blaise Gassend, G. Edward Suh, Marten vanDijk, and Srinivas Devadas. A technique to build a secret key in integratedcircuits for identification and authentication application. In Proceedingsof the IEEE VLSI Circuits Symposium, pages 176–159, 2004.

[76] Arjen K. Lenstra, James P. Hughes, Maxime Augier, Joppe W. Bos,Thorsten Kleinjung, and Christophe Wachter. Ron was wrong, Whit isright. IACR Cryptology ePrint Archive, 2012:64, 2012.

[77] Keith Lofstrom, W. Robert Daasch, and Donald Taylor. Ic identificationcircuit using device mismatch. In IEEE International Solid-State CircuitsConference – ISSCC, pages 372–373, 2000.

[78] R. Maes, P. Tuyls, and I. Verbauwhede. Low-overhead implementationof a soft decision helper data algorithm for sram pufs. In Worshop onCryptographic Hardware and Embedded Systems – CHES, Lecture Notesin Computer Science, pages 332–347, 2009.

140 BIBLIOGRAPHY

[79] R. Maes and I. Verbauwhede. Physically unclonable functions: A study onthe state of the art and future research directions. In Towards HardwareIntrinsic Security: Foundation and Practice, Information Security andCryptography, pages 3–37. Springer-Verlag, 2010.

[80] Roel Maes. Physically Unclonable Functions: Constructions, Propertiesand Applications. PhD thesis, KU Leuven, 2012.

[81] Roel Maes, Pim Tuyls, and Ingrid Verbauwhede. Intrinsic pufs from flip-flops on reconfigurable devices. In 3rd Benelux Workshop on Informationand System Security (WISSec), page 17, 2008.

[82] Roel Maes, Anthony Van Herrewege, and Ingrid Verbauwhede. PUFKY:a fully functional puf-based cryptographic key generator. In Worshop onCryptographic Hardware and Embedded Systems – CHES, Lecture Notesin Computer Science, pages 302–319. Springer-Verlag, 2012.

[83] Roel Maes and Ingrid Verbauwhede. Physically Unclonable Functions: aStudy on the State of the Art and Future Research Directions. In DavidNaccache and Ahmad-Reza Sadeghi, editors, Towards Hardware-IntrinsicSecurity, pages 3–37. Springer, 2010.

[84] Abhranil Maiti, Vikash Gunreddy, and Patrick Schaumont. A systematicmethod to evaluate and compare the performance of physical unclonablefunctions. Cryptology ePrint Archive, Report 2011/657, 2011.

[85] Mehrdad Majzoobi, Farinaz Koushanfar, and Miodrag Potkonjak.Lightweight secure PUFs. In International Conference on Computer-Aided Design – ICCAD, pages 670–673. IEEE, 2008.

[86] Stefan Mangard, Elisabeth Oswald, and Thomas Popp. Power AnalysisAttacks: Revealing the Secrets of Smart Cards. Springer-Verlag, 2007.

[87] U.M. Maurer. Secret key agreement by public discussion from commoninformation. IEEE Transactions on Information Theory, 39(3):733–742,1993.

[88] S.W. Menard. Applied logistic regression analysis. Quantitativeapplications in the social sciences. Sage Publications, 1995.

[89] Alfred J. Menezes, Scott A. Vanstone, and Paul C. Van Oorschot.Handbook of Applied Cryptography. CRC Press, Inc., 1st edition, 1996.

[90] Thomas S. Messerges, Ezzat A. Dabbish, and Robert H. Sloan. Examiningsmart-card security under the threat of power analysis attacks. IEEETransaction on Computers, 51:541–552, 2002.

BIBLIOGRAPHY 141

[91] T. M. Mitchell. Machine Learning. McGraw-Hill, 1997.

[92] Sumio Morioka and Akashi Satoh. An optimized s-box circuit architecturefor low power aes design. In Worshop on Cryptographic Hardware andEmbedded Systems – CHES, Lecture Notes in Computer Sciences, pages172–186, 2002.

[93] Elke De Mulder. Electromagnetic Techniques and Probes for Side-ChannelAnalysis on Cryptographic Devices. PhD thesis, KU Leuven, 2010.

[94] Yassin Musharbash. Die zeit: In ihren eigenen worten. http://www.zeit.de/2012/12/Al-Kaida-Deutschland/seite-1, 2012. Accessed: 2013-Feb-10.

[95] AIST (Agency of Industrial Science, Technology), and Tohoku Universityin Japan. HDL code used for an AES S-Box implementation,. http://www.aoki.ecei.tohoku.ac.jp/crypto/items/AES_Comp.v.

[96] Elisabeth Oswald and Stefan Mangard. Template attacks on masking -resistance is futile. In CT-RSA, pages 243–256, 2007.

[97] E. Ozturk, G. Hammouri, and B. Sunar. Physical unclonable functionwith tristate bu�ers. pages 3194–3197, 2008.

[98] Ravikanth S. Pappu. Physical one-way functions. PhD thesis, MIT, 2001.

[99] Ravikanth S. Pappu, Ben Recht, Jason Taylor, and Niel Gershenfeld.Physical one-way functions. Science, 297:2026–2030, 2002.

[100] Hiren J. Patel, James W. Crouch, Yong C. Kim, and Tony C. Kim.Creating a unique digital fingerprint using existing combinational logic.In International Symposium on Circuits and Systems – ISCAS, pages2693–2696. IEEE, 2009.

[101] Ling Pei, Jingbin Liu, Robert Guinness, Yuwei Chen, Heidi Kuusniemi,and Ruizhi Chen. Using LS-SVM based motion recognition for smartphoneindoor wireless positioning. Sensors, 12(5):6155–6175, 2012.

[102] D. Puntin, S. Stanzione, and G. Iannaccone. CMOS unclonable system forsecure authentication based on device variability. In Solid-State CircuitsConference – ESSCIRC, pages 130 –133, 2008.

[103] J.-J. Quisquater and D. Samyde. Electromagnetic analysis (EMA):Measures and counter-measures for smart cards. In Smart CardProgramming and Security, Lecture Notes in Computer Science, pages200–210, 2001.

http://www.zeit.de/2012/12/Al-Kaida-Deutschland/seite-1

http://www.zeit.de/2012/12/Al-Kaida-Deutschland/seite-1

http://www.aoki.ecei.tohoku.ac.jp/crypto/items/AES_Comp.v

http://www.aoki.ecei.tohoku.ac.jp/crypto/items/AES_Comp.v

142 BIBLIOGRAPHY

[104] C. Rechberger and E. Oswald. Practical template attacks. In InternationalWorkshop Information Security Applications – WISA, volume 3325, pages440–456. Springer-Verlag, 2004.

[105] M. Riedmiller and H. Braun. A direct adaptive method for fasterbackpropagation learning: The RProp algorithm. In IEEE InternationalConference on Neural Networks, pages 586–591, 1993.

[106] Ronald L. Rivest. Cryptography and machine learning. In InternationalConference on the Theory and Applications of Cryptology – ASIACRYPT,volume 739 of Lecture Notes in Computer Science, pages 427–439. Springer-Verlag, 1991.

[107] F. Rosenblatt. The perceptron – a perceiving and recognizing automaton.Cornell Aeronautical Laboratory report. 1957.

[108] U. Rührmair, C. Jaeger, M. Bator, M. Stutzmann, P. Lugli, and G. Csaba.Applications of high-capacity crossbar memories in cryptography. IEEETransactions on Nanotechnology, 10(3):489–498, 2011.

[109] Ulrich Rührmair, Christian Jaeger, Christian Hilgers, Michael Algasinger,György Csaba, and Martin Stutzmann. Security applications of diodeswith unique current-voltage characteristics. In International Conferenceon Financial Cryptography and Data Security, pages 328–335. Springer-Verlag, 2010.

[110] Ulrich Rührmair, Frank Sehnke, Jan Sölter, Gideon Dror, Srinivas Devadas,and Jürgen Schmidhuber. Modeling attacks on physical unclonablefunctions. In ACM conference on Computer and Communications Security– CCS, pages 237–249, 2010.

[111] A. Sarkar and B. S. Manjunath. Estimating steganographic capacity forodd-even based embedding and its use in individual compensation. InIEEE International Conference on Image Processing (ICIP), 2007.

[112] T. Sasao. And-exor expressions and their optimization. In Logic Synthesisand Optimization, Kluwer Academic Publishers, pages 287–312, 1993.

[113] Werner Schindler, Kerstin Lemke, and Christof Paar. A stochasticmodel for di�erential side channel cryptanalysis. In Worshop onCryptographic Hardware and Embedded Systems – CHES, Lecture Notesin Computational Science, pages 30–46. Springer-Verlag, 2005.

[114] Claude E. Shannon. Communication theory of secrecy systems. BellSystem Technical Journal, 28(4):656–715, 1949.

BIBLIOGRAPHY 143

[115] Koichi Shimizu, Daisuke Suzuki, and Tomomi Kasuya. Glitchpuf: Extracting information from usually unwanted glitches. IEICETransactions, 95-A(1):223–233, 2012.

[116] Gustavus J. Simmons. The prisoners’ problem and the subliminal channel.In Advances in Cryptology: Proceedings of CRYPTO ’83, pages 51–67.Plenum, 1983.

[117] K. Solanki, K. Sullivan, U. Madhow, B. S. Manjunath, andS. Chandrasekaran. Provably secure steganography: Achieving zero k-ldivergence using statistical restoration. In IEEE International Conferenceon Image Processing (ICIP), 2006.

[118] Kaushal Solanki, Kenneth Sullivan, Upamanyu Madhow, B. S. Manjunath,and Shivkumar Chandrasekaran. Statistical restoration for robust andsecure steganography. In IEEE International Conference on ImageProcessing (ICIP) (2), pages 1118–1121, 2005.

[119] Y. Su, J. Holleman, and B. Otis. A 1.6pJ/bit 96% stable chip-id generatingcircuit using process variations. In IEEE International Solid-State CircuitsConference – ISSCC, pages 406–611, 2007.

[120] Ying Su, J. Holleman, and B.P. Otis. A digital 1.6 pJ/bit chip identificationcircuit using process variations. IEEE Journal of Solid-State Circuits,43(1):69 –77, 2008.

[121] G. Edward Suh and Srinivas Devadas. Physical unclonable functions fordevice authentication and secret key generation. In Design AutomationConference – DAC, pages 9–14. IEEE, 2007.

[122] J. A. K. Suykens and J. Vandewalle. Least squares support vector machineclassifiers. Neural Processing Letters, 9(3):293–300, 1999.

[123] J.A.K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, andJ. Vandewalle. Least Squares Support Vector Machines. World Scientific,2002.

[124] Daisuke Suzuki and Koichi Shimizu. The glitch puf: a new delay-pufarchitecture exploiting glitch shapes. In Workshop on CryptographicHardware and Embedded Systems – CHES, volume 6225 of Lecture Notesin Computer Science, pages 366–382. Springer-Verlag, 2010.

[125] J.A. Swets. Signal detection theory and ROC analysis in psychologyand diagnostics: collected papers. Scientific Psychology Series. LawrenceErlbaum Associates, 1996.

[126] J. Trithemius. Steganographia. 1621.

144 BIBLIOGRAPHY

[127] P. Tuyls, B. éSkoriâc, and T.A.M. Kevenaar. Security with Noisy Data: OnPrivate Biometrics, Secure Key Storage and Anti-counterfeiting. Springer-Verlag, 2007.

[128] R. Tzschoppe, R. Bauml, and J.J. Eggers. Histogram modificationswith minimum MSE distortion. Technical Report, TelecommunicationLaboratory, University of Erlangen-Nuremberg, 2001.

[129] G.E. Vaillant. Adaptation to Life. Harvard University Press, 1977.

[130] David Wolpert and William G. Macready. No free lunch theorems foroptimization. IEEE Trans. Evolutionary Computation, 1(1):67–82, 1997.

[131] Dai Yamamoto, Gabriel Hospodar, Roel Maes, and Ingrid Verbauwhede.Performance and security evaluation of aes s-box-based glitch pufs on fpgas.International Conference on Security, Privacy and Applied CryptographyEngineering (SPACE 2012), pages 45–62, 2012.

[132] Meng-Day Mandel Yu, David M’Raihi, Richard Sowell, and SrinivasDevadas. Lightweight and secure puf key storage using limits of machinelearning. In Worshop on Cryptographic Hardware and Embedded Systems– CHES, Lecture Notes in Computer Science, pages 358–373. Springer-Verlag, 2011.

[133] Li Zhuang, Feng Zhou, and J. D. Tygar. Keyboard acoustic emanationsrevisited. In ACM Conference on Computer and Communications Security– CCS, pages 373–382, 2005.

[134] Michael Zohner, Michael Kasper, Marc Stöttinger, and Sorin A. Huss.Side channel analysis of the SHA-3 finalists. In Design, Automation andTest in Europe Conference – DATE, pages 1012–1017, 2012.

Curriculum Vitae

Gabriel Mayrink da Rocha HospodarRio de Janeiro, RJ, Brazil, 1985

. . .

PhD in Electrical Engineering, 2013COSIC, KU Leuven, BelgiumAwarded a competitive research grant from COSIC, KU LeuvenResearched into several aspects of electronics security

MSc in Electrical Engineering, 2009COPPE/UFRJ – Federal University of Rio de Janeiro, BrazilRanked first in the 2008 admission process for the Elec. Eng. ProgramGraduated in advance (1 out of 2 years) with grade A (out of A-D)Awarded a full scholarship from the CNPq (Brazilian R&D Agency)

BSc Cum Laude in Electrical Engineering, 2007UFRJ – Federal University of Rio de Janeiro, BrazilElected class leader and speaker in the Engineering graduation ceremonyGraduated in advance (4.5 out of 5 years) with honors (83%-grade)

Updated information and contact details can be found online.

145

List of Publications

International, Peer Reviewed Journals

1. Gabriel Hospodar, Benedikt Gierlichs, Elke De Mulder, Ingrid Ver-bauwhede, and Joos Vandewalle. Machine Learning in Side-ChannelAnalysis: a First Study. In Journal of Cryptographic Engineering (JCEN),Lecture Notes in Computer Science, Springer-Verlag, Volume 1, Issue 4,Pages 293-302, 2011.

International, Peer Reviewed Conferences

1. Gabriel Hospodar, Roel Maes, and Ingrid Verbauwhede. Machine LearningAttacks on 65nm Arbiter PUFs: Accurate Modeling poses strict Bounds onUsability. In 4th IEEE International Workshop on Information Forensicsand Security (WIFS 2012), IEEE, Pages 37-42, 2012.

2. Dai Yamamoto, Gabriel Hospodar, Roel Maes, and Ingrid Verbauwhede.Performance and Security Evaluation of AES S-Box-based Glitch PUFson FPGAs, In International Conference on Security, Privacy and AppliedCryptography Engineering (SPACE 2012), Lecture Notes in ComputerScience, Springer-Verlag, Pages 45-62, 2012.

3. Gabriel Hospodar, Ingrid Verbauwhede, and José Gabriel R. C. Gomes.Algorithms for Digital Image Steganography via Statistical Restoration.In 2nd Joint WIC/IEEE Symposium on Information Theory and SignalProcessing in the Benelux (WIC 2012), Pages 5-12, 2012.

147

FACULTY OF ENGINEERING SCIENCEDEPARTMENT OF ELECTRICAL ENGINEERING (ESAT)

COMPUTER SECURITY AND INDUSTRIAL CRYPTOGRAPHY (COSIC)Kasteelpark Arenberg 10 box 2446

B-3001 [email protected]

www.esat.kuleuven.be/scd

Model Building for Electronic Security - Departement ...

Documents

Transcript of Model Building for Electronic Security - Departement ...