New Methods for Symmetric Cryptography - Lirias - KU Leuven

292
ARENBERG DOCTORAL SCHOOL Faculty of Engineering Science New Methods for Symmetric Cryptography Chaoyun Li Dissertation presented in partial fulfillment of the requirements for the degree of Doctor of Engineering Science (PhD): Electrical Engineering February 2020 Supervisor: Prof. dr. ir. Bart Preneel

Transcript of New Methods for Symmetric Cryptography - Lirias - KU Leuven

ARENBERG DOCTORAL SCHOOLFaculty of Engineering Science

New Methods forSymmetric Cryptography

Chaoyun Li

Dissertation presented in partialfulfillment of the requirements for the

degree of Doctor of EngineeringScience (PhD): Electrical Engineering

February 2020

Supervisor:Prof. dr. ir. Bart Preneel

New Methods for Symmetric Cryptography

Chaoyun LI

Examination committee:Prof. dr. ir. Hugo Hens, chairProf. dr. ir. Bart Preneel, supervisorProf. dr. ir. Vincent RijmenProf. dr. Bruno CrispoProf. dr. ir. Frank PiessensProf. dr. Willi Meier(University of Applied Sciences and Arts

Northwestern Switzerland)Prof. dr. Gregor Leander(Ruhr-University Bochum)

Dissertation presented in partialfulfillment of the requirements forthe degree of Doctor of EngineeringScience (PhD): Electrical Engineer-ing

February 2020

© 2020 KU Leuven – Faculty of Engineering ScienceUitgegeven in eigen beheer, Chaoyun Li, Kasteelpark Arenberg 10, bus 2452, B-3001 Leuven (Belgium)

Alle rechten voorbehouden. Niets uit deze uitgave mag worden vermenigvuldigd en/of openbaar gemaakt wordendoor middel van druk, fotokopie, microfilm, elektronisch of op welke andere wijze ook zonder voorafgaandeschriftelijke toestemming van de uitgever.

All rights reserved. No part of the publication may be reproduced in any form by print, photoprint, microfilm,electronic or any other means without written permission from the publisher.

Preface

Doing a PhD in cryptography is an unforgettable experience. Often I find myselfto be the most ignorant person in the room. This offers me great opportunitiesto learn the vast field bit by bit. This thesis is a written report of some resultsI have got during this adventure. The thesis would not have been possiblewithout the help of many people, some of whom I would like to thank below.

First and foremost, I want to thank my supervisor Prof. Bart Preneel forproviding me the opportunity to do research at COSIC and allowing me tofreely explore my ideas. Thank you for supporting me, guiding me and makingCOSIC a wonderful research group.

I am grateful to the jury members, Prof. Vincent Rijmen, Prof. Bruno Crispo,Prof. Gregor Leander and Prof. Frank Piessens, for reviewing the thesis andproviding valuable feedback, which has improved this thesis. My thanks go toProf. Willi Meier for your close collaboration and for many inspiring discussions.I would also like to thank Prof. Hugo Hens for chairing this jury.

I would like to thank my Master thesis supervisor Prof. Xiangyong Zeng forteaching me the real work ethic and encouraging me to do a PhD abroad. I amdeeply grateful for your support and assistance over the years.

This thesis would not have been possible without the many incredible co-authors,and I am thankful to all of you. I want to express my gratitude to Prof. TorHelleseth for believing in me and helping me through the years. Thank you forinviting me to visit the beautiful Bergen for one month. A special thanks goesto Chunlei. I really appreciate your useful advice and guidance. Thank you fortaking care of everything during my visit to Bergen.

Thanks to Sébastien, Marie and the rest of the people in the AppliedCryptography Group at Orange Labs who ensured a fruitful and pleasantexperience during my internship in Caen.

I would like to thank all my former and current colleagues for making my

i

ii PREFACE

stay at COSIC so enjoyable. A special thanks goes to Qingju for your helpthrough the early days in Leuven and for close collaborations during my PhD.A big thanks goes to Angshuman, Kent and Rafael, for your friendship and formany delightful conversations. Thanks to Tim and Charlotte for translatingthe abstract into Dutch. Thanks to Ren, Bohan, Yunwen, Wei, Hua, Wenying,Rongmao, Junwei, Aysajan and Jun for your friendship and hospitality. Thanksto my nice office mates. I would also like to thank Tomer, Yu Long, Atul,Elena, Begül, Dušan, Danilo, Simon, Daniele and Sujoy for many inspiring andinteresting discussions.

I would especially like to thank our secretary Péla, European projects coordinatorSaartje and accountants Elsy and Wim. Without your support, it would beimpossible to get anything done at all.

Thanks to my friends Zhiyao, Jianmin, Tannie, Raf, Satabdi and Zhe. Yourwarmth and care during these years made me feel at home.

This thesis would not have been possible without the financial support. I wantto thank the European Union for funding this PhD through its Horizon 2020research and innovation programme Marie Skłodowska Curie ITN ECRYPT-NET (Project Reference 643161).

I would like to thank my parents and my brother for your unconditional loveand support through all these years. I am especially grateful to my wife Hongli,for your endless love and patience. Thank you for making me a better person.

Chaoyun Li 李超云Heverlee, January 2020

Abstract

Symmetric cryptography plays a key role in providing security and privacy fordigital systems. This thesis covers some aspects of the design, analysis andapplications of symmetric cryptographic primitives.

Despite the worldwide adoption of cryptographic standards, the rise of Internetof Things creates a need for new cryptographic primitives tailored for resource-constrained environments. We focus on the design and implementation oflightweight linear layers for symmetric ciphers. We propose new constructionsof lightweight MDS (Maximum Distance Separable) and near-MDS matrices.Further, we develop new tools for optimizing the implementation of linear layerswith respect to both circuit area and depth.

We present new cryptanalytic methods for two classes of emerging primitives:lightweight authenticated encryption schemes and ciphers dedicated to advancedprotocols such as Multi-Party Computation and Fully Homomorphic Encryption.For the sake of implementation efficiency, many of the target ciphers utilizebuilding blocks of low algebraic degree. To leverage this structural property,we enhance cube attacks with new degree evaluation and term enumerationtechniques and revisit linear cryptanalysis by providing novel correlationcomputation methods. Moreover, automated tools are developed to searchfor distinguishers in the attacks. We also improve interpolation attacks in termsof memory complexity. The proposed methods enable us to present current bestattacks on some ciphers including the first attack on full versions of Morus.

Encryption algorithms have shown to be crucial to address the security andprivacy issues of data transformation on the Internet. However, they also preventthe work of intrusion detection based on deep packet inspection. We employsymmetric cryptographic primitives to design a privacy-friendly and market-compliant intrusion detection system over encrypted traffic. Our experimentsshow that our protocol is approaching feasibility in real-world applications.

iii

Beknopte Samenvatting

Symmetrische cryptografie speelt een sleutelrol bij de veiligheid en privacy vandigitale systemen. Deze thesis behandelt enkele aspecten van het ontwerp, deanalyse en de toepassing van symmetrische-sleutel primitieven.

Ondanks het wereldwijd gebruik van cryptografische standaarden, zorgt deopkomst van het Internet der Dingen voor een nood aan nieuwe cryptografischeprimitieven die geschikt zijn voor situaties met beperkte middelen. Wij richtenons voornamelijk op het ontwerp en de implementatie van lichtgewicht lineairelagen voor symmetrische cijfers. We stellen nieuwe constructies van MDS(maximum distance separable) en bijna-MDS matrices voor. Verder ontwikkelenwe nieuwe tools om de implementatie van lineaire lagen te optimaliseren naarzowel oppervlakte als -diepte.

We stellen nieuwe cryptanalytische methoden voor voor twee klassen vanopkomende primitieven: lichtgewicht geauthenticeerde encryptieschema’sen domeinspecifieke cijfers voor geavanceerde protocollen zoals Multi-PartyComputation en Volledig Homomorfe Versleuteling. Om een efficiëntieimplementatie te bekomen, maken veel van deze cijfers gebruik van bouwblokkenvan lage algebraïsche graad. Om gebruik te maken van deze structureleeigenschap, verbeteren we de cube aanval met nieuwe graadevaluatie entermenumeratie technieken en herbekijken we lineaire cryptanalyse door nieuwetechnieken voor het berekenen van correlaties voor te stellen. Bovendienontwikkelen we geautomatiseerde tools om distinguishers te zoeken bij dezeaanvallen. We verbeteren ook het geheugengebruik van interpolatieaanvallen.De voorgestelde methodes leiden tot de beste aanvallen van het moment opsommige cijfers, inclusief de eerste aanval op de volledige versie van Morus.

Versleutelingsalgoritmes zijn cruciaal gebleken om de veiligheids- en privacypro-blematiek van gegevensverwerking op het Internet te garanderen. Ze voorkomenechter ook het werk van intrusiedetectiesystemen gebaseerd op deep packetinspection. Wij gebruiken symmetrische cryptografische primitieven om een

v

vi BEKNOPTE SAMENVATTING

privacy-vriendelijk en marktconform intrusiedectiesysteem te ontwikkelen voorversleutelde datatrafiek. Onze experimenten tonen aan dat ons protocol in debuurt komt van de noden van praktische toepassingen.

List of Abbreviations

AE Authenticated Encryption. 3

AES Advanced Encryption Standard. 17

AI Artificial Intelligence. 3

ANF Algebraic Normal Form. 10

CAESAR Competition for Authenticated Encryption: Security, Applicabilityand Robustness. 24

CBC Cipher Block Chaining. 19

CCA Chosen-Ciphertext Attack. 32

COA Ciphertext-Only Attack. 32

CP Constraint Programming. 45

CPA Chosen-Plaintext Attack. 32

DES Data Encryption Standard. 18

DPI Deep Package Inspection. 7

ECB Electronic Code Book. 18

FHE Fully Homomorphic Encryption. 4

FOAM Figure Of Adversarial Merit. 48

FSM Finite State Machine. 21

FSR Feedback Shift Register. 11

vii

viii LIST OF ABBREVIATIONS

GCM Galois Counter Mode. 24

HTTPS Hypertext Transfer Protocol Secure. 3

IDS Intrusion Detection System. 25

IoT Internet of Things. 3

IV Initial Value. 19, 32

KPA Known-Plaintext Attack. 32

LFSR Linear Feedback Shift Register. 12

MAC Message Authentication Code. 16

MC Multiplicative Complexity. 50

MILP Mixed Integer Linear Programming. 44

ML Machine Learning. 3

MPC Multi-Party Computation. 4

nAEAD Nonce-based Authenticated Encryption with Associated Data. 23

NFSR Nonlinear Feedback Shift Register. 12

OFB Output Feedback. 19

PRF Pseudorandom Function. 34

PRP Pseudorandom Permutation. 16

RFID Radio-frequency identification. 4

SAT Boolean Satisfiability. 45

SLP Shortest Linear Program. 15

SNARK Succinct Non-interactive Arguments of Knowledge. 50

SSL Secure Sockets Layer. 3

TLS Transport Layer Security. 3

XL Extended Linearization. 39

ZK Zero-Knowledge proof. 4

List of Symbols

+ Addition over any finite field or integer addition

F2 Finite field with two elements

Fn2 The n-dimensional vector space over F2

Fq Finite field with q elements for a prime power q

M`(R) The set of `× ` matrices with entries from a ring R

⊕ Bitwise eXclusive OR (XOR)

wt(v) Hamming weight of binary vector v

wtb(v) Bundle weight of binary vector v

ix

Contents

Abstract iii

Beknopte Samenvatting v

List of Abbreviations viii

List of Symbols ix

Contents xi

List of Figures xv

List of Tables xvii

I Introduction 1

1 Introduction 2

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.1 Challenges and Approaches . . . . . . . . . . . . . . . . 6

1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

xi

xii CONTENTS

2 Preliminaries 9

2.1 Mathematical Background . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Building Blocks . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.2 Optimizing Implementation of Linear Layers . . . . . . 14

2.2 Block Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.1 Basic Constructions . . . . . . . . . . . . . . . . . . . . 17

2.2.2 Modes of Operations . . . . . . . . . . . . . . . . . . . . 18

2.3 Stream Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.3.1 FSR-based Constructions . . . . . . . . . . . . . . . . . 20

2.3.2 Table-based Constructions . . . . . . . . . . . . . . . . . 21

2.3.3 Block Ciphers and Permutations in Stream Cipher Modes 22

2.4 Authenticated Encryption . . . . . . . . . . . . . . . . . . . . . 23

2.5 Intrusion Detection System over Encrypted Traffic . . . . . . . 25

2.5.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . 25

2.5.2 Threats and Goals . . . . . . . . . . . . . . . . . . . . . 26

3 Security Analysis 29

3.1 Basic Concepts of Security Analysis . . . . . . . . . . . . . . . 29

3.1.1 Approaches for Security Analysis . . . . . . . . . . . . . 30

3.1.2 Attack Goals and Models . . . . . . . . . . . . . . . . . 31

3.2 Provable Security . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2.1 Proof by Reduction . . . . . . . . . . . . . . . . . . . . 33

3.2.2 Security Games . . . . . . . . . . . . . . . . . . . . . . . 33

3.2.3 Cryptographic Primitives . . . . . . . . . . . . . . . . . 34

3.3 Differential and Linear Cryptanalysis . . . . . . . . . . . . . . . 36

3.3.1 Differential Cryptanalysis . . . . . . . . . . . . . . . . . 36

3.3.2 Linear Cryptanalysis . . . . . . . . . . . . . . . . . . . . 37

CONTENTS xiii

3.4 Algebraic Cryptanalysis . . . . . . . . . . . . . . . . . . . . . . 38

3.4.1 Algebraic Attacks . . . . . . . . . . . . . . . . . . . . . 39

3.4.2 Higher Order Differential Attacks . . . . . . . . . . . . . 40

3.4.3 Interpolation Attacks . . . . . . . . . . . . . . . . . . . 41

3.4.4 Cube Attacks . . . . . . . . . . . . . . . . . . . . . . . . 42

3.5 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4 Contributions 47

4.1 Design of Lightweight Linear Layers . . . . . . . . . . . . . . . 47

4.1.1 Near-MDS Matrices . . . . . . . . . . . . . . . . . . . . 48

4.1.2 MDS Matrices from Lightweight Circuits . . . . . . . . 48

4.2 Cryptanalysis of Ciphers with Low Algebraic Degree . . . . . . 49

4.2.1 Division Property based Cube Attacks . . . . . . . . . . 50

4.2.2 Correlation of Quadratic Boolean Functions with Appli-cation to Morus . . . . . . . . . . . . . . . . . . . . . . 51

4.2.3 Improved Interpolation Attacks . . . . . . . . . . . . . . 52

4.3 Intrusion Detection over Encrypted Traffic . . . . . . . . . . . . 52

5 Conclusions and Open Problems 55

5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Bibliography 59

II Designs 76

6 Design of Lightweight Linear Diffusion Layers from Near-MDSMatrices 77

xiv CONTENTS

7 Constructing Low-latency Involutory MDSMatrices with LightweightCircuits 109

III Cryptanalysis 148

8 Improved Division Property Based Cube Attacks Exploiting Alge-braic Properties of Superpoly 149

9 Correlation of Quadratic Boolean Functions: Cryptanalysis of AllVersions of Full MORUS 180

10 Improved Interpolation Attacks on Cryptographic Primitives of LowAlgebraic Degree 212

IV Protocol 236

11 Towards Truly Practical Intrusion Detection System over En-crypted Traffic 237

Curriculum Vitae 265

List of Publications 267

List of Figures

1.1 Some building blocks and symmetric primitives in TLS 1.3 . . 6

2.1 An n-stage FSR . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 One round of an SPN cipher . . . . . . . . . . . . . . . . . . . . 17

2.3 One round of a Feistel network . . . . . . . . . . . . . . . . . . 18

2.4 A stream cipher . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.5 OFB mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.6 Counter mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.7 Sponge in stream cipher mode . . . . . . . . . . . . . . . . . . . 23

2.8 IDS over encrypted traffic . . . . . . . . . . . . . . . . . . . . . 26

xv

List of Tables

2.1 Lookup table of the PRINTcipher S-box . . . . . . . . . . . . . . 11

2.2 Table of CAESAR competition finalists . . . . . . . . . . . . . 24

xvii

Part I

Introduction

1

Chapter 1

Introduction

Thus we may have knowledge of the past but cannot control it; wemay control the future but have no knowledge of it.

- Claude Shannon

1.1 Motivation

Cryptography was born along with writing. The main purpose of classicalcryptography is to ensure secure communication in the presence of adversaries.Prior to the early 20th century, the major consumers of cryptography weremilitary organizations and governments. For a very long time, cryptographywas a secret art and the security of a cryptosystem was based on obscurity. Thefirst rigorous treatment of the theory of cryptography was published in 1949 byClaude Shannon from the viewpoint of information theory [129].

Our modern society has been largely shaped by the digital evolution whichbegan from the late 1950s. As a side effect, cryptography has became universaland used more often by non-military organizations and individuals. Every timewe make a mobile phone call, log in a system or database by typing a passwordor make an online payment we are using cryptography. Generally speaking,modern cryptography is the study of mathematical techniques for securing theinformation transmission, storage and processing, and distributed computationsagainst adversarial attacks.

2

MOTIVATION 3

The research on cryptography is essentially driven by practice. Emergingtechnologies always bring new security and privacy concerns. Recent examplesinclude the development of the Internet of Things (IoT), cloud computing,big data and data analysis based on Artificial Intelligence (AI) and MachineLearning (ML). The massive deployment of these technologies has changed allaspects of our society and particularly our daily lives. However, they also openPandora’s box of security and privacy issues such as mass surveillance and databreaches.

Transport Layer Security

The need of secure private communications has became increasingly urgentin the age of e-commerce and mass surveillance. Transport Layer Security(TLS) plays a central role in communication security. In a nutshell, TLSis a cryptographic protocol that allows two parties to communicate over anencrypted and authenticated channel. TLS is one of the most commonly usedsecure-channel establishment protocols today. For instance, TLS has beenemployed in Hypertext Transfer Protocol Secure (HTTPS) for protecting webbrowsing against multiple types of network adversaries [123, 122]. However, thelast years have witnessed various attacks on TLS and its predecessor SecureSockets Layer (SSL) [130] due to the weakness in cryptographic design andimplementation of symmetric ciphers in the protocol [147, 58, 8, 7, 146, 23].

The attacks on TLS show that the fundamental task of building a securechannel is still a big challenge. From the perspective of cipher designers, animportant lesson learned is that it necessities to attain both confidentialityand integrity simultaneously, i.e., to exploit Authenticated Encryption (AE)algorithms. Actually, the cryptographic community has put immense effortsinto investigating and standardizing AE algorithms. For instance, the CAESARcompetition [1] has substantially enriched our knowledge in this direction.

Internet of Things Security

We live in a world in which everything is getting smarter. From smartphone tosmart city, our lives have been changing by the extension of Internet connectivityinto physical devices and everyday objects. One of the driving force is the IoT,or more generally ubiquitous computing. As the IoT grows, so do the potentialvulnerabilities and risks. Recent research shows that compromising a singlesmart light bulb may cause the shutdown of an entire power grid [126]. Thehuge attack surface of the IoT makes it urgent to address the potential securityissues.

4 INTRODUCTION

A main characteristic of IoT devices is that they are small and cheap. In otherwords, these devices lack resources for heavy computational lifting or storinglarge amounts of data. For many resource-constrained IoT devices such asRadio-Frequency IDentification (RFID) tags, it is not feasible to employ thestandard but heavy ciphers such as AES [47]. How to design and implementcryptographic primitives suitable for the IoT is a great challenge in cryptography.

Computation on Encrypted Data

Privacy violation caused by government activities has been a long-standing andfrequently discussed issue. More seriously, new privacy concerns arise fromnovel information technologies with pure economical motivations. Nowadays,personal identifiable information are collected, analyzed and exploited by bothintelligence agencies and giant multinational IT companies. For instance, cloud,or outsourced computing, has became a popular service for both corporationsand individuals. Leakage of sensitive data is an obvious privacy concern incloud computing. Unfortunately, large scale data breaches happen routinely.An incomplete list of data breaches can be found in [158]. The last decade hasexperienced the rise of AI, ML, and big data, which make the situation evenworse.

To address the privacy issues of emerging technologies, cryptographic techniquessuch as encryption can play a vital role. However, protecting privacy withclassical encryption algorithms usually destroys the usability of the dataand disables the functionality of desired services. This leads to research oncomputation on encrypted data. Advanced cryptographic protocols such asMulti-Party Computation (MPC), Fully Homomorphic Encryption (FHE), Zero-Knowledge proofs (ZK) and searchable encryption have been proposed to provideprivacy-preserving services and applications. However, the huge overhead of theunderlying primitives makes most of the solutions impractical. A key problemin this direction is to design efficient primitives and protocols allowing practicalimplementation of specific functionality on encrypted data.

From the techno-optimist perspective, new technologies can improve the livesof people. However, the adoption of new technologies without the protection ofsecurity and privacy can be catastrophic. To secure new technologies and protectour privacy, cryptography is still one of the most important tools available.Therefore, cryptography will continue to be the workhorse of a secure andprivacy-friendly future.

CRYPTOGRAPHY 5

1.2 Cryptography

The scope of cryptography is in continuous process of expansion and modification.Some central goals of cryptography are listed as follows.

Confidentiality. The information should be kept secret from all but thosewho are authorized to see it.

Message Authentication. The message must be protected against anymalicious manipulation and the receiving party can verify the sourceof the message.

Entity Authentication. One party is assured of the identity of a second partyinvolved in a protocol, and that the second has actually participated.

Non-repudiation. It prevents an entity from denying previous commitmentsor actions.

To achieve the above goals, high-level cryptographic protocols are designedand implemented between different parties. The protocols have cryptographicprimitives as basic components. Further, the cryptographic primitives areoften constructed from certain well-studied mathematical building blocks. Forinstance, as shown in Fig. 1.1, the TLS 1.3 protocol contains AES as a basicprimitive. AES is constructed from mathematical building blocks such asS-boxes and MDS matrices.

Modern cryptography can be split into two branches: symmetric cryptographyand asymmetric (or public key) cryptography. In symmetric cryptography, thesender and receiver share a secret key, while in public key cryptography, eachparty has a key pair consisting of a private and a public key. While public keycryptographic primitives are mainly built upon mathematical structures fromnumber theory, symmetric cryptography mostly depends on discrete structuressuch as Boolean functions.

This dissertation focuses on symmetric cryptography. Symmetric cryptographicprimitives including block ciphers, stream ciphers, message authenticationcodes and hash functions, play major roles in securing communication andcomputation. We deal with several aspects of design, analysis and applicationsof symmetric cryptographic primitives.

In symmetric cryptography, the security of some basic primitives cannot beproved since we cannot reduce the security to computationally hard problems.However, for protocols building upon symmetric ciphers, the security can beevaluated under the assumption that the underlying primitives are (semantically)

6 INTRODUCTION

S-boxes, MDS matrices, Boolean functions, · · ·Building Blocks

AES-GCM, SHA256-HMAC, Chacha20, · · ·Primitives

TLS 1.3Protocol

Figure 1.1: Some building blocks and symmetric primitives in TLS 1.3

secure. Hence, it is desirable to have a security proof for protocols whilethe security of the primitives is measured by the resistance against knowncryptanalytic methods.

1.2.1 Challenges and Approaches

Now we briefly introduce the challenges and approaches of this dissertation. Fora more detailed discussion of our contribution, please refer to Chapter 4.

Lightweight linear layers. The rise of the IoT creates a need for newsymmetric primitives tailored for meeting extremely constrained requirementson performance and efficiency. We focus on the design and implementationof new lightweight building blocks for symmetric ciphers. Particularly, wepropose new designs of lightweight diffusion matrices for block ciphers and hashfunctions. Further, we also develop new tools for optimizing the implementationof linear layers with respect to both circuit area and depth.

New methods for symmetric cryptanalysis. The security analysis of thenumerous existing and new symmetric ciphers is a major challenge. Thisdissertation mainly focuses on two classes of emerging primitives: lightweightAE algorithms and ciphers dedicated to protocols such as FHE, MPC and ZK.In order to achieve efficiency, many of the target ciphers exhibit the propertythat the building blocks have low multiplicative complexity. To leverage thisstructural property, we propose new cryptanalytic methods which are powerfulfor analyzing ciphers of low algebraic degree. Some automated cryptanalytictools are also developed.

Intrusion detection over encrypted traffic. The deployment of HTTPSprotocol prevents the work of fundamental security applications e.g., intrusion

OUTLINE 7

detection based on Deep Packet Inspection (DPI): after encryption the malicioustraffic is indistinguishable from valid traffic. In the Dell security 2016 report [49],the number of affected users by under-the-radar attacks is estimated to 900million. In order to achieve intrusion detection over encrypted data, solutionsbased on MPC and public key searchable encryption have been presented.However, neither of them is practical for real world traffic. We employ symmetriccryptographic primitives to design an efficient and market-compliant intrusiondetection system over encrypted traffic.

1.3 Outline

This thesis has four parts. Part I introduces the necessary background tounderstand the main contributions. Chapter 2 introduces the mathematicalbackground and concepts used in the sequel. Chapter 3 covers basics onreduction proof in protocol design and common cryptanalytic methods insymmetric cryptography. Chapter 4 summarizes the approaches and mainresults. Chapter 5 gives an overview of the contributions of this thesis andpoints out some open problems for future research.

The rest of the thesis is devoted to the technical details of our contributionsin the form of peer-reviewed papers and unpublished manuscripts. Part IIcovers the construction and implementation of lightweight diffusion matrices.Chapter 6 focuses on construction of near-MDS matrices while Chapter 7 coversthe globally optimized MDS matrices. Next, Part III presents new cryptanalyticmethods and their applications. Specifically, Chapter 8 shows the improveddivision property based cube attacks. Chapter 9 investigates the problem ofcomputing the correlation of quadratic Boolean functions and its applicationto Morus. Chapter 10 describes the improved interpolation attack. Finally,Chapter 11 in Part IV presents a new IDS over encrypted traffic.

Chapter 2

Preliminaries

Omnibus ex nihilo ducendis sufficit unum.- Gottfried Wilhelm Leibniz

This chapter presents an introduction to the main topics of the dissertation.Specifically, we show the definitions and main constructions of symmetric cryp-tographic primitives including block ciphers, stream ciphers and authenticatedencryption schemes. Moreover, we also introduce the architecture and designgoals of intrusion detection systems over encrypted data.

2.1 Mathematical Background

This section covers the definitions and basic results on building blocks in thedesign of symmetric primitives. Moreover, we dig deeper into the optimizationof the implementation of linear layers.

2.1.1 Building Blocks

We introduce some basic building blocks in the design of symmetric primitives.

Boolean functions

Let F2 be the finite field with two elements, and Fn2 the n-dimensional vectorspace over F2, where n is a positive integer. A Boolean function of n variables

9

10 PRELIMINARIES

is a function from Fn2 onto F2. A Boolean function can be defined by its truthtable, which gives the images of all elements in Fn2 . In cryptography and codingtheory, it is more convenient to represent a Boolean function with a multivariatepolynomial.

Definition 1. (The ANF of a Boolean function [38]) Let f : Fn2 → F2 be aBoolean function, where n is a positive integer. Then, there exists a uniquemultivariate polynomial in F2[x0, · · · , xn−1]/(x2

0 ⊕ x0, · · · , x2n−1 ⊕ xn−1) such

thatf(x) =

⊕u∈Fn

2

auxu, (2.1)

where x = (x0 · · · , xn−1),u = (u0, · · · , un−1), au ∈ F2, and xu =∏n−1i=0 x

uii .

This multivariate polynomial is called the algebraic normal form (ANF) of f .

To construct the ANF with given truth table of f , or to generate the truth tablefrom its ANF, one can use the Möbius transform. Specifically, the coefficientsof the ANF and the truth table satisfy the following equations:

au =⊕xu

f(x), and f(x) =⊕ux

au, (2.2)

where x y if and only if xi ≤ yi for all 0 ≤ i ≤ n − 1. The transformationbetween the ANF and truth table can be implemented using an efficient divide-and-conquer butterfly algorithm [38].

The Hamming weight wt(x) of a binary vector x ∈ Fn2 is the number of itsnonzero coordinates. The algebraic degree of the Boolean function f is definedas

deg(f) = maxu∈Fn

2 :au 6=0wt(u), (2.3)

where f is given by Eq. (2.1).

An n-variable Boolean function f is called an affine function if deg(f) ≤ 1, i.e.,there exist c, a0, · · · , an−1 ∈ F2 such that

f(x0, · · · , xn−1) = c⊕n−1⊕i=0

aixi ,

where at least one ai is nonzero. Particularly, if c = 0, then f is called a linearfunction. A function f is called nonlinear if deg(f) > 1. From the viewpointof algebraic degree, the simplest nonlinear Boolean functions are the quadraticBoolean functions, which have degree exactly two.

An important notion in linear cryptanalysis is the correlation of Booleanfunctions.

MATHEMATICAL BACKGROUND 11

Definition 2. The correlation of an n-variable Boolean function f is cor(f) =1

2n

∑x∈Fn

2(−1)f(x).

The correlation between two Boolean functions f and g can be defined by thecorrelation of the function f ⊕ g.

S-boxes

A substitution-box (S-box) S : Fn2 → Fm2 is a vectorial Boolean function in ninput variables and with m output bits. That is, there are m n-variable Booleanfunctions f0, f1, · · · , fm−1 such that S can be represented by

S(x) = (f0(x), f1(x), · · · , fm−1(x)),

where each function fi is called a coordinate function of S [39]. An importantclass of S-boxes are permutations on Fn2 , also known as n-bit S-boxes.

In specifications of symmetric ciphers, the S-box is usually given by a lookuptable. For instance the lookup table of the 3-bit PRINTcipher S-box is shownin Table 2.1. It can be written as S(a, b, c) = (a⊕ bc, a⊕ b⊕ ac, a⊕ b⊕ c⊕ ab).

Table 2.1: Lookup table of the PRINTcipher S-boxx 0 1 2 3 4 5 6 7

S(x) 0 1 3 6 7 4 5 2

S-boxes are widely adopted to provide nonlinearity in the design of symmetrickey primitives. Usually, S-boxes are the only nonlinear components in a design,e.g., AES. Therefore, the security of the cipher largely depends on the propertiesof the S-boxes. Actually, the research on S-boxes has been an active subfieldin symmetric cryptography, which covers the design criteria, mathematicalanalysis, implementation and reverse-engineering. Recent surveys on S-boxresearch can be found in [119, 13].

Feedback shift registers

Let f be an n-variable Boolean function. For an initial state (a0, · · · , an−1) ∈ Fn2 ,an n-stage feedback shift register (FSR) with feedback function f generates abinary output sequence a = (a0, a1, a2, · · · ) satisfying

ai+n = f(ai, ai+1, · · · , ai+n−1) for i = 0, 1, 2, · · · .

12 PRELIMINARIES

n− 1 n− 2 n− 3 0· · ·- - - - -

f(x0, x1, · · · , xn−1)? ? ? ?

-

Figure 2.1: An n-stage FSR

An n-stage FSR is illustrated in Fig. 2.1.

The FSR is called a linear feedback shift register (LFSR) if f is a linear function,otherwise it is called a nonlinear feedback shift register (NFSR). LFSRs havebeen studied for over 50 years and there is a characterization of those feedbackfunctions that yield maximum-period output sequences [128, 74]. Moreover,the statistical properties of the LFSR sequences have been investigated byGolomb [74]. In contrast, not much is known about the periods and randomnessproperties of NFSR sequences.

FSRs are popular building blocks for stream ciphers. They can be implementedefficiently in both hardware and software. LFSRs can also be defined over anyfinite field Fq [102]. More details on FSR-based stream ciphers can be seen inSect. 2.3.1.

Diffusion matrices

The linear layers are widely used to realize the diffusion layers of block ciphersbased on substitution-permutation networks (see Sect. 2.2). This dissertationwill mainly focus on linear layers that operate on vectors in (Fn2 )`, i.e., vectorsof length ` over Fn2 . For any linear mapping λ over (Fn2 )`, there exists an n`×n`binary matrix M such that λ(v) = M · v, where v ∈ (Fn2 )` is regarded as avector in Fn`2 . Hereafter, we represent a linear layer by a diffusion matrix.

Now we introduce some notation for matrices. Let R be an arbitrary ring, andM`(R) be the set of all `× ` matrices whose entries are drawn from R. ThenM`(Mn(F2)) is the set of all `× ` matrices whose elements are taken from theset Mn(F2). Every matrix M ∈M`(Mn(F2)) can be represented as an n`× n`binary matrix, which we call the binary representation of M .

Given a vector v ∈ (Fn2 )`, its bundle weight wtb(v) is equal to the number ofnon-zero n-bit words of v. The branch numbers of a diffusion matrix can bedefined in terms of the bundle weight of vectors.

MATHEMATICAL BACKGROUND 13

Definition 3. ([45, 47]) Let M ∈M`(Mn(F2)). Then the differential branchnumber of the matrix M over Fn2 is defined as

Bd(M) = minv6=0wtb(v) + wtb(Mv) ,

and the linear branch number of M over Fn2 is defined as

Bl(M) = minv6=0

wtb(v) + wtb(MTv)

,

where MT is the transpose of M .

The branch numbers are important in the context of linear and differentialcryptanalysis (see Sect. 3.3). In a nutshell, the differential (resp. linear) branchnumber corresponds to the minimum number of active S-boxes in two consecutiverounds of a substitution-permutation network cipher for differential (resp. linear)cryptanalysis. From the viewpoint of a designer, it is desirable to maximizeboth the differential and linear branch numbers of the diffusion matrix. Thentwo classes of matrices are of special interest. The first class of matrices attainsboth optimal differential and linear branch numbers.

Definition 4. ([47]) Let M ∈ M`(Mn(F2)). Then M is called an MDS(Maximum Distance Separable) matrix if Bd(M) = Bl(M) = `+ 1.

It is well known that Bd(M) = ` + 1 if and only if Bl(M) = ` + 1 for anyM ∈M`(Mn(F2)) [47]. Therefore, the suboptimal diffusion matrix is definedas follows.

Definition 5. Let M ∈M`(Mn(F2)). Then M is called a near-MDS matrixif Bd(M) = Bl(M) = `.

MDS and near-MDS matrices exist if and only if the corresponding MDScodes [107] and near-MDS codes [56] exist respectively. However, incryptography, we often rely on the matrix characterizations of both MDSand near-MDS matrices. Now we recall some characterizations as follows.

Lemma 1 ([27, 99, 149]). Let M ∈ M`(Mn(F2)). We consider the set S ofsubmatrices of M of the form

M(I, J) = (Mij)i∈I,j∈J ,

where I, J are two subsets of 0, 1, · · · , `− 1, Mij ∈Mn(F2) is the (i, j)-entryin M .

(i) Then M is an MDS matrix if and only if all square submatrices M(I, J) inS with |I| = |J | = t are invertible for 1 ≤ t ≤ `.

14 PRELIMINARIES

(ii) Assume that M is not MDS. Then M is near-MDS if and only if for any1 ≤ t ≤ ` − 1 and any submstrix M(I, J) in S with |I| = t, |J | = t + 1 and|I| = t+ 1, |J | = t, there exists at least one invertible submatrix M(I ′, J ′) in Ssatisfying |I ′| = |J ′| = t, I ′ ⊆ I and J ′ ⊆ J .

2.1.2 Optimizing Implementation of Linear Layers

We present methods for minimizing the hardware cost of a linear operationin terms of circuit area and the circuit depth of the critical path in animplementation. To obtain optimized implementations, several criteria and theshortest linear program problem are recalled.

Notice that any diffusion matrix M ∈Mn(F2) can be implemented with a finitenumber of XOR gates. The primary goal is to find the implementation with theminimum number of XOR gates. As we will see, this problem is computationallyhard: only metrics determining the upper bounds are available. Several metricsused in this dissertation are listed in the following.

Direct XOR count. Given a matrix M ∈ Mn(F2), the direct XOR countDXC(M) of M is the number of 1’s in the matrix M minus n. This correspondsto a naive implementation of M , where each row of M is implemented elementby element. When it comes to the multiplication of a finite field elementα, we often consider the direct XOR count of the companion matrix of theelement [87, 134].

Sequential XOR count. To improve the direct XOR count, a refinedmetric named sequential XOR count is proposed in [86]. Informally, for allimplementations of an invertible matrix, the sequential XOR count is the minimalnumber of bitwise XORs in a sequential program limited to in-place operationswithout extra registers. It turns out the new metric can be substantially lowerthan the direct XOR count despite the restriction to in-place operations. Indeed,the sequential XOR count is used to obtain the optimized implementation ofmultiplication with a finite field element [14]. However, it is infeasible todetermine the sequential XOR count of binary matrix of large dimensions, e.g.,32 [86].

From local optimization to global optimization

It is readily seen that the direct XOR count only gives a trivial upper boundon the minimal cost of implementing a given linear layer. While the sequential

MATHEMATICAL BACKGROUND 15

XOR can be only used to obtain locally optimized solutions. In contrast, thelightest implementation of given diffusion matrices (e.g., AES MixColumns) isoften obtained by the global optimization approach, i.e., look into the diffusionmatrix as a whole rather than element by element. The global optimizationmethods rely on ad hoc tools [92, 60] since the core is to solve instances of theshortest linear program problems.

The Shortest Linear Program problem. Let M be an m × n matrix ofconstants over F2 and let x be a vector of n variables over F2. The ShortestLinear Program (SLP) problem over F2 is to find the program with the smallestnumber of lines that computes M · x, where every program line is of a certainform.

Let V be a set of variables over F2, that initially contains the input variablesy0, · · · , yn−1, where yi = xi. Let yi, yj ∈ V . Then every program line is ofthe form y′ ← yi ⊕ yj . After executing this program line, the new variable y′ isadded to the set, and we have V ← V ∪ y′. The new variable y′ can thereforebe used in the next program line. The program is said to compute M · x ifthere exist yij ∈ V, 0 ≤ j ≤ n− 1 such that M · x = (yi0 , yi1 , · · · , yin−1).

Global optimization. Recall that we aim to obtain the implementation withthe minimum number of XOR gates; this is equivalent to solving an instanceof an SLP problem over F2, which turns out to be NP-hard [32, 33]. Given amatrix M ∈Mn`(F2), we can solve the SLP problem corresponding to M withstate-of-the-art tools based on SLP heuristics [33], and this metric is denoted asSLP(M). Optimizing implementations of MixColumns in AES and the lineartransformation in certain implementations of SubBytes, have already beeninvestigated in [32, 33].

Note that SLP(M) is so far the most accurate estimation that is practical for agiven 32× 32 binary matrix M . We refer the reader to [60] for a discussion ofthe comparisons and limitations of different metrics.

Circuit depth. Besides the circuit area (measured by the number of XOR gatesrequired for an implementation), another important metric of an implementationis the latency, which imposes a constraint on the clock frequency at which thecircuit can operate. The latency of an implementation can be characterized byits circuit depth.

Definition 6. The critical path of an implementation of a linear layer isdefined as the path between an input and output involving the maximum number

16 PRELIMINARIES

of XOR gates, and the depth of the implementation is the number of XOR gatesinvolved in the critical path.

2.2 Block Ciphers

Block ciphers are basic primitives in cryptography from which many othersystems are built. Block ciphers can be used as building blocks in the designof message authentication codes (MACs), hash functions, stream ciphers andauthenticated encryption schemes. The theory of block ciphers has grownquickly in the last decades. This section briefly introduces some basic conceptsand constructions of modern block ciphers.

A block cipher is a pair E = (Enc,Dec) of functions with key space K and theset X as both message space and ciphertext space. For every fixed key K ∈ K,we can define a keyed permutation Enc(K, ·) : K×X → X , which is denoted byEncK(·). The decryption function Dec(K, ·) : K × X → X , denoted by DecK ,satisfies that DecK(EncK(M)) = M for any K ∈ K and M ∈ X .

The ideal behavior of a block cipher is captured by the notion of a pseudorandompermutation (PRP, see Sect. 3.2.3). That is, an adversary not knowing Kshould not be able to distinguish the input/output behavior of EncK(·) froma permutation that was chosen uniformly at random from the set of all |X |!permutations on X . In other words, block ciphers are designed to be efficientrealizations of PRPs.

Confusion and diffusion, introduced by Shannon [129], are two widely usedfundamental principles in the design of symmetric key primitives. Theseproperties are proposed to preclude statistical attacks and other methods ofcryptanalysis. Most modern block ciphers and hash functions have well-designedconfusion and diffusion layers.

Virtually all block ciphers used in practice use the same basic framework calledthe iterated cipher paradigm. An iterated cipher is constructed as iteratedmappings based on round functions. Modern block ciphers tend to iteratesimple round functions sufficiently many times to achieve security. A keyscheduling algorithm expands the master key K into round keys which are addedin each round. If the iteration can be written as a sequence of unkeyed roundsand bitwise addition of the round keys, the cipher is called a key-alternatingcipher [47].

BLOCK CIPHERS 17

S

ki

xi

P

xi+1

Figure 2.2: One round of an SPN cipher

2.2.1 Basic Constructions

SPN. Substitution-permutation networks (SPNs), proposed by Shannon [129],have been popular in the design of block ciphers and hash functions. An SPNis a special form of the iterative application of nonlinear and linear components.The best understood structure of SPN round function consists of a key additionand then a substitution layer (S-box layer) followed by a linear permutationlayer (P-layer). The S-box layer is usually implemented as a parallel applicationof (not necessarily identical) S-boxes of small size. The permutation layerapplies a linear diffusion matrix to mix the results of the S-box layer efficiently.

The output of each round is fed as input to the next round. After the last roundthere is a final key addition step, and the result is the output of the cipher.Fig. 2.2 shows a high level structure of one round of an SPN cipher.

The most prominent SPN block cipher is the Advanced Encryption Standard(AES) [47], which is a Federal Information Processing Standard for the US.

Feistel Networks. Feistel networks, invented by Feistel [67] at IBM, offeranother approach for constructing block ciphers. An advantage of Feistelnetworks over SPN is that the underlying functions need not be invertible.A Feistel network thus gives a way to construct an invertible function fromnon-invertible components.

A Feistel network operates as an iterated cipher. In each round, the input isdivided into two branches - left and right. In the i-th round, the right branchxRi−1 and the round key ki are input to a keyed function F . The result iscombined with the left branch xLi−1 to obtain the right branch xRi of the output.

18 PRELIMINARIES

F

kixLi−1 xR

i−1

xLi xR

i

Figure 2.3: One round of a Feistel network

The left branch xLi of the output is a mere copy of xRi−1. This process isillustrated in Fig. 2.3. Notice that the swap operation is not applied in the lastround. Round functions are typically constructed from components like S-boxesand linear permutations.

The most notable block cipher based on a Feistel network is the Data EncryptionStandard (DES) [117], which was developed in the 1970s and adopted in 1977as a Federal Information Processing Standard for the US. DES has sufferedfrom brute-force attacks due to the small key size and has been withdrawn in2005.

2.2.2 Modes of Operations

Block ciphers operate on messages of fixed length, e.g., 128-bit strings. Inpractice, it is more likely that we need to deal with long and variable lengthmessages. To this end, we introduce the modes of operations for block ciphers.In a mode of operation, the message must be split into blocks that fit the blocklength of the cipher. Then the block cipher is used to transform plaintext blocksinto ciphertext blocks and vice versa. We assume that the length of the messageis an integer multiple of the block length. If this is not the case, the last blockcan be padded by appending bits so that it has the required length.

A naive approach is to apply the block cipher to all the message blocksindependently. Then the resulting ciphertext can be decrypted by applying theinverse of the block cipher to all the ciphertext blocks independently. This iscalled the Electronic Code Book (ECB) mode.

The ECB mode is highly insecure due to the fact that if the message has twoblocks with the same value, so will the ciphertext. For this reason, the ECB

STREAM CIPHERS 19

mode should never be used. The Cipher Block Chaining (CBC) mode hasbeen proposed to realize probabilistic encryption. In this mode, a randominitialization vector is introduced and the ciphertext blocks are generated byapplying the block cipher to the XOR of the current message block and theprevious ciphertext block. Decryption of CBC is done by applying the inverseblock cipher followed by an XOR with the previous ciphertext block.

Another two commonly used modes are Output Feedback(OFB) mode andCounter mode. In these two modes, the block cipher is used as a synchronouskeystream generator. Hence, more details will be given in Sect. 2.3.

2.3 Stream Ciphers

Modern stream ciphers enjoy efficient implementations and more importantly,high throughput and extremely high speed compared with block ciphers. Theincredible performance makes dedicated stream ciphers the favourite encryptionalgorithms in our communication infrastructures, such as the 2G and 3G mobilecommunications and TLS. This section surveys the theory of stream cipherdesign.

The Vernam cipher is defined on the binary field F2. A bit string m0m1 · · ·m`−1is operated on by a binary key string k0k1 · · · k`−1 of the same length to producea ciphertext string c0c1 · · · c`−1 where

ci = mi ⊕ ki, 0 ≤ i ≤ `− 1 .

If the key string is randomly chosen and never used again, the Vernam cipheris called a one-time pad [112]. The one-time pad is unconditionally secure butis impractical for most applications due to its large key length [129].

A stream cipher simulates the one-time pad but has a relatively short key.Stream ciphers can be divided into synchronous and self-synchronous streamciphers. Only synchronous stream ciphers are considered in this dissertation.A (synchronous) stream cipher with key length κ and initial value (IV) withlength n consists of an internal state S of s bits, a state initialization functionG : Fκ2 × Fn2 → Fs2, a state update function F : Fκ2 × Fs2 → Fs2, and an outputfunction H : Fκ2 ×Fs2 → Fb2. For an input (K, IV ), assume that the initializationfunction generates the initial state S0, i.e., S0 = G(K, IV ). Then, for t ≥ 0, wehave

zt = H(K,St), St+1 = F (K,St) ,

where zt is output as the keystream bit/word at clock t. Finally, the encryptionof the message mt is given by ct = mt ⊕ zt, which resembles the one-time pad.

20 PRELIMINARIES

H

St

G

IV

F

mt

zt

K

H

St

G

IV

F

zt

K

mt

ct

Figure 2.4: A stream cipher

Notice that the decryption can be done by first generating the same keystreamwith the shared secret key and IV and then computing mt = ct ⊕ zt. A streamcipher is illustrated in Fig. 2.4.

In the remainder of this section, we introduce some common constructions ofstream ciphers.

2.3.1 FSR-based Constructions

FSRs are widely exploited in the design of stream ciphers. We briefly introducetwo classes of constructions based on LFSRs and NFSRs, respectively.

LFSR-based constructions. Since 1960s, LFSRs have been popular buildingblocks for stream cipher design due to their efficiency in both hardware andsoftware. However, they cannot be directly used in the design as the resultingsystem would be purely linear. To address the problem, many approaches havebeen proposed to introduce nonlinearity to stream ciphers based on LFSR. Thereare mainly three types of LFSR-based stream ciphers [112]. The filter generatoruses a long LFSR and employs a filtering function to generate keystreambits while the combination generator exploits several short LFSRs and usesa combining function to generate keystream bits. The filtering or combiningfunction are designed to hide the linear weakness in the LFSR(s), and it isrelated to the study of the properties of Boolean functions, such as resilience,nonlinearity and algebraic immunity [38]. The third type is to control theLFSRs by irregular clocks. A notable clock-controlled stream cipher is A5/1 [34]in GSM networks.

STREAM CIPHERS 21

Since the simple filter and combination generators are vulnerable to fast corre-lation attacks [111] and algebraic attacks [43], the stateless filtering/combingfunctions have been replaced by Finite State Machines (FSMs). Notableexamples include ZUC [62] and SNOW 3G [61] in the 3GPP standard.

NFSR-based constructions. Despite the lack of theoretical foundation, NFSRshave been major components in the design of new stream ciphers, especiallythe hardware-oriented ciphers. Two representative ciphers in this categoryare the eSTREAM portfolio ciphers Trivium [48] and the Grain family [81].Trivium has a state consisting of 288 registers. At each round, the content ofthree registers is updated by applying three functions to the previous state bitsrespectively, where each of the functions can be represented by three XORs andone AND operation. The other state bits are updated by simply shifting thewhole state one step. Then one keystream bit can be produced by XORing sixstate bits. Remarkably, Trivium uses extremely sparse update functions whichcan be implemented with only 3 AND gates and 11 XOR gates. While Grainemploys a structure by cascading an LFSR with a NFSR of equal length. Ateach round the feedback bit of the NFSR is XORed with the output of theLFSR. The keystream bit is filtered by applying a sparse nonlinear Booleanfunction to the whole state. Both Trivium and Grain have very efficientimplementations in both hardware and software. Moreover, their structuresalso allow trade-offs between throughput and area.

2.3.2 Table-based Constructions

RC4 [124] is the first publicly known lookup table based stream cipher. Thelookup table operations are used in the state update and keystream generation.Due to its remarkable simplicity and speed in software, RC4 has been widelyused in cryptographic protocols such as SSL/TLS and WEP. However, the lasttwo decades have seen the discoveries of various vulnerabilities in the design.For instance, a weakness of the key schedule algorithm has been identified andleads to key recovery attacks [69].

There is not much theory of design or cryptanalysis of table-based stream cipher.The recent designs include the software-oriented eSTREAM portfolio ciphersHC-128 [160] and its extension HC-256 [159].

22 PRELIMINARIES

EK

ct−1

mt−1

· · · · · ·

EK

c2

m2

EK

c1

m1

IV

EK

c0

m0

Figure 2.5: OFB mode

EK

IV

c0

m0

EK

IV ⊕ 1

c1

m1

EK

IV ⊕ 2

c2

m2

· · · · · · EK

IV ⊕ t

ct

mt

Figure 2.6: Counter mode

2.3.3 Block Ciphers and Permutations in Stream CipherModes

The maturity of the theory of block ciphers inspires people to design streamciphers from block ciphers. Also, the development of cryptographic permutationstimulates new stream cipher designs.

Block ciphers in OFB and counter mode. As illustrated in Figs. 2.5 and 2.6,block ciphers can be used as keystream generators in OFB and counter modes.In OFB mode, the keystream generator is an FSM in which the state has theblock length of the cipher and the state updating function consists of encryptionwith the block cipher for some secret value of the key. In Counter mode, thekeystream is the result of applying ECB encryption to an incrementing counter.

Permutations in stream cipher mode. The last decade has witnessed therise of permutation-based cryptography, i.e., making use of a fixed-length

AUTHENTICATED ENCRYPTION 23

Initialization Keystream generation

pad(K||IV )

c bits

r bits

f

z0

f

z1

f

· · ·

· · ·

z2

Figure 2.7: Sponge in stream cipher mode

cryptographic permutation instead of a block cipher for hashing, MAC and(plain or authenticated) encryption. The most successful example is the SHA-3competition winner Keccak [22], which instantiates the sponge constructionintroduced in [21].

To construct stream ciphers from permutations, the first approach is to insert acryptographic permutation in counter mode. It merely resembles the countermode of block ciphers. Notable designs in this category are the Salsa andChacha families of stream ciphers [18, 17]. The other approach is to use thesponge construction in stream encryption mode, which is shown in Fig. 2.7.

2.4 Authenticated Encryption

Authenticated encryption (AE) is a symmetric cryptographic primitive thatsimultaneously ensures confidentiality and integrity of messages between thesender and the receiver. Most existing AE schemes also allow to authenticatea public string, the associated data, along with the message and are calledschemes for AE with associated data (AEAD) [125]. This section focuses onnonce-based Authenticated Encryption with Associated Data (nAEAD).

Let K,N ,D,M, C and T be non-empty sets. Assume that an nAEAD schemeE = (Enc,Dec) is defined over (K,N ,D,M, C, T ), where Enc and Dec aredeterministic algorithms. The encryption algorithm Enc takes as input a keyK ∈ K, a nonce N ∈ N , associated data D ∈ D and a message M ∈ M, andoutputs a ciphertext-tag pair (C, T ) ∈ C × T . Then the decryption algorithm

24 PRELIMINARIES

Dec outputs either the message M that corresponds to C, or the bot symbol ⊥if the tag is invalid.

The first approach is to combine an encryption algorithm with a MAC [15].There are many different compositions depending on the order of the encryptionand authentication operations. Of the many compositions, the Encrypt-then-MAC (EtM) paradigm, i.e., applying MAC to the ciphertext to obtain the tag,is preferred taking both security and performance into consideration. A securityanalysis due to Bellare and Namprempre [15] demonstrates that this method ofAE is secure provided that the two component schemes are secure.

Galois Counter Mode (GCM) [109] follows the EtM paradigm: it combines ablock cipher (e.g., AES) in counter mode for message encryption with a Wegman-Carter MAC [155] for authentication. However, GCM uses a single key to derivekeys for encryption and authentication while the generic EtM constructionrequires independently chosen keys for encryption and authentication. GCMhas been standardized by NIST and IETF for use in protocols such as IPsec,TLS 1.2 and its successor TLS 1.3.

In 2013, the Competition for Authenticated Encryption: Security, Applicabilityand Robustness (CAESAR) was announced [1]. The CAESAR competitionaimed to address the tension between security and the performance requirementsof various applications. In particular, one of the main goals was to improve theapplicability and robustness of GCM and CCM [156]. In the call for submission,the purpose has been set to “identify a portfolio of authenticated ciphers thatoffer advantages over AES-GCM and are suitable for widespread adoption” [19].Hence, AES-GCM is used as a reference that ought to be surpassed by theCAESAR candidates. Out of 57 submissions to the first round of CAESAR,only seven are chosen as finalists, as shown in Table 2.2.

Table 2.2: Table of CAESAR competition finalistsCandidates Type Primitive Use cases † Refs.

Ascon Sponge SPN 1 [55]Acorn Stream cipher FSR 1 [161]

AEGIS-128 Block cipher mode AES 2 [163]OCB Block cipher mode AES 2 [93]

Deoxys-II Tweakable block cipher AES 3 [85]COLM Block cipher mode AES 3 [9]Morus Stream cipher LRX -‡ [162]

† Use cases:(1) Lightweight applications (resource constrained environments)(2) High-performance applications (3) Defense in depth

‡ Removed from the final portfolio.

INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC 25

Since August 2018, NIST has initiated a process to evaluate and standardizelightweight cryptographic algorithms that are suitable for use in constrainedenvironments where the performance of current NIST cryptographic standards isnot acceptable. The submissions should propose an algorithm, or a collection ofalgorithms, that implements the AEAD functionality. The first round submissionare online; more details can be found in [118].

2.5 Intrusion Detection System over EncryptedTraffic

The section presents some background on cryptographic protocols enabling theintrusion detection over encrypted traffic. We give a brief description of thearchitecture of the system we work on. Moreover, we describe the security andprivacy threats and goals.

2.5.1 Architecture

We work on an Intrusion Detection System (IDS) based on Deep PacketInspection (DPI) which extracts the content of network packets and comparesit against a set of detection signatures. We consider the following actors in thesystem (as in [36] and [131]):

• the Security Editor, denoted SE, who generates and maintains a list ofrules which contain malwares’ signatures;

• the Service Provider, denoted SP, who searches intrusions in the traffic,using the rules provided by the SE;

• a sender S and a receiver R, who sends and receives messages over theInternet respectively.

The SE role is performed by security companies such as McAfee and Symantec.The detection rules are the main assets for the SE and hence cannot be sent toSP in cleartext. The SP, i.e., the middlebox in [131], provides both physicaland cloud-based services such as proxies.

We briefly sketch the main procedures of an IDS over encrypted traffic in thefollowing.

• Setup, generates the keys of the different actors.

26 PRELIMINARIES

S SP R

SE

RuleGenRulesBlinded

Rules

SendTrafficEncrypted

TrafficDetect

Malicious

SafeReceive Traffic

Figure 2.8: IDS over encrypted traffic

• RuleGen, on input the secret key of SE and a setM of rules, outputs aset B of blinded rules that are then sent to SP.

• Send takes as input the secret key of S, potentially the public key of thereceiver, and the traffic T . It outputs the encrypted traffic E which isthen sent to SP.

• Detect, on input the secret key of SP, the encrypted traffic E and the setB of blinded rules from SE, searches matchings between the encryptedtraffic and blinded rules. If a matching is found, then the traffic is markedas malicious and SE outputs an error message ⊥. Otherwise, SE redirectsto R the encrypted traffic E and auxiliary information aux.

• Receive is executed by taking on input the receiver’s secret key, theencrypted traffic E and optionally some additional auxiliary informationaux coming from SP. It outputs the plain traffic T , or an error message⊥.

The actors and main steps are depicted in Fig. 2.8.

2.5.2 Threats and Goals

We now consider the threats and security goals for the target system.

The SP is assumed to be honest-but-curious. That is, it operates the DPIhonestly but attempts to extract information about either the passing encryptedtraffic or the SE’s rules. The new setting that SP is an attacker differs from the“man-in-the-middle” approach in most deployments toady [131].

INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC 27

We also use the honest-but-curious paradigm for the SE as all the rules areconsidered to be authentic malicious patterns. But similarly to the SP, the SEmay try to acquire information about the content of the traffic through directeavesdropping over the network [36].

We however do not consider the case where the SP and the SE collude, since inthis case they can mount a dictionary attack. Specifically, the SE can encryptany guess of the traffic with RuleGen. Then the detection functionality allowsSP to check if there is a match between the encrypted guess and encryptedtraffic. Therefore, the colluding SP and SE can recover the cleartext of theencrypted traffic.

We do not consider a coalition between a sender and a receiver since, as inunencrypted traffic, they can easily agree on any shared secret key and anyencryption algorithm to add a layer of encryption so that the detection becomesinfeasible.

Now we can set three main goals of the system:

Correctness. Any malicious traffic must be detected by the SP. That is, theIDS over encrypted traffic should be as good as that over plain traffic.

Traffic indistinguishability. It is not feasible for unauthorized actors to learnany information about the traffic, other than whether it is malicious orsafe.

Rule indistinguishability. It is not feasible for anyone except SE to learnany information about the rules, other than the matches between the rulesand malicious traffic.

Chapter 3

Security Analysis

Cryptographers seldom sleep well. Their careers are frequently basedon very precise complexity-theoretic assumptions, which could beshattered the next morning.

-Joe Kilian

This chapter covers basics on reduction proofs in protocol design and commoncryptanalytic methods in symmetric cryptography. We first summarize somebasic concepts of security analysis. Prior to introducing specific cryptanalyticmethods, we sketch the main idea of security proofs by reduction. Moreover, weintroduce some common cryptographic primitives by giving formal definitions.Then we show the main statistical and algebraic cryptanalysis techniques againstsymmetric encryption algorithms and present some recent progress in automatedcryptanalytic tools.

3.1 Basic Concepts of Security Analysis

This section introduces some basic concepts of security analysis of cryptographicprimitives and protocols. Then we will classify the attacks in terms of the goalsand power of the adversaries.

29

30 SECURITY ANALYSIS

3.1.1 Approaches for Security Analysis

We present three approaches, i.e., the information theoretic, the complexitytheoretic and the practical approaches for security evaluation. These approachesdiffer in the assumptions about the power of an opponent and in the notion ofsecurity.

From the viewpoint of the designer, the most desirable measure is unconditionalsecurity. This measure follows the information theoretic approach. Thisapproach was developed in the seminal work of Shannon [129]. It can offersecurity even if the opponent has unlimited computational power. Notably,there exist schemes that are secure in this model such as the one-time pad inSect. 2.3. However, Shannon [129] showed that for any unconditional secureencryption scheme the key should be at least as long as the plaintext, whichmay be prohibitively large in practice. To obtain more practical notions, acommon strategy is to restrict the computational resources of the adversary.A cryptosystem is said to be computationally secure if the computing powerrequired to defeat it (using the best attack known) exceeds the given margin.The following two approaches represent two sides of this paradigm.

The complexity theoretic approach reduces the security of a cryptographicprimitive or protocol to that of other well known difficult problems or to that ofother primitives. This approach starts from an abstract model of computationand assumes that the opponent has limited computing power within this model.Cryptosystems of this type are sometimes called to be provable secure, but itshould be pointed out that this approach only provides a reduction proof ofsecurity related to other problems, not an absolute security proof.

The complexity theoretic approach enables us to design high level primitivesand protocols in a scientific fashion. However, until now it cannot give efficientconstructions of atomic primitives, such as one-way functions and pseudo-random functions, which cannot be reduced to other primitives. Actually,modern cryptography still highly relies on the practical approach. This approachtries to produce concrete and practical solutions for the atomic primitives. Thesecurity margin of the proposed primitives is based on the best known attackon the algorithm. As a result of the practice over the past decades, severalcryptanalytic principles have emerged. For new designs, it has been a basicrequirement to provide evidence of security against attacks based on theseprinciples. Actually, it seems likely that for the coming years cryptographersstill need to rely on the security of some concrete primitives such as AES andSHA-3.

A shortcoming of the practical approach is that we can say nothing on unknownattacks. Nowadays, it is common to publish a new design of a cryptographic

BASIC CONCEPTS OF SECURITY ANALYSIS 31

algorithm together with the security analysis of the designers in a conference,journal or just online. Then the community would conduct an open andindependent evaluation. It turns out to be the best way to guarantee that thealgorithm is as strong as claimed.

This dissertation mainly focuses on the practical approach and also exploitsthe complexity theoretic approach in the design of protocols. It is worthnoting that we exclude quantum and implementation attacks, i.e., we onlyconsider classical computing model and assume only black-box access to thecryptographic primitives and protocols.

3.1.2 Attack Goals and Models

The general assumption is that a cryptosystem should be secure even ifeverything about the system, except the secret key, is public. This is usuallyreferred to as Kerckhoffs’ principle.

Shortcut attacks. From the perspective of practical security, an attack is atechnique that achieves a certain goal with a feasible amount of computingresources. However, such a notion would heavily depend on available technology.The following notion does not depend on technology and has become standardin the cryptanalytic community: a shortcut attack is a technique that achievesa certain goal in a certain scenario with less resources than the best generictechnique would require to achieve the same goal in the same scenario. A generictechnique is a method that does not exploit any internal properties of the targetalgorithm. The computational complexities of shortcut attacks are measured interms of the computing resource needed. The data complexity is determined bythe number of plaintext/ciphertext pairs required in the attack. The time andmemory complexities depends on the number of operations performed and thecost of memory elements and memory accesses in the attack. The full cost ofattacks is discussed in [157].

Moreover, the word “attack” is also used when considering reduced-roundvariants of target algorithms. Whether an attack is relevant or not heavilydepends on how much the target primitive is reduced, on the attack complexity,and on the attack scenario which is required.

Attack goals. From the view point of the attacker, the most demanding taskis a key recovery attack, which aims to recover the secret key or informationequivalent to it. This will allow the attacker to decrypt any specific targetciphertext under the same key. A generic key recovery attack is the brute-force

32 SECURITY ANALYSIS

attack. If we are given one plaintext/ciphertext pair or some keystream bits,then we can iterate through all the possible keys to see which key was usedduring encryption. For a cipher with an n-bit key, in the worst case, we willfind the correct key after trying all 2n different keys.

A weaker attack is the message recovery attack, in which an adversary is giventhe ciphertext of a random message, and the goal is to recover the message orpartial information on the message from the ciphertext.

In a distinguishing attack, the adversary will not retrieve the secret key orrecover messages. Instead, the goal is to distinguish the cipher and a randomfunction by querying the encryption algorithm. Any key recovery attack is alsoa distinguishing attack. A distinguishing attack is obviously much less powerfulthan a key recovery attack. As a result, a distinguishing attack can have muchsmaller complexities than a key recovery attack. Nevertheless, distinguishingattacks sometimes can be transferred to message recovery or even key recoveryattacks.

To assess the security of a cryptographic algorithm, we should consider differentattack models as well. These models specify the power of the adversary. Wenow present several well-known attack models. We can, of course, consider anycombination of them.

Ciphertext-only attack (COA). The adversary is completely passive andhas access to ciphertexts and probably to some statistical information onthe plaintext.

Known-plaintext attack (KPA). The adversary learns some plaintexts/ci-phertexts encrypted under the same key.

Chosen-plaintext attack (CPA). The adversary can play with the en-cryption device and send appropriately chosen plaintexts, and get thecorresponding ciphertexts in return.

Chosen-ciphertext attack (CCA). The adversary can play with the decryp-tion device, and thus decrypt any chosen plaintext.

Chosen-IV attacks. If the crytosystem takes as input an IV, then theadversary has control of the plaintext as well as the IV. That is, theadversary can play with the encryption device and send appropriatelychosen IV and plaintexts, and get the corresponding ciphertexts in return.

It is worth pointing out that we assume that the different plaintexts/ciphertextspairs are under the same key. In other words, we consider the single-key setting.

PROVABLE SECURITY 33

Other attack scenarios have been investigated as well. As an example, in arelated-key attack it is also assumed that the adversary knows that two differentkeys are related to each other in some known way, e.g., some bits are the same.The related-key setting is a real concern for cryptographic hash functions basedon block ciphers.

3.2 Provable Security

This section focuses on the security analysis of protocols under the assumptionthat good cryptographic primitives exist. Actually, the protocols are built uponcomputationally secure cryptographic primitives. Our task is to prove that theprotocols can inherit the strength of the primitives. This approach follows theprovable security paradigm [73].

3.2.1 Proof by Reduction

The provable security paradigm is sketched as follows. Firstly, one takes somesecurity goal, such as achieving confidentiality via symmetric encryption. Thenext step is to make a formal adversarial model and define what it means for anencryption scheme to be secure. The definition explains exactly in which casesthe adversary succeeds. With a definition in hand, a particular protocol can beput forward based on some basic primitives. Then one can analyze the protocolfrom the viewpoint of meeting the definition. The plan is to prove security viaa reduction. A reduction shows that the only way to defeat the protocol is tobreak the underlying cryptographic primitive. Thus we will also need formaldefinitions of the security of cryptographic primitives.

A reduction proof indicates that if the primitive does the job it is supposed to do,so does the protocol based on the primitives. In other words, if we assume thatthe underlying primitives are secure under a definition, then we immediatelyknow that the protocol is secure as well for a specific attack model and for aspecific attack goal. In this framework, it is no longer necessary to analyze theprotocol: if you were to find a weakness in it, you would have discovered one inthe underlying primitive. Hence, it suffices to analyze the primitive.

3.2.2 Security Games

The security for a cryptographic scheme can be defined as an attack gameplayed between an adversary and a challenger. To be specific, an adversary A

34 SECURITY ANALYSIS

is a randomized and stateful algorithm with access to an oracle O. The oracleusually represents the cryptographic algorithm under analysis.

Typically, the security is defined in terms of some particular event S. In event-based games, the security is tied to certain events defined with respect to thetranscript generated from the oracle interaction. In this case, adversarial successprobability, or the adversary’s advantage, is measured as the probability theevent occurs.

The indistinguishability games are of special interest. In such a game, thechallenger chooses one of two algorithms f0 and f1 at random. The adversary isgiven access to an oracle which is the algorithm chosen by the challenger. Theadversary needs to determine which of the two algorithms it is interacting with.The indistinguishability advantage of adversary A in distinguishing algorithmf0 from f1 is

Advf0,f1A =

∣∣Pr [Af0 = 1]− Pr

[Af1 = 1

]∣∣ ,where the equation AO = 1 is the event that A outputs 1 when interacting withoracle O. The probabilities are defined over the probability spaces of A and O.

3.2.3 Cryptographic Primitives

Now we give the definitions of some cryptographic primitives which are usefulin reduction proofs. We write s $←− S to denote that s is chosen uniformly atrandom from the set S. Let X and Y be two sets. Let Func(X,Y ) and Perm(X)denote the set of functions from X to Y and the set of permutations on X,respectively.

Pseudorandom functions. Let s, ` and n be positive integers. A keyed functionF : Fs2 × F`2 → Fn2 is a pseudorandom function (PRF) if a computationallybounded adversary against F should not be able to distinguish the functionF (K, ·) with K

$←− Fs2 from a function chosen uniformly at random fromFunc(F`2,Fn2 ).

The PRF-advantage of an adversary A against F is

AdvprfF (A) =∣∣∣Pr [AF (K,·) = 1

]− Pr

[Af = 1

]∣∣∣ ,where K $←− Fs2, f

$←− Func(F`2,Fn2 ), and A is given access to an oracle which oninput a message M , outputs F (K,M).

PROVABLE SECURITY 35

Pseudorandom permutations. A function π : Fs2×F`2 → F`2 is a pseudorandompermutation (PRP) if a computationally bounded adversary against π should notbe able to distinguish the permutation π(K, ·) with K $←− Fs2 from a permutationchosen uniformly at random from Perm(F`2).

Assume that K $←− Fs2 and f`$←− Perm(F`2). The PRP-advantage of an adversary

A against π is

Advprpπ (A) =∣∣∣Pr [Aπ(K,·) = 1

]− Pr

[Af` = 1

]∣∣∣ ,where A is given access to an oracle which on input a message M , outputsπ(K,M).

The above definition only gives the adversaries access to the “forward” oracle.In contrast, the following stronger definition gives adversaries access to theinverse as well. With the above notations, the strong PRP-advantage of anadversary A against π is

Advsprpπ (A) =∣∣∣Pr [Aπ(K,·),π−1(K,·) = 1

]− Pr

[Af`,f

−1` = 1

]∣∣∣ ,where A is given access to two oracles π(K, ·) and π−1(K, ·). The first one, oninput a message M , outputs π(K,M) and the second one, on input an `-stringC, π outputs π−1(K,C).

Cryptographic hash function. A cryptographic hash function H : F∗2 → F`2 isan algorithm that maps arbitrarily long bit strings to digests of fixed length`. Informally speaking, the hash function H is cryptographically secure if itsatisfies the following properties:

Collision resistance It is hard to find two different messages x, x′ such thatH(x) = H(x′).

Preimage resistance Given an image y ∈ F`2, it is hard to find a message xsuch that H(x) = y.

Second Preimage resistance Given a massage x, it is hard to find a messagex′ such that H(x) = H(x′).

Formal definitions of the above properties can be found in [113].

36 SECURITY ANALYSIS

3.3 Differential and Linear Cryptanalysis

This section gives an overview of the key ideas of differential cryptanalysis andlinear cryptanalysis. Resistance against these two attacks has been one of themost important criteria in the design of symmetric key primitives.

3.3.1 Differential Cryptanalysis

Differential cryptanalysis was introduced by Biham and Shamir, who first appliedit to analyze DES and FEAL in the late 1980s [25]. Differential cryptanalysis isa chosen-plaintext attack. The main idea of this technique is to consider specificinput differences which lead to specific output difference with high probability.

Given ∆x and ∆y in Fn2 , the pair (∆x,∆y) is called a differential. For afunction F from Fn2 to Fm2 , the probability of the differential (∆x,∆y) over Fis defined by

DP(∆x,∆y) = Prx

[P (x⊕∆x)⊕ P (x) = ∆y] ,

where x is uniformly chosen from Fn2 . We define the expected differentialprobability (EDP) of a differential (∆x,∆y) over a keyed function as theaverage of the differential probability DP(∆x,∆y) over all keys. For anyblock cipher E , one can construct differential distinguishers of E by identifyingdifferentials with high EDP. A differential distinguisher can be used to mountkey recovery attacks.

Now we briefly present a differential attack on an n-bit block cipher. Wefirst construct an r-round differential distinguisher. A common strategy isto search for an r-round differential characteristic with high probability. Wefirst identify one-round differentials (∆xi,∆xi+1) with high probability for thei-th round function for 0 ≤ i ≤ r − 1. Then we can create a high probabilityr-round differential characteristic (∆x0,∆x1, · · · ,∆xr) by concatenating ther one-round differentials. Assume that the probabilities of differentials areindependent for different rounds. Then the r-round differential characteristic(∆x0,∆x1, · · · ,∆xr) occurs with probability p =

∏ri=1 pi, where pi indicates

the probability of differential for round i.

The r-round differential characteristic enables us to attack the block cipherwith (r + 1) rounds. Assume that the attacker has access to the ciphertextsof sufficiently many plaintext pairs with difference ∆x0. For each plaintextpair, by exhaustively guessing the last round key, the attacker can decrypt theciphertext pair one round back to get the states after r rounds, and check iftheir difference is ∆xr. Consequently, with high probability, the right key is

DIFFERENTIAL AND LINEAR CRYPTANALYSIS 37

the one that has the highest matches. To ensure high success probability ofkey recovery, a good rule-of-thumb for the number of chosen plaintexts is c/p,where c is a small constant.

After the publication of the original differential attack, many improvements andvariants have been proposed. For instance, a differential with probability zerocan also be used as a distinguisher. This is leveraged in impossible differentialattack [90, 24]. Other variants of differential attacks include the higher orderdifferential attack [94, 89] (see Sect. 3.4.2), the truncated differential attack [89]and the boomerang attack [150].

3.3.2 Linear Cryptanalysis

Linear cryptanalysis was developed by Matsui in 1993 [108]. Linear cryptanalysisattempts to take advantage of linear relationships between the input, output andkey bits that hold with higher bias than would be expected for a random function.A noticeable advantage of linear cryptanalysis over differential cryptanalysis isthat it only requires known-plaintext rather than chosen-plaintext. Thus, linearcryptanalysis is more practical than differential cryptanalysis in terms of attackscenario.

The crucial step of linear cryptanalysis is to examine highly biased linearexpressions involving plaintext bits, ciphertext bits, and key bits. Here thelinearity refers to the bit-wise XOR operation. Then a statistical model is usedfor constructing distinguisher or mounting key recovery attack. Let x ·y denotethe inner product of two vectors x,y ∈ Fn2 , i.e., x ·y = ⊕n−1

i=0 xiyi. Let P,C andK be the plaintext, ciphertext and key respectively. A linear approximation ofa cryptographic algorithm is a linear expression of the form

ΓP · P ⊕ ΓC · C = ΓK ·K , (3.1)

where binary vectors ΓP ,ΓC ,ΓK represent the masks of the plaintext, ciphertextand key respectively.

To measure the effectiveness of a linear approximation, we introduce the conceptsof bias and the related correlation. The bias ε of the linear approximation inEq. (3.1) is defined as

Pr[ΓP · P ⊕ ΓC · C = ΓK ·K]− 12 ,

where the plaintext P is chosen uniformly at random from the massage space andC is the corresponding ciphertext. The correlation of the linear approximationis defined to be 2ε. Both concepts are frequently used in the literature.

38 SECURITY ANALYSIS

By checking the value of ΓP · P ⊕ ΓC · C in Eq. (3.1) for a large number ofplaintext/ciphertext pairs, the value ΓK ·K can be guessed by taking the valuethat occurs most often. This gives a single bit of information about the key K.It has been shown in [108] that the success probability is high if the number ofplaintext/ciphertext pairs is larger than 1/ε2.

Many variants of linear cryptanalysis have been proposed in the last twodecades, including the multiple linear cryptanalysis [26] and the zero-correlationlinear cryptanalysis [28]. Combined with differential cryptanalysis, we have thedifferential-linear cryptanalysis [95].

Correlation of linear trails

Similar to differential cryptanalysis, we construct the linear approximation of acipher by combining those of round functions. A series of masks (Γi,Γi+1,Γ′i) iscalled a linear trail (or linear characteristic), where Γi,Γi+1 and Γ′i are the inputmask, output mask and key mask of round i, respectively. Then the correlationof a linear trail can be computed from that of the linear approximation of eachround. In general, we define the correlation of a random binary variable X bycor(X) = 2Pr[X = 0]− 1. Then we have the following lemma.

Lemma 2. (Piling-up lemma [108]) For n independent random binary variablesX1, X2, · · · , Xn, let X = X1 ⊕ X2 ⊕ · · · ⊕ Xn. Then we have cor(X) =∏ni=1 cor(xi).

The Piling-up lemma enables us to compute the correlation (or equivalently,bias) of a linear trail by chaining linear approximations provided that the linearapproximations are independent.

3.4 Algebraic Cryptanalysis

This section is devoted to algebraic cryptanalytic methods. We first recall thetechniques of algebraic attacks. Then the higher order differential attack and therelated degree evaluation methods are introduced. Next, we briefly summarizethe original and optimized interpolation attacks. Finally, we description theessential idea of cube attacks and survey some recent improvements.

ALGEBRAIC CRYPTANALYSIS 39

3.4.1 Algebraic Attacks

The main idea of algebraic attacks is to deduce the secret key by solvingnonlinear equations involving plaintext, ciphertext and key bits. Note that theequations are deterministic rather than probabilistic in linear attacks. Thisidea dates back to the seminal work of Shannon [129]. Due to the hardnessof solving general nonlinear equations, most ciphers are naturally immune totrivial algebraic attacks. Until to date, the most successful application of thealgebraic attacks is the attack on LFSR-based stream ciphers.

A typical algebraic attack has two phase: i) first we manage to derive sufficientlymany low degree or structured multivariate nonlinear equations, ii) then werecover key bits by solving the equations. Notice that the first step only needsto be done once for a cipher. The most frequently exploited equation-solvingmethods include linearization, Gröbner basis and guess-and-determine strategy.In practice, automated tools including SAT solvers can also be used if thenumber of equations is not too big, see also Sect. 3.5. In the sequel, we brieflyintroduce the common methods of solving equations.

Linearization. The main idea is to solve the resulting system of multivariatenonlinear equations by substituting the non-linear terms with new variables.Specifically, each nonlinear monomial is replaced by a new variable. Thenthe resulting new system is purely linear and can be solved using Gaussianelimination. However, it is difficult to estimate the number of linearlyindependent equations of the linearized system. The linearization techniquehas some extensions such as the eXtended Linearization (XL) method [42]. Anotable example of the linearization and XL methods is the algebraic attackson LFSR-based stream ciphers [43].

Gröbner basis computation. Another class of general algorithms for solvingsystems of multivariate algebraic equations is based on Gröbner bases. Althoughthe theory of Gröbner bases dates back to the 1960s, only the last two decadeshave seen its application in cryptography [64]. The original algorithm forcomputation of a Gröbner basis was proposed by Buchberger in [35]. Moreefficient variants are Faugére’s F4 [65] and F5 [66] algorithms. The downsideis that the computational cost of this approach is difficult to evaluate andstrongly depends on the structure of the equation system. More discussion ofthe application of Gröbner basis computation in algebraic cryptanalysis see [2].

Guess-and-Determine. This approach leverages the structural properties of acipher. In a guess-and-determine attack, by analyzing the obtained equations,

40 SECURITY ANALYSIS

some key bits are guessed. Next, some unknown key bits can be determinedby substituting the guessed bits in the equations and then directly solvingthe reduced equations. In a successful attack, the determination step usuallyhas a complexity significantly less than exhaustive search. Finally, the rightkeys are recovered by filtering wrong candidates. Guess-and-determine attacksare efficient against some word-oriented stream cipher [79, 80, 68]. Recently,a preliminary version of the stream cipher FLIP [110] has been broken by aguess-and-determine attack [59].

3.4.2 Higher Order Differential Attacks

First we introduce the first and higher order derivative of Boolean functions.

Definition 7. Let F (x) be a function from Fn2 into Fm2 . For any v ∈ Fn2 , thederivative of F with respect to v is the function DvF (x) = F (x⊕ v)⊕ F (x).For any k-dimensional subspace V of Fn2 and for any basis v0, · · · ,vk−1 ofV , the k-th order derivative of F with respect to V is the function defined by

DV F (x) = Dv0Dv1 · · ·Dvk−1F (x) =⊕u∈V

F (x⊕ u), ∀x ∈ Fn2 .

It turns out that the degree of any first-order derivative of a function is strictlyless than the degree of the function [94]. This implies that for every subspaceV of dimension equals to or larger than deg(F ), we have

DV F (x) =

constant, if dim(V ) = deg(F );0, if dim(V ) > deg(F ), (3.2)

holds for all x ∈ Fn2 . The above property is called the higher order differentialproperty, which is essential in all variants of higher order differential attacks. Inparticular, the higher order differential property can be employed to constructa zero-sum distinguisher [94, 89]. Informally speaking, a zero-sum distinguisherfor a function is any method to find a set of values summing to zero suchthat their respective images also sum to zero [12]. In some cases, zero-sumdistinguishers can be exploited in key recovery attacks.

It is readily seen that the central problem in a higher order differential attackis to evaluate the algebraic degree of a cryptosystem. Now we recall some basicresults and algorithms for degree evaluation.

Generic degree bounds. Assume that a block cipher has round function ofdegree d. Then a trivial upper bound on the output after r rounds is dr.

ALGEBRAIC CRYPTANALYSIS 41

It seems unlikely that one could lower the bound without considering morestructural properties. Remarkably, if the block cipher has an SPN structurewith a nonlinear layer composed of parallel applications of a number of balancedSboxes, then a generic tighter bound is given by Boura et al. [31] and furtherimproved in [30].

Degree estimation algorithms. Due to the lack of good generic degree bounds,in practice the more precise degree bound is often obtained by dedicatedalgorithm for given primitives. The division property [142] has shown tobe a powerful and generic tool for estimating the degree upper bounds onvarious symmetric key primitives [143]. Indeed, it has been applied in theanalysis of block ciphers [141, 142], cryptographic permutations [20] and streamciphers [144]. More applications of the division property will be provided inSect. 3.4.4. Another ad hoc technique named numeric mapping proposed byLiu [103] is dedicated to NFSR based ciphers and in particular Trivium-likeciphers. How to generalize the method to other primitives is an open problem.

3.4.3 Interpolation Attacks

The interpolation attack was introduced by Jakobsen and Knudsen [83, 84].Assume that a symmetric key primitive has fixed unknown key K, theplaintext/ciphertext pair (x, y), and an intermediate bit z. The main ideais to consider the polynomials y = F (K,x) representing the encryption process,z = F1(K,x) from the encryption direction and/or z = F2(K, y) from thedecryption side. With sufficiently many known or chosen plaintext/ciphertextpairs, one can reconstruct the unknown coefficients of the target polynomials bypolynomial interpolation or by solving linear equations. Then various strategiescan be applied to recover the secret key K. We recall several techniques ininterpolation attacks of a block cipher E as follows.

In the original interpolation attack, the key bits are regarded as unknownconstants. Consider the univariate polynomial z = F (K,x), where z is a statebit before the last round of the block cipher E . One can first guess the lastround key. Then, by decrypting one round back, one can interpolate F (K,x)by polynomial interpolation. Next, with additional plaintext/ciphertext pairsone checks if the polynomial F is correct. If this is the case, then the key guessis right with high probability. Otherwise, the key guess is wrong and filtered.Finally, the last round key can be recovered.

After the introduction of the original interpolation attack, many variants havebeen proposed to improve the complexity of key recovery. The main observation

42 SECURITY ANALYSIS

is that the coefficients of F, F1 and F2 are key-dependent, hence they can beused to recover some key bits. In [137], for a special class of block ciphers, thecoefficient of a specific term in the polynomial F is determined, which leadsto a key recovery attack with low memory complexity. In another variant, byanalyzing the algebraic properties of the coefficients in F2, one determines somecoefficients which can be written as low-degree polynomials in some key bits. Byemploying higher order differential properties, one can obtain linear equationsof the coefficients. This allows the attacker to recover the coefficients by solvinglinear equations. Hence, one obtains equations of key bits by substitutingthe recovered coefficients. Thus, some key bits can be deduced from the low-degree equations. A notable application of the latter variant is the optimizedinterpolation attack on the block cipher LowMC [50].

3.4.4 Cube Attacks

The cube attack, proposed by Dinur and Shamir [52] in 2009, is one of thegeneral cryptanalytic techniques of analyzing symmetric key primitives. Asimilar technique has been independently proposed by Vielhaber in [148].After its proposal, the cube attack has been successfully applied to variousciphers, including stream ciphers [11, 53, 70], hash functions [51, 82, 100], andauthenticated encryption schemes [101, 57].

For a cipher with n secret variables x = (x1, x2, · · · , xn) and m public variablesv = (v1, v2, · · · , vm), we can regard the ANF of the output bits as a polynomialof x and v, denoted as f(x,v). For a randomly chosen set I = i1, i2, ..., i|I| ⊂1, . . . ,m, f(x,v) can be written as

f(x,v) = tI · p(x,v)⊕ q(x,v) , (3.3)

where tI = vi1 · · · vi|I| , p(x,v) only contains variables vs’s (s /∈ I) and the secretvariables x, and q(x,v) misses at least one variable in tI . The polynomialp(x,v) is called the superpoly of tI .

Now we can define an affine subspace of Fm2 , called cube, denoted as CI(V ),consisting of 2|I| binary vectors as follows:

CI(V ) := v ∈ Fm2 |v[i] ∈ F2, i ∈ I; v[s] = V [s], s /∈ I , (3.4)

where V represents a specific assignment to the non-cube public variables.It has been proved by Dinur and Shamir [52] that the value of superpoly pcorresponding to the key x and the non-cube IV assignment V can be computedby summing over the cube CI(V ) as follows:

p(x,V ) =⊕

v∈CI (V )

f(x,v) . (3.5)

ALGEBRAIC CRYPTANALYSIS 43

We use CI to denote the cube corresponding to the arbitrary V setting inEq. (3.4).

The essential idea of the cube attack is that the superpoly p(x,v) is potentiallymuch simpler than f(x,v), hence it can be used in cryptanalysis when it hasparticular non-random properties such as linearity, balanceness, and neutralityetc. [11]. A classical cube attack has three phases as follows.

1. Offline Phase: Superpoly Recovery. For a given cube CI(V ), onecan obtain the values of the superpoly p(x,V ) by Eq. (3.5). Then onechecks if the superpoly is linear by a probabilistic test. If the superpoly islinear, then it is reconstructed and stored.

2. Online Phase: Partial Key Recovery. Query the encryption oracleand get the summation of the 2|I| output bits. We denoted the summationby a ∈ F2 and we know p(x,V ) = a according to Eq. (3.5). After collectingsufficiently many equations, some key bits can be recovered by solving thesystem of linear equations.

3. Brute-Force Search. Guess the remaining key variables to recover theentire value in key variables.

There are some shortcoming of the original cube attacks:

• Structural properties are not leveraged since the cipher is regarded as ablack-box.

• Only linear or quadratic [116, 70] superpolies can be exploited.

• The probabilistic test needs to evaluate the p(x,V ), which has complexityexponential in the dimension of the cube. This limits the choices of cubesone can test.

To sum up, due to the above limitations of the original attack strategy, theevaluation of resistance against cube attack is quite incomplete.

Division property based cube attacks

The division property was first proposed by Todo [142] in 2015. This techniqueis the generalization of the integral property that can also exploit the algebraicdegree at the same time. The division property has been a powerful tool forevaluating the algebraic degree of symmetric key primitives.

44 SECURITY ANALYSIS

It turns out that more algebraic properties of the target primitive can beextracted by the division property. Todo et al. adapt Xiang et al.’s method [164]by introducing key variables into the MILP (see Sect. 3.5) model [144, 145].With this technique, a set of key indices J ⊂ 1, . . . , n is deduced for the cubeCI such that p(x,v) is only related to the key variables xj ’s with j ∈ J . Thenone can recover one bit of information on the key bits by executing two phases.In the offline phase, a proper assignment V to the non-cube public variablesis determined so that p(x,V ) is non-constant. Also, the entire truth table ofp(x,V ) is constructed through cube summations. In the online phase, the exactvalue of p(x,V ) is obtained through a cube summation. Then the candidatevalues of xj ’s (j ∈ J) can be identified by checking the precomputed truth table.In this way, a proportion of wrong keys is filtered. Finally, the remaining keybits can be obtained by brute-force search.

Cube attacks with bit conditions

The dynamic cube attack was first introduced by Dinur and Shamir in theanalysis of Grain-128 [52]. The idea is to decompose a polynomial into theform f = p1p2 ⊕ p3, where the degree of p3 is lower than that of f , and p1contains a linear public term called a dynamic variable. A dynamic variable isassigned a function that depends on some key bits and cube variables, and canthen be used to nullify p1. Thus, f is simplified to the low-degree polynomial p3.In the key recovery phase, one first guesses some key bits to compute the valueof a dynamic variable. Then the right key guess is supposed to be uniquelyidentified using cube testers dedicated to particular non-randomness in cubesummations.

Inspired by the dynamic cube attack, the conditional cube attack was proposedby Huang et al. [82] to analyze the keyed Keccak sponge function. They selectsome conditional cube variables such that the diffusion of these cube variable canbe controlled in the first few rounds by imposing certain bit conditions. In thisway, they can reduce the algebraic degree of the output polynomial, and hencelower the required cube dimension to sum over. Moreover, some key variablesin the bit conditions can be used in the key recovery phase. Indeed, similarto dynamic cube attacks, by observing the cube summations, one can recoversome key bits. It is worth noting that the techniques are similar to the messagemodification technique [153, 154] and conditional differential cryptanalysis [88]which use bit conditions to control difference propagation.

TOOLS 45

3.5 Tools

We present the basic idea and recent trends on three tools for automatedcryptanalysis: Mixed Integer Linear Programming, Constraint Programming,and SAT.

Mixed Integer Linear Programming. The Mixed Integer Linear Programming(MILP) is an optimization or feasibility program whose variables are restrictedto integers. A MILP model consists of variables, constraints, and an objectivefunction. An example is given as follows.

Example 1. Consider the following instance of an MILP.

Maximize x+ y + 2zSubject to x+ 2y + 3z ≤ 4

x+ y ≥ 1x, y, z ∈ 0, 1

The solution to the above model is 3, where (x, y, z) = (1, 0, 1).

MILP models can be solved by solvers like Gurobi [76]. If there is no feasiblesolution at all, the solver simply returns infeasible. If no objective function isassigned, the MILP solver only evaluates the feasibility of the model. Over thelast years, MILP models have been widely used in symmetric cryptanalysis.To be specific, it has been adopted in various cryptanalytic methods such asdifferential [115, 140, 139], linear [139], impossible differential [44, 127], zero-correlation linear [44], and integral cryptanalysis [164].

Constraint programming. A Constraint Programming (CP) problem statesvariables and their relations in constraints. The last years have seen theapplications of CP solvers in symmetric cryptanalysis [72, 138, 132]. The CPapproach enjoys certain advantages over MILP. For instance, the modelingprocess of CP is more straightforward than that of the MILP-based method.In the MILP method, one needs to encode the bit patterns as a set of linearinequalities, while in the CP approach, one can directly input bit patterns intothe CP solver. Moreover, CP solvers such as Choco [120] can easily handle 8-bitS-boxes which are not feasible for state-of-the-art MILP solvers.

SAT. The Boolean SATisfiability (SAT) problem is the first known NP-complete problem, shown by Cook in 1971 [40]. An example of a Boolean

46 SECURITY ANALYSIS

formula in conjunctive normal form is

w ∧ (¬x) ∧ (y ∨ z) .

This formula is satisfiable since the assignment (w, x, y, z) = (True, False, True,True) evaluates the entire formula to True. A SAT solver is an algorithm thatdecides whether a given Boolean formula has a satisfying evaluation. Finding asatisfying evaluation is infeasible in general, but many SAT instances can besolved efficiently. For example, CryptoMiniSat [135] is one of the SAT solversdedicated to problems emerging from cryptography.

SAT solvers are natural tools for solving nonlinear multivariate Booleanequations in algebraic attacks. Indeed, they has been successfully applied in theanalysis of the block cipher KeeLoq which is used to secure car keys [41]. Furtheremployment of SAT solvers in cryptanalysis include SAT-based differential, linearand rotational cryptanalysis [91, 114, 106].

Chapter 4

Contributions

If I have seen further it is by standing on the shoulders of giants.- Isaac Newton

This chapter summarizes our main results and some related works. We firstshow new constructions of lightweight linear layers. Then we propose newcryptanalysis methods which are powerful for analyzing ciphers of low algebraicdegree. Finally, we present a novel IDS that works over encrypted traffic.

4.1 Design of Lightweight Linear Layers

The need for lightweight cryptography has resulted in a great interest inlightweight diffusion matrices. Many constructions of lightweight MDS andinvolutory MDS matrices have been proposed. It is common to generateefficient MDS matrices from matrices with specific structures. For instance,AES MixColumns exploits a circulant matrix. Involutory MDS matricesare useful in SPN structures since they can be implemented with the samecircuit in encryption and decryption. Many lightweight involutory MDSmatrices have been constructed from Hadamard, Cauchy, and Vandermondematrices [134, 104].

To further reduce the hardware cost, Guo et al. propose the recursive (orserial) MDS matrices, which have been exploited in the lightweight block cipherLED [78] and the lightweight hash function PHOTON [77]. The main idea isto represent an MDS matrix as a power of a very sparse matrix. In this way,

47

48 CONTRIBUTIONS

the MDS matrix can be implemented by iterating the sparse matrix. Comparedwith direct construction of MDS matrices, recursive MDS matrices have asignificantly lower hardware cost at the price of additional clock cycles.

4.1.1 Near-MDS Matrices

Near-MDS matrices have sub-optimal branch numbers. While they require lessarea than MDS matrices, they do not need additional clock cycles comparedwith recursive MDS matrices. Indeed, some near-MDS matrices outperformMDS or recursive MDS matrices in terms of the FOAM (Figure Of AdversarialMerit) framework proposed by Khoo et al. [87].

We present new designs of lightweight near-MDS matrices in [97]. We firstconstruct generic n× n near-MDS circulant matrices for 5 ≤ n ≤ 9. Then weexamine the implementation cost of instantiations of the generic near-MDSmatrices. For n = 7, 8, it turns out that some proposed near-MDS circulantmatrices of order n have the lowest sequential XOR count among all near-MDSmatrices of the same order. Further, for n = 5, 6, we present (non-circulant)near-MDS matrices of order n having the lowest sequential XOR count as well.The proposed matrices, together with previous construction of order less thanfive, lead to solutions of n× n near-MDS matrices with the lowest sequentialXOR count over finite fields F2m for 2 ≤ n ≤ 8 and 4 ≤ m ≤ 2048. Moreover,we present some involutory near-MDS matrices of order 8 constructed fromHadamard matrices. Further, the security of the proposed linear layers is studiedby calculating lower bounds on the number of active S-boxes. It is shown thatour linear layers, when combined with a well-chosen nonlinear layer, can providesufficient security against differential and linear cryptanalysis.

4.1.2 MDS Matrices from Lightweight Circuits

We propose new constructions of global optimal lightweight involutory MDSmatrices in [98]. In particular, we identify some involutory 32 × 32 MDSmatrices that can be realized with only 78 XOR gates and depth 4, whereas thepreviously known lightest involutory MDS matrices cost 84 XOR gates with thesame depth. Notably, the involutory MDS matrix we find is much smaller thanthe AES MixColumns operation, which requires 97 XOR gates with depth 8 whenimplemented as a block of combinatorial logic that can be computed in oneclock cycle. However, with respect to latency, the AES MixColumns operationis superior to our 78-XOR involutory matrices, since the AES MixColumns canbe implemented with depth 3 by using more XOR gates. We also show that

CRYPTANALYSIS OF CIPHERS WITH LOW ALGEBRAIC DEGREE 49

the depth of a 32 × 32 MDS matrix with branch number 5 (e.g., the AESMixColumns operation) is at least 3.

We enhance Boyar’s SLP-heuristic algorithm [33] with circuit depth awareness,such that the depth of its output circuit can be bounded. With this tool,we can estimate the minimum achievable depth of a circuit implementing thesummation of a set of signals with given depths, which is of independent interest.We apply the new SLP heuristic to a large set of lightweight involutory MDSmatrices. Consequently, we identify a depth 3 involutory MDS matrix whoseimplementation costs 88 XOR gates, which is superior to the AES MixColumnsmatrix with respect to both area and latency, and enjoys the extra involutionproperty.

4.2 Cryptanalysis of Ciphers with Low AlgebraicDegree

Symmetric key ciphers with low-degree round functions have been increasinglypopular nowadays. It turns out that they are favourable in many use cases havingspecific requirements such as lightweight implementation, efficient maskingcountermeasure and low multiplicative complexity. We briefly recall somedesigns with low algebraic degree.

Efficient implementations. One of the goals of lightweight cryptography is tominimize the logic area needed to implement the primitives. As we have seen inSect. 4.1, for linear components it leads to the study of linear operations withthe lowest XOR count. While for nonlinear components such as S-boxes, weneed to consider both the number of XOR and AND gates required in hardwareimplementations [33, 136]. From the viewpoint of the designer, reducing the costof AND gates in an implementation requires round functions with low algebraicdegree. A remarkable example is the eSTREAM portfolio cipher Trivium [48]which has only three AND in the round function to optimize the hardwareimplementation. The CAESAR finalist cipher Morus [162] uses bitwise ANDto achieve efficient implementation in both software and hardware.

The other advantage of low-degree primitives is a reduction of implementationcost of side-channel attack countermeasures. For instance, the block cipherNoekeon [46], the SHA-3 winner Keccak [22] and the CAESAR portfoliocipher Ascon [55] aim to reduce the cost of masking countermeasures by usingS-boxes constructed with a small number of AND and XOR operations.

50 CONTRIBUTIONS

FHE/MPC/ZK-friendly designs. Many new symmetric primitives have beenproposed in the context of FHE, MPC and ZK [6, 75, 37, 4]. In these protocols,the adoption of dedicated symmetric key primitives turns out to be vital toimprove the efficiency. The main design goal of these primitives is to minimizethe Multiplicative Complexity (MC), i.e., minimize the number of multiplicationsin a circuit or/and minimize the multiplicative depth of the circuit. However,traditionally designed block ciphers, stream ciphers and hash functions oftenfail to meet these requirements.

The block cipher LowMC [6] is one of the earliest designs dedicated to FHEand MPC. The stream ciphers Kreyvium [37] and FLIP [110] have beendesigned with a focus on minimizing the ANDdepth of the circuit. Indeed,they aim to provide practical solutions for efficient homomorphic-ciphertextcompression [37, 110]. The stream cipher RASTA [54] has been proposed toachieve both minimum ANDdepth and minimum ANDs per encrypted bit.While the above ciphers operate on the binary field, the block cipher MiMC,proposed by Albrecht et al. in 2016 [4, 5], aims to minimize multiplicationsin the larger fields F2n and Fp. Indeed, MiMC outperforms both AES andLowMC in many applications such as MPC [75] and Succinct Non-interactiveArguments of Knowledge (SNARKs) [16]. A variant of MiMC, GMiMC [3],has been constructed by inserting the original design into generalized Feistelstructures.

4.2.1 Division Property based Cube Attacks

We improve the classical cube attacks by exploiting the algebraic propertiesof the superpoly, which include the (non-)constantness, low degree and sparsemonomial distribution properties. Inspired by the work of Todo et al. in [144],we formulate all these properties in one framework by developing more accurateMILP models. We introduce the flag technique to model the constant valuesof the IV more precisely. Based on the new model, several degree evaluationand term enumeration methods are proposed to efficiently extract algebraicproperties of the superpolies. Specifically, a degree evaluation algorithm ispresented to obtain upper bounds on the degree of the superpoly. With theknowledge of the algebraic degree, the superpoly can be recovered withoutconstructing its whole truth table. We also provide several term enumerationalgorithms for finding the monomials of the superpoly, so that the complexityof superpoly recovery can be further reduced. As a result, we are able to attackmore rounds and employ even larger cubes compared to previous works.

By applying the improved cube attacks, we can lower the complexities of theprevious best cube attacks in [105, 144, 145]. Particularly, we can further

CRYPTANALYSIS OF CIPHERS WITH LOW ALGEBRAIC DEGREE 51

provide key recovery results on 839 out of 1152 rounds of Trivium, 891 outof 1152 rounds of Kreyvium, 184 out of 256 rounds of Grain-128a, and 750out of 1792 rounds of Acorn [151]. To the best of our knowledge, our result ofKreyvium is the current best key recovery attack on the initialization of thetargeted ciphers.

In some follow-up works, our attacks on several ciphers have been improved byrefining our techniques. Ye and Tian propose new superpoly recovery algorithmsand revisit the key recovery phase of our attacks on Trivium [167]. With anew method for cube searching, Yang et al. improve the key recovery attackson Acorn [165]. By exploiting the division property using three subsets, Wanget al. propose key recovery attacks on Trivium up to 841 rounds [152].

4.2.2 Correlation of Quadratic Boolean Functions with Appli-cation to Morus

The precise estimation of correlations of linear approximations is a central issuein linear cryptanalysis. We tackle this problem for symmetric primitives withlow-degree round functions [133]. To our end, we first investigate the problemof computing the correlation of quadratic Boolean functions. By transforminga quadratic Boolean function into its disjoint quadratic form, we propose apolynomial time algorithm that can determine the correlation of an arbitraryquadratic Boolean function, while in previous work (e.g., [10]), such correlationsare computed with exhaustive or ad hoc approaches which intrinsically limitstheir effectiveness.

The new correlation computation technique permits us to develop a modelfor finding linear trails of Morus-like keystream generators. The model isgeneric and can be applied to many other schemes, which is of independentinterest. We also introduce the MILP-based approach for searching the linearapproximations, which is typically used in the analysis of block ciphers.

As an application, we search for more complex rotational invariant linear trailsof Morus, and then compute their correlations. This allows us to identify trailsfor all versions of Morus, which leads to a significant improvement over theprevious attack on Morus-1280-256 presented by Ashur et al. [10]. Both timeand data complexities are reduced from 2152 to 276. Moreover, these trails resultin the first attacks on full Morus-640 and Morus-1280 with 128-bit key. Theattacks have been verified on a reduced version of Morus. The complexities ofthe full versions are approaching the boundary of practical attacks. Our attackshake the security confidence in Morus, which was included in the finalist ofCAESAR competition, but failed to enter the final portfolio.

52 CONTRIBUTIONS

4.2.3 Improved Interpolation Attacks

We present novel attacks against primitives with low algebraic degree [96].The first new attack is based on an observation from Sun et al. [137]. Itintroduces novel interpolation attacks with constant memory complexity: somekey-dependent terms of the interpolated polynomial are determined directly,without constructing the complete polynomial. Then we propose an algorithmwith constant memory for recovering the coefficient of the specific term whichresults in an efficient key recovery attack. The second new attack exploits asimple cyclic key schedule. For the specific key schedule, we present genericattacks with either constant memory or constant data complexity.

As an illustration, we apply the new attacks to the block cipher MiMC.Specifically, we can break 38-round MiMC-129/129 with time complexity 265.5,data complexity 260.2 and negligible memory. Our results refute the claimof the authors who consider attacks with less than 264 bytes memory andconclude [5, p. 17],“38 rounds are sufficient to protect MiMC-129/129 againstthe interpolation, the GCD and the other attacks. Time-memory trade-offsmight well be possible, and we leave this as a topic for future research.” Ourattack simply reduces memory while keeping the time complexity at the samevalue, hence we show that there is no trade-off!

For a two-key version of MiMC-n/n, the best attack described by the designershas complexity O(33r). The designers further claimed that the bound can beimproved by a Meet-In-The-Middle (MITM) attack [5, p. 18], but they offer nodetails. By applying our generic attack to the concrete design, the complexitycan be reduced to O(r3r) if r ≤ d n

log2(3)e − 1 and O(r32r−1) if r ≥ d nlog2(3)e.

Our reduced bound is the first tighter bound based on specific attacks.

Our analysis of MiMC is the first third party cryptanalysis of MiMC. However,our results do not affect the security claims of the full round MiMC.

After the publication of our work, Bonnetain [29] presents attacks on MiMC-2n/n and GMiMC by leveraging the weak key schedules. Moreover, the attacksare independent of the round functions and number of rounds.

4.3 Intrusion Detection over Encrypted Traffic

Privacy and security have been central issues in the age of big data and AI/ML.To address the issues, encryption has been widely adopted to protect data.Nevertheless, the protection of data should not come at the detriment of othersecurity aspects. In the context of network traffic, TLS and HTTPS have

INTRUSION DETECTION OVER ENCRYPTED TRAFFIC 53

been employed to secure Internet traffic. However, IDS based on DPI becomesin particular totally blind when the traffic is encrypted, making clients againvulnerable to known threats and attacks. To alleviate the consequences of suchattacks, Security Editors (SEs) propose to use proxies to establish a secureconnection with the web server on behalf of the client. The proxy decryptsand then inspects the whole traffic. However, this approach raises problems ofsecurity and privacy.

To reconcile security and privacy, it is desirable to design a new IDS overencrypted traffic. Blindbox and BlindIDS are two recent proposals that enableus to perform DPI over encrypted traffic. The first solution, Blindbox, proposedby Sherry et al. [131], relies on MPC tools including oblivious transfer [121, 63]and the garbled circuits of Yao [166]. However, even if Blindbox is quiteefficient in detecting anomalous encrypted traffic, it necessitates a very highsetup time for clients and servers and does not protect the know-how of SEs.Aiming to improve Blindbox, Canard et al. [36] present the BlindIDS basedon decryptable searchable encryption [71]. BlindIDS does protect SEs’ marketand does not introduce any latency during setup time. However, the use ofpublic key cryptography, and especially pairings, comes with an increasingdecryption overhead on the client side. Thus, the BlindIDS is still not adequatefor real-world use.

We propose a new solution for IDS over encrypted traffic with efficient setup,low memory consumption, rules’ confidentiality against proxies and efficiencyof the whole protocol among a sender, a receiver and a proxy. Our system isbased on symmetric cryptographic primitives. A preliminary implementationshows that our protocol outperforms previous works, permitting to encrypt apacket of 1500 bytes in about 6 µs and to test such packet with 3000 rulesin less than 2 µs. This implies that our protocol is approaching feasibility inreal-world applications.

Chapter 5

Conclusions and OpenProblems

Wir müssen wissen. Wir werden wissen.- David Hilbert

This chapter overviews the main results of the dissertation and presents somefuture research directions.

5.1 Conclusions

Near-MDS matrices are good alternatives for lightweight applications.Near-MDS matrices can potentially provide better trade-offs between securityand efficiency compared to MDS matrices. However, concrete constructions ofefficient near-MDS are missing. We fill this gap by proposing locally optimizednear-MDS matrices which are much lighter than MDS matrices of the samesize. Moreover, it is shown that our constructions together with a well-chosennonlinear layer can provide sufficient security against differential and linearcryptanalysis.

New MDS matrices outperform AES MixColumns in both area and depth.The construction of globally optimized MDS matrices has been an open problemin lightweight cryptography. Recent years have seen progresses in light of SLP-based heuristic tools. We enhance Boyar’s SLP heuristic algorithm with circuit

55

56 CONCLUSIONS AND OPEN PROBLEMS

depth awareness, such that the depth of its output circuit can be bounded. Thenew SLP heuristic enables us to identify an MDS matrix which is superior tothe AES MixColumns matrix with respect to both circuit area and latency,while it enjoys the extra involution property.

Enhanced cube attacks improves cryptanalytic results. The division prop-erty based cube attacks address the cube size limitation in traditional cubeattacks. This state-of-the-art technique is improved by our new MILP models.With the new model, we provide novel degree evaluation and term enumerationmethods to efficiently extract algebraic properties of the superpolies. As aresult, we are able to provide a more accurate and complete evaluation of theresistance against cube attacks. Specifically, we can improve the key recoveryattacks on stream ciphers including the round-reduced versions of Trivium,Kreyvium, Grain-128a, and Acorn. Until of the time of writing (November,2019), our analysis of Kreyvium is the best key recovery attack.

Linear cryptanalysis benefits from new correlation computation methods.We revisit linear cryptanalysis by proposing an efficient algorithm for computingthe correlation of an arbitrary quadratic Boolean function. Combining theproposed algorithm with the MILP approach, we present method of searchingfor linear approximations of Morus-like keystream generators. This leads to thefirst attacks on full Morus-640 and Morus-1280 with 128-bit key. The attackshave been verified on a reduced version of Morus. Our attack weaken thesecurity confidence of Morus, which was included in the finalist of CAESARcompetition, but failed to enter the final portfolio.

Interpolation attacks are improved, not yet enough. We present novelinterpolation attacks against primitives with low degree. Our work showsthat the memory complexity of classical interpolation attacks can be reducedsubstantially. To illustrate our techniques, we show new attacks on the blockcipher MiMC. Consequently, we can break a round-reduced version of MiMCwith low memory complexity and give new security bounds for the larger keyversions. Our analysis of MiMC is the first third-party cryptanalysis of MiMC.However, our results do not affect the security claims of the full round MiMC.

Intrusion detection over encrypted traffic is practical. IDSs over encryptedtraffic have been proposed based on MPC and public key cryptography. However,none of the existing schemes is practical due to the prohibitive overhead. Weshow new solutions based on symmetric cryptographic primitives. This enables

FUTURE RESEARCH 57

us to achieve efficient setup, low memory consumption, rules’ confidentialityagainst proxies and efficiency at the client side. Our preliminary implementationoutperforms related works in all aspects, showing that our work is a significantstep towards a real-life deployment of privacy-preserving IDS.

5.2 Future Research

Constructing new globally optimized diffusion matrices. Recent research hasshown that locally optimized diffusion matrices can be significantly improvedby global optimization solutions. In this context, globally optimized MDSmatrices have been intensively studied. However, it remains open to prove theoptimality of the proposed constructions or present new matrices with provableglobal optimality. Another natural extension is to generate globally optimizednon-MDS matrices such as near-MDS matrices.

Improvements and applications of SLP heuristics. SLP heuristics areimportant tools for the optimization of implementations of linear operations.While most existing work focuses on circuit area, our improved heuristic dealswith both circuit area and depth. It is an open problem to further enhance theheuristics or combine them with other tools such as SAT solvers to tackle othercircuit minimization problems [136].

More precise techniques for degree evaluation and term detection. Al-gebraic cryptanalysis relies on techniques for detecting algebraic properties.The division property has shown to be a powerful generic method for degreeevaluation and term detection. It is interesting to enhance the current modelto improve existing cryptanalytic results. However, it remains unclear how toimprove the division property in general.

Computing correlation of Boolean functions of higher degrees. One of theopen problems in linear cryptanalysis is to precisely estimate the correlations oflinear approximations. It would be interesting to apply our method to othersymmetric primitives. Further, an important open problem is to extend ourwork on quadratic Boolean functions to functions of higher degrees.

Cryptanalysis of Low MC ciphers. The security analysis of novel low MCprimitives is generally difficult due to the incredibly large number of rounds.Algebraic cryptanalysis appears to be promising since most low MC primitives

58 CONCLUSIONS AND OPEN PROBLEMS

have round function with simple algebraic structures. Optimized interpolationattacks have been successfully used to analyze the full LowMC v1 [50], butthe full version of MiMC and the new versions of LowMC remain open forcryptanalysis. More generally, it would be interesting to develop new tools ofsecurity analysis to guide the design of new low MC ciphers.

Enhancing functionality of IDS over encrypted traffic. It is interesting toimprove the existing IDS over encrypted traffic with more functionalities suchas the support of regular expressions.

Symmetric primitives in computation over encrypted data. Symmetricprimitives have shown to be vital in improving the efficiency of many protocolsenabling computation over encrypted data. It is interesting to find new use casesto stimulate the research of both privacy-preserving techniques and symmetriccryptography.

Bibliography

[1] CAESAR: Competition for Authenticated Encryption: Security,Applicability, and Robustness. https://competitions.cr.yp.to/caesar.html, 2014.

[2] Albrecht, M. R. Algorithmic algebraic techniques and their applicationto block cipher cryptanalysis. PhD thesis, Royal Holloway, University ofLondon, Egham, UK, 2010.

[3] Albrecht, M. R., Grassi, L., Perrin, L., Ramacher, S.,Rechberger, C., Rotaru, D., Roy, A., and Schofnegger, M.Feistel structures for MPC, and more. Cryptology ePrint Archive, Report2019/397, 2019. https://eprint.iacr.org/2019/397.

[4] Albrecht, M. R., Grassi, L., Rechberger, C., Roy, A., andTiessen, T. MiMC: Efficient encryption and cryptographic hashingwith minimal multiplicative complexity. In Advances in Cryptology –ASIACRYPT 2016, Part I (Dec. 2016), J. H. Cheon and T. Takagi, Eds.,vol. 10031 of Lecture Notes in Computer Science, Springer, Heidelberg,pp. 191–219.

[5] Albrecht, M. R., Grassi, L., Rechberger, C., Roy, A., andTiessen, T. MiMC: Efficient encryption and cryptographic hashing withminimal multiplicative complexity. Cryptology ePrint Archive, Report2019/492, 2019. https://eprint.iacr.org/2019/492.

[6] Albrecht, M. R., Rechberger, C., Schneider, T., Tiessen, T.,and Zohner, M. Ciphers for MPC and FHE. In Advances in Cryptology –EUROCRYPT 2015, Part I (Apr. 2015), E. Oswald and M. Fischlin, Eds.,vol. 9056 of Lecture Notes in Computer Science, Springer, Heidelberg,pp. 430–454.

[7] AlFardan, N. J., Bernstein, D. J., Paterson, K. G., Poettering,B., and Schuldt, J. C. N. On the security of RC4 in TLS. In USENIX

59

60 BIBLIOGRAPHY

Security 2013: 22nd USENIX Security Symposium (Aug. 2013), S. T.King, Ed., USENIX Association, pp. 305–320.

[8] AlFardan, N. J., and Paterson, K. G. Lucky thirteen: Breaking theTLS and DTLS record protocols. In 2013 IEEE Symposium on Securityand Privacy (May 2013), IEEE Computer Society Press, pp. 526–540.

[9] Andreeva, E., Bogdanov, A., Datta, N., Luykx, A., Mennink, B.,Nandi, M., Tischhauser, E., and Yasuda, K. COLM v1. Submissionto Round 3 of the CAESAR competition, 2016. https://competitions.cr.yp.to/round3/colmv1.pdf.

[10] Ashur, T., Eichlseder, M., Lauridsen, M. M., Leurent, G.,Minaud, B., Rotella, Y., Sasaki, Y., and Viguier, B. Cryptanalysisof MORUS. In Advances in Cryptology – ASIACRYPT 2018, Part II(Dec. 2018), T. Peyrin and S. Galbraith, Eds., vol. 11273 of Lecture Notesin Computer Science, Springer, Heidelberg, pp. 35–64.

[11] Aumasson, J.-P., Dinur, I., Meier, W., and Shamir, A. Cubetesters and key recovery attacks on reduced-round MD6 and Trivium. InFast Software Encryption – FSE 2009 (Feb. 2009), O. Dunkelman, Ed.,vol. 5665 of Lecture Notes in Computer Science, Springer, Heidelberg,pp. 1–22.

[12] Aumasson, J.-P., and Meier, W. Zero-sum distinguishers for reducedKeccak-f and for the core functions of Luffa and Hamsi. Rump session ofCryptographic Hardware and Embedded Systems -CHES 2009, 2009.

[13] Bao, Z., Guo, J., Ling, S., and Sasaki, Y. PEIGEN - a platformfor evaluation, implementation, and generation of S-boxes. IACR Trans.Symmetric Cryptol. 2019, 1 (2019), 330–394.

[14] Beierle, C., Kranz, T., and Leander, G. Lightweight multiplicationin GF(2n) with applications to MDS matrices. In Advances in Cryptology– CRYPTO 2016, Part I (Aug. 2016), M. Robshaw and J. Katz, Eds.,vol. 9814 of Lecture Notes in Computer Science, Springer, Heidelberg,pp. 625–653.

[15] Bellare, M., and Namprempre, C. Authenticated encryption:Relations among notions and analysis of the generic composition paradigm.In Advances in Cryptology – ASIACRYPT 2000 (Dec. 2000), T. Okamoto,Ed., vol. 1976 of Lecture Notes in Computer Science, Springer, Heidelberg,pp. 531–545.

[16] Ben-Sasson, E., Chiesa, A., Genkin, D., Tromer, E., and Virza,M. SNARKs for C: verifying program executions succinctly and in zero

BIBLIOGRAPHY 61

knowledge. In Advances in Cryptology – CRYPTO 2013, Part II (Aug.2013), R. Canetti and J. A. Garay, Eds., vol. 8043 of Lecture Notes inComputer Science, Springer, Heidelberg, pp. 90–108.

[17] Bernstein, D. J. ChaCha, a variant of Salsa20. http://cr.yp.to/chacha/chacha-20080128.pdf, 2008.

[18] Bernstein, D. J. The Salsa20 Family of Stream Ciphers. In New StreamCipher Designs - The eSTREAM Finalists. 2008, pp. 84–97.

[19] Bernstein, D. J. Cryptographic competitions: CAESAR callfor submissions. https://competitions.cr.yp.to/caesar-call.html,2014.

[20] Bernstein, D. J., Kölbl, S., Lucks, S., Massolino, P. M. C.,Mendel, F., Nawaz, K., Schneider, T., Schwabe, P., Standaert,F., Todo, Y., and Viguier, B. Gimli : A cross-platform permutation.In Cryptographic Hardware and Embedded Systems – CHES 2017 (Sept.2017), W. Fischer and N. Homma, Eds., vol. 10529 of Lecture Notes inComputer Science, Springer, Heidelberg, pp. 299–320.

[21] Bertoni, G., Daemen, J., Peeters, M., and Van Assche, G. Spongefunctions. In ECRYPT hash workshop (2007).

[22] Bertoni, G., Daemen, J., Peeters, M., and Van Assche, G. TheKeccak Reference Version 3.0. https://keccak.team/files/Keccak-reference-3.0.pdf, 2011.

[23] Bhargavan, K., and Leurent, G. On the practical (in-)security of64-bit block ciphers: Collision attacks on HTTP over TLS and OpenVPN.In ACM CCS 2016: 23rd Conference on Computer and CommunicationsSecurity (Oct. 2016), E. R. Weippl, S. Katzenbeisser, C. Kruegel, A. C.Myers, and S. Halevi, Eds., ACM Press, pp. 456–467.

[24] Biham, E., Biryukov, A., and Shamir, A. Cryptanalysis of Skipjackreduced to 31 rounds using impossible differentials. In Advances inCryptology – EUROCRYPT’99 (May 1999), J. Stern, Ed., vol. 1592 ofLecture Notes in Computer Science, Springer, Heidelberg, pp. 12–23.

[25] Biham, E., and Shamir, A. Differential cryptanalysis of DES-likecryptosystems. In Advances in Cryptology – CRYPTO’90 (Aug. 1991),A. J. Menezes and S. A. Vanstone, Eds., vol. 537 of Lecture Notes inComputer Science, Springer, Heidelberg, pp. 2–21.

[26] Biryukov, A., De Cannière, C., and Quisquater, M. On multiplelinear approximations. In Advances in Cryptology – CRYPTO 2004 (Aug.

62 BIBLIOGRAPHY

2004), M. Franklin, Ed., vol. 3152 of Lecture Notes in Computer Science,Springer, Heidelberg, pp. 1–22.

[27] Blaum, M., and Roth, R. M. On lowest density MDS codes. IEEETrans. Information Theory 45, 1 (1999), 46–59.

[28] Bogdanov, A., and Rijmen, V. Linear hulls with correlation zeroand linear cryptanalysis of block ciphers. Des. Codes Cryptography 70, 3(2014), 369–383.

[29] Bonnetain, X. Collisions on Feistel-MiMC and univariate GMiMC.Cryptology ePrint Archive, Report 2019/951, 2019. https://eprint.iacr.org/2019/951.

[30] Boura, C., and Canteaut, A. On the influence of the algebraic degreeof F−1 on the algebraic degree of GF . IEEE Trans. Information Theory59, 1 (2013), 691–702.

[31] Boura, C., Canteaut, A., and De Cannière, C. Higher-orderdifferential properties of Keccak and Luffa. In Fast Software Encryption– FSE 2011 (Feb. 2011), A. Joux, Ed., vol. 6733 of Lecture Notes inComputer Science, Springer, Heidelberg, pp. 252–269.

[32] Boyar, J., Matthews, P., and Peralta, R. On the shortestlinear straight-line program for computing linear forms. In MathematicalFoundations of Computer Science - MFCS 2008 (2008), E. Ochmanskiand J. Tyszkiewicz, Eds., vol. 5162 of Lecture Notes in Computer Science,Springer, pp. 168–179.

[33] Boyar, J., Matthews, P., and Peralta, R. Logic minimizationtechniques with applications to cryptology. J. Cryptology 26, 2 (2013),280–312.

[34] Briceno, M., Goldberg, I., and Wagner, D. A pedagogicalimplementation of the GSM A5/1 and a5/2 “voice privacy” encryptionalgorithms. http://cryptome.org/gsm-a512.htm, 1999.

[35] Buchberger, B. Ein Algorithmus zum Auffinden der Basiselementedes Restklassenringes nach einem nulldimensionalen Polynomideal. PhDthesis, Universitat Innsbruck, Austria, 1965.

[36] Canard, S., Diop, A., Kheir, N., Paindavoine, M., and Sabt,M. BlindIDS: Market-compliant and privacy-friendly intrusion detectionsystem over encrypted traffic. In ASIACCS 17: 12th ACM Symposiumon Information, Computer and Communications Security (Apr. 2017),R. Karri, O. Sinanoglu, A.-R. Sadeghi, and X. Yi, Eds., ACM Press,pp. 561–574.

BIBLIOGRAPHY 63

[37] Canteaut, A., Carpov, S., Fontaine, C., Lepoint, T., Naya-Plasencia, M., Paillier, P., and Sirdey, R. Stream ciphers: Apractical solution for efficient homomorphic-ciphertext compression. InFast Software Encryption - 23rd International Conference, FSE 2016,Bochum, Germany, March 20-23, 2016, Revised Selected Papers (2016),pp. 313–333.

[38] Carlet, C. Boolean functions for cryptography and error correctingcodes. In Boolean Models and Methods in Mathematics, ComputerScience, and Engineeringd., Y. Crama and P. L. Hammer, Eds. CambridgeUniversity Press, 2010, pp. 257–397.

[39] Carlet, C. Vectorial Boolean functions for cryptography. InBoolean Models and Methods in Mathematics, Computer Science, andEngineeringd., Y. Crama and P. L. Hammer, Eds. Cambridge UniversityPress, 2010, pp. 398–469.

[40] Cook, S. A. The complexity of theorem-proving procedures. InProceedings of the Third Annual ACM Symposium on Theory of Computing(New York, NY, USA, 1971), STOC ’71, ACM, pp. 151–158.

[41] Courtois, N., Bard, G. V., and Wagner, D. A. Algebraic and slideattacks on KeeLoq. In Fast Software Encryption – FSE 2008 (Feb. 2008),K. Nyberg, Ed., vol. 5086 of Lecture Notes in Computer Science, Springer,Heidelberg, pp. 97–115.

[42] Courtois, N., Klimov, A., Patarin, J., and Shamir, A. Efficientalgorithms for solving overdefined systems of multivariate polynomialequations. In Advances in Cryptology – EUROCRYPT 2000 (May 2000),B. Preneel, Ed., vol. 1807 of Lecture Notes in Computer Science, Springer,Heidelberg, pp. 392–407.

[43] Courtois, N., and Meier, W. Algebraic attacks on stream ciphers withlinear feedback. In Advances in Cryptology – EUROCRYPT 2003 (May2003), E. Biham, Ed., vol. 2656 of Lecture Notes in Computer Science,Springer, Heidelberg, pp. 345–359.

[44] Cui, T., Jia, K., Fu, K., Chen, S., and Wang, M. Newautomatic search tool for impossible differentials and zero-correlationlinear approximations. Cryptology ePrint Archive, Report 2016/689, 2016.https://eprint.iacr.org/2016/689.

[45] Daemen, J. Cipher and Hash Function Design. Strategies based on linearand differential cryptanalysis. PhD thesis, Katholieke Universiteit Leuven,1995.

64 BIBLIOGRAPHY

[46] Daemen, J., Peeters, M., Van Assche, G., and Rijmen, V. Nessieproposal: NOEKEON. Submission to NESSIE competition, 2000. http://gro.noekeon.org/Noekeon-spec.pdf.

[47] Daemen, J., and Rijmen, V. The Design of Rijndael: AES - TheAdvanced Encryption Standard. Information Security and Cryptography.Springer, 2002.

[48] De Cannière, C., and Preneel, B. Trivium. In New Stream CipherDesigns - The eSTREAM Finalists. 2008, pp. 244–266.

[49] Dell Security. Annual threat report. http://www.netthreat.co.uk/assets/assets/dell-security-annual-threat-report-2016-white-paper-197571.pdf, 2016.

[50] Dinur, I., Liu, Y., Meier, W., and Wang, Q. Optimized interpolationattacks on LowMC. In Advances in Cryptology – ASIACRYPT 2015,Part II (Nov. / Dec. 2015), T. Iwata and J. H. Cheon, Eds., vol. 9453 ofLecture Notes in Computer Science, Springer, Heidelberg, pp. 535–560.

[51] Dinur, I., Morawiecki, P., Pieprzyk, J., Srebrny, M., and Straus,M. Cube attacks and cube-attack-like cryptanalysis on the round-reducedKeccak sponge function. In Advances in Cryptology – EUROCRYPT 2015,Part I (Apr. 2015), E. Oswald and M. Fischlin, Eds., vol. 9056 of LectureNotes in Computer Science, Springer, Heidelberg, pp. 733–761.

[52] Dinur, I., and Shamir, A. Cube attacks on tweakable black boxpolynomials. In Advances in Cryptology – EUROCRYPT 2009 (Apr.2009), A. Joux, Ed., vol. 5479 of Lecture Notes in Computer Science,Springer, Heidelberg, pp. 278–299.

[53] Dinur, I., and Shamir, A. Breaking Grain-128 with dynamic cubeattacks. In Fast Software Encryption – FSE 2011 (Feb. 2011), A. Joux,Ed., vol. 6733 of Lecture Notes in Computer Science, Springer, Heidelberg,pp. 167–187.

[54] Dobraunig, C., Eichlseder, M., Grassi, L., Lallemand, V.,Leander, G., List, E., Mendel, F., and Rechberger, C. Rasta:A cipher with low ANDdepth and few ANDs per bit. In Advancesin Cryptology – CRYPTO 2018, Part I (Aug. 2018), H. Shacham andA. Boldyreva, Eds., vol. 10991 of Lecture Notes in Computer Science,Springer, Heidelberg, pp. 662–692.

[55] Dobraunig, C., Eichlseder, M., Mendel, F., and Schläffer, M.Ascon v1.2. Submission to Round 3 of the CAESAR competition, 2016.https://competitions.cr.yp.to/round3/asconv12.pdf.

BIBLIOGRAPHY 65

[56] Dodunekov, S. M., and Landgev, I. On near-MDS codes. J. Geom.54, 1 (1995), 30–43.

[57] Dong, X., Li, Z., Wang, X., and Qin, L. Cube-like attack on round-reduced initialization of Ketje Sr. IACR Trans. Symmetric Cryptol. 2017,1 (2017), 259–280.

[58] Duong, T., and Rizzo, J. Here come the ⊕ ninjas. Ekoparty, 2011.

[59] Duval, S., Lallemand, V., and Rotella, Y. Cryptanalysisof the FLIP family of stream ciphers. In Advances in Cryptology –CRYPTO 2016, Part I (Aug. 2016), M. Robshaw and J. Katz, Eds.,vol. 9814 of Lecture Notes in Computer Science, Springer, Heidelberg,pp. 457–475.

[60] Duval, S., and Leurent, G. MDS matrices with lightweight circuits.IACR Trans. Symmetric Cryptol. 2018, 2 (2018), 48–78.

[61] ETSI/SAGE Specification. Specification of the 3GPP confidentialityand integrity algorithms UEA2 & UIA2. Document 2: SNOW 3GSpecification Version 1.1, 2006.

[62] ETSI/SAGE Specification. Specification of the 3GPP Confidentialityand Integrity Algorithms 128-EEA3 & 128-EIA3. Document 2: ZUCSpecification Version 1.6, 2011.

[63] Even, S., Goldreich, O., and Lempel, A. A randomized protocolfor signing contracts. Commun. ACM 28, 6 (1985), 637–647.

[64] Faugère, J., and Joux, A. Algebraic cryptanalysis of hidden fieldequation (HFE) cryptosystems using Gröbner bases. In Advances inCryptology – CRYPTO 2003 (Aug. 2003), D. Boneh, Ed., vol. 2729 ofLecture Notes in Computer Science, Springer, Heidelberg, pp. 44–60.

[65] Faugère, J.-C. A new efficient algorithm for computing Gröbner bases(F4). Journal of pure and applied algebra 139, 1-3 (1999), 61–88.

[66] Faugère, J. C. A new efficient algorithm for computing Gröbner baseswithout reduction to zero (F5). In Proceedings of the 2002 InternationalSymposium on Symbolic and Algebraic Computation (New York, NY, USA,2002), ISSAC ’02, ACM, pp. 75–83.

[67] Feistel, H. Cryptography and Computer Privacy. Scientific American228, 5 (1973), 15–23.

66 BIBLIOGRAPHY

[68] Feng, X., Liu, J., Zhou, Z., Wu, C., and Feng, D. A byte-basedguess and determine attack on SOSEMANUK. In Advances in Cryptology– ASIACRYPT 2010 (Dec. 2010), M. Abe, Ed., vol. 6477 of Lecture Notesin Computer Science, Springer, Heidelberg, pp. 146–157.

[69] Fluhrer, S. R., Mantin, I., and Shamir, A. Weaknesses in the keyscheduling algorithm of RC4. In SAC 2001: 8th Annual InternationalWorkshop on Selected Areas in Cryptography (Aug. 2001), S. Vaudenayand A. M. Youssef, Eds., vol. 2259 of Lecture Notes in Computer Science,Springer, Heidelberg, pp. 1–24.

[70] Fouque, P., and Vannet, T. Improving key recovery to 784 and799 rounds of Trivium using optimized cube attacks. In Fast SoftwareEncryption – FSE 2013 (Mar. 2014), S. Moriai, Ed., vol. 8424 of LectureNotes in Computer Science, Springer, Heidelberg, pp. 502–517.

[71] Fuhr, T., and Paillier, P. Decryptable searchable encryption. InProvSec 2007: 1st International Conference on Provable Security (Nov.2007), W. Susilo, J. K. Liu, and Y. Mu, Eds., vol. 4784 of Lecture Notesin Computer Science, Springer, Heidelberg, pp. 228–236.

[72] Gerault, D., Minier, M., and Solnon, C. Constraint programmingmodels for chosen key differential cryptanalysis. In Principles and Practiceof Constraint Programming - CP 2016 (2016), M. Rueher, Ed., vol. 9892of Lecture Notes in Computer Science, Springer, pp. 584–601.

[73] Goldwasser, S., and Micali, S. Probabilistic encryption. Journal ofComputer and System Sciences 28, 2 (1984), 270 – 299.

[74] Golomb, S. W. Shift Register Sequences. Holden-Day Series inInformation Systems. San Francisco: Holden-Day, 1967.

[75] Grassi, L., Rechberger, C., Rotaru, D., Scholl, P., and Smart,N. P. MPC-friendly symmetric key primitives. In ACM CCS 2016: 23rdConference on Computer and Communications Security (Oct. 2016), E. R.Weippl, S. Katzenbeisser, C. Kruegel, A. C. Myers, and S. Halevi, Eds.,ACM Press, pp. 430–443.

[76] Gu, Z., Rothberg, E., and Bixby, R. Gurobi optimizer. http://www.gurobi.com/.

[77] Guo, J., Peyrin, T., and Poschmann, A. The PHOTON family oflightweight hash functions. In Advances in Cryptology – CRYPTO 2011(Aug. 2011), P. Rogaway, Ed., vol. 6841 of Lecture Notes in ComputerScience, Springer, Heidelberg, pp. 222–239.

BIBLIOGRAPHY 67

[78] Guo, J., Peyrin, T., Poschmann, A., and Robshaw, M. J. B. TheLED block cipher. In Cryptographic Hardware and Embedded Systems –CHES 2011 (Sept. / Oct. 2011), B. Preneel and T. Takagi, Eds., vol. 6917of Lecture Notes in Computer Science, Springer, Heidelberg, pp. 326–341.

[79] Hawkes, P., and Rose, G. G. Exploiting multiples of the connectionpolynomial in word-oriented stream ciphers. In Advances in Cryptology –ASIACRYPT 2000 (Dec. 2000), T. Okamoto, Ed., vol. 1976 of LectureNotes in Computer Science, Springer, Heidelberg, pp. 303–316.

[80] Hawkes, P., and Rose, G. G. Guess-and-determine attacks on SNOW.In SAC 2002: 9th Annual International Workshop on Selected Areas inCryptography (Aug. 2003), K. Nyberg and H. M. Heys, Eds., vol. 2595 ofLecture Notes in Computer Science, Springer, Heidelberg, pp. 37–46.

[81] Hell, M., Johansson, T., Maximov, A., and Meier, W. The Grainfamily of stream ciphers. In New Stream Cipher Designs - The eSTREAMFinalists. 2008, pp. 179–190.

[82] Huang, S., Wang, X., Xu, G., Wang, M., and Zhao, J. Conditionalcube attack on reduced-round Keccak sponge function. In Advances inCryptology – EUROCRYPT 2017, Part II (Apr. / May 2017), J. Coronand J. B. Nielsen, Eds., vol. 10211 of Lecture Notes in Computer Science,Springer, Heidelberg, pp. 259–288.

[83] Jakobsen, T., and Knudsen, L. R. The interpolation attack on blockciphers. In Fast Software Encryption – FSE’97 (Jan. 1997), E. Biham,Ed., vol. 1267 of Lecture Notes in Computer Science, Springer, Heidelberg,pp. 28–40.

[84] Jakobsen, T., and Knudsen, L. R. Attacks on block ciphers of lowalgebraic degree. J. Cryptology 14, 3 (2001), 197–210.

[85] Jean, J., Nikolic, I., Peyrin, T., and Seurin, Y. Deoxys v1.41.Submission to Round 3 of the CAESAR competition, 2016. https://competitions.cr.yp.to/round3/deoxysv141.pdf.

[86] Jean, J., Peyrin, T., Sim, S. M., and Tourteaux, J. Optimizingimplementations of lightweight building blocks. IACR Trans. SymmetricCryptol. 2017, 4 (2017), 130–168.

[87] Khoo, K., Peyrin, T., Poschmann, A. Y., and Yap, H. FOAM:searching for hardware-optimal SPN structures and components with afair comparison. In Cryptographic Hardware and Embedded Systems –CHES 2014 (Sept. 2014), L. Batina and M. Robshaw, Eds., vol. 8731 ofLecture Notes in Computer Science, Springer, Heidelberg, pp. 433–450.

68 BIBLIOGRAPHY

[88] Knellwolf, S., Meier, W., and Naya-Plasencia, M. Conditionaldifferential cryptanalysis of NLFSR-based cryptosystems. In Advances inCryptology – ASIACRYPT 2010 (Dec. 2010), M. Abe, Ed., vol. 6477 ofLecture Notes in Computer Science, Springer, Heidelberg, pp. 130–145.

[89] Knudsen, L. R. Truncated and higher order differentials. In FastSoftware Encryption – FSE’94 (Dec. 1995), B. Preneel, Ed., vol. 1008 ofLecture Notes in Computer Science, Springer, Heidelberg, pp. 196–211.

[90] Knudsen, L. R. DEAL - A 128-bit Block Cipher. Tech. rep., Departmentof Informatics, University of Bergen, Norway, 1998.

[91] Kölbl, S., Leander, G., and Tiessen, T. Observations on the SIMONblock cipher family. In Advances in Cryptology – CRYPTO 2015, Part I(Aug. 2015), R. Gennaro and M. J. B. Robshaw, Eds., vol. 9215 of LectureNotes in Computer Science, Springer, Heidelberg, pp. 161–185.

[92] Kranz, T., Leander, G., Stoffelen, K., and Wiemer, F. Shorterlinear straight-line programs for MDS matrices. IACR Trans. SymmetricCryptol. 2017, 4 (2017), 188–211.

[93] Krovetz, T., and Rogaway, P. OCB v1.1. Submission to Round 3of the CAESAR competition, 2016. https://competitions.cr.yp.to/round3/ocbv11.pdf.

[94] Lai, X. Higher order derivatives and differential cryptanalysis. InCommunications and Cryptography (1994), vol. 276 of The SpringerInternational Series in Engineering and Computer Science, pp. 227–233.

[95] Langford, S. K., and Hellman, M. E. Differential-linear cryptanalysis.In Advances in Cryptology – CRYPTO’94 (Aug. 1994), Y. Desmedt, Ed.,vol. 839 of Lecture Notes in Computer Science, Springer, Heidelberg,pp. 17–25.

[96] Li, C., and Preneel, B. Improved interpolation attacks oncryptographic primitives of low algebraic degree. In SAC 2019: 26thAnnual International Workshop on Selected Areas in Cryptography (Aug.2019), K. G. Paterson and D. Stebila, Eds., vol. 11959 of Lecture Notesin Computer Science, Springer, Heidelberg, pp. 171–193.

[97] Li, C., and Wang, Q. Design of lightweight linear diffusion layers fromnear-MDS matrices. IACR Trans. Symmetric Cryptol. 2017, 1 (2017),129–155.

[98] Li, S., Sun, S., Li, C., Wei, Z., and Hu, L. Constructing low-latency involutory MDS matrices with lightweight circuits. IACR Trans.Symmetric Cryptol. 2019, 1 (2019), 84–117.

BIBLIOGRAPHY 69

[99] Li, Y., and Wang, M. On the construction of lightweight circulantinvolutory MDS matrices. In Fast Software Encryption – FSE 2016 (Mar.2016), T. Peyrin, Ed., vol. 9783 of Lecture Notes in Computer Science,Springer, Heidelberg, pp. 121–139.

[100] Li, Z., Bi, W., Dong, X., and Wang, X. Improved conditional cubeattacks on Keccak keyed modes with MILP method. In ASIACRYPT2017 Part I (2017), T. Takagi and T. Peyrin, Eds., vol. 10624 of LNCS,Springer, pp. 99–127.

[101] Li, Z., Dong, X., and Wang, X. Conditional cube attack on round-reduced ASCON. IACR Trans. Symmetric Cryptol. 2017, 1 (2017), 175–202.

[102] Lidl, R., and Niederreiter, H. Finite Fields, vol. 20. Cambridgeuniversity press, 1997.

[103] Liu, M. Degree evaluation of NFSR-based cryptosystems. In Advancesin Cryptology – CRYPTO 2017, Part III (Aug. 2017), J. Katz andH. Shacham, Eds., vol. 10403 of Lecture Notes in Computer Science,Springer, Heidelberg, pp. 227–249.

[104] Liu, M., and Sim, S. M. Lightweight MDS generalized circulant matrices.In Fast Software Encryption – FSE 2016 (Mar. 2016), T. Peyrin, Ed.,vol. 9783 of Lecture Notes in Computer Science, Springer, Heidelberg,pp. 101–120.

[105] Liu, M., Yang, J., Wang, W., and Lin, D. Correlation cube attacks:From weak-key distinguisher to key recovery. In Advances in Cryptology– EUROCRYPT 2018, Part II (Apr. / May 2018), J. B. Nielsen andV. Rijmen, Eds., vol. 10821 of Lecture Notes in Computer Science,Springer, Heidelberg, pp. 715–744.

[106] Liu, Y. Techniques for Block Cipher Cryptanalysis. PhD thesis,Katholieke Universiteit Leuven, Belgium, 2018.

[107] MacWilliams, F., and Sloane, N. The Theory of Error-CorrectingCodes. North-Holland, 1977.

[108] Matsui, M. Linear cryptanalysis method for DES cipher. In Advancesin Cryptology – EUROCRYPT’93 (May 1994), T. Helleseth, Ed., vol. 765of Lecture Notes in Computer Science, Springer, Heidelberg, pp. 386–397.

[109] McGrew, D. A., and Viega, J. The security and performance of theGalois/Counter Mode (GCM) of operation. In Progress in Cryptology -INDOCRYPT 2004: 5th International Conference in Cryptology in India

70 BIBLIOGRAPHY

(Dec. 2004), A. Canteaut and K. Viswanathan, Eds., vol. 3348 of LectureNotes in Computer Science, Springer, Heidelberg, pp. 343–355.

[110] Méaux, P., Journault, A., Standaert, F., and Carlet, C. Towardsstream ciphers for efficient FHE with low-noise ciphertexts. In Advancesin Cryptology – EUROCRYPT 2016, Part I (May 2016), M. Fischlinand J.-S. Coron, Eds., vol. 9665 of Lecture Notes in Computer Science,Springer, Heidelberg, pp. 311–343.

[111] Meier, W., and Staffelbach, O. Fast correlation attacks on certainstream ciphers. J. Cryptology 1, 3 (1989), 159–176.

[112] Menezes, A. J., Van Oorschot, P. C., and Vanstone, S. A.Handbook of Applied Cryptography. CRC press, 1996.

[113] Mennink, B. Provable Security of Cryptographic Hash Functions. PhDthesis, Katholieke Universiteit Leuven, Belgium, 2013.

[114] Mouha, N. Automated Techniques for Hash Function and Block CipherCryptanalysis. PhD thesis, Katholieke Universiteit Leuven, Belgium, 2012.

[115] Mouha, N., Wang, Q., Gu, D., and Preneel, B. Differentialand linear cryptanalysis using mixed-integer linear programming. InInformation Security and Cryptology - Inscrypt 2011 (2011), C. Wu,M. Yung, and D. Lin, Eds., vol. 7537 of Lecture Notes in ComputerScience, Springer, pp. 57–76.

[116] Mroczkowski, P., and Szmidt, J. The cube attack on stream cipherTrivium and quadraticity tests. Fundam. Inform. 114, 3-4 (2012), 309–318.

[117] National Institute of Standards and Technology. Dataencryption standard (des). FIPS Publication 46-3, October 1999.

[118] NIST Lightweight Cryptography Standardization. Lightweightcryptography: Project overview. https://csrc.nist.gov/Projects/Lightweight-Cryptography, 2019.

[119] Perrin, L. Cryptanalysis, Reverse-Engineering and Design of SymmetricCryptographic Algorithms. PhD thesis, University of Luxembourg, 2017.

[120] Prud’homme, C., Fages, J.-G., and Lorca, X. Choco Documentation.TASC - LS2N CNRS UMR 6241, COSLING S.A.S., 2017.

[121] RABIN, M. O. How to exchange secrets by oblivious transfer. TechnicalMemo TR-81 (1981).

[122] Rescorla, E. The Transport Layer Security (TLS) Protocol Version 1.3.RFC 8446, Aug. 2018.

BIBLIOGRAPHY 71

[123] Rescorla, E., and Dierks, T. The Transport Layer Security (TLS)Protocol Version 1.2. RFC 5246, Aug. 2008.

[124] Rivest, R. L. The RC4 encryption algorithm, 1992.

[125] Rogaway, P. Authenticated-encryption with associated-data. In ACMCCS 2002: 9th Conference on Computer and Communications Security(Nov. 2002), V. Atluri, Ed., ACM Press, pp. 98–107.

[126] Ronen, E., Shamir, A., Weingarten, A., and O’Flynn, C. IoTgoes nuclear: Creating a ZigBee chain reaction. In 2017 IEEE Symposiumon Security and Privacy (May 2017), IEEE Computer Society Press,pp. 195–212.

[127] Sasaki, Y., and Todo, Y. New impossible differential search toolfrom design and cryptanalysis aspects - revealing structural properties ofseveral ciphers. In Advances in Cryptology – EUROCRYPT 2017, Part III(Apr. / May 2017), J. Coron and J. B. Nielsen, Eds., vol. 10212 of LectureNotes in Computer Science, Springer, Heidelberg, pp. 185–215.

[128] Selmer, E. Linear Recurrence Relations over Finite Fields. Departmentof Mathematics, University of Bergen, 1966.

[129] Shannon, C. E. Communication theory of secrecy systems. Bell SystemsTechnical Journal 28, 4 (1949), 656–715.

[130] Sheffer, Y., Holz, R., and Saint-Andre, P. Summarizing KnownAttacks on Transport Layer Security (TLS) and Datagram TLS (DTLS).RFC 7457, Feb. 2015.

[131] Sherry, J., Lan, C., Popa, R. A., and Ratnasamy, S. Blindbox:Deep packet inspection over encrypted traffic. In Proceedings of the 2015ACM Conference on Special Interest Group on Data Communication,SIGCOMM 2015 (2015), pp. 213–226.

[132] Shi, D., Sun, S., Derbez, P., Todo, Y., Sun, B., and Hu,L. Programming the Demirci-Selçuk meet-in-the-middle attack withconstraints. In Advances in Cryptology – ASIACRYPT 2018, Part II(Dec. 2018), T. Peyrin and S. Galbraith, Eds., vol. 11273 of Lecture Notesin Computer Science, Springer, Heidelberg, pp. 3–34.

[133] Shi, D., Sun, S., Sasaki, Y., Li, C., and Hu, L. Correlation ofquadratic boolean functions: Cryptanalysis of all versions of full \mathsfMORUS. In Advances in Cryptology – CRYPTO 2019, Part II (Aug.2019), A. Boldyreva and D. Micciancio, Eds., vol. 11693 of Lecture Notesin Computer Science, Springer, Heidelberg, pp. 180–209.

72 BIBLIOGRAPHY

[134] Sim, S. M., Khoo, K., Oggier, F. E., and Peyrin, T. LightweightMDS involution matrices. In Fast Software Encryption – FSE 2015 (Mar.2015), G. Leander, Ed., vol. 9054 of Lecture Notes in Computer Science,Springer, Heidelberg, pp. 471–493.

[135] Soos, M., Nohl, K., and Castelluccia, C. Extending SAT solversto cryptographic problems. In Theory and Applications of SatisfiabilityTesting - SAT 2009 (2009), O. Kullmann, Ed., vol. 5584 of Lecture Notesin Computer Science, Springer, pp. 244–257.

[136] Stoffelen, K. Optimizing S-box implementations for several criteriausing SAT solvers. In Fast Software Encryption - 23rd InternationalConference, FSE 2016, Bochum, Germany, March 20-23, 2016, RevisedSelected Papers (Mar. 2016), T. Peyrin, Ed., vol. 9783 of Lecture Notes inComputer Science, Springer, Heidelberg, pp. 140–160.

[137] Sun, B., Qu, L., and Li, C. New cryptanalysis of block ciphers with lowalgebraic degree. In Fast Software Encryption – FSE 2009 (Feb. 2009),O. Dunkelman, Ed., vol. 5665 of Lecture Notes in Computer Science,Springer, Heidelberg, pp. 180–192.

[138] Sun, S., Gerault, D., Lafourcade, P., Yang, Q., Todo, Y., Qiao,K., and Hu, L. Analysis of AES, SKINNY, and others with constraintprogramming. IACR Trans. Symmetric Cryptol. 2017, 1 (2017), 281–306.

[139] Sun, S., Hu, L., Wang, M., Wang, P., Qiao, K., Ma, X., Shi,D., Song, L., and Fu, K. Towards finding the best characteristics ofsome bit-oriented block ciphers and automatic enumeration of (related-key) differential and linear characteristics with predefined properties.Cryptology ePrint Archive, Report 2014/747, 2014. https://eprint.iacr.org/2014/747.

[140] Sun, S., Hu, L., Wang, P., Qiao, K., Ma, X., and Song, L.Automatic security evaluation and (related-key) differential characteristicsearch: Application to SIMON, PRESENT, LBlock, DES(L) and otherbit-oriented block ciphers. In Advances in Cryptology – ASIACRYPT 2014,Part I (Dec. 2014), P. Sarkar and T. Iwata, Eds., vol. 8873 of LectureNotes in Computer Science, Springer, Heidelberg, pp. 158–178.

[141] Todo, Y. Integral cryptanalysis on full MISTY1. In Advances inCryptology – CRYPTO 2015, Part I (Aug. 2015), R. Gennaro and M. J. B.Robshaw, Eds., vol. 9215 of Lecture Notes in Computer Science, Springer,Heidelberg, pp. 413–432.

[142] Todo, Y. Structural evaluation by generalized integral property. InAdvances in Cryptology – EUROCRYPT 2015, Part I (Apr. 2015),

BIBLIOGRAPHY 73

E. Oswald and M. Fischlin, Eds., vol. 9056 of Lecture Notes in ComputerScience, Springer, Heidelberg, pp. 287–314.

[143] Todo, Y. Division property: Efficient method to estimate upper boundof algebraic degree. In Paradigms in Cryptology - Mycrypt 2016 (2016),R. C. Phan and M. Yung, Eds., vol. 10311 of Lecture Notes in ComputerScience, Springer, pp. 553–571.

[144] Todo, Y., Isobe, T., Hao, Y., and Meier, W. Cube attackson non-blackbox polynomials based on division property. In Advancesin Cryptology – CRYPTO 2017, Part III (Aug. 2017), J. Katz andH. Shacham, Eds., vol. 10403 of Lecture Notes in Computer Science,Springer, Heidelberg, pp. 250–279.

[145] Todo, Y., Isobe, T., Hao, Y., and Meier, W. Cube attacks on non-blackbox polynomials based on division property. IEEE Trans. Computers67, 12 (2018), 1720–1736.

[146] Vanhoef, M., and Piessens, F. All your biases belong to us: BreakingRC4 in WPA-TKIP and TLS. In USENIX Security 2015: 24th USENIXSecurity Symposium (Aug. 2015), J. Jung and T. Holz, Eds., USENIXAssociation, pp. 97–112.

[147] Vaudenay, S. Security flaws induced by CBC padding - applications toSSL, IPSEC, WTLS ... In Advances in Cryptology – EUROCRYPT 2002(Apr. / May 2002), L. R. Knudsen, Ed., vol. 2332 of Lecture Notes inComputer Science, Springer, Heidelberg, pp. 534–546.

[148] Vielhaber, M. Breaking ONE.FIVIUM by AIDA an algebraic IVdifferential attack. IACR Cryptology ePrint Archive 2007 (2007), 413.

[149] Viswanath, G., and Rajan, B. S. A matrix characterization of near-MDS codes. Ars Comb. 79 (2006), 289–294.

[150] Wagner, D. A. The boomerang attack. In Fast Software Encryption –FSE’99 (Mar. 1999), L. R. Knudsen, Ed., vol. 1636 of Lecture Notes inComputer Science, Springer, Heidelberg, pp. 156–170.

[151] Wang, Q., Hao, Y., Todo, Y., Li, C., Isobe, T., and Meier,W. Improved division property based cube attacks exploiting algebraicproperties of superpoly. In Advances in Cryptology – CRYPTO 2018,Part I (Aug. 2018), H. Shacham and A. Boldyreva, Eds., vol. 10991 ofLecture Notes in Computer Science, Springer, Heidelberg, pp. 275–305.

[152] Wang, S., Hu, B., Guan, J., Zhang, K., and Shi, T. MILP-aidedmethod of searching division property using three subsets and applications.

74 BIBLIOGRAPHY

In Advances in Cryptology – ASIACRYPT 2019, Part III (Dec. 2019),S. D. Galbraith and S. Moriai, Eds., vol. 11923 of Lecture Notes inComputer Science, Springer, Heidelberg, pp. 398–427.

[153] Wang, X., and Yu, H. How to break MD5 and other hash functions.In Advances in Cryptology – EUROCRYPT 2005 (May 2005), R. Cramer,Ed., vol. 3494 of Lecture Notes in Computer Science, Springer, Heidelberg,pp. 19–35.

[154] Wang, X., Yu, H., and Yin, Y. L. Efficient collision search attackson SHA-0. In Advances in Cryptology – CRYPTO 2005 (Aug. 2005),V. Shoup, Ed., vol. 3621 of Lecture Notes in Computer Science, Springer,Heidelberg, pp. 1–16.

[155] Wegman, M. N., and Carter, L. New hash functions and their usein authentication and set equality. J. Comput. Syst. Sci. 22, 3 (1981),265–279.

[156] Whiting, D., Housley, R., and Ferguson, N. Counter with CBC-MAC (CCM). RFC 3610, Sept. 2003.

[157] Wiener, M. J. The full cost of cryptanalytic attacks. J. Cryptology 17(2004), 105–124.

[158] Wikipedia contributors. List of data breaches — Wikipedia, the freeencyclopedia. https://en.wikipedia.org/w/index.php?title=List_of_data_breaches&oldid=903986256, 2019.

[159] Wu, H. A new stream cipher HC-256. In Fast Software Encryption –FSE 2004 (Feb. 2004), B. K. Roy and W. Meier, Eds., vol. 3017 of LectureNotes in Computer Science, Springer, Heidelberg, pp. 226–244.

[160] Wu, H. The stream cipher HC-128. In New Stream Cipher Designs -The eSTREAM Finalists. 2008, pp. 39–47.

[161] Wu, H. Acorn v3. Submission to Round 3 of the CAESAR competition,2016. https://competitions.cr.yp.to/round3/acornv3.pdf.

[162] Wu, H., and Huang, T. The authenticated cipher MORUS (v2).Submission to Round 3 of the CAESAR competition, 2016. https://competitions.cr.yp.to/round3/morusv2.pdf.

[163] Wu, H., and Preneel, B. AEGIS: A fast authenticated encryptionalgorithm. In SAC 2013: 20th Annual International Workshop on SelectedAreas in Cryptography (Aug. 2014), T. Lange, K. Lauter, and P. Lisonek,Eds., vol. 8282 of Lecture Notes in Computer Science, Springer, Heidelberg,pp. 185–201.

BIBLIOGRAPHY 75

[164] Xiang, Z., Zhang, W., Bao, Z., and Lin, D. Applying MILPmethod to searching integral distinguishers based on division property for 6lightweight block ciphers. In Advances in Cryptology – ASIACRYPT 2016,Part I (Dec. 2016), J. H. Cheon and T. Takagi, Eds., vol. 10031 of LectureNotes in Computer Science, Springer, Heidelberg, pp. 648–678.

[165] Yang, J., Liu, M., and Lin, D. Cube cryptanalysis of round-reducedACORN. In ISC 2019: 22nd International Conference on InformationSecurity (Sept. 2019), Z. Lin, C. Papamanthou, and M. Polychronakis,Eds., vol. 11723 of Lecture Notes in Computer Science, Springer,Heidelberg, pp. 44–64.

[166] Yao, A. C. How to generate and exchange secrets (extended abstract).In 27th Annual Symposium on Foundations of Computer Science (Oct.1986), IEEE Computer Society Press, pp. 162–167.

[167] Ye, C.-D., and Tian, T. Revisit division property based cube attacks:Key-recovery or distinguishing attacks? IACR Transactions on SymmetricCryptology 2019, 3 (2019), 81–102.

Part II

Designs

76

Chapter 6

Design of Lightweight LinearDiffusion Layers fromNear-MDS Matrices

Publication data

Chaoyun Li, Qingju Wang: Design of Lightweight Linear Diffusion Layers fromNear-MDS Matrices. IACR Transactions on Symmetric Cryptology 2017(1):129-155, 2017

Contributions

Principal author. The security analysis is due to Qingju Wang.

77

Design of Lightweight Linear Diffusion Layersfrom Near-MDS Matrices

Chaoyun Li1 and Qingju Wang1,2,3?

1 imec-COSIC, Dept. Electrical Engineering (ESAT), KU Leuven, Leuven, [email protected]

2 DTU Compute, Technical University of Denmark, Lyngby, Denmark3 Department of Computer Science and Engineering, Shanghai Jiao Tong University,

Shanghai, [email protected]

Abstract. Near-MDS matrices provide better trade-offs between secu-rity and efficiency compared to constructions based on MDS matrices,which are favored for hardware-oriented designs. We present new de-signs of lightweight linear diffusion layers by constructing lightweightnear-MDS matrices. Firstly generic n × n near-MDS circulant matricesare found for 5 ≤ n ≤ 9. Secondly , the implementation cost of instan-tiations of the generic near-MDS matrices is examined. Surprisingly, forn = 7, 8, it turns out that some proposed near-MDS circulant matrices oforder n have the lowest XOR count among all near-MDS matrices of thesame order. Further, for n = 5, 6, we present near-MDS matrices of ordern having the lowest XOR count as well. The proposed matrices, togetherwith previous construction of order less than five, lead to solutions ofn × n near-MDS matrices with the lowest XOR count over finite fieldsF2m for 2 ≤ n ≤ 8 and 4 ≤ m ≤ 2048. Moreover, we present some involu-tory near-MDS matrices of order 8 constructed from Hadamard matrices.Lastly, the security of the proposed linear layers is studied by calculatinglower bounds on the number of active S-boxes. It is shown that our linearlayers with a well-chosen nonlinear layer can provide sufficient securityagainst differential and linear cryptanalysis.

Keywords: lightweight cryptography, diffusion layer , near-MDS matrix, branchnumber

1 Introduction

Symmetric-key cryptographic primitives including block ciphers, stream ciphersand hash functions, form the backbone of secure communication in modern so-ciety. Confusion and diffusion introduced by Shannon [Sha49] are widely usedtwin fundamental principles in the design of symmetric-key primitives. Mostmodern block ciphers and hash functions have well-designed confusion and dif-fusion layers. Among many design methods, substitution-permutation networks

? Corresponding authors.

78 DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES

(SPN) have been popular in the design of block ciphers and hash functions. Thebest understood structure of an SPN round function consists of a brick layerof local nonlinear permutations (usually S-boxes) followed by a multiplicationwith a diffusion matrix over a finite field (linear diffusion). Diffusion layers playan crucial role in providing resistance against the two most powerful statisticalattacks: differential cryptanalysis [BS91] and linear cryptanalysis [Mat94].

In 1994, Vaudenay [Vau95] proposes multipermutations as perfect diffusionlayers. It is worth noting that the linear multipermutations are exactly MaximumDistance Separable (MDS) matrices, which are defined from MDS codes [MS77].AES [DR02], the most prominent example of an SPN, uses an MDS matrix inthe MixColumns operation together with the ShiftRows operation to achievediffusion. In the context of the wide-trail strategy, the branch number of a lineardiffusion layer is defined to bound the probabilities of the best differential andlinear trails. Furthermore, linear diffusion layers based on MDS matrices havebeen shown to provide optimal diffusion properties in the wide-trail strategy forany AES-like ciphers [DR02].

The development of ubiquitous computing such as the Internet of Things(IoT) brings new security requirements to the fore. This leads to the researcharea of lightweight cryptography. Recently, the study on lightweight diffusionmatrices have been the focus of attention. Many constructions of lightweightMDS and involutory MDS matrices have been proposed [BKL16, CJK15, GR15,JV04, KPPY14, LW16, LS16, SDMO12, SS16, SKOP15]. Note that any elementof an MDS matrix over a finite field must be nonzero. Thus MDS matrices arevery dense and hence costly in hardware implementation. To further reduce thehardware cost, Guo et al. [GPP11, GPPR11] proposed a novel design approachof recursive (or serial) MDS matrices, which have a substantially lower hardwarearea at the cost of additional clock cycles [KPPY14, SKOP15]. Notable examplesinclude the block cipher LED [GPPR11] that has low area in hardware and thehardware-oriented lightweight hash function PHOTON [GPP11] which has beenstandardized in ISO/IEC 29192-5:2016.

However, MDS and recursive MDS matrices might not offer an optimal trade-off between security and efficiency. Near-MDS have sub-optimal branch numberswhile they require less area than MDS matrices and they do not need addi-tional clock cycles. Indeed, some diffusion layers constructed from near-MDSmatrices outperform those based on MDS or recursive MDS matrices in termsof the FOAM framework proposed by Khoo et al. [KPPY14]. Recently, near-MDS matrices have been adopted in some lightweight block ciphers, includingPRINCE [BCG+12], FIDES [BBK+13], PRIDE [ADK+14], Midori [BBI+15] andMANTIS [BJK+16]. On the one hand, low-power, low-energy or low-latencylightweight symmetric-key primitives is becoming increasingly important, andnear-MDS matrices are widely used in the design of dedicated lightweight blockciphers. On the other hand, there is insufficient research on the construction andsecurity properties of near-MDS matrices. These motivate us to present novelresults on linear diffusion layers constructed from near-MDS matrices.

DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES 79

Related work. First, we briefly introduce some previous work on the construc-tion of lightweight (involutory) MDS matrices. Recursive MDS matrices are animportant class of lightweight MDS matrices. The main idea of recursive MDSmatrices is to represent an MDS matrix as a power of a very sparse matrix suchas a companion matrix. In this way, the MDS matrix can be implemented byiterating the sparse matrix many times. Following this idea, much work has beendone to further reduce the hardware cost and improve the performance of recur-sive MDS matrices [SDMS12, WWW13, AF13, Ber13, AF15]. Another commonway is to generate efficient MDS matrices from some special types of matrices.Circulant matrices and their variants such as left-circulant matrices are popularcandidates for lightweight MDS matrices [JV04, GR15, LS16, LW16, BKL16].Involutory MDS matrices are useful in SPN structures as the same circuit can beused for both encryption and decryption. Many lightweight involutory MDS ma-trices have been constructed from Hadamard [SKOP15, LW16], Cauchy [YMT97,CJK15], Vandermonde [SDMO12] and left-circulant matrices [LS16]. More de-tailed surveys on the construction of lightweight (involutory) MDS matrices aregiven in [LS16, BKL16].

Next, we recall some pioneering work about near-MDS matrices which is themain focus of this paper. In 2008, Choy and Khoo [CK08] define the almost-MDSmatrices as diffusion matrices attaining a suboptimal differential branch num-ber. In [KPPY14] Khoo et al. adopt the term almost-MDS matrices to denotediffusion matrices with both suboptimal differential and linear branch numbers.Notice that the almost-MDS matrices in [CK08] match the almost MDS codes in-troduced in [dB96], whereas the almost-MDS matrices in [KPPY14] correspondto near-MDS codes proposed in [DL95]. To link the matrices and the corre-sponding linear codes, the matrices yielding both suboptimal differential andlinear branch numbers are called near-MDS matrices in this paper (more detailsin Sect. 2.1). Indeed, the theoretical results on near-MDS codes [DL95] formthe basis of study on near-MDS matrices. In particular, the characterization ofnear-MDS matrices shown in [VR06] will play an important role in the presentpaper.

Almost-MDS 0, 1-matrices of order less than or equal to four have been pro-posed in [CK08]. Note that these almost-MDS matrices in [CK08] are symmetricand hence near-MDS matrices. Then near-MDS 0, 1-matrices of order less thanor equal to four are obtained. Further, it is shown that the 0, 1-matrices oforder larger than four cannot be almost MDS and hence cannot be near-MDS[CK08]. In practice, the 0, 1-matrix of order four has been widely employed bymany block ciphers including PRINCE [BCG+12], FIDES [BBK+13], PRIDE[ADK+14], Midori [BBI+15] and MANTIS [BJK+16]. In [KPPY14], some in-stances of near-MDS matrices of order 4 and 8 over F24 and F28 are presented.

Our contributions. The main purpose of this paper is to construct lightweightnear-MDS matrices. Recall that near-MDS matrices of order less than five havebeen investigated in [CK08], hence we will focus on the matrices of order largerthan four. For 5 ≤ n ≤ 9, we present generic constructions of near-MDS matricesof order n over F2m , where m is a positive integer. Our work gives an answer

80 DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES

to an open problem proposed by Daemen and Rijmen [DR09] and Dodunekov[Dod09] in 2008. To our end, we first propose an algorithm for checking thenear-MDS property of a matrix and generating the near-MDS conditions for anear-MDS matrix. Circulant matrices are introduced to reduce the search space.By computer search, we obtain some generic n×n near-MDS circulant matriceswith optimal parameters for 5 ≤ n ≤ 9.

To illustrate the efficiency of our generic constructions, some instantiationsof the generic near-MDS matrices over F24 and F28 and their XOR count areprovided. A comparison shows that the XOR count of near-MDS matrices pro-posed in this paper can be around 65% of the XOR count of the best knownlightweight MDS matrices constructed in [BKL16, LS16]. Based on experimen-tal results, for n ≥ 8, we also show some nonexistence results on near-MDSmatrices with a small number of distinct entries. This demonstrates that genericnear-MDS matrices of order larger than eight have more complicated forms.

We further investigate the total XOR count of the constructed near-MDSmatrices in comparison with all other near-MDS matrices over finite fields. First,the exact value of the maximum occurrences of entries 0 and 1 are presented.Based on these results, for n = 7, 8, it turns out that some instantiations ofgeneric near-MDS circulant matrices of order n proposed in Sect. 3 have thelowest XOR count among all near-MDS matrices of the same order. Similarly,for n = 5, 6, we present some near-MDS matrices of order n having the lowestXOR count. Note that most previous diffusion matrices are optimal among somesubclasses rather than the whole space of the matrices with prescribed diffusionproperties. It is worth noticing that the near-MDS matrices in Sect. 4 are globaloptimal solutions, that is, they have the lowest XOR count among all near-MDSmatrices of the same order. Indeed, for 2 ≤ n ≤ 4, it is readily seen that the near-MDS matrices in [CK08] are global optimal solutions since they are composedof 0 and 1 with maximum occurrences. Thus, for 2 ≤ n ≤ 8 the global optimalsolution for n × n near-MDS matrices are obtained over finite fields F2m for4 ≤ m ≤ 2048.

We present some results on involutory near-MDS matrices in Sect. 5. First,involutory near-MDS of order 2, 3, 4 are summarized. Then for n > 4 we showthat there is no circulant involutory near-MDS matrix over finite fields. Next,the Hadamard matrices over finite fields are introduced and their properties arepresented. This leads to our constructions of involutory near-MDS matrices oforder 8 from Hadamard matrices.

To exploit near-MDS matrices in the design of linear layers, it is important toinvestigate the security properties of near-MDS matrices. Following the commonstrategy, we provide lower bounds on the number of differential and linear activeS-boxes for SPN structures using near-MDS matrices and ShiftRows as diffusionlayer. Our results indicate that the linear layers based on near-MDS matricescan provide sufficient security against differential and linear cryptanalysis.

The remainder of this paper is organized as follows. Section 2 introduces somebasic concepts and results on near-MDS matrices. In Sect. 3, some generic near-MDS matrices and their instantiations are presented. In Sect. 4, we give some

DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES 81

near-MDS matrices with the lowest XOR count among all near-MDS matrices ofthe same order. Results on involutory near-MDS are given in Sect. 5. A primarysecurity analysis on near-MDS based linear layers are provided in Sect. 6. Thefinal section concludes the paper.

2 Preliminaries

This section presents some background and results on linear diffusion layers,based on which we introduce the definition of near-MDS matrices. We also recallthe notion of XOR count to measure the lightweight property of a diffusionmatrix.

2.1 Linear diffusion layers

Most block ciphers and hash functions based on substitution-permutation net-work (SPN) structures have two layers in each round: the S-Box layer and thelinear diffusion layer. The S-Box layer is usually composed of several (not neces-sarily identical) S-boxes, while the linear diffusion layer is implemented by usingmatrices over finite fields.

Let F2 = 0, 1, and we denote by F2m a finite extension of F2 and Fn2mthe n-dimensional vector space over F2m , where m and n are positive integers.Indeed, for any linear mapping λ over Fn2m , there exists a matrix M such thatλ(v) = M · v. Hereafter, we represent a linear diffusion layer by a diffusionmatrix.

Given a vector v = (v0, v1, · · · , vn−1)T ∈ Fn2m , its bundle weight wtb(v) isequal to the number of non-zero components of v. The branch numbers of adiffusion matrix can be defined in terms of the bundle weight of vectors.

Definition 1. ([Dae95, DR02]) Let M be an n× n matrix over F2m . Then thedifferential branch number of an n× n matrix M over F2m is defined as

Bd(M) = minv 6=0wtb(v) + wtb(Mv) ,

and the linear branch number of M over F2m is defined as

Bl(M) = minv 6=0

wtb(v) + wtb(M

Tv).

Recall that the branch number can be characterized by the minimum distanceof linear codes.

Lemma 1. ([DR02]) Let M be an n × n matrix over F2m . Suppose that C isa [2n, n]-linear code over F2m with generator matrix (In|MT), where In is theidentity matrix of order n. Then the differential branch number of M equalsthe minimum distance of C, i.e., Bd(M) = d(C). Moreover, we have Bl(M) =d(C⊥), where C⊥ is the dual code of C.

82 DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES

Let C be an [n, k]-linear code. We call C an maximum distance separable(MDS) code if the Singleton bound is attained, i.e., d(C) = n−k+1 [MS77]. Ann× n matrix M is called an MDS matrix if the linear code CM with generatormatrix (In|MT) is MDS. An MDS matrix M attains the upper bounds of thebranch numbers simultaneously, i.e., Bd(M) = Bl(M) = d(CM ) = n+ 1 [DR02].

In this paper, we focus on the diffusion matrices which attain the largestbranch numbers among non-MDS matrices. Now the definition of a near-MDSmatrix can naturally be given in terms of branch numbers.

Definition 2. An n × n matrix M is called a near-MDS matrix if Bd(M) =Bl(M) = n.

In [DL95], an [n, k] near-MDS code C is defined by the conditions d(C) = n− kand d(C⊥) = k. Then by Lemma 1, for an n × n matrix M with Bd(M) =Bl(M) = n, the matrix [I|MT ] is exactly a generator matrix of a [2n, n, n] near-MDS code. This leads to the following characterization of a near-MDS matrix.

Lemma 2. ([VR06]) Let M be a non-MDS matrix of order n, where n is apositive integer with n ≥ 2. Then M is near-MDS if and only if for any 1 ≤ g ≤n − 1 each g × (g + 1) and (g + 1) × g submatrix of M has at least one g × gnon-singular submatrix.

We conclude this section with a useful property of branch numbers whichwill be used in the sequel.

Lemma 3. ([LS16]) For any permutation matrices P1 and P2, the two matricesM and P1MP2 have the same differential and linear branch numbers.

2.2 XOR count

The hardware implementation efficiency of operations is typically measured bythe area required. Note that the diffusion matrix can be implemented only withXOR gates, and this leads to the following definition.

Definition 3. ([KPPY14, SKOP15]) The XOR count of an element α ∈ F2m

is the number of XOR operations required to implement the multiplication of αwith an arbitrary element β ∈ F2m .

Given a basis of F2m , the multiplication by α can be represented by multiplicationwith a binary matrix A of order m. An obvious upper bound of the XOR countof α is the number of ones in A minus m, and this bound is defined as theexact XOR count in [KPPY14, SKOP15]. It turns out that this bound can beimproved in some cases [JPS15, BKL16]. Now we recall the definition of XORcount in terms of the matrices.

Definition 4. ([BKL16]) An invertible matrix A has XOR count t, denoted bywt⊕(A) = t, if t is the minimal number such that A can be written as

A = P

t∏

k=1

(I + Eik,jk) ,

DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES 83

where ik 6= jk for all k, P is a permutation matrix and Eik,jk is a matrix overF2 with all entries zero except the (ik, jk)-entry.

Following Definition 4, Beierle et al. [BKL16] further consider the problem ofoptimizing the XOR count of a given element in finite fields. For α ∈ F2m , m(x)is the minimal polynomial of α if m(α) = 0 and α is not a root of any nonzeropolynomial in F2[x] of lower degree. With the above definitions, some results areshown below.

Lemma 4. ([BKL16]) Let α ∈ F∗2m . Then we have:

(i) wt⊕(α) = 0 if and only if α = 1 while wt⊕(α) = 1 if and only if the minimalpolynomial m(x) of α is a trinomial of degree m;

(ii) wt⊕(α) = wt⊕(α−1);(iii) wt⊕(α±k) ≤ k · wt⊕(α) for k ≥ 1.

By Lemma 4(i), if there is no irreducible trinomial of degreem, then wt⊕(α) ≥2 for any α ∈ F∗2m . Indeed, in these cases, there exists some β ∈ F∗2m and a basisB of F2m such that wt⊕(α) = 2 for all m ≤ 2048 [BKL16]. Notice that there doesnot exist an irreducible trinomial of degree m if 8|m [Swa62]. Hence wt⊕(α) = 2is optimal in F28 , which will be used in the sequel.

3 Lightweight near-MDS circulant matrices

This section presents constructions of near-MDS circulant matrices over F2m ,where m is a positive integer. We propose an algorithm for checking the near-MDS property of a matrix and generating the near-MDS conditions for a near-MDS matrix. Circulant matrices are introduced to reduce the search space. Bycomputer search, we obtain some generic near-MDS circulant matrices with opti-mal parameters. We also show some nonexistence results on near-MDS matriceswith a small number of distinct entries. Finally, some instantiations of the genericnear-MDS matrices and their XOR count are provided.

3.1 Approach to constructing generic near-MDS matrices

This section presents our main approach to constructing near-MDS matricesover F2m , where m is a positive integer. An algorithm is proposed to check thenear-MDS property of a candidate matrix and generate the near-MDS conditionsif the matrix is near-MDS.

Main approach. To construct generic near-MDS matrices, the entries of thecandidate matrices are supposed to be in the quotient field of F2[x]. Specifically,we suppose that the matrix contains 0 and nonzero entries in 〈x〉, where 〈x〉 =xk | k ∈ Z. Based on Lemma 2, we propose an algorithm for checking thenear-MDS property and generating the near-MDS conditions via polynomials inF2[x]. Then one can substitute the indeterminate x with any α ∈ F2m satisfyingall the conditions for the matrices to be near-MDS. Consequently, lightweightnear-MDS can be obtained by choosing the element α as light as possible.

84 DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES

Checking the near-MDS property. By Lemma 2, a non-MDS matrix M is near-MDS if and only if for any 1 ≤ g ≤ n−1 each g×(g+1) and (g+1)×g submatrixof M has at least one g×g non-singular submatrix. Then, to check the near-MDSproperty, one needs to compute the determinants of all the g × g submatricesof a given g × (g + 1) or (g + 1)× g submatrix of M . For the matrix composedof entries in 〈x〉, it is readily seen that the determinant of any submatrix isa quotient of two polynomials in F2[x]. Furthermore, the determinant of anysubmatrix is nonzero if and only if the numerator of the determinant is nonzero.Hence, for simplicity, we will consider the numerator of the determinant ratherthan the determinant itself.

After obtaining the numerators of the determinants of all g × g submatri-ces, it suffices to further check if there is at least one nonzero numerator. Weintroduce a result to simplify the process. Denote by gcd(f(x), g(x)) the great-est common divisor of two polynomials f(x), g(x) over F2[x]. By convention, letgcd(f(x), 0) = f(x). Let n > 1, we denote

gcd(f1(x), f2(x), · · · , fn(x)) = gcd(f1(x), gcd(f2(x), · · · , gcd(fn−1(x), fn(x)) · · · )).

Then the following lemma holds.

Lemma 5. Let f1(x), f2(x), · · · , fk(x) be k polynomials in F2[x], where k isa positive integer. Then there exists at least one nonzero fi(x) if and only ifgcd(f1(x), f2(x), · · · , fk(x)) 6= 0.

Proof. It is equivalent to prove that f1(x) = f2(x) = · · · = fk(x) = 0 if and onlyif gcd(f1(x), f2(x), · · · , fk(x)) = 0, which is trivial.

For any given g × (g + 1) or (g + 1) × g submatrix of M , Lemma 5 impliesthat it suffices to check the greatest common divisor of the numerators of thedeterminants of all g×g submatrices. If the greatest common divisor is nonzero,one can decompose the nonzero greatest common divisor into irreducible factorsand collect the factors in a condition set S. Otherwise, the matrix is not near-MDS. The procedure is described in Algorithm 1.

Suppose that for the matrix M the condition set S is output by Algorithm1. By substituting x with α ∈ F2m , the concrete matrix M(α) is near-MDS ifand only if α is not a root of any polynomial in the set S.

It should be pointed out that the main approach has been adopted in con-structing recursive MDS matrices [WWW13, SDMS12, AF13]. In these works,the authors investigate matrices with entries of the form

∑i aiL

i, and deducethe MDS conditions by polynomials in L, where L is a sparse nonsingular m×mbinary matrix and ai ∈ F2. Recently, this method was exploited by Beierle et al.in [BKL16] to generate generic lightweight MDS matrices with entries in finitefields. These previous works motivate our approach to producing generic near-MDS matrices over finite fields. In this section, by Lemmas 2 and 5, Algorithm1 are proposed to check the near-MDS property of a matrix and generate thenear-MDS conditions for a near-MDS matrix.

DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES 85

Algorithm 1 Check near-MDS property and generate near-MDS conditions

Input: an n× n matrix M with entries in 0 ∪ 〈x〉Output: a condition set S if M is near-MDS and ⊥ otherwise1: S ← ∅2: for g ∈ [1, n− 1] do . n is the order of M3: for all g × (g + 1) and (g + 1)× g submatrix A of M do4: f(x)← 0, T ← ∅5: for all g × g submatrices B of A do . there are (g + 1) submatrices6: Compute the numerator p(x) of the fraction det(B)7: f(x)← gcd(f(x), p(x)) . Compute the greatest common divisor

8: if f(x) = 0 then9: return ⊥ . M is not near-MDS

10: else11: Compute the set T of the irreducible factors of f(x)12: S ← S ∪ T13: return S

3.2 Generic near-MDS circulant matrices

We describe the strategy for searching near-MDS circulant matrices. Some genericnear-MDS circulant matrices with optimal parameters are shown. We also presentsome nonexistence results on near-MDS matrices with a small number of distinctentries.

Circulant matrices. Circulant matrices are widely adopted in the design of dif-fusion matrices. Following this approach, we will focus on circulant matrices inthis section. First, the definition of a circulant matrix is recalled.

Definition 5. An n×n matrix M is circulant if its rows are generated by succes-sive cyclic shifts of its first row. That is, there exist n elements a0, a1, · · · , an−1such that the (i, j)-entry of M can be represented by M [i, j] = a(j−i)modn. Wedenote the matrix M by circ(a0, a1, · · · , an−1).

The fact that each row of a circulant matrix is a cyclic shift of the firstrow enables one reuse the multiplication circuit to save implementation cost[KPPY14, LS16, SKOP15]. To construct the lightest diffusion matrices, it isnatural to consider the circulant matrix with lightest field elements. It seemsthat the 0, 1-matrices are the best candidates. Choy and Khoo [CK08] provedthe following results on 0, 1-matrices over finite fields.

Lemma 6. ([CK08]) Let n be a positive integer. For n = 2, 3, 4, the n × ncirculant matrices

circ(0, 1, · · · , 1) =

0 1 · · · 11 0 · · · 1...

.... . .

...1 1 · · · 0

86 DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES

are near-MDS matrices over any finite field. For n > 4, any 0, 1-matrices oforder n cannot be near-MDS.

It is readily seen that the matrices proposed in Lemma 6 are optimal near-MDS matrices in terms of XOR count. The 4× 4 matrix circ(0, 1, 1, 1) has beenadopted in several lightweight block ciphers such as PRINCE [BCG+12], FIDES[BBK+13], PRIDE [ADK+14] and Midori [BBI+15], and MANTIS [BJK+16].

For n ≥ 5, the nonexistence of near-MDS 0, 1-matrices leads to the studyof circulant matrices with three or more distinct elements such as matrices withentries in the set 0, 1, γ, where γ ∈ F2m \0, 1. We use the criteria for efficientdiffusion matrices proposed by Junod and Vaudenay [JV04] to maximize theoccurrences of 0 and 1. Let g = 1 in Lemma 2, then there is at most one 0 ineach row and each column of M . Hence, the following result can be obtained.

Proposition 1. Suppose that n ≥ 5 and circ(a0, a1, · · · , an−1) is near-MDS.Let Nδ be the number of δ in the multiset a0, a1, · · · , an−1. Then, we have

N0 ≤ 1 .

Moreover, if N0 = 1, then N1 ≤ n− 2.

Search strategy. Our main idea is to reduce the search space and explore themost efficient matrices first. By Proposition 1, we always assume that N0 = 1.Lemma 3 indicates that the branch numbers of a circulant matrix are preservedby rotation of the first row. Hence, we will focus on circulant matrices of theform

circ(0, a1, a2, · · · , an−1) , (1)

where ai ∈ 〈x〉 for 1 ≤ i ≤ n − 1 and 〈x〉 = xk | k ∈ Z. To explore themost efficient matrices first, we restrict the nonzero entries ai’s of the circulantmatrices to elements in the set 1, x, x−1 ⊆ 〈x〉.

We first exhaustively search the matrices with maximal N1. Note that theXOR count of x and x−1 are identical, hence they will be treated equally. Morespecifically, N1 is initialized as n − 2 and Nx + Nx−1 as 1. For any candidatematrix, we apply Algorithm 1 to check the near-MDS property and generatethe condition set (cf. Algorithm 1) if it is near-MDS. If no near-MDS matrixis found, then Nx + Nx−1 is increased by one and N1 decreased by one. Thisexhaustive search process continues until all near-MDS with optimal parametersare found.

Experimental results. We present some experimental results on near-MDS ma-trices of order n for 5 ≤ n ≤ 9. In Table 1, the parameters N0, N1, Nx, Nx−1

with maximum N1 and minimum Nx+Nx−1 are listed. We also show some goodmatrices and the corresponding near-MDS conditions in Table 2. A concretenear-MDS matrix is obtained by substituting x with α ∈ F2m such that α is nota root of any polynomial listed in the Conditions column of Table 2. Moreover,the determinants of the constructed matrices are given. For the complete list,see Table 9 in Appendix B.

DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES 87

Table 1. Optimal parameters with maximum N1 and minimum Nx +Nx−1

n N0 N1 Nx Nx−1 Number of Matrices

5 1 3 1 - 46 1 3 2 - 5

1 3 1 1 187 1 4 1 1 128 1 4 1 2 8

1 4 2 1 89 1 2 2 4 6

1 2 4 2 6

For n = 7, we also find near-MDS matrices consisting of three distinct ele-ments 0, 1, x withN1 = 3 andNx = 3. For instance, the matrix circ(0, x, x, 1, x, 1, 1)is near-MDS under the following conditions

x, x+ 1, x2 + x+ 1, x3 + x+ 1, x3 + x2 + 1 .

However, for 8 ≤ n ≤ 10, experimental results 0, 1, x do not suffice to constructnear-MDS circulant matrices of order n. Indeed, this fact holds for all n ≥ 8and we summarize it in the following theorem, the proof of which is given inAppendix A.

Theorem 1. For any n ≥ 8, there is no near-MDS circulant matrix with threedistinct entries 0, 1, x, where x ∈ F2m \ 0, 1.It is worth noting that Theorem 1 partially generalizes the results of Lemma 6.Further, we pose the following conjecture based on experimental results.

Conjecture 1. For any n ≥ 10, there is no near-MDS circulant matrix with fourdistinct entries 0, 1, x, x−1, where x ∈ F2m \ 0, 1.

3.3 Instantiations of generic near-MDS matrices

Considering the cryptographic applications of diffusion matrices, we focus onF2m with m = 4 and 8. It is readily seen that the discussion in this section alsoapplies to any other m. For 5 ≤ n ≤ 8, we list in Table 3 some n× n near-MDSmatrices over F24 and F28 from the generic matrices proposed in Table 2. Theminimal polynomial of the nonzero elements of each matrix and XOR count ofthe first row are also presented.

We first explain how to choose the minimal polynomial of α in the case thatn = 5. According to Table 2, the matrix circ(0, α, 1, 1, 1) is near-MDS if and onlyif α is not a root of any of the following polynomials:

x, x+ 1, x2 + x+ 1.

This implies that the minimal polynomial of α can be any irreducible polynomialexcept for the above three polynomials. For m = 4, one can take the minimal

88 DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES

Table 2. Examples of generic near-MDS circulant matrices of order 5 ≤ n ≤ 9

n Coefficients of the first row Conditions to be near-MDS Determinants

5 (0, x, 1, 1, 1)x

x5 + x3 + x+ 1x+ 1x2 + x+ 1

6 (0, x, 1, 1, 1, x)x x4

x+ 1x2 + x+ 1

7 (0, x, 1, x−1, 1, 1, 1)

xx+ 1 x7 + x5 + x−1+x2 + x+ 1 x−3 + x−5 + x−7

x3 + x+ 1x3 + x2 + 1x4 + x3 + x2 + x+ 1

8 (0, x, 1, x, x−1, 1, 1, 1)

x

x−8

x+ 1x2 + x+ 1x3 + x+ 1x3 + x2 + 1x4 + x3 + x2 + x+ 1x5 + x4 + x3 + x2 + 1

9 (0, x, x−1, x, x, x−1, 1, 1, x)

x

0

x+ 1x2 + x+ 1x3 + x+ 1x3 + x2 + 1x4 + x+ 1x4 + x3 + 1x4 + x3 + x2 + x+ 1x5 + x2 + 1x5 + x3 + 1x5 + x3 + x2 + x+ 1x5 + x4 + x2 + x+ 1x5 + x4 + x3 + x+ 1x5 + x4 + x3 + x2 + 1x6 + x5 + x4 + x2 + 1x7 + x4 + x3 + x2 + 1x7 + x6 + x4 + x+ 1x12 + x11 + x10 + x9+x8 + x7 + x6 + x2 + 1

DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES 89

polynomial as x4 +x+1 or x4 +x3 +1 since wt⊕(α) is minimal, i.e., wt⊕(α) = 1in these cases (see Table 3 in [BKL16]). Further, to make the matrix nonsingular,x4 + x+ 1 is selected as the minimal polynomial of α since

det(circ(0, α, 1, 1, 1)) = α5 + α3 + α+ 1 = (α+ 1)(α4 + α3 + 1).

For m = 8, as shown in Sect. 2.2, wt⊕(α) = 2 is best possible. One of thepolynomials attaining the minimal XOR count is x8 +x4 +x3 +x+ 1 (see Table7 in [BKL16]). In this way, two efficient near-MDS matrices over F24 and F28

are constructed respectively from one generic near-MDS matrix in Table 2. Theother matrices in Table 3 are generated in the same manner.

To compute the XOR count of a circulant matrix, it is convenient to onlyconsider the XOR count of the first row [KPPY14, LS16, BKL16]. For an n× ncirculant matrix A over F2m , the XOR count of the first row is

(c0 + c1 + · · ·+ cn−1) + (z − 1)m,

where ci is the XOR count of the i-th entry in the row, z is the number of nonzeroelements in the row. For instance, the XOR count of the first row of the matrixcirc(0, α, 1, 1, 1) is 1 + 3× 4 = 13 since wt⊕(α) = 1 and wt⊕(0) = wt⊕(1) = 0.

Table 3. Near-MDS circulant matrices of order 5 ≤ n ≤ 8 over finite field F24 and F28

Finite fields n First row Minimal polynomial of α XOR counts

5 (0, α, 1, 1, 1) x4 + x+ 1 1 + 3× 4 = 13F24 6 (0, α, 1, 1, 1, α) x4 + x+ 1 2 + 4× 4 = 18

7 (0, α, 1, α−1, 1, 1, 1) x4 + x+ 1 2 + 5× 4 = 228 (0, α, 1, α, α−1, 1, 1, 1) x4 + x+ 1 3 + 6× 4 = 27

5 (0, α, 1, 1, 1) x8 + x4 + x3 + x+ 1 2 + 3× 8 = 26F28 6 (0, α, 1, 1, 1, α) x8 + x4 + x3 + x+ 1 4 + 4× 8 = 36

7 (0, α, 1, α−1, 1, 1, 1) x8 + x4 + x3 + x+ 1 4 + 5× 8 = 448 (0, α, 1, α, α−1, 1, 1, 1) x8 + x4 + x3 + x+ 1 6 + 6× 8 = 54

Table 4 compares the efficiency of near-MDS matrices proposed in this paperwith the best known lightweight MDS matrices constructed in [BKL16, LS16].It is readily seen that the XOR counts of near-MDS matrices can be around65% of the XOR counts of MDS ones of the same order. However, since thenear-MDS matrices have slower diffusion than the MDS ones, a fair comparisonshould be carried out within a framework combining the security properties andimplementation cost. A notable attempt in this direction is the new comparisonmetric figure of adversarial merit (FOAM) proposed by Khoo et al. in [KPPY14].

4 Near-MDS matrices with the lowest XOR count

It is not easy to define an optimal near-MDS matrix in terms of implemen-tation cost, since the cost of the matrix largely depends on the implemen-

90 DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES

Table 4. Comparison of XOR counts of near-MDS circulant matrices and known (non-involutory) MDS matrices of order 5 ≤ n ≤ 8 over F24 and F28

F24 F28

nNear-MDS MDS Near-MDS MDS

Table 3 [LS16] Table 3 [BKL16]

5 13 20 26 406 18 32 36 547 - - 44 648 - - 54 82

tation technologies. Among many criteria for efficient diffusion matrices, theXOR count of the matrix is a major feature in various implementation meth-ods [KPPY14, LS16, BKL16]. In this section, we concentrate on the total XORcount of near-MDS matrices over finite fields. First, the exact value of themaximum occurrences of special entries 0 and 1 are provided. Based on theseresults, one can prove that for n = 7, 8 the instantiations of generic near-MDSmatrices in Sect. 3 have the lowest XOR count among all near-MDS matrices ofthe same order. Moreover, for n = 5, 6, we also present some near-MDS matricesof order n having the lowest XOR count.

Let M be a near-MDS matrix of order n over F2m . Denote by v0(M) thenumber of entries in M equal to 0 and v1(M) the number of entries in M equalto 1. Let vn0 be the maximum value of v0(M) for all M . Similarly, let vn1 denotethe maximum value of v1(M) for all M . Then, Proposition 1 and the results inTable 1 together show the following result on vn0 for 5 ≤ n ≤ 8.

Lemma 7. Let M be a near-MDS matrix of order n. Then we have vn0 = n for5 ≤ n ≤ 8.

Now we can present the upper bounds on vn1 .

Proposition 2. Let M be a near-MDS matrix of order n. For 5 ≤ n ≤ 8, theupper bounds of vn1 are shown in Table 5.

Table 5. The upper bounds of vn1 for near-MDS matrix M of order n with 5 ≤ n ≤ 8

n 5 6 7 8

Upper bounds of vn1 16 21 28 32

Proof. We only prove the case that n = 8 since the other cases can be provedin the same manner. Suppose that M is a near-MDS matrix of order 8. Let kbe the maximum occurrence of the entry 1 in a row of M . To derive the upperbound, we discuss five cases in terms of the value of k.

DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES 91

Case 1. Suppose that k = 8 and the first row has eight ones. We claim thatthere are at most two ones in any other row of M . Otherwise, assume that thereare at least three ones in some row i with i 6= 1. For instance, M can be writtenas

1 1 1 1 1 1 1 1...

......

......

......

...∗ 1 ∗ ∗ 1 1 ∗ ∗...

......

......

......

...

.

It is easy to verify that M has a submatrix(

1 1 11 1 1

).

Then by Lemma 2, M is not near-MDS, which is a contradiction. Thus, theclaim is correct and we have v1(M) ≤ k + 2(n− 1) = 22 .Case 2. Assume that k = 7 and the first row contains seven ones. It followsfrom Lemma 3 that the near-MDS property is preserved under the permutationof columns of M . Without loss of generality, we can always assume that the firstrow is (∗1111111). Then we can claim that there are at most three ones in anyother row of M . Otherwise, assume that there are at least four ones in some rowi with i 6= 1. This implies that there are at least three ones among the last fourentries in row i, as shown below

∗ 1 1 1 1 1 1 1...

......

......

......

...∗ ∗ 1 ∗ 1 1 1 ∗...

......

......

......

...

.

Similarly, we can derive a contradiction, which indicates the claim is correct.Hence, we have v1(M) ≤ k + 3(n− 1) = 28 .Case 3. Let k = 6 and the first row contains six ones. Without loss of generality,we assume that the first row is (∗ ∗ 111111). The remaining seven rows arepartitioned into two blocks A and B, as shown below

(∗∗ 1 1 1 1 1 1

A B

).

Similar to Case 1, there are at most two ones in each row of B. Note that the2× 7 block A cannot contain a submatrix of the form

1 11 11 1

.

This implies that the block A contains at most nine ones. Thus, we have v1(M) ≤k + 2(n− 1) + 9 = 29 .

92 DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES

Case 4. Let k = 5 and the first row contains five 1. Without loss of generality,we assume that the first row is (∗ ∗ ∗11111). The remaining rows are partitionedinto two blocks, as shown below

(∗ ∗ ∗ 1 1 1 1 1

C D

).

Similar to Case 1, there are at most two 1’s in each row of D. Since M is near-MDS, the 3 × 7 block C cannot contain a submatrix having one of the twoforms: (

1 1 11 1 1

)

1 11 11 1

.

By considering the maximum occurrence of 1 in a row of C, one can deduce thatC contains at most 13 ones. Then we have v1(M) ≤ k + 2(n− 1) + 13 = 32 fork = 5.Case 5. For k ≤ 4, we have v1(M) ≤ kn ≤ 32.

By combining the above five cases, we have v1(M) ≤ 32 for any M . Thisgives v81 ≤ 32.

The explicit near-MDS matrices shown in Table 1 give lower bounds for vn1 .These together with Proposition 2 yield the following result.

Corollary 1. We have v71 = 28 and v81 = 32.

By Corollary 1, one can show that the instantiations of the generic near-MDSmatrices proposed in Sect. 3.2 are optimal in terms of XOR counts.

Theorem 2. Let C be the set of generic near-MDS matrices proposed in Sect. 3.2.For n = 7, 8 and m ≥ 4, if α is a lightest element in F2m and α satisfies thenear-MDS conditions, then the respective instantiations of the matrices in C havethe lowest XOR count among all near-MDS matrices of the same order.

Proof. Let M be a near-MDS matrix of order n. The XOR count of the matrixM can be written as

(n(n− 1)− v0(M))m+∑

β 6=0,1

wt⊕(β) ,

where the sum is over all entries of M not equal to 0 or 1. It follows that

(n(n− 1)− v0(M))m+∑

β 6=0,1

wt⊕(β) ≥ (n(n− 1)− vn0 )m

+(n2 − vn0 − vn1 ) minγ 6=0,1

wt⊕(γ) .

The lower bound holds if and only if M satisfies the following conditions:

1. both vn0 and vn1 are attained2. any entry ofM not equal to 0 or 1 has the lowest XOR count, i.e., min

γ 6=0,1wt⊕(γ) .

DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES 93

For n = 7, 8, it is easy to see that each instantiation of a matrix in C of ordern satisfies the first condition. If α is a lightest element in F2m , then so is α−1

by Lemma 4(ii). Note that α and α−1 are the only entries of M not equal to 0or 1. This leads to the second condition. Hence, the respective instantiations ofthe matrices in C achieve the lower bound of XOR count and have the lowestXOR count among all near-MDS matrices of the same order. Thus, the theoremis proved.

By Theorem 2, for near-MDS matrices of order 7 and 8, the minimum XORcount can be achieved by circulant matrices. As circulant matrices have manydesirable properties such as efficient serial implementations [LS16], it is interest-ing to study the existence of near-MDS circulant matrices attaining minimumXOR count for other orders. However, as shown by subsequent results, circulantmatrices of order 5 and 6 cannot achieve minimum XOR count. For n = 5 and 6,we give further results on n× n near-MDS matrices with maximum occurrencesof entries 0 and 1. Some experimental results are presented.

To determine v51 and v61 , by Proposition 2, it suffices to study the existence ofthe near-MDS matrix achieving the upper bound in Table 5. We performed anexhaustive search for near-MDS matrices of order 5 and 6 satisfying the followingconditions:

– entries from 0, 1, x– v1(M) taken as the upper bound in Table 5– v0(M) = n and the main diagonal consists of zeros

Experimental results give an affirmative answer to the existence of generic near-MDS matrices of order 5 and 6 satisfying both v0(M) = n and the value v1(M)attains the upper bound in Table 5. For instance, the following 5× 5 matrix

0 α 1 1 11 0 α 1 11 1 0 α 1α 1 1 0 11 1 1 1 0

(2)

is near-MDS for any α 6= 0, 1 while the 6× 6 matrix

0 α α 1 1 11 0 1 α 1 11 1 0 1 α 11 1 α 0 1 α1 α 1 1 0 αα 1 1 1 1 0

(3)

is near-MDS for any α 6= 0, 1 and α2 + α + 1 6= 0. This implies v51 = 16 andv61 = 21. Since v1(M) must be a multiple of the order of M if M is circulant,circulant matrices of order 5 and 6 cannot achieve v51 and v61 respectively. Hence,they cannot attain minimum XOR count. The following corollary summarizesthe above results.

94 DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES

Corollary 2. We have v51 = 16 and v61 = 21. Moreover, circulant matrices oforder 5 and 6 cannot attain minimum XOR count.

Consequently, one can show following result similarly to Theorem 2.

Theorem 3. For n = 5, 6 and m ≥ 3, if α is a lightest element in F2m and αsatisfies the near-MDS conditions, then the generic near-MDS matrices of ordern given by (2) and (3) have instantiations with the lowest XOR count among allnear-MDS matrices of the same order over F2m .

Lightest elements in F2m . With the aid of Theorems 2 and 3, the problem ofconstructing near-MDS matrices with lowest XOR count over F2m can be reducedto choosing α as the lightest element in F2m . Recall that the only restriction onα is that it cannot be a root of any polynomial in the corresponding conditionset. Now we give a primary analysis of the existence of lightest α satisfying thenear-MDS conditions.

Suppose that m ≥ 4. If there exists an irreducible trinomial of degree m, thenby Lemma 4 (i) the lightest α is obtained by taking its minimal polynomial asthe irreducible trinomial. For example, there is irreducible trinomial of degree mfor m ≤ 7. However, there exist no irreducible trinomial of degree m for certainm, such as m = 8. In this case, we recall the following fact (more details can befound in Sect. 3.2 of [BKL16]).

Fact 1 ([BKL16]) For all m ≤ 2048 for which no irreducible trinomial of degreem exists, there is γ ∈ F2m having the optimal XOR count, i.e., wt⊕(γ) = 2.Moreover, the minimal polynomial of γ is irreducible pentanomial of degree m.

It is readily seen that the lightest element γ in Fact 1 satisfies the near-MDSconditions over F2m for 8 ≤ m ≤ 2048. For 4 ≤ m ≤ 7, one can verify that thereexists an α such that wt⊕(α) = 1 and α satisfies the near-MDS conditions. Thisleads to the following result.

Corollary 3. Let m be a positive integer with 4 ≤ m ≤ 2048. For n = 5, 6, thegeneric near-MDS matrices of order n given by (2) and (3) have instantiationswith lowest XOR count over F2m . For n = 7, 8, the matrices of order n in Table2 have instantiations with lowest XOR count over F2m .

Notice that the condition set only excludes a small number of elements in alarge field F2m when m > 2048. So it seems that the lightest α satisfying thenear-MDS conditions exists for all m > 2048. Further study of the existence ofα for m > 2048 is left as an open problem.

Discussions. A long-standing problem in the study of lightweight diffusion ma-trices over finite fields is to find the global optimal solutions, i.e., matrices of agiven order with prescribed branch numbers and lowest XOR count. Very re-cently, Sarkar and Syed in [SS16] propose 4× 4 MDS matrices with lowest XORcount over F24 and F28 . However, the construction of global optimal solutions forMDS matrices with other parameters remains an open problem [JV04, BKL16].

DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES 95

Junod and Vaudenay in [JV04] present some exact values of the maximum oc-currences of 1 in an MDS matrix. However, maximum occurrence of 1 doesnot directly ensure the lowest XOR count property. Recently, Beierle et al. in[BKL16] characterize elements in finite fields with lowest XOR count. Some veryefficient MDS matrices are proposed based on these optimal elements. However,it is still unknown whether these local optimal solutions can lead to global opti-mal solutions for MDS matrices.

This section shows that local optimal solutions can lead to global optimalsolutions for near-MDS matrices. By Theorems 2 and 3, the near-MDS matriceswith lowest XOR count are constructed with the lightest elements in the finitefiled. Moreover, the explicit near-MDS matrices can be generated systematicallyfrom generic matrices rather than by an ad hoc method for a specific finite field.This also enables one to find global optimal solutions for near-MDS matricesover a large number of fields.

It is worth noticing that for 2 ≤ n ≤ 4, the near-MDS matrices given inLemma 6 are global optimal solutions over any finite field since they are com-posed of 0 and 1 and attain vn0 and vn1 simultaneously. This together with Corol-lary 3 shows that for 2 ≤ n ≤ 8 the n× n near-MDS matrices with lowest XORcount are obtained over F2m with 4 ≤ m ≤ 2048.

5 Involutory near-MDS matrices

This section presents some results on involutory near-MDS matrices. First, wesummarize involutory near-MDS of order 2, 3 and 4. Then for n > 4 we givea nonexistence result of circulant involutory near-MDS matrices. Hence, theHadamard matrices over finite fields are introduced and their properties areprovided. This allows us to find involutory near-MDS Hadamard matrices oforder 8.

Cases n = 2, 3 and 4. It is easy to verify that the near-MDS matrices of order 2and 4 given in Lemma 6 are involutory. However, for n = 3 the matrix circ(0, 1, 1)is not involutory. Furthermore, direct computation show that a 0, 1-matrix oforder 3 cannot be involutory.

To construct generic near-MDS matrices of order 3, we performed an ex-haustive search for matrices with elements in the set 0, 1, x, x−1, x2, 1 + x.Our experimental results show that there is no 3× 3 generic near-MDS matricescomposed of elements 0, 1, x, x−1, x2. Indeed, there are 12 generic involutorynear-MDS with entries in 0, 1, x, 1 + x. For instance, the following matrix

0 1 1α 1 + α α

1 + α 1 + α α

is involutory near-MDS for any α 6= 0, 1. Consequently, for n = 2, 3 and 4, welist some lightweight involutory near-MDS matrices of order n in Table 6.

When n > 4, we have the following result analogous to the fact that there isno circulant involutory MDS matrix over finite fields [GR15].

96 DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES

Table 6. Involutory near-MDS matrices of order less than or equal to 4

n Entries Set Examples #Matrices References

2 0, 1 circ(0, 1) 4 [CK08]

30, 1, α, 1 + α

α 6= 0, 1

0 1 1α 1 + α α

1 + α 1 + α α

12 This section

4 0, 1 circ(0, 1, 1, 1) 10 [CK08, BKL16]

Proposition 3. For n > 4, any n × n circulant involutory matrices over F2m

cannot be near-MDS.

Proof. Consider the case n = 2k, where k > 2. Let M = cicr(a0, a1, · · · , a2k−1)be a (2k) × (2k) circulant involutory matrix over F2m . Then M2 = In. Directcomputation shows that M2 = circ(a20 + a2k, 0, a

21 + a2k+1, 0, · · · , a2k−1 + a22k−1, 0).

This implies that a0+ak = 1 and ai = ak+i for 1 ≤ i ≤ k−1. Thus, the sum of the0-th and k-th columns of M is (1, 0, · · · , 1, 0, · · · , 0)T. That is , Bd(M) ≤ 4 < n.Therefore, M is not near-MDS.

For n = 2k + 1 we have M2 = circ(a20, a2k+1, a

21, a

2k+1, · · · , a22k, a2k). It follows

that a0 = 1 and ai = 0 for 1 ≤ i ≤ 2k. Thus, M is not near-MDS.

Proposition 3 inspires us to study other matrices than circulant matrices forn > 4.

Hadamard matrices. The definition of a Hadamard matrix is recalled below.

Definition 6. Let n be a power of 2. An n×n matrix H is Hadamard if there ex-ist n elements h0, h1, · · · , hn−1 such that the (i, j)-entry of H can be representedby H[i, j] = hi⊕j. We denote the matrix H by had(h0, h1, · · · , hn−1).

Analogously to circulant matrices, each row of a Hadamard matrix is a per-mutation of the first row. This allows one to implement the matrix efficiently[KPPY14, LS16, SKOP15]. The other desirable property is that it is easy toconstruct involutory matrices from Hadamard matrices.

Lemma 8. ([SKOP15]) An n×n Hadamard matrix H = had(h0, h1, · · · , hn−1)is involutory, i.e., H2 = In if and only if h0 + h1 + · · ·+ hn−1 = 1.

By Lemma 2, a non-MDS matrix M is near-MDS if and only if for any1 ≤ g ≤ n − 1 each g × (g + 1) and (g + 1) × g submatrix of M has at leastone g × g non-singular submatrix. Note that a Hadamard matrix is symmetric,i.e., H = HT. This implies that there is a one-to-one corresponding between theg× (g+1) submatrices of a Hadamard matrix H and the (g+1)×g submatricesof H. Hence we have the following corollary.

DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES 97

Corollary 4. Let H be a non-MDS Hadamard matrix of order n, where n isa positive integer with n ≥ 2. Then H is near-MDS if and only if for any1 ≤ g ≤ n− 1 each g × (g + 1) (or (g + 1)× g) submatrix of H has at least oneg × g non-singular submatrix.

Corollary 4 halves the number of operations in checking the near-MDS prop-erty for Hadamard matrices. Consequently, Hadamard matrices are good candi-dates for constructing involutory near-MDS matrices.

Involutory near-MDS Hadamard matrices of order 8. Since we focus on the casen > 4 and the order of a Hadamard matrix is a power of 2, it is natural toconsider the case n = 8. Similar to the search strategy in Sect. 3.1, we limitthe entries of the Hadamard matrix to elements in the set 0, 1, x, x−1, x2. Forhad(h0, h1, · · · , hn−1), let Nδ be the number of times δ occurs in the multiseth0, h1, · · · , hn−1. Lemma 8 implies that

N1 +Nxx+Nx−1x−1 +Nx2x2 = 1 (mod 2) . (4)

We also have a trivial counting formula

N0 +N1 +Nx +Nx−1 +Nx2 = 8 . (5)

We perform an exhaustive search for Hadamard matrices with parameterssatisfying Eqs. (4) and (5). Experimental results show that any Hadamard matrixwith four or less distinct entries from 0, 1, x, x−1, x2 cannot be an involutorynear-MDS matrix. Indeed, there are 2688 involutory near-MDS matrices fromHadamard matrices with five distinct entries 0, 1, x, x−1, x2. Moreover, each ofthe matrices satisfies N0 = N1 = 1 and Nx = Nx−1 = Nx2 = 2.

To further analyze the properties of the involutory near-MDS matrices, theequivalence classes of Hadamard matrices are recalled from [SKOP15]. Let H =had(h0, h1, · · · , hn−1) and denote Hσ = had(hσ(0), hσ(1), · · · , hσ(n−1)), where σis an index permutation. Two Hadamard matrices H and Hσ are equivalent ifσ(i) = i⊕ α for α = 0, 1, · · · , n− 1 or σ is a linear permutation with respect tothe XOR operation, i.e., σ(i⊕ j) = σ(i)⊕ σ(j). This equivalent relation dividesthe set of Hadamard matrices into equivalence classes.

By Lemma 3 and Theorem 3 in [SKOP15], the following lemma holds.

Lemma 9. ([SKOP15]) Let s be a positive integer. Given the index set 0, 1, · · · , 2s−1, there are exactly 2s

∏s−1i=0 (2s − 2i) distinct index permutations generated by

composition of linear permutations with respect to XOR operation and permuta-tions having the form σ(i) = i⊕ α.

By Lemma 9 there are exactly 23∏2i=0(23 − 2i) = 1344 permutations for

n = 8. We compute the equivalence classes of the 2688 involutory near-MDSHadamard matrices with five distinct entries 0, 1, x, x−1, x2. The experimentalresults are summarized in the following fact.

98 DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES

Fact 2 The 2688 involutory near-MDS Hadamard matrices with five distinctentries 0, 1, x, x−1, x2 can be classified into two different equivalence classes. Tworepresentatives of the equivalence classes are

had(0, x2, x−1, x2, x−1, x, x, 1) and had(0, x2, x−1, x−1, x2, x, x, 1).

Moreover, each equivalence is exactly the set of matrices obtained by applyingthe 1344 permutations to the corresponding representative.

The representatives of the equivalence classes and the corresponding conditionsets are listed in Table 7.

Table 7. Involutory near-MDS Hadamard matrices of order 8

Represntatives Conditions to be near-MDS Size of Equivalence Classes

had(0, x2, x−1, x2, x−1, x, x, 1)

x

1344

x+ 1x2 + x+ 1x3 + x+ 1x3 + x2 + 1x4 + x+ 1

x5 + x4 + x2 + x+ 1

had(0, x2, x−1, x−1, x2, x, x, 1)

x

1344

x+ 1x2 + x+ 1x3 + x+ 1x3 + x2 + 1x4 + x+ 1

x4 + x3 + x2 + x+ 1x5 + x3 + 1

Example 1. By taking the minimal polynomial of α as x4 +x3 + 1 and x8 +x4 +x3 + x + 1, the matrix had(1, α, α, α2, 0, α2, α−1, α−1) is involutory near-MDSover F24 and F28 , respectively. The XOR count is 32 in F24 and 64 in F28 .

6 Security analysis

This section provides a primary analysis on the security property of near-MDSmatrices. It is well known that resistance against linear and differential crypt-analysis is a standard design criterion for new designs. For the AES [Dae95,DR02], provable security against linear and differential cryptanalysis follows fromthe wide trail design strategy. We apply a similar proof strategy: after provinga lower bound on the number of active S-boxes for both differential and linearcryptanalysis, we use the maximum differential/linear probability of the S-boxes

DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES 99

to derive an upper bound for the probability of the best characteristic. As iscommonly done, the probability of the differential/linear hull is estimated bythe probability of the best characteristic. Therefore the main task is to calculatethe minimum number of active S-boxes.

In this section we consider S-boxes with optimal cryptographic properties.We define a linear layer by combining our near-MDS matrices of order n with theShiftRows operation of AES, i.e., the word in row i and column j (0 ≤ i, j ≤ n−1)cyclically moves to position (j − i) mod n.

By applying the technique based on Mixed-Integer Linear Programming(MILP) [MWGP11], we obtain lower bounds on the number of differential andlinear active S-boxes for SPN structures. Our results confirm the two-round andfour-round propagation theorems for SPN structures in [DR02, KPPY14]. More-over, we determine the minimum number of active S-boxes for up to 16 rounds.The results are shown in Table 8. As described above, given a well-chosen S-boxwith maximum differential/linear probability, one can immediately compute theupper bounds for any differential/linear characteristics. From those, it showsthat our linear layers can provide sufficient security against differential/linearcryptanalysis.

We note that the lower bounds also allow to evaluate the efficiency of ma-trices as well. For example, by specifying the nonlinear layers (e.g. S-boxes) andhardware architectures, one can compute the FOAM values of the primitivesbased on near-MDS matrices. Hence, our work will be useful for future designof lightweight ciphers based on near-MDS matrices.

Table 8. The minimum number of active S-boxes for SPN structures with ShiftRowsand near-MDS matrices of order n

n# Rounds

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

4 1 4 7 16 17 20 23 32 33 36 39 48 49 52 55 64

5 1 5 9 25 26 30 34 50 51 55 59 75 76 80 84 100

6 1 6 11 36 37 42 47 72 73 78 83 108 109 114 119 144

7 1 7 13 49 50 56 62 98 99 105 111 147 148 154 160 196

8 1 8 15 64 65 72 79 128 129 136 143 192 193 200 207 256

Banik et al. in [BBI+15] show that the ShiftRows operation from AES isnot always the best choice when near-MDS matrices are chosen in the Mix-Columns. It is an open problem to investigate how to design an efficient shuf-fle/permutation to speed up the diffusion with a near-MDS matrix.

7 Conclusion

This paper presents new designs of lightweight linear diffusion layer from lightweightnear-MDS matrices. For 5 ≤ n ≤ 9, some generic n×n near-MDS circulant ma-

100 DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES

trices are found. The implementation cost of instantiations of the generic near-MDS matrices is also considered. This allows us to propose some near-MDSmatrices of order n having the lowest XOR count among all near-MDS matricesof the same order, where 5 ≤ n ≤ 8. Further, we provide some results on in-volutory near-MDS matrices of small orders and propose involutory near-MDSHadamard matrices of order 8. Finally, we give a primary analysis of the securityof the proposed linear layers.

Acknowledgements. The authors would like to thank Bart Preneel and theanonymous reviewers of FSE for their comments and suggestions. This workwas supported in part by the Research Council KU Leuven: C16/15/058. Inaddition, this work was partially supported by the Research Council KU Leuven,OT/13/071, and by the Flemish Government through FWO projects and byEuropean Union’s Horizon 2020 research and innovation programme under grantagreement No H2020-MSCA-ITN-2014-643161 ECRYPT-NET, and the NationalNatural Science Foundation of China (No. 61472250, No. 61672347) and MajorState Basic Research Development Program (973 Plan, No. 2013CB338004).

References

[ADK+14] Martin R. Albrecht, Benedikt Driessen, Elif Bilge Kavun, Gregor Lean-der, Christof Paar, and Tolga Yalcin. Block ciphers - focus on the linearlayer (feat. PRIDE). In Juan A. Garay and Rosario Gennaro, editors,CRYPTO 2014, Part I, volume 8616 of LNCS, pages 57–76. Springer, Hei-delberg, August 2014.

[AF13] Daniel Augot and Matthieu Finiasz. Exhaustive search for small dimensionrecursive MDS diffusion layers for block ciphers and hash functions. InProceedings of the 2013 IEEE International Symposium on InformationTheory, Istanbul, Turkey, July 7-12, 2013, pages 1551–1555, 2013.

[AF15] Daniel Augot and Matthieu Finiasz. Direct construction of recursive MDSdiffusion layers using shortened BCH codes. In Carlos Cid and ChristianRechberger, editors, FSE 2014, volume 8540 of LNCS, pages 3–17. Springer,Heidelberg, March 2015.

[BBI+15] Subhadeep Banik, Andrey Bogdanov, Takanori Isobe, Kyoji Shibutani,Harunaga Hiwatari, Toru Akishita, and Francesco Regazzoni. Midori: Ablock cipher for low energy. In Tetsu Iwata and Jung Hee Cheon, edi-tors, ASIACRYPT 2015, Part II, volume 9453 of LNCS, pages 411–436.Springer, Heidelberg, November / December 2015.

[BBK+13] Begul Bilgin, Andrey Bogdanov, Miroslav Knezevic, Florian Mendel,and Qingju Wang. Fides: Lightweight authenticated cipher with side-channel resistance for constrained hardware. In Guido Bertoni and Jean-Sebastien Coron, editors, CHES 2013, volume 8086 of LNCS, pages 142–158. Springer, Heidelberg, August 2013.

[BCG+12] Julia Borghoff, Anne Canteaut, Tim Guneysu, Elif Bilge Kavun, MiroslavKnezevic, Lars R. Knudsen, Gregor Leander, Ventzislav Nikov, ChristofPaar, Christian Rechberger, Peter Rombouts, Søren S. Thomsen, and Tolga

DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES 101

Yalcin. PRINCE - A low-latency block cipher for pervasive computing ap-plications - extended abstract. In Xiaoyun Wang and Kazue Sako, editors,ASIACRYPT 2012, volume 7658 of LNCS, pages 208–225. Springer, Hei-delberg, December 2012.

[Ber13] Thierry P. Berger. Construction of recursive MDS diffusion layers fromGabidulin codes. In Goutam Paul and Serge Vaudenay, editors, IN-DOCRYPT 2013, volume 8250 of LNCS, pages 274–285. Springer, Hei-delberg, December 2013.

[BJK+16] Christof Beierle, Jeremy Jean, Stefan Kolbl, Gregor Leander, Amir Moradi,Thomas Peyrin, Yu Sasaki, Pascal Sasdrich, and Siang Meng Sim. TheSKINNY family of block ciphers and its low-latency variant MANTIS. InMatthew Robshaw and Jonathan Katz, editors, CRYPTO 2016, Part II,volume 9815 of LNCS, pages 123–153. Springer, Heidelberg, August 2016.

[BKL16] Christof Beierle, Thorsten Kranz, and Gregor Leander. Lightweight multi-plication in GF(2n) with applications to MDS matrices. In Matthew Rob-shaw and Jonathan Katz, editors, CRYPTO 2016, Part I, volume 9814 ofLNCS, pages 625–653. Springer, Heidelberg, August 2016.

[BS91] Eli Biham and Adi Shamir. Differential cryptanalysis of DES-likecryptosystems. In Alfred J. Menezes and Scott A. Vanstone, editors,CRYPTO’90, volume 537 of LNCS, pages 2–21. Springer, Heidelberg, Au-gust 1991.

[CJK15] Ting Cui, Chenhui Jin, and Zhiyin Kong. On compact Cauchy matrices forsubstitution-permutation networks. IEEE Trans. Computers, 64(7):2098–2102, 2015.

[CK08] Jiali Choy and Khoongming Khoo. New applications of differential boundsof the SDS structure. In Tzong-Chen Wu, Chin-Laung Lei, Vincent Rijmen,and Der-Tsai Lee, editors, ISC 2008, volume 5222 of LNCS, pages 367–384.Springer, Heidelberg, September 2008.

[CLRS09] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and CliffordStein. Introduction to Algorithms (3rd ed.). MIT Press, 2009.

[Dae95] Joan Daemen. Cipher and Hash Function Design. Strategies based on linearand differential cryptanalysis. PhD thesis, Katholieke Universiteit Leuven,1995.

[dB96] Mario A. de Boer. Almost MDS codes. Des. Codes Cryptography, 9(2):143–155, 1996.

[DL95] Stefan M. Dodunekov and Ivan Landgev. On near-MDS codes. J. Geom.,54(1):30–43, 1995.

[Dod09] Stefan M. Dodunekov. Applications of near MDS codes in cryptography.In Enhancing Cryptographic Primitives with Techniques from Error Cor-recting Codes, pages 81–86. 2009.

[DR02] Joan Daemen and Vincent Rijmen. The Design of Rijndael: AES - TheAdvanced Encryption Standard. Information Security and Cryptography.Springer, 2002.

[DR09] Joan Daemen and Vincent Rijmen. Codes and provable security of ciphers-extended abstract. In Enhancing Cryptographic Primitives with Techniquesfrom Error Correcting Codes, pages 69–80. 2009.

[GPP11] Jian Guo, Thomas Peyrin, and Axel Poschmann. The PHOTON familyof lightweight hash functions. In Phillip Rogaway, editor, CRYPTO 2011,volume 6841 of LNCS, pages 222–239. Springer, Heidelberg, August 2011.

102 DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES

[GPPR11] Jian Guo, Thomas Peyrin, Axel Poschmann, and Matthew J. B. Robshaw.The LED block cipher. In Bart Preneel and Tsuyoshi Takagi, editors,CHES 2011, volume 6917 of LNCS, pages 326–341. Springer, Heidelberg,September / October 2011.

[GR15] Kishan Chand Gupta and Indranil Ghosh Ray. Cryptographically sig-nificant MDS matrices based on circulant and circulant-like matrices forlightweight applications. Cryptography and Communications, 7(2):257–287,2015.

[JPS15] Jeremy Jean, Thomas Peyrin, and Siang Meng Sim. Minimal implementa-tions of linear and non-linear lightweight building blocks, 2015. Preprint.

[JV04] Pascal Junod and Serge Vaudenay. Perfect diffusion primitives for block ci-phers. In Helena Handschuh and Anwar Hasan, editors, SAC 2004, volume3357 of LNCS, pages 84–99. Springer, Heidelberg, August 2004.

[KPPY14] Khoongming Khoo, Thomas Peyrin, Axel York Poschmann, and HuihuiYap. FOAM: Searching for hardware-optimal SPN structures and com-ponents with a fair comparison. In Lejla Batina and Matthew Robshaw,editors, CHES 2014, volume 8731 of LNCS, pages 433–450. Springer, Hei-delberg, September 2014.

[LS16] Meicheng Liu and Siang Meng Sim. Lightweight MDS generalized circulantmatrices. In Thomas Peyrin, editor, FSE 2016, volume 9783 of LNCS, pages101–120. Springer, Heidelberg, March 2016.

[LW16] Yongqiang Li and Mingsheng Wang. On the construction of lightweightcirculant involutory MDS matrices. In Thomas Peyrin, editor, FSE 2016,volume 9783 of LNCS, pages 121–139. Springer, Heidelberg, March 2016.

[Mat94] Mitsuru Matsui. Linear cryptoanalysis method for DES cipher. In TorHelleseth, editor, EUROCRYPT’93, volume 765 of LNCS, pages 386–397.Springer, Heidelberg, May 1994.

[MS77] F.J. MacWilliams and N.J.A. Sloane. The Theory of Error-CorrectingCodes. North-Holland, 1977.

[MWGP11] Nicky Mouha, Qingju Wang, Dawu Gu, and Bart Preneel. Differentialand linear cryptanalysis using mixed-integer linear programming. In In-formation Security and Cryptology - 7th International Conference, Inscrypt2011, Beijing, China, November 30 - December 3, 2011. Revised SelectedPapers, pages 57–76, 2011.

[SDMO12] Mahdi Sajadieh, Mohammad Dakhilalian, Hamid Mala, and BehnazOmoomi. On construction of involutory MDS matrices from vandermondematrices in GF(2q). Des. Codes Cryptography, 64(3):287–308, 2012.

[SDMS12] Mahdi Sajadieh, Mohammad Dakhilalian, Hamid Mala, and PouyanSepehrdad. Recursive diffusion layers for block ciphers and hash func-tions. In Anne Canteaut, editor, FSE 2012, volume 7549 of LNCS, pages385–401. Springer, Heidelberg, March 2012.

[Sha49] Claude E. Shannon. Communication theory of secrecy systems. Bell Sys-tems Technical Journal, 28(4):656–715, 1949.

[SKOP15] Siang Meng Sim, Khoongming Khoo, Frederique E. Oggier, and ThomasPeyrin. Lightweight MDS involution matrices. In Gregor Leander, editor,FSE 2015, volume 9054 of LNCS, pages 471–493. Springer, Heidelberg,March 2015.

[SS16] Sumanta Sarkar and Habeeb Syed. Lightweight diffusion layer: Importanceof toeplitz matrices. Cryptology ePrint Archive, Report 2016/835, 2016.http://eprint.iacr.org/2016/835.

DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES 103

[Swa62] Richard G. Swan. Factorization of polynomials over finite fields. Pacific J.of Math., 12(3):1099–1106, 1962.

[Vau95] Serge Vaudenay. On the need for multipermutations: Cryptanalysis of MD4and SAFER. In Bart Preneel, editor, FSE’94, volume 1008 of LNCS, pages286–297. Springer, Heidelberg, December 1995.

[VR06] G. Viswanath and B. Sundar Rajan. A matrix characterization of near-MDS codes. Ars Comb., 79:289–294, 2006.

[WWW13] Shengbao Wu, Mingsheng Wang, and Wenling Wu. Recursive diffusionlayers for (lightweight) block ciphers and hash functions. In Lars R. Knud-sen and Huapeng Wu, editors, SAC 2012, volume 7707 of LNCS, pages355–371. Springer, Heidelberg, August 2013.

[YMT97] A.M. Youssef, S. Mister, and S.E. Tavares. On the design of linear trans-formations for substitution permutation encryption networks. In CarlisleAdams and Mike Just, editors, SAC 1997, LNCS, pages 40–48. Springer,Heidelberg, August 1997.

A Proof of Theorem 1

Before presenting the proof we introduce some definitions of strings from [CLRS09].A string over a finite set S is a sequence of elements of S. In the proof we focuson strings over set 1, x. A substring s′ of a string s is an ordered sequence ofconsecutive elements of s. Define a run of a string to be the maximal string ofconsecutive identical elements [MS77]. We call a string and a run of length k ak-string and a k-run respectively. For instance, the 8-string 11xxx1xx has fourruns: 11, xxx, 1, xx.

Proof. Lemma 6 implies that there is no near-MDS circulant matrix with onlytwo entries 0, 1 or 0, x. So we now assume that N1Nx > 0. Note that N0 canbe 0 or 1 by Proposition 1. We first consider the case N0 = 1. As statedin Sect. 3.2, one only needs to consider the circulant matrices of the formcirc(0, a1, a2, · · · , an−1).

To prove the result, it suffices to consider the strings of length three, i.e.,aiajak with 1 ≤ i, j, k ≤ n − 1. A matched pair of strings ai1ai2ai3 andaj1aj2aj3 satisfies the following two conditions:

1. there exists an integer k such that jl − il ≡ k ( modn) and k 6≡ 0 ( modn)for l = 1, 2, 3;

2. ai1ai2ai3 = aj1aj2aj3 as strings over 1, x.

Indeed, the existence of a matched pair yields that the matrix circ(0, a1, a2, · · · ,an−1) has the 2× 3 submatrix

(ai1 ai2 ai3aj1 aj2 aj3

)=

(ai1 ai2 ai3ai1 ai2 ai3

)

with three singular 2 × 2 submatrices. Then, by taking g = 2 in Lemma 2,we conclude that the matrix circ(0, a1, a2, · · · , an−1) is not near-MDS. Hence,to prove the theorem, we aim to find the matched pairs of the (n − 1)-string

104 DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES

s = a1a2 · · · an−1. One can divide the proof into four different cases in terms ofthe length of the longest runs (LLR) of s denoted by LLR(s).Case 1. LLR(s) ≥ 4. In this case, there exists some i such that aiai+1ai+2ai+3 =aaaa for a ∈ 1, x. Hence, aiai+1ai+2 = ai+1ai+2ai+3 = aaa, as required.Case 2. LLR(s) = 3. Suppose that aiai+1ai+2 = aaa. If there is anotherrun with length greater than or equal to two, i.e., ajaj+1 = bb, then we haveaiai+1aj = ai+1ai+2aj+1 = aab, as desired. Otherwise, all remaining runs are1-runs. According to the position of the 3-run aaa in s, s contains at least oneof the following five substrings:

ababaaa babaaab abaaaba baaabab aaababa .

Direct verifications show that there is at least one matched pair.Case 3. LLR(s) = 2. The proof of this case can be split into four subcases.Subcase 3.1 There are at least three distinct 2-runs. Suppose that aiai+1, ajaj+1

and akak+1 are 2-runs. Then ai = ai+1, aj = aj+1, and ak = ak+1. This leads toa matched pair aiajak and ai+1aj+1ak+1.Subcase 3.2 There are exactly two 2-runs aa and bb with a 6= b. Note that allremaining runs are of length 1. It is readily seen that there can be 0, 2 or atleast 4 elements between aa and bb in s.

First, suppose that there are at least four elements between aa and bb. Thisyields the substrings ababa or babab of s. Thus, a matched pair occurs.

Secondly, if there are exactly two elements between aa and bb, i.e., s hasthe substring aababb (resp. bbabaa), then s contains the substrings baababb oraababba (resp. abbabaa or bbabaab). In these cases, it is easy to find a matchedpair. For instance, let aiai+1ai+2ai+3ai+4ai+5ai+6 = baababb, then we haveaiai+1ai+3 = ai+3ai+4ai+6 = bab.

Finally, we suppose that bbaa or aabb is a substring of s. It suffices to consideraabb. In terms of the position of aabb in s, s contains at least one of the followingfour substrings:

aabbaba babaabb baabbab abaabba .

For the latter two substrings, it is easy to find a matched pair while a verificationof the first two substrings can be done by considering n = 8 and n ≥ 9. We omitthe details here.Subcase 3.3 There are exactly two aa runs. The relative position of two aaruns in s implies that s contains at least one of the following four substrings:

aabaab baabaa aababaa babab .

It is obvious that there is at least one matched pair.Subcase 3.4 There are exactly one 2-run aa. Concerning the position of aa ins, it follows that s contains at least one of the following five substrings:

aababab baababa babaaba bababaa babab .

Direct verifications show that there is at least one matched pair.The four subcases together yield Case 3.

DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES 105

Case 4. LLR(s) = 1. It follows that one can always find the substring x1x1x.Thus, in this case, at least one matched pair exists.

Note that LLR(s) ≥ 1. Then the above four cases combine to give the resultwhen N1Nx > 0 and N0 = 1. The case that N0 = 0, N1Nx > 0 can be proved inthe same manner. Therefore, the theorem is proved.

B Tables

In Table 9, we provide a list of generic near-MDS circulant matrices of order5 ≤ n ≤ 9 over the finite field F2m . Based on a generic matrix, one can obtaina concrete near-MDS matrix by substituting x with α ∈ F2m such that α is nota root of any polynomial in the corresponding condition set which is given inTable 10. The determinants of the generic matrices are given as well.

Note that A and AT have exactly the same properties, including near-MDSproperty, determinant and XOR count. So we only list the matrix A whenA 6= AT. For instance, both circ(0, x, 1, 1, 1) and circ(0, 1, 1, 1, x) are near-MDSunder certain conditions, but we only present the former one in Table 9 sincecirc(0, x, 1, 1, 1) = circ(0, 1, 1, 1, x)T.

106 DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES

Table 9. List of generic near-MDS circulant matrices of order 5 ≤ n ≤ 9

n Coefficients of the first rowCondition sets(cf.Table 10)

Determinants

5(0, x, 1, 1, 1)(0, 1, x, 1, 1)

S0 x5 + x3 + x+ 1

6(0, x, 1, 1, 1, x)(0, x, 1, x, 1, 1)(0, 1, x, x, 1, 1)

S0

S0

S2

x4

x4 + x2 + 1x4 + x2 + 1

(0, x−1, x, 1, 1, 1)(0, x, x−1, 1, 1, 1)(0, x−1, 1, x, 1, 1)(0, x, 1, x−1, 1, 1)(0, x−1, 1, 1, x, 1)(0, x, 1, 1, x−1, 1)(0, x−1, 1, 1, 1, x)(0, 1, x−1, x, 1, 1)(0, 1, x, x−1, 1, 1)

S1

S2

S1

S1

S4

S5

S0

S3

S3

x6 + x4 + 1 + x−4 + x−6

x6 + x4 + 1 + x−4 + x−6

x6 + 1 + x−2 + x−4 + x−6

x6 + x4 + x2 + 1 + x−6

x6 + x2 + 1 + x−2 + x−6

x6 + x2 + 1 + x−2 + x−6

x6 + x4 + 1 + x−4 + x−6

x6 + 1 + x−2 + x−4 + x−6

x6 + x4 + x2 + 1 + x−6

7

(0, x−1, 1, x, 1, 1, 1)(0, x, 1, x−1, 1, 1, 1)(0, x−1, 1, 1, 1, x, 1)(0, x, 1, 1, 1, x−1, 1)(0, 1, x−1, x, 1, 1, 1)(0, 1, x, x−1, 1, 1, 1)

S6

x7 + x5 + x3 + x+ x−5 + x−7

x7 + x5 + x−1 + x−3 + x−5 + x−7

x7 + x5 + x−1 + x−3 + x−5 + x−7

x7 + x5 + x3 + x+ x−5 + x−7

x7 + x5 + x−1 + x−3 + x−5 + x−7

x7 + x5 + x3 + x+ x−5 + x−7

8

(0, x, x−1, 1, x, 1, 1, 1)(0, x, 1, x, x−1, 1, 1, 1)(0, x, 1, 1, x−1, 1, 1, x)(0, 1, x−1, 1, x, x, 1, 1)(0, 1, 1, x, x−1, x, 1, 1)(0, x−1, x, 1, x−1, 1, 1, 1)(0, x−1, 1, x−1, x, 1, 1, 1)(0, x−1, 1, 1, x, 1, 1, x−1)(0, 1, x, 1, x−1, x−1, 1, 1)(0, 1, 1, x−1, x, x−1, 1, 1)

S7

S8

S8

S7

S8

S7

S8

S8

S7

S8

x−8

x−8

x−8

x−8

x−8

x8

x8

x8

x8

x8

9

(0, x, x−1, x, x, x−1, 1, 1, x)(0, x, x, x, 1, x−1, 1, x, x−1)(0, x−1, x, 1, x, x, x, x−1, 1)(0, x−1, x, x−1, x−1, x, 1, 1, x−1)(0, x−1, x−1, x−1, 1, x, 1, x−1, x)(0, x, x−1, 1, x−1, x−1, x−1, x, 1)

S9

S9

S9

S10

S10

S10

0

DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES 107

Table 10. Condition sets in Table 9

Sets Conditions

S0 x, x+ 1, x2 + x+ 1

S1 x, x+ 1, x2 + x+ 1, x3 + x+ 1

S2 x, x+ 1, x2 + x+ 1, x3 + x2 + 1

S3 x, x+ 1, x2 + x+ 1, x3 + x+ 1, x3 + x2 + 1

S4 x, x+ 1, x2 + x+ 1, x3 + x+ 1, x4 + x3 + 1

S5 x, x+ 1, x2 + x+ 1, x3 + x2 + 1, x4 + x+ 1

S6 x, x+ 1, x2 + x+ 1, x3 + x+ 1, x3 + x2 + 1, x4 + x3 + x2 + x+ 1

S7x, x+ 1, x2 + x+ 1, x3 + x+ 1, x3 + x2 + 1, x4 + x+ 1,x4 + x3 + 1, x4 + x3 + x2 + x+ 1

S8x, x+ 1, x2 + x+ 1, x3 + x+ 1, x3 + x2 + 1,x4 + x3 + x2 + x+ 1, x5 + x4 + x3 + x2 + 1

S9

x, x+ 1, x2 + x+ 1, x3 + x+ 1, x3 + x2 + 1, x4 + x+ 1, x4 + x3 + 1x4 + x3 + x2 + x+ 1, x5 + x2 + 1, x5 + x3 + 1, x5 + x3 + x2 + x+ 1x5 + x4 + x2 + x+ 1, x5 + x4 + x3 + x+ 1, x5 + x4 + x3 + x2 + 1x6 + x5 + x4 + x2 + 1, x7 + x4 + x3 + x2 + 1, x7 + x6 + x4 + x+ 1x12 + x11 + x10 + x9 + x8 + x7 + x6 + x2 + 1

S10

x, x+ 1, x2 + x+ 1, x3 + x+ 1, x3 + x2 + 1, x4 + x+ 1, x4 + x3 + 1x4 + x3 + x2 + x+ 1, x5 + x2 + 1, x5 + x3 + 1, x5 + x3 + x2 + x+ 1x5 + x4 + x2 + x+ 1, x5 + x4 + x3 + x+ 1, x5 + x4 + x3 + x2 + 1x6 + x4 + x2 + x+ 1, x7 + x5 + x4 + x3 + 1, x7 + x6 + x3 + x+ 1x12 + x10 + x6 + x5 + x4 + x3 + x2 + x+ 1

108 DESIGN OF LIGHTWEIGHT LINEAR DIFFUSION LAYERS FROM NEAR-MDS MATRICES

Chapter 7

Constructing Low-latencyInvolutory MDS Matriceswith Lightweight Circuits

Publication data

Shun Li, Siwei Sun, Chaoyun Li, Zihao Wei and Lei Hu: Constructing Low-latency Involutory MDS Matrices with Lightweight Circuits. IACR Transactionson Symmetric Cryptology 2019(1): 84-117, 2019

Contributions

Contributing author

109

Constructing Low-latency Involutory MDSMatrices with Lightweight Circuits

Shun Li1,2,4, Siwei Sun1,2,4 , Chaoyun Li3, Zihao Wei1,2,4, and Lei Hu1,2,4

1 State Key Laboratory of Information Security, Institute of Information Engineering,Chinese Academy of Sciences

2 Data Assurance and Communication Security Research Center,Chinese Academy of Sciences, Beijing 100093, China

3 imec-COSIC, Dept. Electrical Engineering (ESAT), KU Leuven, Leuven 3001,Belgium

4 School of Cyber Security, University of Chinese Academy of Sciences, Beijing100049, China lishun,sunsiwei,hulei,[email protected],

[email protected]

Abstract. MDS matrices are important building blocks providing dif-fusion functionality for the design of many symmetric-key primitives.In recent years, continuous efforts are made on the construction of MDSmatrices with small area footprints in the context of lightweight cryptog-raphy. Just recently, Duval and Leurent (ToSC 2018/FSE 2019) reportedsome 32 × 32 binary MDS matrices with branch number 5, which canbe implemented with only 67 XOR gates, whereas the previously knownlightest ones of the same size cost 72 XOR gates.In this article, we focus on the construction of lightweight involutoryMDS matrices, which are even more desirable than ordinary MDS ma-trices, since the same circuit can be reused when the inverse is required.In particular, we identify some involutory MDS matrices which can be re-alized with only 78 XOR gates with depth 4, whereas the previously knownlightest involutory MDS matrices cost 84 XOR gates with the same depth.Notably, the involutory MDS matrix we find is much smaller than theAES MixColumns operation, which requires 97 XOR gates with depth 8when implemented as a block of combinatorial logic that can be com-puted in one clock cycle. However, with respect to latency, the AESMixColumns operation is superior to our 78-XOR involutory matrices,since the AES MixColumns can be implemented with depth 3 by usingmore XOR gates.We prove that the depth of a 32× 32 MDS matrix with branch number5 (e.g., the AES MixColumns operation) is at least 3. Then, we enhanceBoyar’s SLP-heuristic algorithm with circuit depth awareness, such thatthe depth of its output circuit is limited. Along the way, we give a formulafor computing the minimum achievable depth of a circuit implementingthe summation of a set of signals with given depths, which is of indepen-dent interest. We apply the new SLP heuristic to a large set of lightweightinvolutory MDS matrices, and we identify a depth 3 involutory MDS ma-trix whose implementation costs 88 XOR gates, which is superior to theAES MixColumns operation with respect to both lightweightness andlatency, and enjoys the extra involution property.

110 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

Keywords: Lightweight cryptography · MDS matrix · Involutory matrix · Lowlatency

1 Introduction

The development of pervasive computing and the demand for low-cost securityhave stimulated intensive researches on the design of lightweight symmetric-keycryptographic algorithms. This often boils down to the search for lightweight yetcryptographically strong diffusion and confusion components.

In practice, the diffusion components are typically realized with linear op-erations, whose functionality, loosely speaking, is to spread the internal depen-dencies as much as possible. The so-called Maximal Distance Separable (MDS)matrices are probably the most preferable diffusion building blocks. When usingMDS matrices as the diffusion layers in iterative block ciphers, it is possible toachieve a desired number of differentially or linearly active non-linear elementswith a relatively small number of rounds, and therefore leading to low-latencydesigns. Moreover, designs with MDS matrices typically enjoy simple and clearsecurity proofs, such as the case of AES [DR02]. Actually, it is exactly the ele-gant security proof offered by AES that initiates the widely application of MDSmatrix in the design of symmetric-key primitives.

However, it is not an easy task to find lightweight MDS matrices, and itmay be too luxury to use an MDS matrix in a design targeting resource con-strained devices. In such situations, the designers compromise by employingalmost MDS matrices [BBI+15, Ava17], or linear operations that can be real-ized with several bitwise XORs [BJK+16], or even bit-level permutations whichcan be implemented with a proper wiring [BKL+07]. Such design strategy moreoften than not leads to a significant increase of the number of rounds, andcomplicates the security proof remarkably. Therefore, it is an important en-deavor to construct lightweight MDS matrices. In particular, lightweight in-volutory MDS matrices would be more preferable, since the same circuit canbe reused when the inverse is required. Actually, the idea of reusing involutorycomponents in both encryption and decryption has already been applied in somedesigns [BR00, SPR+04, BCG+12].

1.1 Related work

If the chip area is the sole consideration, one promising approach proposedby Guo, Peyrin, and Poschmann to reduce the implementation footprint isto find a lightweight matrix A such that Ak is MDS [GPP11, GPPR11]. Theimplementation of Ak can be obtained by recursively “executing” the imple-mentation of A k times. Then no matter how complex Ak is, the cost is de-termined by A completely. However, this approach comes at the expense ofan increased number of clock cycles, which is not desirable in low-latency ap-plications. Therefore, in this work, we focus on the lightweight constructions,where the full MDS matrix is implemented as a block of combinatorial logic

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 111

circuit such that it can be computed in one clock cycle. We refer the reader to[GPP11, TTKS18, AF14, Ber13, GPV17, WWW12, CLM16] for more informa-tion on the recursive constructions.

The initial attempts to find lightweight MDS matrices where the full matrixis implemented mainly focus on the selection of matrix entries enjoying lowhardware footprints [SKOP15, BKL16, LS16, LW16, LW17, SS16a, SS16b, SS17,JPST17, ZWS18, GLWL16]. This line of work makes a great step forward for ourability of constructing lightweight MDS matrices and can be categorized as localoptimizations. In particular, with the knowledge of which kind of entries arebetter, one can construct MDS matrices from some special classes of matrices,such as circulant, Hadamard, or Toeplitz matrices [SKOP15, LS16, SS16b]. Someof these constructions lead to involutory MDS matrices. In particular, Sim et al.observed that involutory MDS matrices can be implemented with almost thesame cost as non-involutory ones under some specific metric, the latter beingusually non-lightweight when the inverse matrix is required [SKOP15]. Notethat here the entries of a matrix are not restricted to finite field elements, andcan be general linear transformations. Actually, the idea of using general lineartransformations leads to notable improvement at the time [BKL16, LW16].

So far, we have a fairly deep understanding of the problem with respectto local optimizations. Hence recent work tend to deal with the problem at amore essential level, viewing it as the well-known Shortest Linear straight-lineProblem (SLP) and optimizing globally. Indeed, this approach results in moreaccurate estimations of the cost of hardware implementations. In [KLSW17],Kranz et al. shows that the AES MixColumns matrix can be implemented withonly 97 F2 × F2 → F2 XOR gates with Boyar’s tool [BMP13] based on SLPheuristic, while the previous best implementation costs 103 XOR gates [JPST17].Just recently in ToSC 2018/FSE 2019, Duval and Leurent reported some 32×32binary MDS matrices which can be implemented with only 67 XOR gates bysearching through a set of circuits ordered by hardware cost and optimizingglobally [DL18], whereas the previously known lightest ones of the same sizecost 72 XOR gates [KLSW17].

1.2 Our Contribution

First, we slightly generalize the structure of the involutory MDS matrix MKLSW

(costs 84 XOR gates) proposed by Kranz, Leander, Stoffelen, and Wiemer [KLSW17],and try to construct an involutory MDS matrix G of the generalized form withless 1’s than MKLSW in its binary form based on some educated guesses. After ap-plying the SLP heuristic [BMP13] to G, it turns out that G can be implementedwith only 80 XOR gates.

Then we further generalize the structure of G to a family of 4 × 4 matriceswhose entries are powers of a given 8× 8 binary matrix A. We show that everyinvolutory matrix in this family can be completely determined by 6 parameterstaking integer values. We search through a restricted range of matrices generatedby these 6 parameters, and identify some involutory MDS matrices which can

112 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

be implemented with only 78 XOR gates, while the previous best result requires84 XOR gates.

Finally, we prove that the depth of a 32 × 32 MDS matrix with branchnumber 5 (e.g., the AES MixColumns operation) is at least 3. Then we augmentBoyar’s SLP-heuristic algorithm [BMP13] with circuit depth awareness to limitthe depths of its output circuits. Along the way, we give a formula for computingthe minimum achievable depth of a circuit implementing the summation of a setof signals with given depths, which is of independent interest. By applying thistool, we search through a large set of lightweight involutory MDS matrices andidentify one which can be implemented with 88 XOR gates, whose circuit depthreaches the lower bound 3. A summary of the optimal matrices we find is givenin Table 1. We also try to synthesize the matrices from Table 1 with threedifferent technology libraries (NanGate 45 nm, SMIC 65nm and TSMC 28nm). In allcases, our matrices exhibit lower area footprint. Taking the 97-XOR AES MDSmatrix for example, it takes 154.811996 um2 when synthesized with NanGate 45nmtechnology (194 GE), while our 88-XOR matrix takes 140.447996 um2 (176 GE).Hence, our 88-XOR matrix enjoys three advantages over the AES MDS matrix:it is involutory; its depth is 3 (the depth of the 97-XOR AES MDS is 8; and itsarea footprint is lower. Moreover, we make all of our code and results (matricesin binary representations with their actual implementations) publicly availableat https://github.com/siweisun/involutory_mds.

Table 1: A summary of the results. All matrices shown in the table are 32×32 bi-nary matrices, and Mk(R) is the set of all k×k matrices whose entries are drawnfrom R. The SLP column is obtained by applying Boyar’s SLP heuristic [BMP13],and SLP∗ means that the result is obtained by applying a modified version ofBoyar’s SLP heuristic with circuit depth awareness presented in Sect. 6.

Matrix MDS Involutory SLP Depth Source

MAES ∈M4(F28) 3 7 97 8 [KLSW17]MAES ∈M4(F28) 3 7 105 (SLP∗) 3 Sect. 6

MKLSW ∈M4(M2(F24)) 3 3 84 4 [KLSW17]G ∈M4(M8(F2)) 3 3 80 4 Sect. 4H ∈M4(M8(F2)) 3 3 78 4 Sect. 5Q ∈M4(M8(F2)) 3 3 88 (SLP∗) 3 Sect. 6

1.3 Organization

In Sect. 2, we give some preliminaries on finite fields and MDS matrices. Thenmetrics used in this work for measuring the circuit cost are given in Sect. 3. InSect. 4 we show how to construct a lighter involutory matrix by generalizing apreviously known involutory MDS matrix. In Sect. 5, we consider further gen-eralizations and search through a large set of matrices to find lighter involutory

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 113

MDS matrices. We prove a theorem on the lower bound of the circuit depth of an32× 32 MDS matrix with branch number 5, and enhance Boyar’s SLP-heuristicalgorithm to find lightweight involutory MDS matrices whose depths reach thelower bound. Section 7 concludes the paper.

2 Preliminaries

Let R be an arbitrary ring, and Mk(R) be the set of all k × k matrices whoseentries are drawn from R. Therefore, Mk(F2n) denotes the set of all k×k matricesover the finite field of 2n elements, and Mk(GL(n,F2)) is the set of all k ×k matrices whose elements are taken from the general linear group GL(n,F2)formed by all invertible n × n matrices over F2. Every matrix A in Mk(F2n)or Mk(GL(n,F2)) can be represented as an nk × nk binary matrix, which wecall the binary representation of A. We use In and On to denote the n × nidentity matrix and zero matrix over F2 respectively. We will omit the subscriptn whenever it is obvious from the context.

Given a vector x in Fnk2 , we denote by ωn(x) the number of non-zero n-bitchunks in x. When n = 1, we simply write ω1(x) as ω(x), which is the well knownHamming weight of x. The branch number Bn(A) of A ∈Mnk(F2) is defined asminx∈F

2nk\0ωn(x) + ωn(Ax).

Definition 1. An invertible nk×nk binary matrix A is MDS over k n-bit wordsif and only if Bn(A) = k + 1. Furthermore, if an MDS matrix A satisfies thatA = A−1, then we call it an involutary MDS matrix.

Definition 2 (Characteristic polynomial [Wan03]). The characteristic poly-nomial f of a binary matrix A ∈Mm(F2) is defined as f(x) = |xI +A| ∈ F2[x].

Lemma 1 ([DF04]). If f is a characteristic polynomial of A ∈Mm(F2), thenf(A) = 0.

Definition 3 ([Con14]). Let A ∈ Mm(F2), f ∈ F2[x] is the minimal polyno-mial of A if and only if f(A) = 0, and for any g ∈ F2[x] such that g(A) = 0,deg(f) ≤ deg(g).

Note that a minimal polynomial of A ∈Mm(F2) can be reducible.

Definition 4 ([Wan03]). Let f = xm + am−1xm−1 + · · · + a1x + a0 ∈ F2[x].The companion matrix of f is defined as the m×m matrix

0 a01 0 a1

1. . .

.... . . 0 am−2

1 am−1

.

114 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

It is trivial to verify that the characteristic polynomial of f ’s companionmatrix is f .

Lemma 2 ([BR99, LW16]). Let L be a matrix in Mk(Mn(F2)). Then L is anMDS matrix (with branch number k + 1) if and only if all square sub-matricesG ∈Mt(Mn(F2)) of L are of full rank for 1 ≤ t ≤ k.

Lemma 2 is employed in this paper to check the MDS property of our can-didate lightweight matrices.

3 Metrics

We estimate the hardware cost of a linear operation as the number of F2×F2 →F2 XOR gates required in its implementation, where the implementation can bedescribed as a sequence of XOR and assignment operations xi ← xai ⊕ xbi withai, bi < i. But, for a given linear operation, it is NP-hard to obtain the minimumnumber of XOR gates required [BMP08, BMP13], and only metrics determiningthe upper bounds are available. The metrics used in this paper are listed in thefollowing.

Direct XOR Count. Given a matrix A ∈ Mnk(F2), the Direct XOR CountDXC(A) of A is ω(A) − nk, that is, the number of 1s in the matrix A minusnk. This corresponds to a naive implementation of A, where each row of A isimplemented as is. DXC(A) is essentially the same as the Hamming weight ω(A)of A up to a constant shift.

Global Optimization. Given a matrix A ∈Mnk(F2), we can obtain an estimationof its hardware cost by finding a good linear straight-line program correspond-ing to A with state-of-the-art automatic tools based on certain SLP heuris-tic [BMP13], and this metric is denoted as SLP(A). Note that this is so far themost accurate estimation that is practical for 32× 32 binary matrices.

In this work, eventually the hardware cost is estimated with Global Op-timization. However, before applying the Global Optimization, we first try toconstruct lighter involutory MDS matrices with fairly low Direct XOR Count(i.e., matrices with low Hamming weights). Finally, we would like to mentionthat there are other metrics (such as the Sequential XOR Count [JPST17]) inthe literature, and we refer the reader to [DL18] for a clear discussion of thecomparisons and limitations of different metrics.

Besides the circuit area (measured by the number of XOR gates required for animplementation), another important metric of an implementation is the latency,which imposes constraint on the clock frequency at which the circuit can operate.The latency of an implementation can be characterized by its depth.

Definition 5. Let M be an m×m binary Matrix. Then the function fM : x ∈Fm2 7→ Mx ∈ Fm2 can be implemented with a finite number of XOR gates. Thecritical path of such an implementation is defined as the path between an inputand output involving the maximum number of XOR gates, and the depth of theimplementation is the number of XOR gates involved in the critical path.

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 115

4 Our Constructions

By applying the subfield construction [BNN+10, KPPY14] to the involutoryMDS matrix

I4 C C2 I4C I4 I4 C2

C3 C I4 CC C3 C I4

with C =

0 0 0 11 0 0 10 1 0 00 0 1 0

proposed by Sarkar et al. [SS16b], Kranz et al. obtain so far the most lightweightinvolutory MDS matrix in M4(M2(F24)), whose binary representation is

MKLSW =

I4 0 C 0 C2 0 I4 00 I4 0 C 0 C2 0 I4C 0 I4 0 I4 0 C2 00 C 0 I4 0 I4 0 C2

C3 0 C 0 I4 0 C 00 C3 0 C 0 I4 0 CC 0 C3 0 C 0 I4 00 C 0 C3 0 C 0 I4

.

The involutory MDS matrixMKLSW can be regarded as a matrix in M4(GL(8,F2))of the following form

I8 A A2 I8A I8 I8 A

2

A3 A I8 AA A3 A I8

. (1)

Then we can generalize (1) and try to find lightweight involutory MDS matricesof the following form

G =

I8 Al Ai I8Al I8 I8 A

i

Aj Ak I8 Al

Ak Aj Al I8

.

Observation 1 The matrix G ∈ M4(GL(8,F2)) is involutory if and only ifG2 = I which implies A2l +Ai+j +Ak = O8 and Ai+k +Aj = O8.

According to Observation 1, to make G involutory, we have Ai+k +Aj = O8

and thus

G =

I8 Al Ai I8Al I8 I8 A

i

Aj Ak I8 Al

Ak Aj Al I8

=

I8 Al Ai I8Al I8 I8 A

i

Ai+k Ak I8 Al

Ak Ai+k Al I8

.

First, our goal is to find an involutory matrix G, such that DXC(G) is small.Since DXC(G) = ω(G)− 32 = 4ω(Al) + 2ω(Ai) + 2ω(Ak) + 2ω(Ai+k) + 48− 32

116 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

and heuristically ω(At) increases along with |t| when A is very sparse, we preferinstantiations of i, l, j and k, such that |i|, |l|, |j| and |k| (the exponents of Aappearing in G) are small.

According to [BKL16] (see Table 7 of [BKL16]), DXC(A) ≥ 2 if the charac-teristic polynomial of A is an irreducible polynomial of degree 8. Therefore, weonly consider A whose characteristic polynomial is reducible. We find that if wechoose

A =

0 0 0 0 0 0 0 11 0 0 0 0 0 0 00 1 0 0 0 0 0 10 0 1 0 0 0 0 00 0 0 1 0 0 0 00 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 0

(2)

to be the companion matrix of x8 + x2 + 1, whose characteristic polynomial is(x4 +x+1)2 = x8 +x2 +1, then DXC(A−4) = 6, DXC(A−3) = 4, DXC(A−2) = 2,DXC(A−1) = 1, DXC(A0) = 0, DXC(A) = 1, DXC(A2) = 2, DXC(A3) = 3,DXC(A4) = 4, and A8 +A2 + I = 0 according to Lemma 1.

It is easy to verify that the minimal polynomial of A is also x8 + x2 + 1 ac-cording to Definition 3. Hence A8+A2+I = 0 and thus A8+d+A2+d+Ad = 0 forany integer d. Therefore, solving the equation over two sets A8+d, A2+d, Ad =A2l, A2i+k, Ak, where A2i+k = Ai+j according to Observation 1, gives the so-lutions of l, i, and k such that A2l + Ai+j + Ak = O8. We can enumerate allsolutions and pick one which minimizes 4|l| + 2|i| + 2|k| + 2|i + j|. One suchpossible solution5 is

d = −4l = 2k = −2i = −1

which transforms G into

G =

I8 A2 A−1 I8A2 I8 I8 A−1

A−3 A−2 I8 A2

A−2 A−3 A2 I8

.

By applying Boyar’s SLP-heuristic algorithm, we obtain an implementation ofG with only 80 XOR gates, which breaks the record of 84 XOR gates [KLSW17],and the actual implementation can be found in Table 2

5There are other possible solutions. However, we do not discuss them since all ofthem will be covered in sebsequent sections.

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 117

Table 2: An implementation of G with 80 XOR gates and depth 4, where(x0, · · · , x31) are input signals, (y0, · · · , y31) are output signals, and ti’s are in-termediate signals.

No. Operation Depth No. Operation Depth No. Operation Depth

1 t1 = x0 + x9 1 28 t28 = x31 + t16 2 55 t55 = x4 + t38 3

2 t2 = x1 + x8 1 29 t29 = x7 + t28 [y7] 3 56 t56 = t40 + t55 [y4] 4

3 t3 = x2 + t1 2 30 t30 = x7 + x19 1 57 t57 = x5 + x29 1

4 t4 = x10 + t2 2 31 t31 = x7 + x26 1 58 t58 = t6 + t57 [y5] 2

5 t5 = x3 + x30 1 32 t32 = x8 + t30 2 59 t59 = x9 + t34 3

6 t6 = x11 + x22 1 33 t33 = x29 + t32 [y29] 3 60 t60 = t36 + t59 [y9] 4

7 t7 = x0 + x27 1 34 t34 = x14 + t31 2 61 t61 = x10 + t7 2

8 t8 = x6 + x18 1 35 t35 = x20 + t34 [y20] 3 62 t62 = t8 + t61 [y10] 3

9 t9 = x15 + t7 2 36 t36 = x24 + t22 2 63 t63 = x11 + t32 3

10 t10 = x21 + t9 [y21] 3 37 t37 = x0 + t36 [y0] 3 64 t64 = t38 + t63 [y11] 4

11 t11 = x20 + t1 2 38 t38 = x28 + t2 2 65 t65 = x12 + t11 3

12 t12 = x30 + t11 [y30] 3 39 t39 = x22 + t38 [y22] 3 66 t66 = t13 + t65 [y12] 4

13 t13 = x29 + t3 3 40 t40 = x21 + t4 3 67 t67 = x13 + x21 1

14 t14 = x23 + t13 [y23] 4 41 t41 = x31 + t40 [y31] 4 68 t68 = t5 + t67 [y13] 2

15 t15 = x4 + x22 1 42 t42 = x12 + x23 1 69 t69 = x17 + t17 3

16 t16 = x13 + x16 1 43 t43 = x24 + t21 2 70 t70 = t19 + t69 [y17] 4

17 t17 = x31 + t15 2 44 t44 = x15 + t43 [y15] 3 71 t71 = x18 + t43 3

18 t18 = x14 + t17 [y14] 3 45 t45 = x30 + t42 2 72 t72 = t45 + t71 [y18] 4

19 t19 = t3 + t6 3 46 t46 = x6 + t45 [y6] 3 73 t73 = x19 + t26 3

20 t20 = x24 + t19 [y24] 4 47 t47 = t4 + t5 3 74 t74 = t28 + t73 [y19] 4

21 t21 = x5 + x23 1 48 t48 = x16 + t47 [y16] 4 75 t75 = x25 + t45 3

22 t22 = x14 + x17 1 49 t49 = x1 + t24 3 76 t76 = t47 + t75 [y25] 4

23 t23 = x6 + x25 1 50 t50 = t26 + t49 [y1] 4 77 t77 = x26 + t15 2

24 t24 = x15 + t8 2 51 t51 = x2 + t32 3 78 t78 = t16 + t77 [y26] 3

25 t25 = x28 + t24 [y28] 3 52 t52 = t34 + t51 [y2] 4 79 t79 = x27 + t21 2

26 t26 = x16 + t23 2 53 t53 = x3 + t9 3 80 t80 = t22 + t79 [y27] 3

27 t27 = x8 + t26 [y8] 3 54 t54 = t11 + t53 [y3] 4

5 More Generalizations

The above result motivates us to consider a more generalized form:

M =

Aε11 Aε12 Aε13 Aε14

Aε21 Aε22 Aε23 Aε24

Aε31 Aε32 Aε33 Aε34

Aε41 Aε42 Aε43 Aε44

=

I Aε12 Aε13 Aε14

Aε21 I Aε23 Aε24

Aε31 Aε32 I Aε34

Aε41 Aε42 Aε43 I

.

where ε11 = ε22 = · · · = ε44 = 0, A ∈ GL(8,F2) is the companion matrix ofx8 +x2 + 1 shown in Equation (2), and εij are integers for 1 ≤ i, j ≤ 4. Withoutloss of generality, let

Aε42 = Ar+ε13

Aε43 = As+ε12

Aε24 = At+ε13.

Since M is involutory and thus A2 = I, we can deduce that

M =

I Aε12 Aε13 Aε14

Aε12+s+t I Aε14+s Aε13+t

Aε13+r+t Aε14+r I Aε12+t

Aε14+r+s Aε13+r Aε12+s I

(3)

118 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

and

(I, Aε12 , Aε13 , Aε14)

Aε11

Aε12+s+t

Aε13+r+t

Aε14+r+s

= I,

which impliesA2ε12−r +A2ε13−s +A2ε14−t = 0. (4)

According to Equation (3), the matrix M can be completely determined bythe parameters ε12, ε13, ε14, r, s and t. Therefore, we inspect all (ε12, ε13, ε14, r, s, t) ∈Z6 satisfying the following conditions6

−8 ≤ ε1j ≤ 8 for 1 ≤ j ≤ 4

0 ≤ r ≤ s ≤ t ≤ 8

A2ε12−r +A2ε13−s +A2ε14−t = 0

. (5)

Finally, we identify 5550 involutory MDS matrices whose Hamming weightsare within the range from 148 to 172. We apply Boyar’s SLP-heuristic algorithmto all these matrices to obtain their lightweight implementations and the resultsare summarized in Table 3.

The above approach produces many equivalent matrices. For instance, let

M =

I Aε12 Aε13 Aε14

Aε12+s+t I Aε14+s Aε13+t

Aε13+r+t Aε14+r I Aε12+t

Aε14+r+s Aε13+r Aε12+s I

,

which is parameterized by (ε12, ε13, ε14, r, s, t). If we exchange the second rowand third row, and then exchange the second and third column, we obtain

M =

I 0 0 00 0 I 00 I 0 00 0 0 I

T

M

I 0 0 00 0 I 00 I 0 00 0 0 I

=

I Aε13 Aε12 Aε14

Aε13+r+t I Aε14+r Aε12+t

Aε12+s+t Aε14+s I Aε13+t

Aε14+r+s Aε12+s Aε13+r I

,

corresponding to the parameter (ε13, ε12, ε14, s, r, t). Obviously, M is an involu-tory MDS matrix if and only if M is involutory and MDS. In addition, fromany implementation of M , we can derive an implementation of M with thesame circuit size and depth. Hence, the parameters (ε12, ε13, ε14, r, s, t), and(ε13, ε12, ε14, s, r, t) are equivalent. We list all equivalent parameters in Table 4.

Every entry in the rightmost column of Table 4 is the cycle notation of apermutation π over 1, 2, 3, 4. The parameter in the same row is obtained bypermute the columns and rows of

M =

I Aε12 Aε13 Aε14

Aε12+s+t I Aε14+s Aε13+t

Aε13+r+t Aε14+r I Aε12+t

Aε14+r+s Aε13+r Aε12+s I

6These conditions can be relaxed to find potentially better matrices.

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 119

Table 3: A summary of the result. The first row means that we identify a set of18 matrices whose Hamming weight and DXC are 148 and 116 respectively. Themaximal and minimal XOR gate counts of these matrices after applying Boyar’sSLP heuristic are 80, and the minimum circuit depth is 4.

ω(A) #Matrices DXC(A) min SLP(A) max SLP(A) min depth(A)

148 18 116 80 80 4

149 48 117 80 80 4

150 72 118 80 83 4

151 48 119 83 84 4

152 60 120 83 87 4

153 72 121 80 84 4

154 84 122 80 86 4

155 24 123 86 87 5

156 72 124 86 87 4

157 96 125 82 84 5

158 156 126 80 90 4

159 0 – – – –

160 210 128 78 90 4

161 144 129 79 84 4

162 204 130 79 89 4

163 192 131 79 91 5

164 300 132 78 93 4

165 312 133 79 88 5

166 324 134 80 93 4

167 336 135 80 94 5

168 600 136 78 99 4

169 384 137 79 97 4

170 504 138 80 98 4

171 528 139 81 99 4

172 762 140 79 102 4

according to π. Taking the 4th row for example, we have π = (2, 4, 3), and thetransformation is performed as follows

(I8 0 0 00 0 0 I80 I8 0 00 0 I8 0

)T ( I Aε12 Aε13 Aε14

Aε12+s+t I Aε14+s Aε13+t

Aε13+r+t Aε14+r I Aε12+t

Aε14+r+s Aε13+r Aε12+s I

)(I8 0 0 00 0 0 I80 I8 0 00 0 I8 0

)

=

(I Aε13 Aε14 Aε12

Aε13+r+t I Aε12+t Aε14+r

Aε14+r+s Aε12+s I Aε13+r

Aε12+s+t Aε14+s Aε13+t I

),

from which we can see that (ε13, ε14, ε12, s, t, r) and (ε12, ε13, ε14, r, s, t) are equiv-alent. However, such equivalences are not visible to Boyar’s tool [BMP13] due toits heuristic nature, where the orders of the rows and columns do matter. Thatis, Boyar’s tool may output circuits with different sizes and depths for two equiv-alent matrices. Therefore, in our experiment, we still need to search through allmatrices we generated, and pick the ones with better implementations. One of

120 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

Table 4: A list of equivalent parameters, where the Transformation column cor-responds to certain column and row permutations explained in the following.

No. Parameter Transformation

1 (ε12, ε13, ε14, r, s, t) –

2 (ε12, ε14, ε13, r, t, s) (3, 4)

3 (ε13, ε12, ε14, s, r, t) (2,3)

4 (ε13, ε14, ε12, s, t, r) (2,4,3)

5 (ε14, ε12, ε13, t, r, s) (2,3,4)

6 (ε14, ε13, ε12, t, s, r) (2,4)

7 (ε12 + s+ t, ε13 + t, ε14 + s, r − s,−t) (1,2)(3,4)

8 (ε12 + s+ t, ε14 + s, ε13 + t, r,−t,−s) (1,2)

9 (ε13 + t, ε12 + s+ t, ε14 + s,−s, r,−t) (1,3,4,2)

10 (ε13 + t, ε14 + s, ε12 + s+ t,−s,−t, r) (1,4,2)

11 (ε14 + s, ε12 + s+ t, ε13 + t,−t, r,−s) (1,3,2)

12 (ε14 + s, ε13 + t, ε12 + s+ t,−t,−s, r) (1,4,3,2)

13 (ε12 + t, ε13 + r + t, ε14 + r,−r, s,−t) (1,3)(2,4)

14 (ε12 + t, ε14 + r, ε13 + r + t,−r,−t, s) (1,4,2,3)

15 (ε13 + r + t, ε12 + t, ε14 + r, s,−r,−t) (1,2,4,3)

16 (ε13 + r + t, ε14 + r, ε12 + t, s,−t,−r) (1,2,3)

17 (ε14 + r, ε12 + t, ε13 + r + t,−t,−r, s) (1,4,3)

18 (ε14 + r, ε13 + r + t, ε12 + t,−t, s,−r) (1,3)

19 (ε12 + s, ε13 + r, ε14 + r + s,−r,−s, t) (1,4)(2,3)

20 (ε12 + s, ε14 + r + s, ε13 + r,−r, t,−s) (1,3,2,4)

21 (ε13 + r, ε12 + s, ε14 + r + s,−s,−r, t) (1,4)

22 (ε13 + r, ε14 + r + s, ε12 + s,−s, t,−r) (1,3,4)

23 (ε14 + r + s, ε12 + s, ε13 + r, t,−r,−s) (1,2,4)

24 (ε14 + r + s, ε13 + r, ε12 + s, t,−s,−r) (1,2,3,4)

the optimal matrices we find is

H =

I8 I8 I8 A4

A4 I8 A6 A2

A2 A4 I8 A2

A6 I8 A2 I8

corresponding to the parameter (0, 0, 4, 0, 2, 2), where A is the companion matrixof x8 + x2 + 1 shown in Equation (2). The actual implementation of H is givenin Table 5.

6 Searching for Low-latency Involutory MDS Matrices

In the previous section, we identify an involutory MDS Matrix which can beimplemented with 78 XOR gates whose circuit depth is 4. Although this matrixis good with respect to lightweightness, we find that it is inferior to AES Mix-Columns operation in terms of latency. The lightest implementation (97 XOR

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 121

Table 5: An implementation of H, corresponding to parameter (0, 0, 4, 0, 2, 2),with 78 XOR gates and depth 4, where (x0, · · · , x31) are input signals,(y0, · · · , y31) are output signals, and ti’s are intermediate signals.

No. Operation Depth No. Operation Depth No. Operation Depth

1 t1 = x6 + x12 1 27 t27 = t1 + t14 2 53 t53 = t2 + t40 2

2 t2 = x7 + x13 1 28 t28 = t12 + t27 [y12] 3 54 t54 = t38 + t53 [y13] 3

3 t3 = x18 + x30 1 29 t29 = t3 + t26 3 55 t55 = t4 + t52 3

4 t4 = x19 + x31 1 30 t30 = t7 + t29 [y10] 4 56 t56 = t8 + t55 [y11] 4

5 t5 = x2 + x22 1 31 t31 = t11 + t27 3 57 t57 = t37 + t53 3

6 t6 = x3 + x23 1 32 t32 = t29 + t31 [y18] 4 58 t58 = t55 + t57 [y19] 4

7 t7 = x4 + x10 1 33 t33 = t18 + t31 [y30] 4 59 t59 = t44 + t57 [y31] 4

8 t8 = x5 + x11 1 34 t34 = t18 + t20 3 60 t60 = t44 + t46 3

9 t9 = x16 + x28 1 35 t35 = t29 + t34 [y4] 4 61 t61 = t55 + t60 [y5] 4

10 t10 = x17 + x29 1 36 t36 = x28 + t34 [y28] 4 62 t62 = x29 + t60 [y29] 4

11 t11 = x6 + x14 1 37 t37 = x7 + x15 1 63 t63 = x0 + x8 1

12 t12 = x22 + x26 1 38 t38 = x23 + x27 1 64 t64 = t9 + t63 [y0] 2

13 t13 = t11 + t12 [y6] 2 39 t39 = t37 + t38 [y7] 2 65 t65 = x1 + x9 1

14 t14 = x0 + x20 1 40 t40 = x1 + x21 1 66 t66 = t10 + t65 [y1] 2

15 t15 = x8 + t5 2 41 t41 = x9 + t6 2 67 t67 = x14 + t5 2

16 t16 = x24 + t15 [y24] 3 42 t42 = x25 + t41 [y25] 3 68 t68 = t9 + t67 [y14] 3

17 t17 = x6 + x20 1 43 t43 = x7 + x21 1 69 t69 = x15 + t6 2

18 t18 = x30 + t1 2 44 t44 = x31 + t2 2 70 t70 = t10 + t69 [y15] 3

19 t19 = x16 + t18 [y16] 3 45 t45 = x17 + t44 [y17] 3 71 t71 = t9 + t12 2

20 t20 = x4 + t3 2 46 t46 = x5 + t4 2 72 t72 = t24 + t71 [y26] 4

21 t21 = x8 + t20 [y8] 3 47 t47 = x9 + t46 [y9] 3 73 t73 = t10 + t38 2

22 t22 = x28 + t7 2 48 t48 = x29 + t8 2 74 t74 = t50 + t73 [y27] 4

23 t23 = x22 + t22 [y22] 3 49 t49 = x23 + t48 [y23] 3 75 t75 = t13 + t15 3

24 t24 = x2 + t22 3 50 t50 = x3 + t48 3 76 t76 = t17 + t75 [y20] 4

25 t25 = t20 + t24 [y2] 4 51 t51 = t46 + t50 [y3] 4 77 t77 = t39 + t41 3

26 t26 = x24 + t17 2 52 t52 = x25 + t43 2 78 t78 = t43 + t77 [y21] 4

gates) of the AES MixColumns operation is of depth 8, and if we increase thenumber of XOR gates, the AES MixColumns can be implemented with depth 3.In the following, we show that depth 3 is optimal.

Theorem 1. The circuit depth of an MDS matrix A ∈ M4(GL(8,F2)) withbranch number 5 is at least 3.

Proof. Let

A =

A1,1 A1,2 A1,3 A1,4

A2,1 A2,2 A2,3 A2,4

A3,1 A3,2 A3,3 A3,4

A4,1 A4,2 A4,3 A4,4

with Ai,j ∈ GL(8,F2) (6)

be an MDS matrix with branch number 5 whose circuit depth is 2, which impliesthat each of the 4 × 8 = 32 rows of A contains at most four 1’s. Then theHamming weight of each row of the 8×8 submatrix Ai,j is 1. Otherwise, there isone row of some submatrix Ai,j whose Hamming weight is 0, which contradictsour assumption that A is MDS (see Lemma 2). Moreover, each column of Ai,jcontains only one 1. Otherwise we can identify two linearly dependent rows,which is a contradiction to the MDS property. Therefore, Ai,j is a permutationmatrix. Now let us consider the submatrix

A′ =

(A1,1 A1,2

A2,1 A2,2

).

The Hamming weights of each row and each column of A′ is 2. Thus, the sumof the 2× 8 = 16 rows of A′ is a zero vector, meaning that A′ is not invertible.This is a contradiction to the MDS property of A.

122 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

Therefore, our goal is to find lightweight involutory matrices whose circuitdepth is 3. Hopefully, we can identify one that is lighter than the MixColumnsoperation of AES, which does not enjoy the involutory property. For a given 32×32 matrix, Boyar’s SLP-heuristic algorithm [BMP13] is virtually the best toolavailable for finding its lightweight implementation. However, Boyar’s algorithmaims at minimizing the number of XOR gates of an implementation regardless ofits circuit depth, which is not applicable in our scenario.

Given a set of input signals and a set of linear predicates represented as abinary matrix, Boyar’s algorithm repeatedly picks two signals according to somerules, adds them together as a new signal, and puts this new signal into thesignal set. Intuitively, after each iteration the signal set becomes “closer” to theset of linear predicates according to a notion of distance. The algorithm stopsexecuting if and only if the distance becomes 0, that is, the set of signals computethe set of linear predicates.

In the following, we enhance Boyar’s algorithm with circuit depth aware-ness. Basically, we modify Boyar’s algorithm by only picking signals which arenot going to exceed a specified depth bound, and defining a new notion of dis-tance which takes the circuit depth into account. The details are presented inAlgorithm 1, where the subroutine Pick() picks two elements from the currentsignal set S such that when the exclusive-or of these two elements are put intothe signal sets S, the sum of the values in the new distance vector ∆ is mini-mized among all possible choices of the selected two elements, and ties will beresolved by maximizing the Euclidean norm of ∆. This strategy is exactly thesame as Boyar’s method [BMP13], except that the distances in ∆ are computedaccording to our new definition presented in the following.

Let S be a sequence of signals. For any linear predicate f , we define δH(S, f)as the minimum number of additions (XOR gates) required to implement f withinput signals from S, such that the depth of the implementation is not greaterthan H. We call δH(S, f) the H-Distance from S to f . Note that our notion ofdistance is different from Boyar’s in that if δH(S, f) = k, we not only require thatf can be obtained by k additions, but also that there exits an implementationof k additions within depth H. If f can not be implemented within depth H,we have δH(S, f) = ∞. In what follows, we use δ(S, f) to denote the distancedefined in Boyar’s work [BMP13], where the circuit depth is not considered.

Example 1. Let S = [x1, x2, x3, x4, x5], and f = x2+x3+x4+x5. Then δ(S, f) =δ2(S, f) = 3, and f can be implemented as x6 = x2 + x3, x7 = x4 + x5, andx8 = x6 + x7, where x8 computes f , whose depth is 2.

Example 2. Let S = [x1, x2, x3, x4, x5, x6 = x2 + x4, x7 = x3 + x6] (note thatthe depths of x6 and x7 are 1 and 2 respectively), and f = x2 + x3 + x4 + x5.Then δ(S, f) = 1, and f can be implemented as x5 + x7, whose depth is 3,while δ2(S, f) = 2, and f can be implemented within depth 2 as x8 = x3 + x5,x9 = x6 + x8, where x9 computes f .

Example 3. Let S = [x1, x2, x3, x4, x5] , and f = x1 + x2 + x3 + x4 + x5. Thenit is easy to check that δ(S, f) = 4, and δ2(S, f) =∞.

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 123

Algorithm 1: SLP heuristic with bounded circuit depth

Input: An m× n binary matrix M representing m linear predicates in nvariables, i.e., (y1, · · · , ym) = M(x1, · · · , xn)T , and a positive integerH

Output: S = [x1, x2, · · · , xn, xn+1, xn+2, · · · , xn+l] such that d(xj) ≤ H forall j, and for any yk with 1 ≤ k ≤ m, yk can be computed by oneelement in Sl, where xn+j = xa + xb, xa, xb ∈ x1, · · · , xn+j−1 forj ≥ 1.

1 /* Initialization */

2 S ← [x1, x2, · · · , xn] /* The input signals */

3 D ← [0, 0, · · · , 0] /* D[i] keeps track of the circuit depth of S[i] */

4 ∆← [δH(S, y1), · · · , δH(S, ym)] /* The distances */

5 if ∆[i] =∞ for some i then6 return Infeasible7 end8 /* M can not be implemented within the depth bound H */

9 j ← n10 while ∆ 6= 0 do11 j ← j + 112 if ∃(x′a, x′b) ∈ S such that yt = x′a + x′b for some t ∈ 1, · · · ,m then13 (xa, xb)← (x′a, x

′b)

14 else15 (xa, xb) ← Pick(S, D, H)16 end17 xj ← xa + xb18 S ← S ∪ [xj ]19 depth(xj)← max(D[a], D[b]) + 1 /* Compute the depth of x j */

20 D ← D ∪ [depth(xj)]21 ∆← [δH(S, y1), · · · , δH(S, ym)] /* Update the distances */

22 end

23 return S

In Algorithm 1, we need a method to compute the minimal circuit depthof v1 + · · · + vk, where the depths of vi’s are known. Note that there are manydifferent ways of implementing v1+ · · ·+vk which lead to different circuit depthsas illustrated in Fig. 1. To deal with this, we prove the following theorem.

Theorem 2. Let v1, v2, · · · , vn be a set of signals with depth(vi) = di, thenthe lower bound of the depth of the circuit implementing z = v1 + · · · + vn isdlog2

∑ni=1 2die. Moreover, there is always a circuit implementing z with depth

dlog2

∑ni=1 2die, i.e., the lower bound is always achievable.

Proof. We prove by induction on k, the number of terms in the summation. Forn = 1 and n = 2, Theorem 2 holds obviously. Assuming that it holds for allk < n, we show in the following that it also holds for k = n.

124 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

x1x2

x3x4

x5

x6x7

x8

x9

v1 + v2 + v3

v1

v2

v3

(a) Implementation I (depth 4)

x1x2

x3x4

x5

x6x7

x8

x9

v1 + v2 + v3

v1

v2

v3

(b) Implementation II (depth 5)

Fig. 1: Two implementations of the same summation v1 + v2 + v3 with differentcircuit depths, where the depths of v1, v2 and v3 are 2, 0, and 3 respectively.

Without loss of generality, any implementation of z = v1 + · · ·+ vn is of theform z = za + zb, where za = vi1 + · · ·+ viq , zb = vj1 + · · ·+ vjn−q , and

vi1 , · · · , viq ∪ vj1 , · · · , vjn−q = v1, v2, · · · , vn.Then depth(z) = maxdepth(za),depth(zb) + 1. According to the inductionhypothesis, we have

depth(za) ≥ dlog2

q∑

t=1

2dit e,

depth(zb) ≥ dlog2

n−q∑

t=1

2djt e.

Therefore, we can obtain that

depth(z) ≥ maxdlog2

q∑

t=1

2dit e, dlog2

n−q∑

t=1

2djt e+ 1

≥ max1 + dlog2

q∑

t=1

2dit e, 1 + dlog2

n−q∑

t=1

2djt e

≥ maxdlog2 2

q∑

t=1

2dit e, dlog2 2

n−q∑

t=1

2djt e ≥ dlog2

n∑

i=1

2die.

Next, we show that the lower bound is achievable. First, we sort the setv1, · · · , vn of signals with non-decreasing depths. Then, we remove the leftmost

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 125

two signals with the same depth, and insert the signal of their sum into the depth-ordered list. Without loss of generality, we assume that v1, · · · , vn is alreadyin order, and depth(v1) = depth(v2). After we update the set according to theabove rule, we have a new set of signals v1 + v2, v3, · · · , vn. Note that suchoperation preserves the sum

∑x 2depth(x), that is

2depth(v1) + · · ·+ 2depth(vn) = 2depth(v1+v2) + 2depth(v3) + · · ·+ 2depth(vn).

We repeat the above operations until we obtain a set of signals z1, · · · , zmwith depth(zi) = qi such that q1 < q2 < · · · < qm. Now, we are ready to givethe implementation achieving the lower bound. First, if m > 1, we add z1 andz2 and obtain zm+1 = z1 + z2 whose depth depth(zm+1) = q2 + 1; Then we addzm+1 and z3 and obtain zm+2 whose depth depth(zm+2) = q3 + 1; · · · ; Finally,we add z2m−2 and zm and obtain z which implements v1 + · · ·+ vn whose depthdepth(z) = qm+1. Since 2qm+1 > 2q1 + · · ·+2qm = 2depth(v1) + · · ·+2depth(vn) >2qm , we can derive that qm + 1 = dlog2

∑ni=1 2die.

If m = 1, depth(z) = q1, and 2depth(v1) + · · · + 2depth(vn) is exactly a powerof 2. In this case, we have q1 = log2

∑ni=1 2di

In our algorithm, initially S is the sequence of all input signals. We maintaina list ∆ to track the H-distances of the output signals from S. At the same time,we keep a list D such that D[i] is the circuit depth of S[i]. At each iteration,we pick two different elements from S with Pick(S,D,H). Basically, we create anew element for S whose circuit depth is not greater than H by adding the twoelements returned by Pick() which minimizes the sum of the new H-distances,where ties are resolved by maximizing the Euclidean norm of the new ∆. Thisstrategy is the same as Boyar’s SLP heuristic, and we refer the reader to [BMP13]for more information. Our algorithm is best illustrated by running through a toyexample.

Example 4. Let the set of input signals be x1, x2, x3, x4, x5, and

y1 = x1 + x2 + x3

y2 = x2 + x4 + x5

y3 = x1 + x3 + x4 + x5

y4 = x2 + x3 + x4

y5 = x1 + x2 + x4

y6 = x2 + x3 + x4 + x5

, which can be represented as

1 1 1 0 00 1 0 1 11 0 1 1 10 1 1 1 01 1 0 1 00 1 1 1 1

(7)

We execute the Algorithm 1 with H = 2.

Step 0. S0 = [x1, x2, x3, x4, x5], D0 = [0, 0, 0, 0, 0], and ∆0 = [2, 2, 3, 2, 2, 3].

Step 1. S1 = S0 ∪ [x6 = x2 + x4] = [x1, x2, x3, x4, x5, x6 = x2 + x4], D1 =[0, 0, 0, 0, 0, 1], and ∆1 = [2, 1, 3, 1, 1, 2].

Step 2. S2 = S1 ∪ [x7 = x5 +x6] = [x1, x2, x3, x4, x5, x6 = x2 +x4, x7 = x5 +x6],D2 = [0, 0, 0, 0, 0, 1, 2], and ∆2 = [2, 0, 3, 1, 1, 2], where x7 computes x2 +x5 +x4.

126 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

Step 3. S3 = S2 ∪ [x8 = x3 + x6] = [x1, x2, x3, x4, x5, x6 = x2 + x4, x7 = x5 +x6, x8 = x3 + x6], D3 = [0, 0, 0, 0, 0, 1, 2, 2], and ∆3 = [2, 0, 3, 0, 1, 2], where x8computes x2 + x3 + x4.

Step 4. S4 = S3 ∪ [x9 = x1 + x6] = [x1, x2, x3, x4, x5, x6 = x2 + x4, x7 =x5 + x6, x8 = x3 + x6, x9 = x1 + x6], D4 = [0, 0, 0, 0, 0, 1, 2, 2, 2], and ∆4 =[2, 0, 3, 0, 0, 2], where x9 computes x1 + x2 + x4.

Step 5. S5 = S4 ∪ [x10 = x1 + x3] = [x1, x2, x3, x4, x5, x6 = x2 + x4, x7 =x5 +x6, x8 = x3 +x6, x9 = x1 +x6, x10 = x1 +x3], D5 = [0, 0, 0, 0, 0, 1, 2, 2, 2, 1],and ∆5 = [1, 0, 2, 0, 0, 2].

Step 6. S6 = S5 ∪ [x11 = x2 + x10] = [x1, x2, x3, x4, x5, x6 = x2 + x4, x7 =x5 + x6, x8 = x3 + x6, x9 = x1 + x6, x10 = x1 + x3, x11 = x2 + x10], D6 =[0, 0, 0, 0, 0, 1, 2, 2, 2, 1, 2], and ∆6 = [0, 0, 2, 0, 0, 2], where x11 computes x1 +x2 + x3.

Step 7. S7 = S6 ∪ [x12 = x3 + x5] = [x1, x2, x3, x4, x5, x6 = x2 + x4, x7 =x5 +x6, x8 = x3 +x6, x9 = x1 +x6, x10 = x1 +x3, x11 = x2 +x10, x12 = x3 +x5],D7 = [0, 0, 0, 0, 0, 1, 2, 2, 2, 1, 2, 1], and ∆7 = [0, 0, 2, 0, 0, 1].

Step 8. S8 = S7 ∪ [x13 = x6 + x12] = [x1, x2, x3, x4, x5, x6 = x2 + x4, x7 = x5 +x6, x8 = x3 +x6, x9 = x1 +x6, x10 = x1 +x3, x11 = x2 +x10, x12 = x3 +x5, x13 =x6+x12], D8 = [0, 0, 0, 0, 0, 1, 2, 2, 2, 1, 2, 1, 2], and ∆8 = [0, 0, 2, 0, 0, 0], where x13computes x2 + x3 + x4 + x5.

Step 9. S9 = S8 ∪ [x14 = x1 + x4] = [x1, x2, x3, x4, x5, x6 = x2 + x4, x7 =x5 + x6, x8 = x3 + x6, x9 = x1 + x6, x10 = x1 + x3, x11 = x2 + x10, x12 =x3 + x5, x13 = x6 + x12, x14 = x1 + x4], D9 = [0, 0, 0, 0, 0, 1, 2, 2, 2, 1, 2, 1, 2, 1],and ∆9 = [0, 0, 1, 0, 0, 0].

Step 10. S10 = S9∪ [x15 = x12 +x14] = [x1, x2, x3, x4, x5, x6 = x2 +x4, x7 = x5 +x6, x8 = x3 +x6, x9 = x1 +x6, x10 = x1 +x3, x11 = x2 +x10, x12 = x3 +x5, x13 =x6+x12, x14 = x1+x4, x15 = x12+x14], D10 = [0, 0, 0, 0, 0, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2],and ∆10 = [0, 0, 0, 0, 0, 0], where x15 computes x1 + x3 + x4 + x5.

We apply this algorithm to all matrices we generated in Sect. 5, and thelightest one achieving the lower bound of the circuit depth (i.e., 3) we find is Q,

Q =

I8 I8 A−2 A−2

A10 I8 A2 A4

A6 I8 I8 A6

A4 I8 A4 I8

corresponding to the parameter (0,−2,−2, 2, 4, 6), where A the companion ma-trix of x8 + x2 + 1 shown in Equation (2). The actual implementation of Q isgiven in Table 6.

Remark. In Sects. 4-6, we only show the best matrices we find. We present asummary of all other results we obtained in Supplementary materials A and B,

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 127

Table 6: An implementation of Q, corresponding to parameter (0,−2,−2, 2, 4, 6),with 88 XOR gates and depth 3, where (x0, · · · , x31) are input signals,(y0, · · · , y31) are output signals, and ti’s are intermediate signals.

No. Operation Depth No. Operation Depth No. Operation Depth

1 t1 = x4 + x20 1 31 t31 = x5 + x23 1 61 t61 = x14 + x26 1

2 t2 = x5 + x21 1 32 t32 = t14 + t31 [y5] 2 62 t62 = t25 + t61 [y14] 3

3 t3 = x6 + x22 1 33 t33 = x6 + x16 1 63 t63 = x14 + x30 1

4 t4 = x7 + x23 1 34 t34 = t19 + t33 [y6] 2 64 t64 = t21 + t63 [y30] 2

5 t5 = x2 + x26 1 35 t35 = x22 + x30 1 65 t65 = x15 + x27 1

6 t6 = x3 + x27 1 36 t36 = t9 + t35 2 66 t66 = t27 + t65 [y15] 3

7 t7 = x4 + x28 1 37 t37 = t8 + t36 [y10] 3 67 t67 = x15 + x31 1

8 t8 = x10 + t7 2 38 t38 = t34 + t36 [y22] 3 68 t68 = t23 + t67 [y31] 2

9 t9 = x0 + x16 1 39 t39 = x7 + x17 1 69 t69 = x18 + t5 2

10 t10 = x5 + x29 1 40 t40 = t20 + t39 [y7] 2 70 t70 = t8 + t69 [y18] 3

11 t11 = x11 + t10 2 41 t41 = x23 + x31 1 71 t71 = x19 + t6 2

12 t12 = x1 + x17 1 42 t42 = t12 + t41 2 72 t72 = t11 + t71 [y19] 3

13 t13 = x12 + x30 1 43 t43 = t11 + t42 [y11] 3 73 t73 = x20 + t7 2

14 t14 = x13 + x31 1 44 t44 = t40 + t42 [y23] 3 74 t74 = t22 + t73 [y20] 3

15 t15 = x8 + x24 1 45 t45 = x8 + x16 1 75 t75 = x21 + t10 2

16 t16 = t1 + t15 [y24] 2 46 t46 = t5 + t45 [y16] 2 76 t76 = t24 + t75 [y21] 3

17 t17 = x9 + x25 1 47 t47 = x0 + x24 1 77 t77 = x6 + x28 1

18 t18 = t2 + t17 [y25] 2 48 t48 = t21 + t47 2 78 t78 = t13 + t77 2

19 t19 = x14 + x24 1 49 t49 = t46 + t48 [y0] 3 79 t79 = t36 + t78 [y28] 3

20 t20 = x15 + x25 1 50 t50 = t22 + t48 [y12] 3 80 t80 = x7 + x29 1

21 t21 = x2 + x18 1 51 t51 = x8 + t3 2 81 t81 = t14 + t80 2

22 t22 = x6 + t13 2 52 t52 = t7 + t51 [y8] 3 82 t82 = t42 + t81 [y29] 3

23 t23 = x3 + x19 1 53 t53 = x9 + x17 1 83 t83 = x10 + x26 1

24 t24 = x7 + t14 2 54 t54 = t6 + t53 [y17] 2 84 t84 = t1 + t3 2

25 t25 = x2 + t1 2 55 t55 = x1 + x25 1 85 t85 = t83 + t84 [y26] 3

26 t26 = t8 + t25 [y2] 3 56 t56 = t23 + t55 2 86 t86 = x11 + x27 1

27 t27 = x3 + t2 2 57 t57 = t54 + t56 [y1] 3 87 t87 = t2 + t4 2

28 t28 = t11 + t27 [y3] 3 58 t58 = t24 + t56 [y13] 3 88 t88 = t86 + t87 [y27] 3

29 t29 = x4 + x22 1 59 t59 = x9 + t4 2

30 t30 = t13 + t29 [y4] 2 60 t60 = t10 + t59 [y9] 3

where we only show the parameter resulting in better circuit when equivalencesare encountered. Moreover, The raw data and source code are also submitted assupplementary material along the paper.

7 Conclusion

In this work, we find so far the lightest 32× 32 involutory MDS matrices whosebranch number is 5 by searching through a large set of matrices whose entriesare the powers of the companion matrix of x8 + x2 + 1. Moreover, we enhanceBoyar’s SLP heuristic with circuit depth awareness, which enables us to identifyso far the lightest 32 × 32 involutory MDS matrix whose circuit depth is 3,achieving the provable lower bound for a 32×32 MDS matrix. Along the way, wepresent a formula, which is of independent interest, for computing the minimumachievable depth of a circuit implementing the summation of a set of signals withgiven depths. The results of this work can be potentially applied in the designof lightweight and low-latency symmetric-key primitives.

Acknowledgment. The authors thank the anonymous reviewers for many help-ful comments. The work is supported by the National Key R&D Program ofChina (Grant No. 2018YFB0804402), the Chinese Major Program of NationalCryptography Development Foundation (Grant No. MMJJ20180102), the Na-tional Natural Science Foundation of China (61732021, 61802400, 61772519,61802399), and the Youth Innovation Promotion Association of Chinese Academy

128 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

of Sciences. Chaoyun Li is supported by the Research Council KU Leuven:C16/15/058, OT/13/071, and by European Union’s Horizon 2020 research andinnovation programme under grant agreement No. H2020-MSCA-ITN-2014-643161ECRYPT-NET.

References

AF14. Daniel Augot and Matthieu Finiasz. Direct construction of recursive MDSdiffusion layers using shortened BCH codes. In Fast Software Encryption- 21st International Workshop, FSE 2014, London, UK, March 3-5, 2014.Revised Selected Papers, pages 3–17, 2014.

Ava17. Roberto Avanzi. The QARMA block cipher family. almost MDS matricesover rings with zero divisors, nearly symmetric even-mansour constructionswith non-involutory central rounds, and search heuristics for low-latencyS-Boxes. IACR Trans. Symmetric Cryptol., 2017(1):4–44, 2017.

BBI+15. Subhadeep Banik, Andrey Bogdanov, Takanori Isobe, Kyoji Shibutani,Harunaga Hiwatari, Toru Akishita, and Francesco Regazzoni. Midori: Ablock cipher for low energy. In Advances in Cryptology - ASIACRYPT2015 - 21st International Conference on the Theory and Application ofCryptology and Information Security, Auckland, New Zealand, November29 - December 3, 2015, Proceedings, Part II, pages 411–436, 2015.

BCG+12. Julia Borghoff, Anne Canteaut, Tim Guneysu, Elif Bilge Kavun, MiroslavKnezevic, Lars R. Knudsen, Gregor Leander, Ventzislav Nikov, ChristofPaar, Christian Rechberger, Peter Rombouts, Søren S. Thomsen, and TolgaYalcin. PRINCE - A low-latency block cipher for pervasive computingapplications - extended abstract. In Advances in Cryptology - ASIACRYPT2012 - 18th International Conference on the Theory and Application ofCryptology and Information Security, Beijing, China, December 2-6, 2012.Proceedings, pages 208–225, 2012.

Ber13. Thierry P. Berger. Construction of recursive MDS diffusion layers fromgabidulin codes. In Progress in Cryptology - INDOCRYPT 2013 - 14thInternational Conference on Cryptology in India, Mumbai, India, December7-10, 2013. Proceedings, pages 274–285, 2013.

BJK+16. Christof Beierle, Jeremy Jean, Stefan Kolbl, Gregor Leander, Amir Moradi,Thomas Peyrin, Yu Sasaki, Pascal Sasdrich, and Siang Meng Sim. TheSKINNY family of block ciphers and its low-latency variant MANTIS. InAdvances in Cryptology - CRYPTO 2016 - 36th Annual International Cryp-tology Conference, Santa Barbara, CA, USA, August 14-18, 2016, Proceed-ings, Part II, pages 123–153, 2016.

BKL+07. Andrey Bogdanov, Lars R. Knudsen, Gregor Leander, Christof Paar, AxelPoschmann, Matthew J. B. Robshaw, Yannick Seurin, and C. Vikkelsoe.PRESENT: an ultra-lightweight block cipher. In Cryptographic Hardwareand Embedded Systems - CHES 2007, 9th International Workshop, Vienna,Austria, September 10-13, 2007, Proceedings, pages 450–466, 2007.

BKL16. Christof Beierle, Thorsten Kranz, and Gregor Leander. Lightweight mul-tiplication in gf(2ˆn) with applications to MDS matrices. In Advances inCryptology - CRYPTO 2016 - 36th Annual International Cryptology Con-ference, Santa Barbara, CA, USA, August 14-18, 2016, Proceedings, PartI, pages 625–653, 2016.

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 129

BMP08. Joan Boyar, Philip Matthews, and Rene Peralta. On the shortest linearstraight-line program for computing linear forms. In Mathematical Foun-dations of Computer Science 2008, 33rd International Symposium, MFCS2008, Torun, Poland, August 25-29, 2008, Proceedings, pages 168–179, 2008.

BMP13. Joan Boyar, Philip Matthews, and Rene Peralta. Logic minimization tech-niques with applications to cryptology. J. Cryptology, 26(2):280–312, 2013.

BNN+10. Paulo S. L. M. Barreto, Ventzislav Nikov, Svetla Nikova, Vincent Rijmen,and Elmar Tischhauser. Whirlwind: a new cryptographic hash function.Des. Codes Cryptography, 56(2-3):141–162, 2010.

BR99. Mario Blaum and Ron M. Roth. On lowest density MDS codes. IEEETrans. Information Theory, 45(1):46–59, 1999.

BR00. Paulo Sergio L.M. Barreto and Vincent Rijmen. The Anubis block cipher,2000. Submission to the NESSIE project.

CLM16. Victor Cauchois, Pierre Loidreau, and Nabil Merkiche. Direct constructionof quasi-involutory recursive-like MDS matrices from 2-cyclic codes. IACRTrans. Symmetric Cryptol., 2016(2):80–98, 2016.

Con14. Keith Conrad. The minimal polynomial and some applica-tions. http://www.math.uconn.edu/~kconrad/blurbs/linmultialg/

minpolyandappns.pdf, 2014.DF04. David S. Dummit and Richard M. Foote. Abstract algebra, volume 3. Wiley

Hoboken, 2004.DL18. Sebastien Duval and Gaetan Leurent. MDS matrices with lightweight cir-

cuits. IACR Trans. Symmetric Cryptol., 2018(2):48–78, 2018.DR02. Joan Daemen and Vincent Rijmen. The Design of Rijndael: AES - The

Advanced Encryption Standard. Information Security and Cryptography.Springer, 2002.

GLWL16. Zhiyuan Guo, Renzhang Liu, Wenling Wu, and Dongdai Lin. Direct con-struction of lightweight rotational-xor MDS diffusion layers. IACR Cryp-tology ePrint Archive, 2016:1036, 2016.

GPP11. Jian Guo, Thomas Peyrin, and Axel Poschmann. The PHOTON familyof lightweight hash functions. In Advances in Cryptology - CRYPTO 2011- 31st Annual Cryptology Conference, Santa Barbara, CA, USA, August14-18, 2011. Proceedings, pages 222–239, 2011.

GPPR11. Jian Guo, Thomas Peyrin, Axel Poschmann, and Matthew J. B. Robshaw.The LED block cipher. In Cryptographic Hardware and Embedded Systems- CHES 2011 - 13th International Workshop, Nara, Japan, September 28 -October 1, 2011. Proceedings, pages 326–341, 2011.

GPV17. Kishan Chand Gupta, Sumit Kumar Pandey, and Ayineedi Venkateswarlu.Towards a general construction of recursive MDS diffusion layers. Des.Codes Cryptography, 82(1-2):179–195, 2017.

JPST17. Jeremy Jean, Thomas Peyrin, Siang Meng Sim, and Jade Tourteaux. Opti-mizing implementations of lightweight building blocks. IACR Trans. Sym-metric Cryptol., 2017(4):130–168, 2017.

KLSW17. Thorsten Kranz, Gregor Leander, Ko Stoffelen, and Friedrich Wiemer.Shorter linear straight-line programs for MDS matrices. IACR Trans. Sym-metric Cryptol., 2017(4):188–211, 2017.

KPPY14. Khoongming Khoo, Thomas Peyrin, Axel York Poschmann, and HuihuiYap. FOAM: searching for hardware-optimal SPN structures and compo-nents with a fair comparison. In Cryptographic Hardware and EmbeddedSystems - CHES 2014 - 16th International Workshop, Busan, South Korea,September 23-26, 2014. Proceedings, pages 433–450, 2014.

130 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

LS16. Meicheng Liu and Siang Meng Sim. Lightweight MDS generalized circulantmatrices. In Fast Software Encryption - 23rd International Conference,FSE 2016, Bochum, Germany, March 20-23, 2016, Revised Selected Papers,pages 101–120, 2016.

LW16. Yongqiang Li and Mingsheng Wang. On the construction of lightweightcirculant involutory MDS matrices. In Fast Software Encryption - 23rd In-ternational Conference, FSE 2016, Bochum, Germany, March 20-23, 2016,Revised Selected Papers, pages 121–139, 2016.

LW17. Chaoyun Li and Qingju Wang. Design of lightweight linear diffusion layersfrom near-MDS matrices. IACR Trans. Symmetric Cryptol., 2017(1):129–155, 2017.

SKOP15. Siang Meng Sim, Khoongming Khoo, Frederique E. Oggier, and ThomasPeyrin. Lightweight MDS involution matrices. In Fast Software Encryption- 22nd International Workshop, FSE 2015, Istanbul, Turkey, March 8-11,2015, Revised Selected Papers, pages 471–493, 2015.

SPR+04. Francois-Xavier Standaert, Gilles Piret, Gael Rouvroy, Jean-JacquesQuisquater, and Jean-Didier Legat. ICEBERG : An involutional cipherefficient for block encryption in reconfigurable hardware. In Fast SoftwareEncryption, 11th International Workshop, FSE 2004, Delhi, India, Febru-ary 5-7, 2004, Revised Papers, pages 279–299, 2004.

SS16a. Sumanta Sarkar and Siang Meng Sim. A deeper understanding of the XORcount distribution in the context of lightweight cryptography. In Progressin Cryptology - AFRICACRYPT 2016 - 8th International Conference onCryptology in Africa, Fes, Morocco, April 13-15, 2016, Proceedings, pages167–182, 2016.

SS16b. Sumanta Sarkar and Habeeb Syed. Lightweight diffusion layer: Importanceof toeplitz matrices. IACR Trans. Symmetric Cryptol., 2016(1):95–113,2016.

SS17. Sumanta Sarkar and Habeeb Syed. Analysis of toeplitz MDS matrices. InInformation Security and Privacy - 22nd Australasian Conference, ACISP2017, Auckland, New Zealand, July 3-5, 2017, Proceedings, Part II, pages3–18, 2017.

TTKS18. Dylan Toh, Jacob Teo, Khoongming Khoo, and Siang Meng Sim.Lightweight MDS serial-type matrices with minimal fixed XOR count. InProgress in Cryptology - AFRICACRYPT 2018 - 10th International Con-ference on Cryptology in Africa, Marrakesh, Morocco, May 7-9, 2018, Pro-ceedings, pages 51–71, 2018.

Wan03. Zhexian Wan. Lectures on finite fields and Galois rings. World ScientificPublishing Company, 2003.

WWW12. Shengbao Wu, Mingsheng Wang, and Wenling Wu. Recursive diffusionlayers for (lightweight) block ciphers and hash functions. In Selected Areasin Cryptography, 19th International Conference, SAC 2012, Windsor, ON,Canada, August 15-16, 2012, Revised Selected Papers, pages 355–371, 2012.

ZWS18. Lijing Zhou, Licheng Wang, and Yiru Sun. On efficient constructions oflightweight MDS matrices. IACR Trans. Symmetric Cryptol., 2018(1):180–200, 2018.

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 131

A A List of Involutory MDS Matrices

ω(A) = 148, DXC(A) = 116

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (−2,−1, 2, 0, 0, 0) 80 4

2 (−2, 1,−2, 0, 0, 2) 80 4

ω(A) = 149, DXC(A) = 117

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (−3,−2, 1, 1, 1, 1) 80 4

2 (−1, 0, 3,−1,−1,−1) 80 4

ω(A) = 150, DXC(A) = 118

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (−3,−2, 2, 0, 0, 2) 80 4

2 (−3, 1,−1, 0, 0, 2) 80 4

3 (−4,−2, 1, 0, 2, 2) 80 4

4 (0,−3,−2, 0, 2, 2) 83 4

132 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

ω(A) = 151, DXC(A) = 119

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (−4, 0,−2, 1, 1, 3) 83 4

2 (0, 4, 0,−1,−1,−3) 83 5

ω(A) = 152, DXC(A) = 120

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (−4, 0,−1, 0, 0, 4) 86 5

2 (−3, 0,−3, 1, 1, 3) 83 4

3 (1, 4,−1,−1,−1,−3) 83 4

ω(A) = 153, DXC(A) = 121

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (−3,−3, 1, 0, 2, 2) 80 5

2 (−4,−3, 0, 2, 2, 2) 83 4

3 (0, 1, 4,−2,−2,−2) 83 5

ω(A) = 154, DXC(A) = 122

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (−3, 0,−2, 0, 0, 4) 86 4

2 (−1,−4,−2, 0, 2, 4) 86 4

3 (−4,−3, 1, 1, 1, 3) 80 5

4 (0, 1, 3,−1,−1,−3) 80 5

ω(A) = 155, DXC(A) = 123

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (−5, 0,−2, 0, 2, 4) 86 5

ω(A) = 156, DXC(A) = 124

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (−4, 0,−3, 0, 2, 4) 86 5

2 (−1,−4,−3, 1, 3, 3) 86 4

3 (5, 0, 1,−1,−3,−3) 86 4

ω(A) = 157, DXC(A) = 125

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (−5,−3, 0, 1, 3, 3) 83 5

2 (1, 1, 4,−1,−3,−3) 82 5

3 (−4,−4, 0, 1, 3, 3) 83 5

4 (2, 0, 4,−1,−3,−3) 83 5

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 133

ω(A) = 158, DXC(A) = 126

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (−4,−3, 2, 0, 0, 4) 80 5

2 (−4,−4, 1, 0, 2, 4) 80 5

3 (−1,−3,−3, 0, 2, 4) 86 4

4 (−5,−1,−2, 1, 1, 5) 89 5

5 (1, 5, 0,−1,−1,−5) 89 5

6 (−4,−1,−4, 2, 2, 4) 86 4

7 (2, 5, 0,−2,−2,−4) 85 6

ω(A) = 160, DXC(A) = 128

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (1, 2, 5, 0, 0, 0) 82 5

2 (0, 1, 5, 0, 0, 2) 80 5

3 (0, 4, 2, 0, 0, 2) 80 5

4 (1, 4, 1, 0, 0, 2) 78 4

5 (0, 0, 4, 0, 2, 2) 78 4

6 (0, 1, 4, 1, 1, 1) 79 5

7 (2, 3, 6,−1,−1,−1) 79 5

8 (−4,−1,−3, 1, 1, 5) 89 5

9 (2, 5,−1,−1,−1,−5) 89 4

10 (−5,−1,−3, 2, 2, 4) 86 5

11 (1, 5, 1,−2,−2,−4) 85 5

ω(A) = 161, DXC(A) = 129

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (3, 0, 1, 0, 2, 2) 80 5

2 (−5,−3, 1, 0, 2, 4) 80 6

3 (0, 3, 0, 1, 1, 3) 79 7

4 (4, 7, 2,−1,−1,−3) 79 4

5 (−5,−4, 0, 2, 2, 4) 83 5

6 (1, 2, 4,−2,−2,−4) 82 5

ω(A) = 162, DXC(A) = 130

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (0, 3, 1, 0, 0, 4) 80 7

2 (−1, 1, 4, 0, 2, 2) 80 5

3 (2, 0, 0, 0, 2, 4) 80 4

4 (−5,−4, 0, 0, 4, 4) 85 6

5 (−2,−4,−3, 0, 4, 4) 88 4

6 (−1, 0, 4, 1, 1, 3) 79 5

7 (3, 4, 6,−1,−1,−3) 79 5

8 (−5,−4,−1, 3, 3, 3) 86 5

9 (1, 2, 5,−3,−3,−3) 85 5

134 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

ω(A) = 163, DXC(A) = 131

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (−1, 3, 1, 1, 1, 3) 81 5

2 (3, 7, 3,−1,−1,−3) 81 5

3 (−5,−1,−4, 1, 3, 5) 88 6

4 (3, 5, 0,−1,−3,−5) 88 6

5 (−2,−5,−3, 1, 3, 5) 88 5

6 (6, 1, 1,−1,−3,−5) 88 5

7 (−1, 0, 3, 2, 2, 2) 79 5

8 (3, 4, 7,−2,−2,−2) 80 5

ω(A) = 164, DXC(A) = 132

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (−1, 0, 5, 0, 0, 4) 81 7

2 (−5,−1,−1, 0, 0, 6) 92 4

3 (−4,−1,−2, 0, 0, 6) 92 4

4 (−2, 0, 4, 0, 2, 4) 80 5

5 (−1, 3, 0, 0, 2, 4) 78 5

6 (−6,−1,−2, 0, 2, 6) 91 5

7 (−2,−5,−2, 0, 2, 6) 91 4

8 (−6,−3, 0, 0, 4, 4) 84 6

9 (2,−1, 0, 1, 3, 3) 80 7

10 (8, 3, 4,−1,−3,−3) 81 5

11 (−6,−1,−3, 1, 3, 5) 88 6

12 (2, 5, 1,−1,−3,−5) 88 5

13 (−2,−4,−4, 1, 3, 5) 88 4

14 (6, 2, 0,−1,−3,−5) 87 6

ω(A) = 165, DXC(A) = 133

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (1,−1, 0, 0, 4, 4) 81 5

2 (−5,−4, 1, 1, 1, 5) 80 6

3 (1, 2, 3,−1,−1,−5) 80 6

4 (−1, 2, 0, 1, 1, 5) 79 7

5 (5, 8, 2,−1,−1,−5) 79 5

6 (−2, 0, 3, 1, 3, 3) 80 5

7 (4, 4, 7,−1,−3,−3) 81 5

8 (−1,−1, 3, 1, 3, 3) 79 5

9 (5, 3, 7,−1,−3,−3) 81 6

10 (1,−1,−1, 1, 3, 5) 79 7

11 (9, 5, 3,−1,−3,−5) 82 6

12 (−2,−5,−4, 2, 4, 4) 88 5

13 (6, 1, 2,−2,−4,−4) 87 6

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 135

ω(A) = 166, DXC(A) = 134

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (−1, 3, 2, 0, 0, 4) 82 7

2 (−2, 3, 1, 0, 2, 4) 81 5

3 (−1,−1, 4, 0, 2, 4) 81 5

4 (2,−1, 1, 0, 2, 4) 81 5

5 (−2,−4,−3, 0, 2, 6) 91 5

6 (1,−1, 0, 0, 2, 6) 83 5

7 (−5,−5, 0, 1, 3, 5) 83 6

8 (3, 1, 4,−1,−3,−5) 82 6

9 (−2, 2, 0, 2, 2, 4) 80 5

10 (4, 8, 4,−2,−2,−4) 82 5

11 (−1, 2,−1, 2, 2, 4) 80 5

12 (5, 8, 3,−2,−2,−4) 82 7

13 (−6,−4,−1, 2, 4, 4) 85 6

14 (2, 2, 5,−2,−4,−4) 84 5

ω(A) = 167, DXC(A) = 135

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (5,−2, 0, 0, 2, 2) 90 5

2 (−5,−1,−3, 0, 2, 6) 91 5

3 (1,−2, 0, 1, 3, 5) 82 5

4 (9, 4, 4,−1,−3,−5) 82 5

5 (−2,−1, 3, 2, 2, 4) 80 5

6 (4, 5, 7,−2,−2,−4) 83 6

7 (−6,−2,−3, 2, 2, 6) 91 6

8 (2, 6, 1,−2,−2,−6) 90 5

9 (−5,−2,−4, 2, 2, 6) 91 5

10 (3, 6, 0,−2,−2,−6) 89 6

11 (−5,−5,−1, 2, 4, 4) 86 6

12 (3, 1, 5,−2,−4,−4) 86 7

13 (1,−2,−1, 2, 4, 4) 80 5

14 (9, 4, 5,−2,−4,−4) 83 6

136 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

ω(A) = 168, DXC(A) = 136

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (−3,−1, 5, 0, 0, 0) 97 5

2 (−1, 1, 7, 0, 0, 0) 91 5

3 (−4,−2, 5, 0, 0, 2) 98 6

4 (−4, 4,−1, 0, 0, 2) 98 5

5 (−2, 0, 7, 0, 0, 2) 90 5

6 (−2, 4,−3, 0, 0, 2) 94 4

7 (−2, 6, 1, 0, 0, 2) 92 6

8 (0, 6,−1, 0, 0, 2) 90 4

9 (−1, 2, 1, 0, 0, 6) 79 5

10 (−2, 2, 0, 0, 2, 6) 78 5

11 (0,−2, 0, 0, 4, 6) 82 5

12 (0,−1,−1, 0, 4, 6) 81 6

13 (−4,−2, 4, 1, 1, 1) 96 5

14 (−2, 0, 6,−1,−1,−1) 97 5

15 (−2, 0, 6, 1, 1, 1) 90 5

16 (0, 2, 8,−1,−1,−1) 90 6

17 (−2,−1, 4, 1, 1, 5) 82 5

18 (4, 5, 6,−1,−1,−5) 81 5

19 (−6,−4, 0, 1, 3, 5) 82 6

20 (2, 2, 4,−1,−3,−5) 82 6

21 (0,−2,−1, 1, 5, 5) 81 5

22 (10, 4, 5,−1,−5,−5) 83 6

23 (−2, 1,−1, 2, 2, 6) 79 5

24 (6, 9, 3,−2,−2,−6) 81 6

25 (0,−2,−2, 2, 4, 6) 78 4

26 (10, 6, 4,−2,−4,−6) 84 6

27 (−2,−1, 2, 3, 3, 3) 80 5

28 (4, 5, 8,−3,−3,−3) 83 6

29 (−5,−2,−5, 3, 3, 5) 88 6

30 (3, 6, 1,−3,−3,−5) 88 6

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 137

ω(A) = 169, DXC(A) = 137

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (3,−4,−2, 0, 2, 2) 96 6

2 (−3, 0, 3, 0, 4, 4) 83 7

3 (−1, 5,−2, 1, 1, 3) 90 4

4 (3, 9, 0,−1,−1,−3) 92 5

5 (−2, 2, 1, 1, 1, 5) 83 7

6 (4, 8, 3,−1,−1,−5) 83 7

7 (−2, 1, 0, 1, 1, 7) 83 5

8 (6, 9, 2,−1,−1,−7) 82 5

9 (−2, 2,−1, 1, 3, 5) 79 5

10 (6, 8, 3,−1,−3,−5) 82 6

11 (0,−2,−1, 1, 3, 7) 82 5

12 (10, 6, 3,−1,−3,−7) 83 6

13 (−6,−2,−4, 3, 3, 5) 87 6

14 (2, 6, 2,−3,−3,−5) 87 5

15 (−2, 1,−2, 3, 3, 5) 79 5

16 (6, 9, 4,−3,−3,−5) 84 6

ω(A) = 170, DXC(A) = 138

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (−1, 5,−1, 0, 0, 4) 92 5

2 (−2, 1, 1, 0, 0, 8) 86 5

3 (−5,−2, 4, 0, 2, 2) 96 5

4 (−1,−2, 6, 0, 2, 2) 92 7

5 (−2, 5,−2, 0, 2, 4) 93 5

6 (4,−1,−2, 0, 2, 4) 93 5

7 (1,−2, 1, 0, 2, 6) 83 7

8 (0,−2, 0, 0, 2, 8) 86 5

9 (−2,−1, 3, 0, 4, 4) 82 6

10 (−6,−1,−4, 0, 4, 6) 90 6

11 (−3, 5, 0, 1, 1, 3) 93 6

12 (1, 9, 2,−1,−1,−3) 92 6

13 (−6,−2,−2, 1, 1, 7) 94 4

14 (2, 6, 0,−1,−1,−7) 93 4

15 (−3,−1, 3, 1, 3, 5) 81 5

16 (5, 5, 7,−1,−3,−5) 81 5

17 (−3, 2, 0, 1, 3, 5) 83 6

18 (5, 8, 4,−1,−3,−5) 83 6

19 (−2,−2, 2, 2, 4, 4) 80 5

20 (6, 4, 8,−2,−4,−4) 84 6

21 (−3, 1,−1, 3, 3, 5) 82 5

22 (5, 9, 5,−3,−3,−5) 82 5

138 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

ω(A) = 171, DXC(A) = 139

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (−3,−4, 4, 0, 2, 2) 96 7

2 (−3, 0, 6, 0, 2, 2) 91 6

3 (−5, 3,−2, 1, 1, 3) 98 6

4 (−1, 7, 0,−1,−1,−3) 98 7

5 (−2, 4,−2, 1, 1, 5) 92 5

6 (4, 10, 0,−1,−1,−5) 92 6

7 (−5,−2,−3, 1, 1, 7) 93 5

8 (3, 6,−1,−1,−1,−7) 93 5

9 (4,−3,−1, 1, 3, 3) 92 5

10 (10, 1, 3,−1,−3,−3) 94 5

11 (−2,−2, 3, 1, 3, 5) 82 5

12 (6, 4, 7,−1,−3,−5) 84 6

13 (−3, 1, 0, 2, 2, 6) 84 6

14 (5, 9, 4,−2,−2,−6) 81 5

15 (−3,−1, 2, 2, 4, 4) 81 5

16 (5, 5, 8,−2,−4,−4) 82 5

17 (0,−3,−1, 2, 4, 6) 83 5

18 (10, 5, 5,−2,−4,−6) 82 5

19 (−6,−5,−1, 3, 3, 5) 85 7

20 (2, 3, 5,−3,−3,−5) 84 6

21 (0,−3,−2, 3, 5, 5) 81 6

22 (10, 5, 6,−3,−5,−5) 83 6

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 139

ω(A) = 172, DXC(A) = 140

No. (ε12, ε13, ε14, r, s, t) SLP(A) depth(A)

1 (2, 3, 6, 0, 0, 0) 84 5

2 (1, 2, 6, 0, 0, 2) 84 5

3 (−3, 5, 1, 0, 0, 4) 96 8

4 (0, 1, 6, 0, 0, 4) 87 5

5 (−5,−4, 2, 0, 0, 6) 80 6

6 (−2,−1, 5, 0, 0, 6) 83 6

7 (−2, 2, 2, 0, 0, 6) 84 7

8 (−2, 4,−1, 0, 0, 6) 94 5

9 (−4, 5, 0, 0, 2, 4) 94 5

10 (4,−3, 0, 0, 2, 4) 96 6

11 (−6,−4, 1, 0, 2, 6) 82 6

12 (−2,−2, 4, 0, 2, 6) 82 5

13 (3,−2,−2, 0, 2, 6) 95 5

14 (−3, 1, 0, 0, 2, 8) 79 5

15 (2, 0, 1, 0, 4, 4) 86 4

16 (−7,−1,−3, 0, 4, 6) 89 7

17 (−3,−4,−4, 0, 4, 6) 90 5

18 (1, 0, 0, 0, 4, 6) 88 5

19 (−3, 3,−4, 1, 1, 3) 94 5

20 (1, 7,−2,−1,−1,−3) 94 6

21 (−7,−2,−3, 1, 3, 7) 94 6

22 (3, 6, 1,−1,−3,−7) 92 6

23 (0,−3, 0, 1, 3, 7) 85 6

24 (10, 5, 4,−1,−3,−7) 81 5

25 (−7,−4,−1, 1, 5, 5) 86 7

26 (3, 2, 5,−1,−5,−5) 86 7

27 (−6,−5,−1, 1, 5, 5) 87 8

28 (4, 1, 5,−1,−5,−5) 86 6

29 (−3,−1, 5, 2, 2, 2) 92 6

30 (1, 3, 9,−2,−2,−2) 93 5

31 (−7,−2,−4, 2, 4, 6) 90 6

32 (3, 6, 2,−2,−4,−6) 89 6

33 (−6,−5,−2, 4, 4, 4) 87 7

34 (2, 3, 6,−4,−4,−4) 87 6

35 (−3,−2, 1, 4, 4, 4) 82 5

36 (5, 6, 9,−4,−4,−4) 83 6

140 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

B A List of Involutory MDS Matrices with Depth 3

ω(A) = 148, DXC(A) = 116, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (−2,−1, 2, 0, 0, 0) 90

2 (−2, 1,−2, 0, 0, 2) 90

ω(A) = 149, DXC(A) = 117, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (−3,−2, 1, 1, 1, 1) 90

2 (−1, 0, 3,−1,−1,−1) 90

ω(A) = 150, DXC(A) = 118, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (−3,−2, 2, 0, 0, 2) 91

2 (−3, 1,−1, 0, 0, 2) 90

3 (−4,−2, 1, 0, 2, 2) 90

4 (0,−3,−2, 0, 2, 2) 93

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 141

ω(A) = 151, DXC(A) = 119, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (−4, 0,−2, 1, 1, 3) 94

2 (0, 4, 0,−1,−1,−3) 94

ω(A) = 152, DXC(A) = 120, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (−4, 0,−1, 0, 0, 4) 96

2 (−3, 0,−3, 1, 1, 3) 93

3 (1, 4,−1,−1,−1,−3) 94

ω(A) = 153, DXC(A) = 121, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (−3,−3, 1, 0, 2, 2) 93

2 (−4,−3, 0, 2, 2, 2) 94

3 (0, 1, 4,−2,−2,−2) 95

ω(A) = 154, DXC(A) = 122, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (−3, 0,−2, 0, 0, 4) 95

2 (−1,−4,−2, 0, 2, 4) 95

3 (−4,−3, 1, 1, 1, 3) 94

4 (0, 1, 3,−1,−1,−3) 93

ω(A) = 155, DXC(A) = 123, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (−5, 0,−2, 0, 2, 4) 96

ω(A) = 156, DXC(A) = 124, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (−4, 0,−3, 0, 2, 4) 97

3 (5, 0, 1,−1,−3,−3) 96

ω(A) = 157, DXC(A) = 125, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

2 (1, 1, 4,−1,−3,−3) 95

3 (−4,−4, 0, 1, 3, 3) 96

4 (2, 0, 4,−1,−3,−3) 97

ω(A) = 158, DXC(A) = 126, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (−4,−3, 2, 0, 0, 4) 97

2 (−4,−4, 1, 0, 2, 4) 96

5 (1, 5, 0,−1,−1,−5) 97

7 (2, 5, 0,−2,−2,−4) 97

142 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

ω(A) = 160, DXC(A) = 128, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (1, 2, 5, 0, 0, 0) 94

2 (0, 1, 5, 0, 0, 2) 93

3 (0, 4, 2, 0, 0, 2) 94

4 (1, 4, 1, 0, 0, 2) 93

5 (0, 0, 4, 0, 2, 2) 92

6 (0, 1, 4, 1, 1, 1) 93

7 (2, 3, 6,−1,−1,−1) 93

9 (2, 5,−1,−1,−1,−5) 98

11 (1, 5, 1,−2,−2,−4) 97

ω(A) = 161, DXC(A) = 129, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (3, 0, 1, 0, 2, 2) 93

3 (0, 3, 0, 1, 1, 3) 92

4 (4, 7, 2,−1,−1,−3) 92

6 (1, 2, 4,−2,−2,−4) 98

ω(A) = 162, DXC(A) = 130, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (0, 3, 1, 0, 0, 4) 92

2 (−1, 1, 4, 0, 2, 2) 93

3 (2, 0, 0, 0, 2, 4) 92

6 (−1, 0, 4, 1, 1, 3) 92

7 (3, 4, 6,−1,−1,−3) 93

9 (1, 2, 5,−3,−3,−3) 98

ω(A) = 163, DXC(A) = 131, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (−1, 3, 1, 1, 1, 3) 94

2 (3, 7, 3,−1,−1,−3) 93

6 (6, 1, 1,−1,−3,−5) 96

7 (−1, 0, 3, 2, 2, 2) 94

8 (3, 4, 7,−2,−2,−2) 94

ω(A) = 164, DXC(A) = 132, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (−1, 0, 5, 0, 0, 4) 93

3 (−4,−1,−2, 0, 0, 6) 99

4 (−2, 0, 4, 0, 2, 4) 92

5 (−1, 3, 0, 0, 2, 4) 92

9 (2,−1, 0, 1, 3, 3) 93

10 (8, 3, 4,−1,−3,−3) 92

12 (2, 5, 1,−1,−3,−5) 100

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 143

ω(A) = 165, DXC(A) = 133, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (1,−1, 0, 0, 4, 4) 93

4 (−1, 2, 0, 1, 1, 5) 92

5 (5, 8, 2,−1,−1,−5) 92

6 (−2, 0, 3, 1, 3, 3) 94

7 (4, 4, 7,−1,−3,−3) 92

8 (−1,−1, 3, 1, 3, 3) 90

9 (5, 3, 7,−1,−3,−3) 93

10 (1,−1,−1, 1, 3, 5) 90

11 (9, 5, 3,−1,−3,−5) 95

13 (6, 1, 2,−2,−4,−4) 97

ω(A) = 166, DXC(A) = 134, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (−1, 3, 2, 0, 0, 4) 95

2 (−2, 3, 1, 0, 2, 4) 93

3 (−1,−1, 4, 0, 2, 4) 93

4 (2,−1, 1, 0, 2, 4) 94

6 (1,−1, 0, 0, 2, 6) 93

9 (−2, 2, 0, 2, 2, 4) 94

10 (4, 8, 4,−2,−2,−4) 92

11 (−1, 2,−1, 2, 2, 4) 90

12 (5, 8, 3,−2,−2,−4) 93

14 (2, 2, 5,−2,−4,−4) 100

ω(A) = 167, DXC(A) = 135, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (5,−2, 0, 0, 2, 2) 97

3 (1,−2, 0, 1, 3, 5) 94

4 (9, 4, 4,−1,−3,−5) 93

5 (−2,−1, 3, 2, 2, 4) 90

6 (4, 5, 7,−2,−2,−4) 93

8 (2, 6, 1,−2,−2,−6) 100

10 (3, 6, 0,−2,−2,−6) 98

13 (1,−2,−1, 2, 4, 4) 91

14 (9, 4, 5,−2,−4,−4) 95

144 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

ω(A) = 168, DXC(A) = 136, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (−3,−1, 5, 0, 0, 0) 99

2 (−1, 1, 7, 0, 0, 0) 98

3 (−4,−2, 5, 0, 0, 2) 101

4 (−4, 4,−1, 0, 0, 2) 102

5 (−2, 0, 7, 0, 0, 2) 96

6 (−2, 4,−3, 0, 0, 2) 100

7 (−2, 6, 1, 0, 0, 2) 97

8 (0, 6,−1, 0, 0, 2) 96

9 (−1, 2, 1, 0, 0, 6) 93

10 (−2, 2, 0, 0, 2, 6) 92

11 (0,−2, 0, 0, 4, 6) 94

12 (0,−1,−1, 0, 4, 6) 93

13 (−4,−2, 4, 1, 1, 1) 101

14 (−2, 0, 6,−1,−1,−1) 98

15 (−2, 0, 6, 1, 1, 1) 96

16 (0, 2, 8,−1,−1,−1) 98

17 (−2,−1, 4, 1, 1, 5) 93

18 (4, 5, 6,−1,−1,−5) 93

21 (0,−2,−1, 1, 5, 5) 91

22 (10, 4, 5,−1,−5,−5) 96

23 (−2, 1,−1, 2, 2, 6) 90

24 (6, 9, 3,−2,−2,−6) 95

25 (0,−2,−2, 2, 4, 6) 88

26 (10, 6, 4,−2,−4,−6) 98

27 (−2,−1, 2, 3, 3, 3) 91

28 (4, 5, 8,−3,−3,−3) 93

30 (3, 6, 1,−3,−3,−5) 99

ω(A) = 169, DXC(A) = 137, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (3,−4,−2, 0, 2, 2) 101

2 (−3, 0, 3, 0, 4, 4) 92

3 (−1, 5,−2, 1, 1, 3) 98

4 (3, 9, 0,−1,−1,−3) 97

5 (−2, 2, 1, 1, 1, 5) 95

6 (4, 8, 3,−1,−1,−5) 94

7 (−2, 1, 0, 1, 1, 7) 93

8 (6, 9, 2,−1,−1,−7) 93

9 (−2, 2,−1, 1, 3, 5) 90

10 (6, 8, 3,−1,−3,−5) 93

11 (0,−2,−1, 1, 3, 7) 91

12 (10, 6, 3,−1,−3,−7) 96

14 (2, 6, 2,−3,−3,−5) 100

15 (−2, 1,−2, 3, 3, 5) 88

16 (6, 9, 4,−3,−3,−5) 96

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 145

ω(A) = 170, DXC(A) = 138, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (−1, 5,−1, 0, 0, 4) 97

2 (−2, 1, 1, 0, 0, 8) 94

3 (−5,−2, 4, 0, 2, 2) 102

4 (−1,−2, 6, 0, 2, 2) 97

5 (−2, 5,−2, 0, 2, 4) 97

6 (4,−1,−2, 0, 2, 4) 98

7 (1,−2, 1, 0, 2, 6) 95

8 (0,−2, 0, 0, 2, 8) 94

9 (−2,−1, 3, 0, 4, 4) 91

11 (−3, 5, 0, 1, 1, 3) 98

12 (1, 9, 2,−1,−1,−3) 99

15 (−3,−1, 3, 1, 3, 5) 94

16 (5, 5, 7,−1,−3,−5) 92

17 (−3, 2, 0, 1, 3, 5) 93

18 (5, 8, 4,−1,−3,−5) 92

19 (−2,−2, 2, 2, 4, 4) 88

20 (6, 4, 8,−2,−4,−4) 94

21 (−3, 1,−1, 3, 3, 5) 92

22 (5, 9, 5,−3,−3,−5) 93

ω(A) = 171, DXC(A) = 139, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (−3,−4, 4, 0, 2, 2) 102

2 (−3, 0, 6, 0, 2, 2) 98

3 (−5, 3,−2, 1, 1, 3) 105

4 (−1, 7, 0,−1,−1,−3) 106

5 (−2, 4,−2, 1, 1, 5) 97

6 (4, 10, 0,−1,−1,−5) 99

9 (4,−3,−1, 1, 3, 3) 97

10 (10, 1, 3,−1,−3,−3) 99

11 (−2,−2, 3, 1, 3, 5) 90

12 (6, 4, 7,−1,−3,−5) 93

13 (−3, 1, 0, 2, 2, 6) 94

14 (5, 9, 4,−2,−2,−6) 96

15 (−3,−1, 2, 2, 4, 4) 96

16 (5, 5, 8,−2,−4,−4) 92

17 (0,−3,−1, 2, 4, 6) 92

18 (10, 5, 5,−2,−4,−6) 94

21 (0,−3,−2, 3, 5, 5) 89

22 (10, 5, 6,−3,−5,−5) 96

146 CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS

ω(A) = 172, DXC(A) = 140, depth(A) = 3

No. (ε12, ε13, ε14, r, s, t) SLP∗(A)

1 (2, 3, 6, 0, 0, 0) 98

2 (1, 2, 6, 0, 0, 2) 100

3 (−3, 5, 1, 0, 0, 4) 104

4 (0, 1, 6, 0, 0, 4) 98

6 (−2,−1, 5, 0, 0, 6) 94

7 (−2, 2, 2, 0, 0, 6) 96

8 (−2, 4,−1, 0, 0, 6) 99

9 (−4, 5, 0, 0, 2, 4) 103

10 (4,−3, 0, 0, 2, 4) 102

12 (−2,−2, 4, 0, 2, 6) 92

13 (3,−2,−2, 0, 2, 6) 101

14 (−3, 1, 0, 0, 2, 8) 92

15 (2, 0, 1, 0, 4, 4) 102

18 (1, 0, 0, 0, 4, 6) 103

19 (−3, 3,−4, 1, 1, 3) 103

20 (1, 7,−2,−1,−1,−3) 103

22 (3, 6, 1,−1,−3,−7) 103

23 (0,−3, 0, 1, 3, 7) 93

24 (10, 5, 4,−1,−3,−7) 96

26 (3, 2, 5,−1,−5,−5) 101

29 (−3,−1, 5, 2, 2, 2) 97

30 (1, 3, 9,−2,−2,−2) 100

32 (3, 6, 2,−2,−4,−6) 103

34 (2, 3, 6,−4,−4,−4) 102

35 (−3,−2, 1, 4, 4, 4) 89

36 (5, 6, 9,−4,−4,−4) 94

CONSTRUCTING LOW-LATENCY INVOLUTORY MDS MATRICES WITH LIGHTWEIGHT CIRCUITS 147

Part III

Cryptanalysis

148

Chapter 8

Improved Division PropertyBased Cube AttacksExploiting AlgebraicProperties of Superpoly

Publication data

Qingju Wang, Yonglin Hao, Yosuke Todo, Chaoyun Li, Takanori Isobe andWilli Meier: Improved Division Property Based Cube Attacks ExploitingAlgebraic Properties of Superpoly. Advances in Cryptology-CRYPTO 2018 :275-305, 2018

Contributions

Principal author

149

Improved Division Property Based Cube AttacksExploiting Algebraic Properties of Superpoly

Qingju Wang1,2,3, Yonglin Hao4?, Yosuke Todo5?, Chaoyun Li6?,Takanori Isobe7, and Willi Meier8

1 Shanghai Jiao Tong Uninversity, China2 Technical University of Denmark3 SnT, University of Luxembourg

4 State Key Laboratory of Cryptology, Beijing, China5 NTT Secure Platform Laboratories, Japan

6 imec-COSIC, Dept. Electrical Engineering (ESAT), KU Leuven, Belgium7 University of Hyogo, Japan

8 FHNW, [email protected],[email protected],[email protected],

[email protected],[email protected],[email protected]

Abstract. The cube attack is an important technique for the cryptanal-ysis of symmetric key primitives, especially for stream ciphers. Aiming atrecovering some secret key bits, the adversary reconstructs a superpolywith the secret key bits involved, by summing over a set of the plain-texts/IV which is called a cube. Traditional cube attack only exploitslinear/quadratic superpolies. Moreover, for a long time after its pro-posal, the size of the cubes has been largely confined to an experimentalrange, e.g., typically 40. These limits were first overcome by the divisionproperty based cube attacks proposed by Todo et al. at CRYPTO 2017.Based on MILP modelled division property, for a cube (index set) I, theyidentify the small (index) subset J of the secret key bits involved in theresultant superpoly. During the precomputation phase which dominatesthe complexity of the cube attacks, 2|I|+|J| encryptions are required torecover the superpoly. Therefore, their attacks can only be available whenthe restriction |I|+ |J | < n is met.In this paper, we introduced several techniques to improve the divisionproperty based cube attacks by exploiting various algebraic propertiesof the superpoly.1. We propose the “flag” technique to enhance the preciseness of MILP

models so that the proper non-cube IV assignments can be identifiedto obtain a non-constant superpoly.

2. A degree evaluation algorithm is presented to upper bound the de-gree of the superpoly. With the knowledge of its degree, the super-poly can be recovered without constructing its whole truth table.This enables us to explore larger cubes I’s even if |I|+ |J | ≥ n.

3. We provide a term enumeration algorithm for finding the monomialsof the superpoly, so that the complexity of many attacks can befurther reduced.

? Corresponding authors.

150 IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS

As an illustration, we apply our techniques to attack the initialization ofseveral ciphers. To be specific, our key recovery attacks have mountedto 839-round Trivium, 891-round Kreyvium, 184-round Grain-128a and750-round Acorn respectively.

Keywords: Cube attack, Division Property, MILP, Trivium, Kreyvium,Grain-128a, Acorn, Clique

1 Introduction

Cube attack, proposed by Dinur and Shamir [1] in 2009, is one of the generalcryptanalytic techniques of analyzing symmetric-key cryptosystems. After itsproposal, cube attack has been successfully applied to various ciphers, includ-ing stream ciphers [2,3,4,5,6], hash functions [7,8,9], and authenticated encryp-tions [10,11]. For a cipher with n secret variables x = (x1, x2, . . . , xn) and mpublic variables v = (v1, v2, . . . , vm), we can regard the algebraic normal form(ANF) of output bits as a polynomial of x and v, denoted as f(x,v). For a ran-domly chosen set I = i1, i2, ..., i|I| ⊂ 1, . . . ,m, f(x,v) can be representeduniquely as

f(x,v) = tI · p(x,v) + q(x,v),

where tI = vi1 · · · vi|I| , p(x,v) only relates to vs’s (s /∈ I) and the secret keybits x, and q(x,v) misses at least one variable in tI . When vs’s (s /∈ I) and xare assigned statically, the value of p(x,v) can be computed by summing theoutput bit f(x,v) over a structure called cube, denoted as CI , consisting of 2|I|different v vectors with vi, i ∈ I being active (traversing all 0-1 combinations)and non-cube indices vs, s /∈ I being static constants. Traditional cube attacksare mainly concerned about linear or quadratic superpolies. By collecting linearor quadratic equations from the superpoly, the attacker can recover some secretkey bits information during the online phase. Aiming to mount distinguishingattack by property testing, cube testers are obtained by evaluating superpoliesof carefully selected cubes. In [2], probabilistic tests are applied to detect somealgebraic properties such as constantness, low degree and sparse monomial dis-tribution. Moreover, cube attacks and cube testers are acquired experimentallyby summing over randomly chosen cubes. So the sizes of the cubes are largelyconfined. Breakthroughs have been made by Todo et al. in [12] where they in-troduce the bit-based division property, a tool for conducting integral attacks1,to the realm of cube attack. With the help of mixed integer linear programming(MILP) aided division property, they can identify the variables excluded fromthe superpoly and explore cubes with larger size, e.g.,72 for 832-round Trivium.This enables them to improve the traditional cube attack.

Division property, as a generalization of the integral property, was first pro-posed at EUROCRYPT 2015 [13]. With division property, the propagation of1 Integral attacks also require to traverse some active plaintext bits and check whetherthe summation of the corresponding ciphertext bits have zero-sum property, whichequals to check whether the superpoly has p(x,v) ≡ 0.

IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS 151

the integral characteristics can be deduced in a more accurate manner, and oneprominent application is the first theoretic key recovery attack on full MISTY1[14].

The original division property can only be applied to word-oriented primi-tives. At FSE 2016, bit-based division property [15] was proposed to investigateintegral characteristics for bit-based block ciphers. With the help of divisionproperty, the propagation of the integral characteristics can be represented bythe operations on a set of 0-1 vectors identifying the bit positions with the zero-sum property. Therefore, for the first time, integral characteristics for bit-basedblock ciphers Simon32 and Simeck32 have been proved. However, the sizes of the0-1 vector sets are exponential to the block size of the ciphers. Therefore, as hasbeen pointed out by the authors themselves, the deduction of bit-based divisionproperty under their framework requires high memory for block ciphers withlarger block sizes, which largely limits its applications. Such a problem has beensolved by Xiang et al. [16] at ASIACRYPT 2016 by utilizing the MILP model.The operations on 0-1 vector sets are transformed to imposing division propertyvalues (0 or 1) to MILP variables, and the corresponding integral characteristicsare acquired by solving the models with MILP solvers like Gurobi [17]. Withthis method, they are able to give integral characteristics for block ciphers withblock sizes much larger than 32 bits. Xiang et al.’s method has now been appliedto many other ciphers for improved integral attacks [18,19,20,21].

In [12], Todo et al. adapt Xiang et al.’s method by taking key bits into theMILP model. With this technique, a set of key indices J = j1, j2, . . . , j|J| ⊂1, . . . , n is deduced for the cube CI s.t. p(x,v) can only be related to the keybits xj ’s (j ∈ J). With the knowledge of I and J , Todo et al. can recover 1-bitof secret-key-related information by executing two phases. In the offline phase,a proper assignment to the non-cube IVs, denoted by IV ∈ Fm2 , is determinedensuring p(x, IV ) non-constant. Also in this phase, the whole truth table ofp(x, IV ) is constructed through cube summations. In the online phase, the exactvalue of p(x, IV ) is acquired through a cube summation and the candidatevalues of xj ’s (j ∈ J) are identified by checking the precomputed truth table. Aproportion of wrong keys are filtered as long as p(x, IV ) is non-constant.

Due to division property and the power of MILP solver, cubes of largerdimension can now be used for key recoveries. By using a 72-dimensional cube,Todo et al. propose a theoretic cube attack on 832-round Trivium. They alsolargely improve the previous best attacks on other primitives namely Acorn,Grain-128a and Kreyvium [12,22]. It is not until recently that the result onTrivium has been improved by Liu et al. [6] mounting to 835 rounds with anew method called the correlation cube attack. The correlation attack is basedon the numeric mapping technique first appeared in [23] originally used forconstructing zero-sum distinguishers.

1.1 Motivations.

Due to [12,22], the power of cube attacks has been enhanced significantly, how-ever, there are still problems remaining unhandled that we will reveal explicitly.

152 IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS

Finding proper IV ’s may require multiple trials. As is mentioned above,the superpoly can filter wrong keys only if a proper IV assignment IV ∈ Fm2 inthe constant part of IVs is found such that the corresponding superpoly p(x, IV )is non-constant. The MILP model in [12,22] only proves the existence of theproper IV ’s but finding them may not be easy. According to practical experi-ments, there are quite some IV ’s making p(x, IV ) ≡ 0. Therefore, t ≥ 1 differentIV ’s might be trailed in the precomputation phase before finding a proper one.Since each IV requires to construct a truth table with complexity 2|I|+|J|, theoverall complexity of the offline phase can be t× 2|I|+|J|. When large cubes areused (|I| is big) or many key bits are involved (|J | is large), such a complex-ity might be at the risk of exceeding the brute-force bound 2n. Therefore, twoassumptions are made to validate their cube attacks as follows.

Assumption 1 (Strong Assumption) For a cube CI , there are many valuesin the constant part of IV whose corresponding superpoly is balanced.

Assumption 2 (Weak Assumption) For a cube CI , there are many values inthe constant part of IV whose corresponding superpoly is not a constant function.

These assumptions are proposed to guarantee the validity of the attacks as longas |I| + |J | < n, but the rationality of such assumptions is hard to be proved,especially when |I|+ |J | are so close to n in many cases. The best solution is toevaluate different IVs in the MILP model so that the proper IV of the constantpart of IVs and the set J are determined simultaneously before implementingthe attack.Restriction of |I| + |J | < n. The superpoly recovery has always been domi-nating the complexity of the cube attack, especially in [12], the attacker knowsno more information except which secret key bits are involved in the superpoly.Then she/he has to first construct the whole truth table for the superpoly inthe offline phase. In general, the truth-table construction requires repeating thecube summation 2|J| times, and makes the complexity of the offline phase about2|I|+|J|. Apparently, such an attack can only be meaningful if |I|+|J | < n, wheren is the number of secret variables. The restriction of |I|+ |J | < n barricades theadversary from exploiting cubes of larger dimension or mounting more rounds(where |J | may expand). This restriction can be removed if we can avoid thetruth table construction in the offline phase.

1.2 Our Contributions.

This paper improves the existing cube attacks by exploiting the algebraic prop-erties of the superpoly, which include the (non-)constantness, low degree andsparse monomial distribution properties. Inspired by the division property basedcube attack work of Todo et al. in [12], we formulate all these properties in oneframework by developing more precise MILP models, thus we can reduce thecomplexity of superpoly recovery.

This also enables us to attack more rounds, or employ even larger cubes.Similar to [12], our methods regard the cryptosystem as a non-blackbox poly-nomial and can be used to evaluate cubes with large dimension compared with

IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS 153

traditional cube attack and cube tester. In the following, our contributions aresummarized into five aspects.

Flag technique for finding proper IV assignments. The previous MILPmodel in [12] has not taken the effect of constant 0/1 bits of the constant partof IVs into account. In their model, the active bits are initialized with divisionproperty value 1 and other non-active bits are all initialized to division propertyvalue 0. The non-active bits include constant part of IVs, together with somesecret key bits and state bits that are assigned statically to 0/1 according tothe specification of ciphers. It has been noticed in [22] that constant 0 bits canaffect the propagation of division property. But we should pay more attention toconstant 1 bits since constant 0 bits can be generated in the updating functionsdue to the XOR of even number of constant 1’s. Therefore, we propose a for-mal technique which we refer as the “flag” technique where the constant 0 andconstant 1 as well as other non-constant MILP variables are treated properly.With this technique, we are able to find proper assignments to constant IVs(IV ) that makes the corresponding superpoly (p(x, IV )) non-constant. Withthis technique, proper IVs can now be found with MILP model rather thantime-consuming trial & summations in the offline phase as in [12,22]. Accordingto our experiments, the flag technique has a perfect 100% accuracy for findingproper non-cube IV assignments in most cases. Note that our flag technique haspartially proved the availability of the two assumptions since we are able to findproper IV ’s in all our attacks.Degree evaluation for going beyond the |I| + |J | < n restriction. Toavoid constructing the whole truth table using cube summations, we introducea new technique that can upper bound the algebraic degree, denoted as d, of thesuperpoly using the MILP-aided bit-based division property. With the knowledgeof its degree d (and key indices J), the superpoly can be represented with its(|J|≤d)coefficients rather than the whole truth table, where

(|J|≤d)is defined as

( |J |≤ d

):=

d∑

i=0

(|J |i

). (1)

When d = |J |, the complexity by our new method and that by [12] are equal.For d < |J |, we know that the coefficients of the monomials with degree higherthan d are constantly 0. The complexity of superpoly recovery can be reducedfrom 2|I|+|J| to 2|I| ×

(|J|≤d). In fact, for some lightweight ciphers, the algebraic

degrees of their round functions are quite low. Therefore, the degrees d areoften much smaller than the number of involved key bits |J |, especially whenhigh-dimensional cubes are used. Since d |J | for all previous attacks, we canimprove the complexities of previous results and use larger cubes mounting tomore rounds even if |I|+ |J | ≥ n.Precise Term enumeration for further lowering complexities. Since thesuperpolies are generated through iterations, the number of higher-degree mono-mials in the superpoly is usually much smaller than its low-degree counterpart.

154 IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS

For example, when the degree of the superpoly is d < |J |, the number of d-degreemonomials are usually much smaller than the upper bound

(|J|d

). We propose

a MILP model technique for enumerating all t-degree (t = 1, . . . , d) monomialsthat may appear in the superpoly, so that the complexities of several attacks arefurther reduced.Relaxed Term enumeration. For some primitives (such as 750-round Acorn),our MILP model can only enumerate the d-degree monomials since the num-ber of lower-degree monomials are too large to be exhausted. Alternately, fort = 1, . . . , d− 1, we can find a set of key indices JRt ⊆ J s.t. all t-degree mono-mials in the superpoly are composed of xj , j ∈ JRt. As long as |JRt| < |J | forsome t = 1, . . . , d− 1, we can still reduce the complexities of superpoly recovery.

Combining the flag technique and the degree evaluation above, we are able tolower the complexities of the previous best cube attacks in [6,12,22]. Particularly,we can further provide key recovery results on 839-round Trivium2, 891-roundKreyvium, 184-round Grain-128a, and 750-round Acorn. Furthermore, the pre-cise & relaxed term enumeration techniques allow us to lower the complexitiesof 833-round Trivium, 849-round Kreyvium, 184-round Grain-128a and 750-round Acorn. Our concrete results are summarized in Table 1.3 In [26], Todoet al. revisit the fast correlation attack and analyze the key-stream generator(rather than the initialization) of the Grain family (Grain-128a, Grain-128, andGrain-v1). As a result, the key-stream generators of the Grain family are in-secure. In other words, they can recover the internal state after initializationmore efficiently than by exhaustive search. And the secret key is recovered fromthe internal state because the initialization is a public permutation. To the bestof our knowledge, all our results of Kreyvium, Grain-128a, and Acorn are thecurrent best key recovery attacks on the initialization of the targeted ciphers.However, none of our results seems to threaten the security of the ciphers.Clique view of the superpoly recovery. In order to lower the complexityof the superpoly recovery, the term enumeration technique has to execute manyMILP instances, which is difficult for some applications. We represent the re-sultant superpoly as a graph, so that we can utilize the clique concept from thegraph theory to upper bound the complexity of the superpoly recovery phase,without requiring MILP solver as highly as the term enumeration technique.

Organizations. Sect. 2 provides the background of cube attacks, division prop-erty, MILP model etc. Sect. 3 introduces our flag technique for identifying properassignments to non-cube IVs. Sect. 4 details the degree evaluation technique up-per bounding the algebraic degree of the superpoly. Combining the flag techniqueand degree evaluation, we give improved key recovery cube attacks on 4 targetedciphers in Sect. 5. The precise & relaxed term enumeration as well as their ap-plications are given in Sect. 6. We revisit the term enumeration technique fromthe clique overview in Sect. 7. Finally, we conclude in Sect. 8.2 While this paper was under submission, Fu et al. released a paper on ePrint [24] andclaimed that 855 rounds initialization of Trivium can be attacked.

3 Because of the page limitation, we put part of detailed applications about Kreyvium,Grain-128a and Acorn in the full version [25].

IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS 155

Table 1. Summary of our cube attack results

Applications #Full Rounds #Rounds Cube size |J | Complexity Reference

Trivium 1152

799 32 † - practical [4]832 72 5 277 [12,22]833 73 7 276.91 Sect. 6.1835 37/36∗ - 275 [6]836 78 1 279 Sect. 5.1839 78 1 279 Sect. 5.1

Kreyvium 1152

849 61 23 284 [22]849 61 23 281.7 Full version [25]849 61 23 273.41 Sect. 6.2872 85 39 2124 [22]872 85 39 294.61 Full version [25]891 113 20 2120.73 Full version [25]

Grain-128a 256

177 33 - practical [27]182 88 18 2106 [12,22]182 88 14 2102 Full version [25]183 92 16 2108 [12,22]183 92 16 2108 − 296.08 Full version [25]184 95 21 2109.61 Sect. 6.3

Acorn 1792

503 5 ‡ - practical ‡ [5]704 64 58 2122 [12,22]704 64 63 277.88 Sect. 6.4750 101 81 2125.71 Full version [25]750 101 81 2120.92 Sect. 6.4

† 18 cubes whose size is from 32 to 37 are used, where the most efficient cube is shownto recover one bit of the secret key.∗ 28 cubes of sizes 36 and 37 are used, following the correlation cube attack scenario.It requires an additional 251 complexity for preprocessing.‡ The attack against 477 rounds is mainly described for the practical attack in [5].However, when the goal is the superpoly recovery and to recover one bit of the secretkey, 503 rounds are attacked.

2 Preliminaries

2.1 Mixed Integer Linear Programming

MILP is an optimization or feasibility program whose variables are restricted tointegers. A MILP modelM consists of variablesM.var, constraintsM.con, andan objective functionM.obj. MILP models can be solved by solvers like Gurobi[17]. If there is no feasible solution at all, the solver simply returns infeasible. Ifno objective function is assigned, the MILP solver only evaluates the feasibilityof the model. The application of MILP model to cryptanalysis dates back to

156 IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS

the year 2011 [28], and has been widely used for searching characteristics cor-responding to various methods such as differential [29,30], linear [30], impossibledifferential [31,32], zero-correlation linear [31], and integral characteristics withdivision property [16]. We will detail the MILP model of [16] later in this section.

2.2 Cube Attack

Considering a stream cipher with n secret key bits x = (x1, x2, . . . , xn) and mpublic initialization vector (IV) bits v = (v1, v2, . . . , vm). Then, the first outputkeystream bit can be regarded as a polynomial of x and v referred as f(x,v).For a set of indices I = i1, i2, . . . , i|I| ⊂ 1, 2, . . . , n, which is referred as cubeindices and denote by tI the monomial as tI = vi1 · · · vi|I| , the algebraic normalform (ANF) of f(x,v) can be uniquely decomposed as

f(x,v) = tI · p(x,v) + q(x,v),

where the monomials of q(x,v)miss at least one variable from vi1 , vi2 , . . . , vi|I|.Furthermore, p(x,v), referred as the superpoly in [1], is irrelevant to vi1 , vi2 , . . . ,vi|I|. The value of p(x,v) can only be affected by the secret key bits x and theassignment to the non-cube IV bits vs (s /∈ I). For a secret key x and anassignment to the non-cube IVs IV ∈ Fm2 , we can define a structure called cube,denoted as CI(IV ), consisting of 2|I| 0-1 vectors as follows:

CI(IV ) := v ∈ Fm2 : v[i] = 0/1, i ∈ I∧

v[s] = IV [s], s /∈ I. (2)

It has been proved by Dinur and Shamir [1] that the value of superpoly pcorresponding to the key x and the non-cube IV assignment IV can be computedby summing over the cube CI(IV ) as follows:

p(x, IV ) =⊕

v∈CI(IV )

f(x,v). (3)

In the remainder of this paper, we refer to the value of the superpoly correspond-ing to the assignment IV in Eq. (3) as pIV (x) for short. We use CI as the cubecorresponding to arbitrary IV setting in Eq. (2). Since CI is defined accordingto I, we may also refer I as the “cube” without causing ambiguities. The size ofI, denoted as |I|, is also referred as the dimension of the cube.

Note: since the superpoly p is irrelevant to cube IVs vi, i ∈ I, the value ofIV [i], i ∈ I cannot affect the result of the summation in Eq. (3) at all. Thereforein Sect. 5, our IV [i]’s (i ∈ I) are just assigned randomly to 0-1 values.

2.3 Bit-Based Division Property and its MILP Representation

At 2015, the division property, a generalization of the integral property, was pro-posed in [13] with which better integral characteristics for word-oriented cryp-tographic primitives have been detected. Later, the bit-based division propertywas introduced in [15] so that the propagation of integral characteristics canbe described in a more precise manner. The definition of the bit-based divisionproperty is as follows:

IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS 157

Definition 1 ((Bit-Based) Division Property). Let X be a multiset whoseelements take a value of Fn2 . Let K be a set whose elements take an n-dimensionalbit vector. When the multiset X has the division property D1n

K , it fulfils the fol-lowing conditions:

x∈Xxu =

unknown if there exist k ∈ K s.t. u k,

0 otherwise,

where u k if ui ≥ ki for all i, and xu =∏ni=1 x

uii .

When the basic bitwise operations COPY, XOR, AND are applied to the ele-ments in X, transformations of the division property should also be made fol-lowing the propagation corresponding rules copy, xor, and proved in [13,15].Since round functions of cryptographic primitives are combinations of bitwiseoperations, we only need to determine the division property of the chosen plain-texts, denoted by D1n

K0. Then, after r-round encryption, the division property of

the output ciphertexts, denoted by D1n

Kr , can be deduced according to the roundfunction and the propagation rules. More specifically, when the plaintext bitsat index positions I = i1, i2, . . . , i|I| ⊂ 1, 2, . . . , n are active (the active bitstraverse all 2|I| possible combinations while other bits are assigned to static 0/1values), the division property of such chosen plaintexts is D1n

k , where ki = 1 ifi ∈ I and ki = 0 otherwise. Then, the propagation of the division property fromD1n

k is evaluated as

k := K0 → K1 → K2 → · · · → Kr,

where DKi is the division property after i-round propagation. If the divisionproperty Kr does not have an unit vector ei whose only ith element is 1, the ithbit of r-round ciphertexts is balanced.

However, when round r gets bigger, the size of Kr expands exponentiallytowards O(2n) requiring huge memory resources. So the bit-based division prop-erty has only been applied to block ciphers with tiny block sizes, such as Simon32and Simeck32 [15]. This memory-crisis has been solved by Xiang et al. using theMILP modeling method.

Propagation of Division Property with MILP. At ASIACRYPT 2016,Xiang et al. first introduced a new concept division trail defined as follows:

Definition 2 (Division Trail [16]). Let us consider the propagation of thedivision property k def

= K0 → K1 → K2 → · · · → Kr. Moreover, for anyvector k∗i+1 ∈ Ki+1, there must exist a vector k∗i ∈ Ki such that k∗i can propa-gate to k∗i+1 by the propagation rule of the division property. Furthermore, for(k0,k1, . . . ,kr) ∈ (K0 × K1 × · · · × Kr) if ki can propagate to ki+1 for alli ∈ 0, 1, . . . , r − 1, we call (k0 → k1 → · · · → kr) an r-round division trail.

Let Ek be the target r-round iterated cipher. Then, if there is a division trailk0

Ek−−→ kr = ej (j = 1, ..., n), the summation of jth bit of the ciphertexts is

158 IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS

unknown; otherwise, if there is no division trial s.t. k0Ek−−→ kr = ej , we know

the ith bit of the ciphertext is balanced (the summation of the ith bit is constant0). Therefore, we have to evaluate all possible division trails to verify whethereach bit of ciphertexts is balanced or not. Xiang et al. proved that the basicpropagation rules copy, xor, and of the division property can be translated assome variables and constraints of an MILP model. With this method, all possibledivision trials can be covered with an MILP modelM and the division propertyof particular output bits can be acquired by analyzing the solutions of the M.After Xiang et al.’s work, some simplifications have been made to the MILPdescriptions of copy, xor, and in [18,12]. We present the current simplest MILP-based copy, xor, and as follows:

Proposition 1 (MILPModel for COPY [18]). Let a COPY−−−−→ (b1, b2, . . . , bm)be a division trail of COPY. The following inequalities are sufficient to describethe propagation of the division property for copy.

M.var ← a, b1, b2, . . . , bm as binary.M.con← a = b1 + b2 + · · ·+ bm

Proposition 2 (MILP Model for XOR [18]). Let (a1, a2, . . . , am)XOR−−−→ b

be a division trail of XOR. The following inequalities are sufficient to describethe propagation of the division property for xor.

M.var ← a1, a2, . . . , am, b as binary.M.con← a1 + a2 + · · ·+ am = b

Proposition 3 (MILP Model for AND [12]). Let (a1, a2, . . . , am)AND−−−→ b

be a division trail of AND. The following inequalities are sufficient to describethe propagation of the division property for and.

M.var ← a1, a2, . . . , am, b as binary.M.con← b ≥ ai for all i ∈ 1, 2, . . . ,m

Note: Proposition 3 includes redundant propagations of the division property,but they do not affect preciseness of the obtained characteristics [12].

2.4 The Bit-Based Division Property for Cube Attack

When the number of initialization rounds is not large enough for a thoroughdiffusion, the superpoly p(x,v) defined in Eq. (2) may not be related to all keybits x1, . . . , xn corresponding to some high-dimensional cube I. Instead, there isa set of key indices J ⊆ 1, . . . , n s.t. for arbitrary v ∈ Fm2 , p(x,v) can only berelated to xj ’s (j ∈ J). In CRYPTO 2017, Todo et al. proposed a method fordetermining such a set J using the bit-based division property [12]. They furthershowed that, with the knowledge of such J , cube attacks can be launched torecover some information related to the secret key bits. More specifically, theyproved the following Lemma 1 and Proposition 4.

IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS 159

Lemma 1. Let f(x) be a polynomial from Fn2 to F2 and afu ∈ F2 (u ∈ Fn2 ) bethe ANF coefficients of f(x). Let k be an n-dimensional bit vector. Assumingthere is no division trail such that k f−→ 1, then afu is always 0 for u k.

Proposition 4. Let f(x,v) be a polynomial, where x and v denote the secretand public variables, respectively. For a set of indices I = i1, i2, . . . , i|I| ⊂1, 2, . . . ,m, let CI be a set of 2|I| values where the variables in vi1 , vi2 , . . . , vi|I|are taking all possible combinations of values. Let kI be an m-dimensional bitvector such that vkI = tI = vi1vi2 · · · vi|I| , i.e. ki = 1 if i ∈ I and ki = 0 oth-

erwise. Assuming there is no division trail such that (eλ,kI)f−→ 1, xλ is not

involved in the superpoly of the cube CI .

When f represents the first output bit after the initialization iterations, wecan identify J by checking whether there is a division trial (eλ,kI)

f−→ 1 forλ = 1, . . . , n using the MILP modeling method introduced in Sect. 2.3. If thedivision trial (eλ,kI)

f−→ 1 exists, we have λ ∈ J ; otherwise, λ /∈ J .When J is determined, we know that for some assignment to the non-cube

IV ∈ Fm2 , the corresponding superpoly pIV (x) is not constant 0, and it isa polynomial of xj , j ∈ J . With the knowledge of J , we recover offline thesuperpoly pIV (x) by constructing its truth table using cube summations definedas Eq. (3). As long as pIV (x) is not constant, we can go to the online phase wherewe sum over the cube CI(IV ) to get the exact value of pIV (x) and refer to theprecomputed truth table to identify the xj , j ∈ J assignment candidates. Wesummarize the whole process as follows:

1. Offline Phase: Superpoly Recovery. Randomly pick an IV ∈ Fm2 andprepare the cube CI(IV ) defined as Eq. (2). For x ∈ Fn2 whose xj , j ∈ Jtraverse all 2|J| 0-1 combinations, we compute and store the value of thesuperpoly pIV (x) as Eq. (3). The 2|J| values compose the truth table ofpIV (x) and the ANF of the superpoly is determined accordingly. If pIV (x)is constant, we pick another IV and repeat the steps above until we find anappropriate one s.t. pv(x) is not constant.

2. Online Phase: Partial Key Recovery. Query the cube CI(IV ) to en-cryption oracle and get the summation of the 2|I| output bits. We denotedthe summation by λ ∈ F2 and we know pIV (x) = λ according to Eq. (3). Sowe look up the truth table of the superpoly and only reserve the xj , j ∈ Js.t. pIV (x) = λ.

3. Brute-Force Search. Guess the remaining secret variables to recover theentire value in secret variables.

Phase 1 dominates the time complexity since it takes 2|I|+|J| encryptions toconstruct the truth table of size 2|J|. It is also possible that pIV (x) is constantso we have to run several different IV ’s to find the one we need. The attackcan only be meaningful when (1) |I|+ |J | < n; (2) appropriate IV ’s are easy tobe found. The former requires the adversary to use “good” cube I’s with smallJ while the latter is the exact reason why Assumptions 1 and 2 are proposed[12,22].

160 IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS

3 Modeling the Constant Bits to Improve the Precisenessof the MILP Model

In the initial state of stream ciphers, there are secret key bits, public modifiableIV bits and constant 0/1 bits. In the previous MILP model, the initial bit-baseddivision properties of the cube IVs are set to 1, while those of the non-cube IVs,constant state bits or even secret key bits are all set to 0.

Obviously, when constant 0 bits are involved in multiplication operations,it always results in an constant 0 output. But, as is pointed out in [22], sucha phenomenon cannot be reflected in previous MILP model method. In theprevious MILP model, the widely used COPY+AND operation:

COPY+AND : (s1, s2)→ (s1, s2, s1 ∧ s2). (4)

can result in division trials (x1, x2)COPY+AND−−−−−−−−−→ (y1, y2, a) as follows:

(1, 0)COPY+AND−−−−−−−−−→ (0, 0, 1),

(0, 1)COPY+AND−−−−−−−−−→ (0, 0, 1).

Assuming that either s1 or s2 of Eq. (4) is a constant 0 bit, (s1 ∧ s2) is always 0.In this occasion, the division property of (s1 ∧ s2) must be 0 which is overlookedby the previous MILP model. To prohibit the propagation above, an additionalconstraintM.con← a = 0 should be added when either s1 or s2 is constant 0.

In [22], the authors only consider the constant 0 bits. They thought the modelcan be precise enough when all the state bits initialized to constant 0 bits arehandled. But in fact, although constant 1 bits do not affect the division propertypropagation, we should still be aware because 0 bits might be generated wheneven number of constant 1 bits are XORed during the updating process. This isshown in Example 2 for Kreyvium in App. A [25].

Therefore, for all variables in the MILP v ∈ M.var, we give them an ad-ditional flag v.F ∈ 1c, 0c, δ where 1c means the bit is constant 1, 0c meansconstant 0 and δ means variable. Apparently, when v.F = 0c/1c, there is alwaysa constraint v = 0 ∈M.con. We define =, ⊕ and × operations for the elementsof set 1c, 0c, δ. The = operation tests whether two elements are equal(naturally1c = 1c, 0c = 0c and δ = δ ). The ⊕ operation follows the rules:

1c ⊕ 1c = 0c

0c ⊕ x = x⊕ 0c = x

δ ⊕ x = x⊕ δ = δ

for arbitrary x ∈ 1c, 0c, δ (5)

The × operation follows the rules:

1c × x = x× 1c = x

0c × x = x× 0c = 0c

δ × δ = δ

for arbitrary x ∈ 1c, 0c, δ (6)

IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS 161

Therefore, in the remainder of this paper, the MILP models for COPY, XORand AND should also consider the effects of flags. So the previous copy, xor, andand should now add the assignment to flags. We denote the modified versions ascopyf, xorf, and andf and define them as Proposition 5 6 and 7 as follows.

Proposition 5 (MILPModel for COPY with Flag). Let a COPY−−−−→ (b1, b2, . . . ,bm) be a division trail of COPY. The following inequalities are sufficient to de-scribe the propagation of the division property for copyf.

M.var ← a, b1, b2, . . . , bm as binary.M.con← a = b1 + b2 + · · ·+ bm

a.F = b1.F = . . . = bm.F

We denote this process as (M, b1, . . . , bm)← copyf(M, a,m).

Algorithm 1 Evaluate secret variables by MILP with Flags1: procedure attackFramework(Cube indices I, specific assignment to non-cube IVs

IV or IV = NULL)2: Declare an empty MILP modelM3: Declare x as n MILP variables ofM corresponding to secret variables.4: Declare v as m MILP variables ofM corresponding to public variables.5: M.con← vi = 1 and assign vi.F = δ for all i ∈ I6: M.con← vi = 0 for all i ∈ (1, 2, . . . , n − I)7: M.con←∑n

i=1 xi = 1 and assign xi.F = δ for all i ∈ 1, . . . , n8: if IV = NULL then9: vi.F = δ for all i ∈ (1, 2, . . . ,m − I)10: else11: Assign the flags of vi, i ∈ (1, 2, . . . ,m − I) as:

vi.F =

1c if IV [i] = 1

0c if IV [i] = 0

12: end if13: UpdateM according to round functions and output functions14: do15: solve MILP modelM16: if M is feasible then17: pick index j ∈ 1, 2, . . . , n s.t. xj = 118: J = J ∪ j19: M.con← xj = 020: end if21: whileM is feasible22: return J23: end procedure

162 IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS

Proposition 6 (MILPModel for XOR with Flag). Let (a1, a2, . . . , am)XOR−−−→

b be a division trail of XOR. The following inequalities are sufficient to describethe propagation of the division property for xorf.

M.var ← a1, a2, . . . , am, b as binary.M.con← a1 + a2 + · · ·+ am = b

b.F = a1.F ⊕ a2.F ⊕ · · · ⊕ am.FWe denote this process as (M, b)← xorf(M, a1, . . . , am).

Proposition 7 (MILPModel for AND with Flag). Let (a1, a2, . . . , am)AND−−−→

b be a division trail of AND. The following inequalities are sufficient to describethe propagation of the division property for andf.

M.var ← a1, a2, . . . , am, b as binary.M.con← b ≥ ai for all i ∈ 1, 2, . . . ,mb.F = a1.F × a2.F × · · · am.FM.con← b = 0 if b.F = 0c

We denote this process as (M, b)← andf(M, a1, . . . , am).

With these modifications, we are able to improve the preciseness of the MILPmodel. The improved attack framework can be written as Algorithm 1. It enablesus to identify the involved keys when the non-cube IVs are set to specific constant0/1 values by imposing corresponding flags to the non-cube MILP binary vari-ables. With this method, we can determine an IV ∈ Fm2 s.t. the correspondingsuperpoly pIV (x) 6= 0.

4 Upper Bounding the Degree of the Superpoly

For an IV ∈ Fm2 s.t. pIV (x) 6= 0, the ANF of pIV (x) can be represented as

pIV (x) =∑

u∈Fn2

auxu (7)

where au is determined by the values of the non-cube IVs. If the degree ofthe superpoly is upper bounded by d, then for all u’s with Hamming weightsatisfying hw(u) > d, we constantly have au = 0. In this case, we no longer haveto build the whole truth table to recover the superpoly . Instead, we only needto determine the coefficients au for hw(u) ≤ d. Therefore, we select

∑di=0

(|J|i

)

different x’s and construct a linear system with(∑d

i=0

(|J|i

))variables and the

coefficients as well as the whole ANF of pIV (x) can be recovered by solvingsuch a linear system. So the complexity of Phase 1 can be reduced from 2|I|+|J|

to 2|I| ×∑di=0

(|J|i

). For the simplicity of notations, we denote the summation∑d

i=0

(|J|i

)as(|J|≤d)in the remainder of this paper. With the knowledge of the

involved key indices J = j1, j2, . . . , j|J| and the degree of the superpoly d =deg pIV (x), the attack procedure can be adapted as follows:

IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS 163

1. Offline Phase: Superpoly Recovery. For all(|J|≤d)x’s satisfying hw(x) ≤

d and⊕

j∈J ej x, compute the values of the superpolys as pIV (x) bysumming over the cube CI(IV ) as Eq. (3) and generate a linear system ofthe

(|J|≤d)coefficients au (hw(u) ≤ d). Solve the linear system, determine the

coefficient au of the(|J|≤d)terms and store them in a lookup table T . The

ANF of the pIV (x) can be determined with the lookup table.2. Online Phase: Partial Key Recovery. Query the encryption oracle and

sum over the cube CI(IV ) as Eq. (3) and acquire the exact value of pIV (x).For each of the 2|J| possible values of xj1 , . . . , xj|J|, compute the values ofthe superpoly as Eq. (7) (the coefficient au are acquired by looking up theprecomputed table T ) and identify the correct key candidates.

3. Brute-force search phase. Attackers guess the remaining secret variablesto recover the entire value in secret variables.

The complexity of Phase 1 becomes 2|I| ×(|J|≤d). Phase 2 now requires 2|I| en-

cryptions and 2|J| ×(|J|≤d)table lookups, so the complexity can be regarded as

2|I|+2|J|×(|J|≤d). The complexity of Phase 3 remains 2n−1. Therefore, the number

of encryptions a feasible attack requires is

max

2|I| ×

( |J |≤ d

), 2|I| + 2|J| ×

( |J |≤ d

)< 2n. (8)

The previous limitation of |I|+ |J | < n is removed.The knowledge of the algebraic degree of superpolys can largely benefit the

efficiency of the cube attack. Therefore, we show how to estimate the algebraicdegree of superpolys using the division property. Before the introduction of themethod, we generalize Proposition 4 as follows.

Proposition 8. Let f(x,v) be a polynomial, where x and v denote the secretand public variables, respectively. For a set of indices I = i1, i2, . . . , i|I| ⊂1, 2, . . . ,m, let CI be a set of 2|I| values where the variables in vi1 , vi2 , . . . , vi|I|are taking all possible combinations of values. Let kI be an m-dimensional bitvector such that vkI = tI = vi1vi2 · · · vi|I| . Let kΛ be an n-dimensional bit vector.

Assuming there is no division trail such that (kΛ||kI) f−→ 1, the monomial xkΛ

is not involved in the superpoly of the cube CI .

Proof. The ANF of f(x,v) is represented as follows

f(x,v) =⊕

u∈Fn+m2

afu · (x‖v)u,

164 IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS

where afu ∈ F2 denotes the ANF coefficients. The polynomial f(x,v) is decom-posed into

f(x,v) =⊕

u∈Fn+m2 |u(0‖kI)

afu · (x‖v)u ⊕⊕

u∈Fn+m2 |u6(0‖kI)

afu · (x‖v)u

= tI ·⊕

u∈Fn+m2 |u(0‖kI)

afu · (x‖v)u⊕(0‖kI) ⊕⊕

u∈Fn+m2 |u6(0‖kI)

afu · (x‖v)(0‖u)

= tI · p(x,v)⊕ q(x,v).

Therefore, the superpoly p(x,v) is represented as

p(x,v) =⊕

u∈Fn+m2 |u(0‖kI)

afu · (x‖v)u⊕(0‖kI).

Since there is no division trail (kΛ‖kI) f−→ 1, afu = 0 for u (kΛ‖kI) becauseof Lemma1. Therefore,

p(x,v) =⊕

u∈Fn+m2 |u(0‖kI),ukΛ‖0=0

afu · (x‖v)u⊕(0‖kI).

This superpoly is independent of the monomial xkΛ since ukΛ‖0 is always 0. ut

Algorithm 2 Evaluate upper bound of algebraic degree on the superpoly1: procedure DegEval(Cube indices I, specific assignment to non-cube IVs IV or

IV = NULL)2: Declare an empty MILP modelM.3: Declare x be n MILP variables ofM corresponding to secret variables.4: Declare v be m MILP variables ofM corresponding to public variables.5: M.con← vi = 1 and assign the flags vi.F = δ for all i ∈ I6: M.con← vi = 0 for i ∈ (1, . . . , n − I)7: if IV = NULL then8: Assign the flags vi.F = δ for i ∈ (1, . . . , n − I)9: else10: Assign the flags of vi, i ∈ (1, 2, . . . , n − I) as:

vi.F =

1c if IV [i] = 1

0c if IV [i] = 0

11: end if12: Set the objective functionM.obj ←∑n

i=1 xi13: UpdateM according to round functions and output functions14: Solve MILP modelM15: return The solution ofM.16: end procedure

IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS 165

According to Proposition 8, the existence of the division trial (kΛ||kI) f−→ 1is in accordance with the existence of the monomial xkΛ in the superpoly of thecube CI .

If there is d ≥ 0 s.t. for all kΛ of hamming weight hw(kΛ) > d, the divisiontrail xkΛ does not exist, then we know that the algebraic degree of the superpolyis bounded by d. Using MILP, this d can be naturally modeled as the maximumof the objective function

∑nj=1 xj . With the MILP model M and the cube in-

dices I, we can bound the degree of the superpoly using Algorithm 2. Samewith Algorithm 1, we can also consider the degree of the superpoly for specificassignment to the non-cube IVs. So we also add the input IV that can either bea specific assignment or a NULL referring to arbitrary assignment. The solutionM.obj = d is the upper bound of the superpoly’s algebraic degree. Furthermore,corresponding to M.obj = d and according to the definition of M.obj, thereshould also be a set of indices l1, . . . , ld s.t. the variables representing the ini-tially declared x (representing the division property of the key bits) satisfy theconstraints xl1 = . . . = xld = 1. We can also enumerate all t-degree (1 ≤ t ≤ d)monomials involved in the superpoly using a similar technique which we willdetail later in Sect. 6.

5 Applications of Flag Technique and Degree Evaluation

We apply our method to 4 NLFSR-based ciphers namely Trivium, Kreyvium,Grain-128a and Acorn. Among them, Trivium, Grain-128a and Acorn arealso targets of [12]. Using our new techniques, we can both lower the complexitiesof previous attacks and give new cubes that mount to more rounds. We givedetails of the application to Trivium in this section, and the applications toKreyvium, Grain-128a and Acorn in our full version [25].

5.1 Specification of Trivium

Trivium is an NLFSR-based stream cipher, and the internal state is repre-sented by 288-bit state (s1, s2, . . . , s288). Fig. 1 shows the state update functionof Trivium. The 80-bit key is loaded to the first register, and the 80-bit IV isloaded to the second register. The other state bits are set to 0 except the leastthree bits in the third register. Namely, the initial state bits are represented as

(s1, s2, . . . , s93) = (K1,K2, . . . ,K80, 0, . . . , 0),

(s94, s95, . . . , s177) = (IV1, IV2, . . . , IV80, 0, . . . , 0),

(s178, s279, . . . , s288) = (0, 0, . . . , 0, 1, 1, 1).

The pseudo code of the update function is given as follows.

t1 ← s66 ⊕ s93t2 ← s162 ⊕ s177t3 ← s243 ⊕ s288z ← t1 ⊕ t2 ⊕ t3

166 IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS

zi

Fig. 1. Structure of Trivium

t1 ← t1 ⊕ s91 · s92 ⊕ s171t2 ← t2 ⊕ s175 · s176 ⊕ s264t3 ← t3 ⊕ s286 · s287 ⊕ s69(s1, s2, . . . , s93)← (t3, s1, . . . , s92)

(s94, s95, . . . , s177)← (t1, s94, . . . , s176)

(s178, s279, . . . , s288)← (t2, s178, . . . , s287)

Here z denotes the 1-bit key stream. First, in the key initialization, the stateis updated 4 × 288 = 1152 times without producing an output. After the keyinitialization, one bit key stream is produced by every update function.

5.2 MILP Model of Trivium

The only non-linear component of Trivium is a 2-degree core function denotedas fcore that takes as input a 288-bit state s and 5 indices i1, . . . , i5, and outputsa new 288-bit state s′ ← fcore(s, i1, . . . , i5) where

s′i =

si1si2 + si3 + si4 + si5 , i = i5

si, otherwise(9)

The division property propagation for the core function can be represented asAlgorithm 3. The input of Algorithm 3 consists ofM as the current MILP model,a vector of 288 binary variables x describing the current division property of the288-bit NFSR state, and 5 indices i1, i2, i3, i4, i5 corresponding to the input bits.Then Algorithm 3 outputs the updated model M, and a 288-entry vector ydescribing the division property after fcore.

With the definition of Core, the MILP model of R-round Trivium can bedescribed as Algorithm 4. This algorithm is a subroutine of Algorithm 1 forgenerating the MILP modelM, and the modelM can evaluate all division trails

IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS 167

Algorithm 3 MILP model of division property for the core function (Eq. (9))1: procedure Core(M,x, i1, i2, i3, i4, i5)2: (M, yi1 , z1)← copyf(M, xi1)3: (M, yi2 , z2)← copyf(M, xi2)4: (M, yi3 , z3)← copyf(M, xi3)5: (M, yi4 , z4)← copyf(M, xi4)6: (M, a)← andf(M, z1, z2)7: (M, yi5)← xorf(M, a, z2, z3, z4, xi5)8: for all i ∈ 1, 2, . . . , 288 w/o i1, i2, i3, i4, i5 do9: yi = xi10: end for11: return (M,y)12: end procedure

for Trivium whose initialization rounds are reduced to R. Note that constraintsto the input division property are imposed by Algorithm1.

5.3 Experimental Verification

Identical to [12], we use the cube I = 1, 11, 21, 31, 41, 51, 61, 71 to verify ourattack and implementation. The experimental verification includes: the degreeevaluation using Algorithm 2, specifying involved key bits using Algorithm 1with IV = NULL or specific non-cube IV settings.

Example 1 (Verification of Our Attack against 591-round Trivium). With IV =NULL using Algorithm 1, we are able to identify J = 23, 24, 25, 66, 67. Weknow that with some assignment to the non-cube IV bits, the superpoly can bea polynomial of secret key bits x23, x24, x25, x66, x67. These are the same with[12]. Then, we set IV to random values and acquire the degree through Algo-rithm 2, and verify the correctness of the degree by practically recovering thecorresponding superpoly.

– When we set IV = 0xcc2e487b, 0x78f99a93, 0xbeae, and run Algorithm 2,we get the degree 3. The practically recovered superpoly is also of degree 3:

pv(x) = x66x23x24 + x66x25 + x66x67 + x66,

which is in accordance with the deduction by Algorithm2 through MILPmodel.

– When we set IV = 0x61fbe5da, 0x19f5972c, 0x65c1, the degree evaluationof Algorithm 2 is 2. The practically recovered superpoly is also of degree 2:

pv(x) = x23x24 + x25 + x67 + 1.

– When we set IV = 0x5b942db1, 0x83ce1016, 0x6ce, the degree is 0 and thesuperpoly recovered is also constant 0.

168 IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS

Algorithm 4 MILP model of division property for Trivium1: procedure TriviumEval(round R)2: Prepare empty MILP ModelM3: M.var ← vi for i ∈ 1, 2, . . . , 80. . Declare Public Modifiable IVs4: M.var ← xi for i ∈ 1, 2, . . . , 80. . Declare Secret Keys5: M.var ← s0i for i ∈ 1, 2, . . . , 2886: s0i = xi, s0i+93 = vi for i = 1, . . . , 80.7: M.con← s0i = 0 for i = 81, . . . , 93, 174, . . . , 288.8: s0i .F = 0c for i = 81, . . . , 285 and s0j .F = 1c for j = 286, 287, 288. . Assign the

flags for constant state bits9: for r = 1 to R do10: (M,x) = Core(M, sr−1, 66, 171, 91, 92, 93)11: (M,y) = Core(M,x, 162, 264, 175, 176, 177)12: (M,z) = Core(M,y, 243, 69, 286, 287, 288)13: sr = z ≫ 114: end for15: for all i ∈ 1, 2, . . . , 288 w/o 66, 93, 162, 177, 243, 288 do16: M.con← sRi = 017: end for18: M.con← (sR66 + sR93 + sR162 + sR177 + sR243 + sR288) = 119: returnM20: end procedure

On the accuracy of MILP model with flag technique. As a comparison,we use the cube above and conduct practical experiments on different roundsnamely 576, 577, 587, 590, 591 (selected from Table 2 of [22]). We try 10000randomly chosen IV ’s. For each of them, we use the MILP method to evaluatethe degree d, in comparison with the practically recovered ANF of the superpolypIV (x). For 576, 577, 587 and 590 rounds, the accuracy is 100%. In fact, such100% accuracy is testified for most of our applied ciphers, which is shown in [25].For 591-round, the accuracies are distributed as:

1. When the MILP model gives degree evaluation d = 0, the accuracy is 100%that the superpoly is constant 0.

2. When the MILP model gives degree evaluation d = 3, there is an accuracy49% that the superpoly is a 3-degree polynomial. For the rest, the superpolyis constant 0.

3. When the MILP model gives degree evaluation d = 2, there is accuracy 43%that the superpoly is a 2-degree polynomial. For the rest, the superpoly isconstant 0.

The ratios of error can easily be understood: for example, in some case, one keybit may multiply with constant 1 in one step xi · 1 and be canceled by XORingwith itself in the next round, this results in a newly generated constant 0 bit((xi · 1) ⊕ xi = 0). However, by the flag technique, this newly generated bithas flag value δ = (δ × 1c) + δ. In our attacks, the size of cubes tends to belarge, which means most of the IV bits become active, the above situation of

IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS 169

(xi · 1)⊕ xi = 0 will now become (xi · vj)⊕ xi. Therefore, when larger cubes areused, fewer constant 0/1 flags are employed, and the MILP models are becomingcloser to those of IV = NULL. It is predictable that the accuracy of the flagtechnique tends to increase when larger cubes are used. To verify this statement,we construct a 10-dimensional cube I = 5, 13, 18, 22, 30, 57, 60, 65, 72, 79 for591-round Trivium. When IV = NULL, we acquire the same upper bound ofthe degree d = 3. Then, we tried thousands of random IVs, and get an overallaccuracy 80.9%. From above, we can conclude that the flag technique has highpreciseness and can definitely improve the efficiency of the division propertybased cube attacks.

5.4 Theoretical Results

The best result in [12] mounts to 832-round Trivium with cube dimension |I| =72 and the superpoly involves |J | = 5 key bits. The complexity is 277 in [12].Using Algorithm 2, we further acquire that the degree of such a superpoly is 3.So the complexity for superpoly recovery is 272×

(5≤3)= 276.7 and the complexity

for recovering the partial key is 272 + 23 ×(53

). Therefore, according to Eq. (8),

the complexity of this attack is 276.7.We further construct a 77-dimensional cube, I = 1, . . . , 80 \ 5, 51, 65. Its

superpoly after 835 rounds of initialization only involves 1 key bit J = 57. Sothe complexity of the attack is 278. Since there are only 3 non-cube IVs, we letIV be all 23 possible non-cube IV assignments and run Algorithm 1. We findthat x57 is involved in all of the 23 superpolys. So the attack is available for anyof the 23 non-cube IV assignments. This can also be regarded as a support tothe rationality of Assumption 1.

According previous results, Trivium has many cubes whose superpolys onlycontain 1 key bit. These cubes are of great value for our key recovery attacks.Firstly, the truth table of such superpoly is balanced and the Partial Key Re-covery phase can definitely recover 1 bit of secret information. Secondly, theSuperpoly Recovery phase only requires 2|I|+1 and the online Partial Key Re-covery only requires 2|I| encryptions. Such an attack can be meaningful as longas |I|+ 1 < 80, so we can try cubes having dimension as large as 78. Therefore,we investigate 78-dimensional cubes and find the best cube attack on Triviumis 839 rounds. By running Algorithm 1 with 22 = 4 different assignments tonon-cube IVs, we know that the key bit x61 is involved in the superpoly forIV = 0x0, 0x4000, 0x0 or IV = 0x0, 0x4002, 0x0. In other words, the 47-thIV bit must be assigned to constant 1. The summary of our new results aboutTrivium is in Table 2.

6 Lower Complexity with Term Enumeration

In this section, we show how to further lower the complexity of recovering thesuperpoly (Phase 1) in Sect. 4.

With cube indices I, key bits J and degree d, the complexity of the currentsuperpoly recovery is 2I ×

(|J|≤d), where

(|J|≤d)corresponds to all 0-, 1-. . ., d-degree

170 IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS

Table 2. Summary of theoretical cube attacks on Trivium. The time complexity inthis table shows the time complexity of Superpoly Recovery (Phase 1) and Partial KeyRecovery (Phase 2).

#Rounds |I| Degree Involved keys J Time complexity

832 72† 3 34, 58, 59, 60, 61 (|J | = 5) 276.7

833 73‡ 3 49, 58, 60, 74, 75, 76 (|J | = 7) 279

833 74∗ 1 60 (|J | = 1) 275

835 77? 1 57 (|J | = 1) 278

836 78 1 57 (|J | = 1) 279

839 78• 1 61 (|J | = 1) 279

†: I = 1, 2, ..., 65, 67,69, ..., 79‡: I = 1,2, ..., 67, 69,71, ..., 79∗: I = 1,2, ..., 69, 71, 73, ..., 79?: I = 1, 2, 3, 4, 6, 7, . . . , 50, 52, 53,. . . ,64, 66, 67, . . . , 80 : I = 1, ..., 11, 13, ..., 42, 44, ..., 80 •: I = 1, ..., 33, 35, ..., 46, 48, ..., 80 and IV [47] = 1

monomials. When d ≤ |J |/2 (which is true in most of our applications), weconstantly have

(|J|0

)≤ . . . ≤

(|J|d

). But in practice, high-degree terms are gen-

erated in later iterations and the high-degree monomials should be fewer thantheir low-degree counterparts. Therefore, for all

(|J|i

)monomials, only very few

of them may appear in the superpoly. Similar to Algorithm 1 that decides allkey bits appear in the superpoly, we propose Algorithm 5 that enumerates allt-degree monomials that may appear in the superpoly. Apparently, when weuse t = 1, we can get J1 = J , the same output as Algorithm 1 containingall involved keys. If we use t = 2, 3, . . . , d, we get J2, . . . , Jd that contains allpossible monomials of degrees 2, 3, . . . , d. Therefore, we only need to determine1 + |J1|+ |J2|+ . . .+ |Jd| coefficients in order to recover the superpoly and ap-parently, |Jt| ≤

(|J|t

)for t = 1, . . . d. With the knowledge of Jt, t = 1, . . . , d, the

complexity for Superpoly Recovery (Phase 1) has now become

2|I| × (1 +

d∑

t=1

|Jt|) ≤ 2|I| ×( |J |≤ d

). (10)

And the size of the lookup table has also reduced to (1 +∑dt=1 |Jt|). So the

complexity of the attack is now

max2|I| × (1 +d∑

t=1

|Jt|), 2|I| + 2|J| × (1 +d∑

t=1

|Jt|). (11)

IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS 171

Furthermore, since high-degree monomials are harder to be generated throughiterations than low-degree ones, we can often find |Ji| <

(|J|i

)when i approaches

d. So the complexity for superpoly recovery has been reduced.Note: Jt’s (t = 1, . . . , d) can be generated by TermEnum of Algorithm 5 andthey satisfy the following Property 1. This property is equivalent to the “EmbedProperty” given in [19].

Property 1. For t = 2, . . . , d, if there us T = (i1, i2, . . . , it) ∈ Jt and T ′ =(isi , . . . , isl) (l < t) is a subsequence of T (1 ≤ s1 < . . . < sl ≤ t). Then, weconstantly have T ′ ∈ JI .Before proving Property 1, we first prove the following Lemma 2.

Lemma 2. If k k′ and there is division trial k f−→ l, then there is also divisiontrial k′ f−→ l′ s.t. l l′.

Proof. Since f is a combination of COPY, AND and XOR operations, and theproofs when f equals to each of them are similar, we only give a proof of thecase when f equals to COPY. Let f : (∗, . . . , ∗, x) COPY−−−−→ (∗, . . . , ∗, x, x).

First assume the input division property be k = (k1, 0), since k k′, theremust be k′ = (k′1, 0) and k1 k′1. We have l = k, l′ = k′, thus the propertyholds.

When the input division property is k = (k1, 1), we know that the outputdivision property can be l ∈ (k1, 0, 1), (k1, 1, 0). Since k k′, we know k′ =(k′1, 1) or k′ = (k′1, 0), and k1 k′1. When k′ = (k′1, 0), then l′ = k′ = (k′1, 0),the relation holds. When k′ = (k′1, 1), we know l′ ∈ (k′1, 0, 1), (k′1, 1, 0), therelation still holds. ut

Now we are ready to prove Property 1.

Proof. Let k,k ∈ Fn2 satisfy ki = 1 for i ∈ T and ki = 0 otherwise; k′i = 1for i ∈ T ′ and k′i = 0 otherwise. Since T ∈ Jt, we know that there is divisiontrial (k,kI)

R−Rounds−−−−−−−→ (0, 1) Since k k′, we have (k,kI) (k′,kI) andaccording to Lemma 2, there is division trial s.t. (k′,kI)

R−Rounds−−−−−−−→ (0m+n, s)where (0m+n, 1) (0m+n, s). Since the hamming weight of (k′,kI) is larger than0 and there is no combination of COPY, AND and XOR that makes non-zerodivision property to all-zero division property. So we have s = 1 and there existdivision trial (k′,kI)

R−Rounds−−−−−−−→ (0, 1). utProperty 1 reveals a limitation of Algorithm 5. Assume the superpoly is

pv(x1, x2, x3, x4) = x1x2x3 + x1x4.

We can acquire J3 = (1, 2, 3) by running TermEnum of Algorithm 5. But,if we run TermEnum with t = 2, we will not acquire just J2 = (1, 4) butJ2 = (1, 4), (1, 2), (1, 3), (2, 3) due to (1, 2, 3) ∈ J3 and (1, 2), (1, 3), (2, 3) areits subsequences. Although there are still redundant terms, the reduction from(|J|d

)to |Jd| is usually huge enough to improve the existing cube attack results.

172 IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS

Algorithm 5 Enumerate all the terms of degree t

1: procedure TermEnum(Cube indices I,specific assignment to non-cube IVsIV or IV = NULL, targeted degree t)

2: Declare an empty MILP modelMand an empty set Jt = φ ⊆ 1, . . . , nn

3: Declare x as n MILP variables ofM corresponding to secret variables.

4: Declare v as m MILP variables ofM corresponding to public variables.

5: M.con← vi = 1 and assign vi.F =δ for all i ∈ I

6: M.con ← vi = 0 for all i ∈(1, 2, . . . , n − I)

7: M.con ← ∑ni=1 xi = t and assign

xi.F = δ for all i ∈ 1, . . . , n8: if IV = NULL then9: vi.F = δ for all i ∈

(1, 2, . . . , n − I)10: else11: Assign the flags of vi, i ∈

(1, 2, . . . , n − I) as:

vi.F =

1c if IV [i] = 1

0c if IV [i] = 0

12: end if13: Update M according to round

functions and output functions14: do15: solve MILP modelM16: if M is feasible then17: pick index sequence

(j1, . . . , jt) ⊆ 1, . . . , nt s.t.xj1 = . . . = xjt = 1

18: Jt = Jt ∪ (j1, . . . , jt)19: M.con←∑t

i=1 xji ≤ t− 120: end if21: whileM is feasible22: return Jt23: end procedure

1: procedure RTermEnum(Cube indicesI, specific assignment to non-cube IVsIV or IV = NULL, targeted degree t)

2: Declare an empty MILP modelM and an empty set JRt = φ ⊆1, . . . , n

3: Declare x as n MILP variables ofM corresponding to secret variables.

4: Declare v as m MILP variables ofM corresponding to public variables.

5: M.con← vi = 1 and assign vi.F =δ for all i ∈ I

6: M.con ← vi = 0 for all i ∈(1, 2, . . . , n − I)

7: M.con ← ∑ni=1 xi ≥ t and assign

xi.F = δ for all i ∈ 1, . . . , n8: if IV = NULL then9: vi.F = δ for all i ∈

(1, 2, . . . , n − I)10: else11: Assign the flags of vi, i ∈

(1, 2, . . . , n − I) as:

vi.F =

1c if IV [i] = 1

0c if IV [i] = 0

12: end if13: Update M according to round

functions and output functions14: do15: solve MILP modelM16: if M is feasible then17: pick index setj1, . . . , jt′ ⊆ 1, . . . , n s.t. t′ ≥ tand xj1 = . . . = xjt′ = 1

18: JRt = JRt ∪ j1, . . . , jt′19: M.con←∑

i/∈JRtxi ≥ 1

20: end if21: whileM is feasible22: return JRt

23: end procedure

IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS 173

Applying such term enumeration technique, we are able to lower complex-ities of many existing attacks namely: 832-, 833-round Trivium, 849-roundKreyvium, 184-round Grain-128a and 704-round Acorn. The attack on 750-round Acorn can also be improved using a relaxed version of TermEnum whichis presented as RTermEnum on the righthand side of Algorithm 5. In the relaxedalgorithm, RTermEnum is acquired from TermEnum by replacing some states whichare marked in red in Algorithm 5, and we state details later in Sect. 6.4.

6.1 Application to Trivium

As can be seen in Table 2, the attack on 832-round Trivium has J = J1 = 5 anddegree d = 3, so we have

(5≤3)= 26 using previous technique. But by running

Algorithm 5, we find that |J2| = 5, |J3| = 1, so we have 1 +∑3t=1 |Jt| = 12 <(

5≤3)= 26. Therefore, the complexity has now been reduced from 276.7 to 275.8.

Similar technique can also be applied to the 73 dimensional cube of Table 2.Details are shown in Table 3.

Table 3. Results of Trivium with Precise Term Enumeration

#Rounds |I| |J1| |J2| |J3| |J4| |J5| |Jt|, t ≥ 6 1 +∑d

t=1 |Jt| Previous Improved

832 72 5 5 1 0 0 0 12≈ 23.58 276.7 275.58

833 73 7 6 1 0 0 0 15≈ 23.91 279 276.91

6.2 Applications to Kreyvium

We revisit the 61-dimensional cube first given in [23] and transformed to a keyrecovery attack on 849-round Kreyvium in [22]. The degree of the superpoly is9, so the complexity is given as 281.7 in Appex. A of [25]. Since J = J1 is ofsize 23, we enumerate all the terms of degree 2-9 and acquire the sets J2, . . . , J9.1 +

∑dt=1 |Jt| = 5452 ≈ 212.41. So the complexity is now lowered to 273.41. The

details are listed in Table 4.

Table 4. Results of Kreyvium with Precise Term Enumeration

#Rounds |I| |J1| |J2| |J3| |J4| |J5| |J6| |J7| |J8| |J9| 1 +∑d

t=1 |Jt| Previous Improved

849 61 23 158 555 1162 1518 1235 618 156 26 5452≈ 212.41 281.7 273.41

6.3 Applications to Grain-128a

For the attack on 184-round Grain-128a, the superpoly has degree d = 14, thenumber of involved key bits is |J | = |J1| = 21 and we are able to enumerate allterms of degree 1 ∼ 14 as Table 5.

174 IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS

Table 5. Results of Grain-128a with Term Enumeration

#Rounds |I| |J1| |Ji| (2 ≤ i ≤ 14) 1 +∑d

t=1 |Jt| Previous Improved

184 95 21 157, 651, 1765, 3394, 4838, 5231, 214.61 2115.95 2109.61

4326, 2627, 1288, 442, 104, 15, 1

6.4 Applications to Acorn

For the attack on 704-round Acorn, with the cube dimension 64, the numberof involved key bits in the superpoly is 72, and the degree is 7. We enumerate allthe terms of degree from 2 to 7 as in Table 6, therefore we manage to improvethe complexity of our cube attack in the previous section.

Table 6. Results of Acorn with Precise Term Enumeration

#Rounds |I| |J1| |J2| |J3| |J4| |J5| |J6| |J7| 1 +∑d

t=1 |Jt| Previous Improved

704 64 72 1598 4911 5755 2556 179 3 213.88 293.23 277.88

Relaxed Algorithm 5. For the attack on 750-round Acorn (the superpolyis of degree d = 5), The left part of Algorithm 5 can only be carried out forthe 5-degree terms |J5| = 46. For t = 2, 3, 4, the sizes of Jt are too large tobe enumerated. We settle for the index set JRt containing the key indices thatcomposing all the t-degree terms. For example, when J3 = (1, 2, 3), (1, 2, 4),we have JR3 = 1, 2, 3, 4. The relationship between Jt and JRt is |Jt| ≤

(|JRt|t

)

and J1 = JR1. The searching space for Jt in Algorithm 5 is(|J1|t

)while that

of the relaxed algorithm is only(|JRt|

t

). So it is much easier to enumerate JRt,

therefore the complexity can still be improved (in comparison with Eq. (8)) aslong as |JRt| < |J1|. The complexity of this relaxed version can be written as

max2|I| × (1 +d−1∑

t=1

(|JRt|t

)+ Jd), 2

|I| + 2|J| × (1 +d−1∑

t=1

(|JRt|t

)+ Jd) (12)

For 750-round Acorn, we enumerate J5 and JR1, . . . , JR4 whose sizes are listedin Table 7. The improved complexity, according to Eq. (12), is 2120.92, lower thanthe original 2125.71 given in App. A in [25].

Table 7. Results of Acorn with Relaxed Term Enumeration

#Rounds |I| |JR1| |JR2| |JR3| |JR4| |J5| 1 +∑d−1

t=1

(|JRt|t

)+ |Jd| Previous Improved

750 101 81 81 77 70 46 219.92 2125.71 2120.92

IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS 175

7 A Clique View of the Superpoly Recovery

The precise & relaxed term enumeration technique introduced in Sect. 6 have toexecute many MILP instances, which is difficult for some applications. In thissection, we represent the resultant superpoly as a graph, which is called superpolygraph, so that we can utilize the clique concept from the graph theory to upperbound the complexity of the superpoly recovery phase in our attacks, withoutrequiring MILP solver as highly as the term enumeration technique.Definition 3 (Clique[33]). In a graph G = (V,E), where V is the set of ver-tices and E is the set of edges, a subset C ⊆ V , s.t. each pair of vertices in C isconnected by an edge is called a clique.A i-clique is defined as a clique consists of i vertices, and i is called the cliquenumber. A 1-clique is a vertex, a 2-clique is just an edge, and a 3-clique is calleda triangle.

Given a cube CI , by running Algorithm 5 for degree i, we determine Ji,which is the set of all the degree-i terms that might appear in the superpolyp(x,v) (see Sect. 6). Then we represent p(x,v) as a graph G = (J1, J2), wherethe vertices in J1 correspond to the involved secret key bits in p(x,v), the edgesbetween any pairs of the vertices reveal the quadratic terms involved in p(x,v),We call the graph G = (J1, J2) the superpoly graph of the cube CI . The set ofi-cliques in the superpoly graph is denoted as Ki. Note that there is a naturalone-to-one correspondence between the sets Ji and Ki for i = 1, 2.

It follows from the definition of a clique that any i-clique in Ki (i ≥ 2)represents a monomial of degree i whose all divisors of degree 2 belong to J2. Onthe other hand, due to the “embed” Property 1 in Sect. 6, we have that all itsquadratic divisors must be in J2. Then any monomial in Ji can be representedby an i-clique in Ki. Hence for all i ≥ 2, Ji corresponds to a subset of Ki. Denotethe number of i-cliques as |Ki|, then |Ji| ≤ |Ki|. Apparently, |Ki| ≤

(|J|i

)for all

1 ≤ i ≤ d.Now we show a simple algorithm for constructing Ki from J1 and J2 for i ≥ 3.

For instance, when constructing K3, we take the union operation of all possiblecombinations of three elements from J2, and only keep the elements of degree 3.Similarly, we construct Ki for 3 < i ≤ d, where d is the degree of the superpoly.Therefore, all the i-cliques (3 ≤ i ≤ d) are found by the simple algorithm, i.e.the number of i-cliques |Ki| in G(J1, J2) is determined. We therefore can upperbound the complexity of the offline phase as

2|I| × (1 +d∑

i=1

|Ki|). (13)

Note that we have |Ji| ≤ |Ki| ≤(|J1|i

). It indicates that the upper bound

of the superpoly recovery given by clique theory in Eq. (13) is better than theone provided by our degree evaluation in Eq. (8), while it is weaker than the onepresented by our term enumeration techniques in Eq. (10). However, it is unclearif there exists a specific relation between |Ki| and

(|JRi|i

)in the relaxed terms

enumeration technique.

176 IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS

Advantage over the terms enumeration techniques. In Sect. 6 when cal-culating Ji (i ≥ 3) by Algorithm 5, we set the target degree as i and solvethe newly generated MILP to obtain Ji, regardless of the knowledge of Ji−1 wealready hold. On the other hand, as is known in some cases, the MILP solvermight take long time before providing Ji as desired. However, by using cliquetheory, we first acquire J1 and J2, which are essential for the term enumerationmethod as well. According to the “embed” property, we then make full use of theknowledge of J1 and J2, to construct Ki for i ≥ 3 by an algorithm which is ac-tually just performing simple operations (like union operations among elements,or removal of repeated elements, etc) in sets. So hardly any cost is required tofind all the Ki (3 ≤ i ≤ d) we want. This significantly saves the computationcosts since solving MILP is usually very time-consuming.

8 Conclusion

Algebraic properties of the resultant superpoly of the cube attacks were furtherstudied. We developed a division property based framework of cube attacksenhanced by the flag technique for identifying proper non-cube IV assignments.The relevance of our framework is three-fold: For the first time, it can identifyproper non-cube IV assignments of a cube leading to a non-constant superpoly,rather than randomizing trails & summations in the offline phase. Moreover,our model derived the upper bound of the superpoly degree, which can breakthe |I|+ |J | < n barrier and enable us to explore even larger cubes or mount toattacks on more rounds. Furthermore, our accurate term enumeration techniquesfurther reduced the complexities of the superpoly recovery, which brought us thecurrent best key recovery attacks on ciphers namely Trivium, Kreyvium, Grain-128a and Acorn.

Besides, when term enumeration cannot be carried out, we represent theresultant superpoly as a graph. By constructing all the cliques of our super-poly graph, an upper bound of the complexity of the superpoly recovery can beobtained.

Acknowledgements. We would like to thank Christian Rechberger, ElmarTischhauser, Lorenzo Grassi and Liang Zhong for their fruitful discussions, andthe anonymous reviewers for their valuable comments. This work is supported byUniversity of Luxembourg project - FDISC, National Key Research and Develop-ment Program of China (Grant No. 2018YFA0306404), National Natural ScienceFoundation of China (No. 61472250, No. 61672347), Program of Shanghai Aca-demic/Technology Research Leader (No. 16XD1401300), the Research CouncilKU Leuven: C16/15/058, OT/13/071, the Flemish Government through FWOprojects and by European Union’s Horizon 2020 research and innovation pro-gramme under grant agreement No H2020-MSCA-ITN-2014-643161 ECRYPT-NET.

IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS 177

References

1. Dinur, I., Shamir, A.: Cube attacks on tweakable black box polynomials. In Joux,A., ed.: EUROCRYPT 2009. Volume 5479 of LNCS., Springer (2009) 278–299

2. Aumasson, J., Dinur, I., Meier, W., Shamir, A.: Cube testers and key recoveryattacks on reduced-round MD6 and Trivium. In Dunkelman, O., ed.: FSE 2009.Volume 5665 of LNCS., Springer (2009) 1–22

3. Dinur, I., Shamir, A.: Breaking Grain-128 with dynamic cube attacks. In Joux,A., ed.: FSE 2011. Volume 6733 of LNCS., Springer (2011) 167–187

4. Fouque, P., Vannet, T.: Improving key recovery to 784 and 799 rounds of Triviumusing optimized cube attacks. In Moriai, S., ed.: FSE 2013. Volume 8424 of LNCS.,Springer (2013) 502–517

5. Salam, M.I., Bartlett, H., Dawson, E., Pieprzyk, J., Simpson, L., Wong, K.K.:Investigating cube attacks on the authenticated encryption stream cipher ACORN.In Batten, L., Li, G., eds.: ATIS 2016. Volume 651 of CCIS., Springer (2016) 15–26

6. Liu, M., Yang, J., Wang, W., Lin, D.: Correlation Cube Attacks: From Weak-KeyDistinguisher to Key Recovery. In Nielsen, J.B., Rijmen, V., eds.: EUROCRYPT2018 Part II. Volume 10821 of Lecture Notes in Computer Science., Springer (2018)715–744

7. Dinur, I., Morawiecki, P., Pieprzyk, J., Srebrny, M., Straus, M.: Cube attacks andcube-attack-like cryptanalysis on the round-reduced Keccak sponge function. InOswald, E., Fischlin, M., eds.: EUROCRYPT 2015 Part I. Volume 9056 of LNCS.,Springer (2015) 733–761

8. Huang, S., Wang, X., Xu, G., Wang, M., Zhao, J.: Conditional Cube Attackon Reduced-Round Keccak Sponge Function. In Coron, J., Nielsen, J.B., eds.:EUROCRYPT 2017 Part II. Volume 10211 of LNCS., Springer (2017) 259–288

9. Li, Z., Bi, W., Dong, X., Wang, X.: Improved conditional cube attacks on Keccakkeyed modes with MILP method. In Takagi, T., Peyrin, T., eds.: ASIACRYPT2017 Part I. Volume 10624 of LNCS., Springer (2017) 99–127

10. Li, Z., Dong, X., Wang, X.: Conditional cube attack on round-reduced ASCON.IACR Trans. Symmetric Cryptol. 2017(1) (2017) 175–202

11. Dong, X., Li, Z., Wang, X., Qin, L.: Cube-like attack on round-reduced initializa-tion of Ketje Sr. IACR Trans. Symmetric Cryptol. 2017(1) (2017) 259–280

12. Todo, Y., Isobe, T., Hao, Y., Meier, W.: Cube attacks on non-blackbox polynomialsbased on division property. In Katz, J., Shacham, H., eds.: CRYPTO 2017 PartIII. Volume 10403 of LNCS., Springer (2017) 250–279

13. Todo, Y.: Structural evaluation by generalized integral property. In Oswald, E.,Fischlin, M., eds.: EUROCRYPT 2015 Part I. Volume 9056 of LNCS., Springer(2015) 287–314

14. Todo, Y.: Integral cryptanalysis on full MISTY1. In Gennaro, R., Robshaw, M.,eds.: CRYPTO 2015 Part I. Volume 9215 of LNCS., Springer (2015) 413–432

15. Todo, Y., Morii, M.: Bit-based division property and application to SIMON family.In Peyrin, T., ed.: FSE 2016. Volume 9783 of LNCS., Springer (2016) 357–377

16. Xiang, Z., Zhang, W., Bao, Z., Lin, D.: Applying MILP method to searchingintegral distinguishers based on division property for 6 lightweight block ciphers.In Cheon, J.H., Takagi, T., eds.: ASIACRYPT 2016 Part I. Volume 10031 of LNCS.,Springer (2016) 648–678

17. Gu, Z., Rothberg, E., Bixby, R.: Gurobi optimizer. http://www.gurobi.com/18. Sun, L., Wang, W., Wang, M.: MILP-Aided Bit-Based Division Property for Prim-

itives with Non-Bit-Permutation Linear Layers. Cryptology ePrint Archive, Report2016/811 (2016) https://eprint.iacr.org/2016/811.

178 IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS

19. Sun, L., Wang, W., Wang, M.: Automatic search of bit-based division propertyfor ARX ciphers and word-based division property. In Takagi, T., Peyrin, T., eds.:ASIACRYPT 2017 Part I. Volume 10624 of LNCS., Springer (2017) 128–157

20. Funabiki, Y., Todo, Y., Isobe, T., Morii, M.: Improved integral attack on HIGHT.In Pieprzyk, J., Suriadi, S., eds.: ACISP 2017 Part I. Volume 10342 of LNCS.,Springer (2017) 363–383

21. Wang, Q., Grassi, L., Rechberger, C.: Zero-sum partitions of PHOTON permuta-tions. In Smart, N., ed.: CT-RSA 2018. Volume 10808 of LNCS., Springer (2018)

22. Todo, Y., Isobe, T., Hao, Y., Meier, W.: Cube attacks on non-blackbox polynomi-als based on division property (full version). Cryptology ePrint Archive, Report2017/306 (2017) https://eprint.iacr.org/2017/306.

23. Liu, M.: Degree evaluation of NFSR-based cryptosystems. In Katz, J., Shacham,H., eds.: CRYPTO 2017 Part III. Volume 10403 of LNCS., Springer (2017) 227–249

24. Fu, X., Wang, X., Dong, X., Meier, W.: A key-recovery attack on 855-roundTrivium. Cryptology ePrint Archive, Report 2018/198 (2018) https://eprint.iacr.org/2018/198.

25. Wang, Q., Hao, Y., Todo, Y., Li, C., Isobe, T., Meier, W.: Improved division prop-erty based cube attacks exploiting algebraic properties of superpoly (full version).Cryptology ePrint Archive, Report 2017/1063 (2017) https://eprint.iacr.org/2017/1063.

26. Todo, Y., Isobe, T., Meier, W., Aoki, K., Zhang, B.: Fast correlation attackrevisited–cryptanalysis on full Grain-128a, Grain-128, and Grain-v1. CRYPTO2018 (2018) (accepted).

27. Lehmann, M., Meier, W.: Conditional differential cryptanalysis of Grain-128a. InPieprzyk, J., Sadeghi, A., Manulis, M., eds.: CANS 2012. Volume 7712 of LNCS.,Springer (2012) 1–11

28. Mouha, N., Wang, Q., Gu, D., Preneel, B.: Differential and linear cryptanalysisusing mixed-integer linear programming. In Wu, C., Yung, M., Lin, D., eds.:Inscrypt 2011. Volume 7537 of LNCS., Springer (2011) 57–76

29. Sun, S., Hu, L., Wang, P., Qiao, K., Ma, X., Song, L.: Automatic security eval-uation and (related-key) differential characteristic search: Application to SIMON,PRESENT, LBlock, DES(L) and other bit-oriented block ciphers. In Sarkar, P.,Iwata, T., eds.: ASIACRYPT 2014 Part I. Volume 8873 of LNCS., Springer (2014)158–178

30. Sun, S., Hu, L., Wang, M., Wang, P., Qiao, K., Ma, X., Shi, D., Song, L., Fu,K.: Towards finding the best characteristics of some bit-oriented block ciphersand automatic enumeration of (related-key) differential and linear characteristicswith predefined properties. Cryptology ePrint Archive, Report 2014/747 (2014)https://eprint.iacr.org/2014/747.

31. Cui, T., Jia, K., Fu, K., Chen, S., Wang, M.: New automatic search tool forimpossible differentials and zero-correlation linear approximations. CryptologyePrint Archive, Report 2016/689 (2016) https://eprint.iacr.org/2016/689.

32. Sasaki, Y., Todo, Y.: New impossible differential search tool from design and crypt-analysis aspects - revealing structural properties of several ciphers. In Coron, J.,Nielsen, J.B., eds.: EUROCRYPT 2017 Part III. Volume 10212 of LNCS., Springer(2017) 185–215

33. Bondy, J.A., Murty, U.S.R.: Graph theory with applications. Volume 290. Macmil-lan London (1976)

IMPROVED DIVISION PROPERTY BASED CUBE ATTACKS 179

Chapter 9

Correlation of QuadraticBoolean Functions:Cryptanalysis of All Versionsof Full MORUS

Publication data

Danping Shi, Siwei Sun, Yu Sasaki, Chaoyun Li and Lei Hu: Correlation ofQuadratic Boolean Functions: Cryptanalysis of All Versions of Full MORUS.Advances in Cryptology-CRYPTO 2019 (II): 180-209, 2019

Contributions

Major contributor of Section 3

180

Correlation of Quadratic Boolean Functions:Cryptanalysis of All Versions of Full MORUS

Danping Shi1,2, Siwei Sun1,2,3?, Yu Sasaki4, Chaoyun Li5, and Lei Hu1,2,3

1 State Key Laboratory of Information Security, Institute of InformationEngineering, Chinese Academy of Sciences, China

2 Data Assurance and Communication Security Research Center,Chinese Academy of Sciences, China

3 School of Cyber Security, University of Chinese Academy of Sciences, Chinashidanping, sunsiwei, [email protected]

4 NTT Secure Platform Laboratories, Japan [email protected] imec-COSIC, Dept. Electrical Engineering (ESAT), KU Leuven,

Belgium [email protected]

Abstract. We show that the correlation of any quadratic Boolean func-tion can be read out from its so-called disjoint quadratic form. We furtherpropose a polynomial-time algorithm that can transform an arbitraryquadratic Boolean function into its disjoint quadratic form. With thisalgorithm, the exact correlation of quadratic Boolean functions can becomputed efficiently.We apply this method to analyze the linear trails of MORUS (one of theseven finalists of the CAESAR competition), which are found with thehelp of a generic model for linear trails of MORUS-like key-stream gen-erators. In our model, any tool for finding linear trails of block cipherscan be used to search for trails of MORUS-like key-stream generators.As a result, a set of trails with correlation 2−38 is identified for all ver-sions of full MORUS, while the correlations of previously published besttrails for MORUS-640 and MORUS-1280 are 2−73 and 2−76 respectively(ASIACRYPT 2018). This significantly improves the complexity of theattack on MORUS-1280-256 from 2152 to 276. These new trails also lead tothe first distinguishing and message-recovery attacks on MORUS-640-128and MORUS-1280-128 with surprisingly low complexities around 276.Moreover, we observe that the condition for exploiting these trails in anattack can be more relaxed than previously thought, which shows thatthe new trails are superior to previously published ones in terms of bothcorrelation and the number of ciphertext blocks involved.

Keywords: Quadratic Boolean function · Disjoint quadratic form · Cor-relation attack · CAESAR competition · MORUS · MILP

1 Introduction

The notion of authenticated encryption (AE), which provides both confidential-ity and authenticity, was first introduced by Bellare and Namprempre around

? The corresponding author

CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS 181

2000 [4,5]. It was further developed and evolved into the notion of authenticatedencryption with associated data (AEAD) [29,30,31] to capture the settings ofreal-world communication networks, where the authenticity of some public infor-mation (e.g., packet header) must be ensured. Informally, an AEAD is a secret-key scheme involving an encryption algorithm and a decryption algorithm. Itsencryption algorithm receives a plaintext or message M , an associated data A,and a secret key K, and produces a ciphertext C and a tag T . The authenticityof the message and associated data can be checked against the tag T . We referthe reader to [29] for a more rigorous treatment of the definition of AEAD.

The CAESAR competition (the Competition for Authenticated Encryption:Security, Applicability, and Robustness) was announced at the Early Symmetric-key Crypto workshop 2013 [14] and also on-line at [7]. After several years ofintensive analysis and comparison of the 57 submissions, the finalists were an-nounced at FSE 2018. In this work, our target is one of the seven finalists —MORUS [39], which provides three main variants: MORUS-640 with a 128-bitkey, and MORUS-1280 with either a 128-bit or a 256-bit key.

Related Work. Apart from the analysis provided by the designers, MORUShas received extensive third-party cryptanalysis. These cryptanalysis includedifferential cryptanalysis [26,33,13], linear cryptanalysis [21], SAT-based crypt-analysis [12], cube cryptanalysis [32,21], state-recovery [19,38] and key-recoveryattacks [13], as well as attacks in the nonce-reuse setting [26]. However, theseattacks either target round-reduced versions of MORUS, or are launched in thenonce-reuse setting which is contradicting to the nonce-respect assumption as-sumed by the designers. Therefore, none of these analysis violates the securityclaims of MORUS.

A major breakthrough on the cryptanalysis of MORUS was made at ASI-ACRYPT 2018 [2]. In this work, based on rotational-invariant linear approxima-tions, Ashur et al. transfered linear approximations for a state-reduced versionof MORUS (named as MiniMORUS) to linear approximations for MORUS. Lin-ear approximations in the ciphertext bits with correlation 2−73 and 2−76 wereidentified for MORUS-640 and MORUS-1280 respectively. The approximation ofMORUS-1280 leads to distinguishing attacks and message-recovery attacks onthe full MORUS-1280 with 256-bit key. Since it requires about 22×76 = 2152

encryptions to exploit the correlation, MORUS-1280 with 128-bit keys remainimmune to these attacks. Similarly, to exploit the correlation of MORUS-640, itrequires about 2146 encryptions, which means MORUS-640-128 is also immuneto these attacks.

Our Contribution. In this work, we investigate the problem of computingthe correlation of quadratic Boolean functions. By transforming a quadraticBoolean function into its so-called disjoint quadratic form, we propose, to thebest of our knowledge, the first polynomial time algorithm that can determinethe correlation of an arbitrary quadratic Boolean function, while in previous

182 CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS

work (e.g., [2]), such correlations are computed with exhaustive or quite ad-hocapproaches which intrinsically limits their effectiveness.

Equipped with this new weapon, we set out to search for more complexrotational invariant linear trails of MORUS, and then compute their correlationswith the new method. To this end, we set up a model for finding linear trails ofMORUS-like key-stream generators, such that most existing search tools canbe applied. The model we proposed is generic and can be applied to manyother schemes, which is of independent interest. Eventually, using MILP basedapproach, we identify trails of all versions of MORUS which lead to significantimprovement over the previous attack on MORUS-1280-256 presented by Ashuret al. [2]. Generally, the complexity is reduced from 2152 to 276. Moreover, thesetrails result in the first attacks on full MORUS-640 and MORUS-1280 with 128-bit key. A summary of the results are given in Table 1, from which we can see thatthe attack is not marginal and the complexities are approaching the boundaryof practical attacks. We verify the attacks on a reduced version of MORUS. Also,following Ashur et al.’s approach [2], we verify all trail fragments for all versionsof full MORUS.

Along the way, we make an interesting observation that the condition im-posed on Ashur et al.’s attack can be relaxed. Specifically, the attacks actuallyonly require that enough plaintexts with a common prefix of certain size areencrypted, rather than the same plaintext is encrypted enough times as statedin [2]. This observation motivates us to find trails involving a smaller numberof ciphertext blocks, since the common-prefix assumption does occur in somepractical protocols.

At this point, we would like to mention that even after Ashur et al.’s work [2],many researchers are not sure if MORUS will stay in the competition given thehigh complexities of the attacks and the status of MORUS-640-128 and MORUS-1280-128. However, we think that the new attacks breaking all versions of fullMORUS with complexity around 276 severely shake the security confidence ofMORUS and should deserve more attentions. Finally, our technique is purelylinear, and most of the attacks presented in our paper are known-plaintext at-tacks, where we do not rely on any property of the output of the initializationprocess except its randomness. Hence, it is interesting to see how to improve ouranalysis by applying the differential-linear framework [20,3].

The exact linear trails we used can be found in an extended version of thepaper at https://eprint.iacr.org/2019/172, and the source code is availableat https://github.com/siweisun/attack_morus.

Organization. In Sect. 2, we give a brief visualized description of the authenti-cated encryption scheme MORUS. Then in Sect. 3, we show how to compute thecorrelation of a quadratic Boolean function by transforming it into the so-calleddisjoint quadratic form. A generic model for finding linear trails of MORUS-likekey-stream generators is constructed in Sect. 4, which is employed in Sect. 5to search for linear trails of MORUS with high absolute correlations, leading toattacks on all versions of full MORUS. Section 6 discusses the condition of the at-

CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS 183

Tab

le1:

Su

mm

aryof

the

results,

wh

ereth

esp

anin

dicates

the

nu

mb

erof

ciphertex

tb

lock

sin

volvedin

the

linear

app

roxim

ations.

Targ

et

Linearmask

softh

ecipherte

xtblock

sSpan

|Corre

latio

n|Data

Tim

eSource

MiniM

ORUS-640

08000000

52−

16

232

232

[2]

04000105

8800a002

00105040

00080000

10000000

42−

8216

216

Sect.

508000202

00004103

00002000

MiniM

ORUS-1280

0008000000000000

52−

16

232

232

[2]

0080000202000001

0008406020000090

0004040000100800

0000000001000000

0000000000010000

42−

8216

216

Sect.

54000000020100000

0000000220000804

0000000000008000

MORUS-640-128

10000000100000001000000010000000

42−

38

276

276

Sect.

508000202080002020800020208000202

00004103000041030000410300004103

00002000000020000000200000002000

MORUS-1280-128

0000000000010000000000000001000000000000000100000000000000010000

42−

38

276

276

Sect.

54000000020100000400000002010000040000000201000004000000020100000

0000000220000804000000022000080400000002200008040000000220000804

0000000000008000000000000000800000000000000080000000000000008000

MORUS-1280-256

0008000000000000000800000000000000080000000000000008000000000000

52−

76

2152

2152

[2]

0080000202000001008000020200000100800002020000010080000202000001

0008406020000090000840602000009000084060200000900008406020000090

0004040000100800000404000010080000040400001008000004040000100800

0000000001000000000000000100000000000000010000000000000001000000

0000000000010000000000000001000000000000000100000000000000010000

42−

38

276

276

Sect.

54000000020100000400000002010000040000000201000004000000020100000

0000000220000804000000022000080400000002200008040000000220000804

0000000000008000000000000000800000000000000080000000000000008000

184 CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS

tacks presented in the previous section and clarifies why trails involving a smallernumber of ciphertext blocks are preferred. We propose some open problems andconclude in Sect. 7.

2 Specification of MORUS and MiniMORUS

We give a brief description of MORUS and MiniMORUS, which largely followsthe notations used by Ashur et al. [2] to facilitate cross checking.

2.1 MORUS

MORUS is a family of AEAD schemes [39] whose interfaces are shown in Fig. 1.The encryption algorithm of MORUS operates on a 5q-bit state composed offive q-bit registers (q ∈ 128, 256), and each register is divided into four q/4-bit words as shown in Fig. 2, where we use Si,j to denote the jth bit of theith register Si of the 5q-bit state S. The three recommended parameter sets ofMORUS are listed in Table 2. Note that when the exact key size is not important,we use MORUS-640 and MORUS-1280 to denote the versions with 640-bit stateand 1280-bit state, respectively.

Key

MessageNonce

Associated Data

Ciphertext

Authentication Tag

Fig. 1: The high-level structure of the encryption algorithm of an AEAD scheme

S0

S0,q−1 S0,0

S1

S1,q−1 S1,0

S2

S2,q−1 S2,0

S3

S3,q−1 S3,0

S4

S4,q−1 S4,0

Fig. 2: A view of the MORUS internal state

During the encryption process of MORUS, a function

StateUpdate : F5q2 × Fq2 → F5q

2

is repeatedly executed on the internal state. Each call to the StateUpdate func-tion is called a step. We denote the state at the very beginning of the encryptionprocess by S−16 = S−160 ‖ S−161 ‖ S−162 ‖ S−163 ‖ S−164 . After a series of steps, asequence of states is produced:

S−16StateUpdate−−−−−−−→ S−15

StateUpdate−−−−−−−→ · · · StateUpdate−−−−−−−→ S0 StateUpdate−−−−−−−→ · · ·

CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS 185

Table 2: The three variants of MORUS, where the sizes are measured in bits

NameState size Register size Word size

Key size Tag size(5q) (q) (q/4)

MORUS-640-128 640 128 32 128 128MORUS-1280-128 1280 256 64 128 128MORUS-1280-256 1280 256 64 256 128

Therefore, we can use the notion St = St0 ‖ St1 ‖ St2 ‖ St3 ‖ St4 to reference thestate at step t. The detail of the StateUpdate function is shown in the followingequations:

St+10 ← (St0 ⊕ (St1 · St2)⊕ St3) ≪w b0, St3 ← St3 ≪ b

′0,

St+11 ← (St1 ⊕ (St2 · St3)⊕ St4 ⊕mi) ≪w b1, St4 ← St4 ≪ b

′1,

St+12 ← (St2 ⊕ (St3 · St4)⊕ St0 ⊕mi) ≪w b2, St0 ← St0 ≪ b

′2,

St+13 ← (St3 ⊕ (St4 · St0)⊕ St1 ⊕mi) ≪w b3, St1 ← St1 ≪ b

′3,

St+14 ← (St4 ⊕ (St0 · St1)⊕ St2 ⊕mi) ≪w b4, St2 ← St2 ≪ b

′4,

where ≪ω bi means rotation inside every w-bit (w = q/4) word of the registerto the left by bi bits, and ≪ is the ordinary left bitwise rotation operation. Theconcrete values for the rotation offsets are listed in Table 3, and we refer thereaders to Fig. 3 for a visualization of the StateUpdate function.

M

≪ω b0

≪ b′0

M

≪ω b1

≪ b′1

M

≪ω b2

≪ b′2

M

≪ω b3

≪ b′3

≪ω b4

≪ b′4

S0

S1

S2

S3

S4

Fig. 3: The StateUpdate function of MORUS

The encryption algorithm of MORUS can be divided into four phases. A visu-alized description of the encryption algorithm of MORUS without the finalizationphase can be found in Fig. 4.

186 CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS

Table 3: Rotation constants bi for ≪w and b′i for ≪ in round i of StepUpdate

CipherRotation offsets for ≪w Rotation offsets for ≪b0 b1 b2 b3 b4 b′0 b′1 b′2 b′3 b′4

MORUS-640-128 5 31 7 22 13 32 64 96 64 32MORUS-1280-128 13 46 38 7 4 64 128 192 128 64MORUS-1280-256 13 46 38 7 4 64 128 192 128 64

f(St, V t) = StateUpdate(St, V t)

g(St) = St0 ⊕ (St

1 ≪ b′2)⊕ (St2 ∧ St

3)

c1c01∗

KeyNonce

S−16

f

0

· · ·· · ·· · ·· · ·· · ·

f

0 Key

Initialization S0

f

A0

· · ·· · ·· · ·· · ·· · ·

f

Au−1

SuAssociated dataprocessing

g

f

M0

C0

g

f

M1

C1

g

f

M2

C2

g

f

M3

C3

· · ·

Fig. 4: The encryption algorithm of MORUS

Initialization. The initialization of every MORUS instance starts by loadingthe key and nonce materials into the state to produce the starting state S−16.Then update the state by calling StateUpdate 16 times, and finally the key isexclusive-ored into the state to produce the resulting state S0. Let c0 and c1 betwo 128-bit constants, and we use N128, K128, and K256 to denote the 128-bitnonce, 128-bit key and 256-bit key, respectively. The details of the initializationprocesses for different versions of MORUS are given in the following.

MORUS-640-128: S−16 = N128 ‖ K128 ‖ 1128 ‖ c0 ‖ c1. Then for t = −16,−15,· · · , −1, St+1 = StateUpdate(St, 0128). Finally, we set S0 ← S0

0 ‖ S01 ⊕K128 ‖

S02 ‖ S0

3 ‖ S04 .

MORUS-1280-128: S−16 = (N128 ‖ 0128) ‖ (K128 ‖ K128) ‖ 1256 ‖ 0256 ‖ (c0 ‖c1). Then for t = −16,−15, · · · , −1, St+1 = StateUpdate(St, 0256). Finally, weset S0 ← S0

0 ‖ S01 ⊕ (K128 ‖ K128) ‖ S0

2 ‖ S03 ‖ S0

4 .

MORUS-1280-256: S−16 = (N128 ‖ 0128) ‖ K256 ‖ 1256 ‖ 0256 ‖ (c0 ‖ c1).Then for t = −16,−15, · · · , −1, St+1 = StateUpdate(St, 0256). Finally, we setS0 ← S0

0 ‖ S01 ⊕K256 ‖ S0

2 ‖ S03 ‖ S0

4 .

Associated Data Processing. If there is no associated data, this process isomitted. Otherwise, the associated data is padded with zeros when necessary toform a multiple of q-bit (register size) block. Then the state is updated with the

CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS 187

associated data A as St+1 = StateUpdate(St, At), for t = 0, · · · , u − 1, whereu = d|A|/qe is the number of q-bit blocks of the (padded) associated data A.

Encryption. The plaintext is processed in q-bit blocks to update the stateand generate the ciphertext block at the same time. Similar to associated dataprocessing, the plaintext is padded with zeros if the last block is fractional. Fort = 0, · · · , v − 1, the following is performed.

Ct = M t ⊕ Su+t0 ⊕ (Su+t1 ≪ b′2)⊕ (Su+t2 ∧ Su+t3 ),

Su+t+1 = StateUpdate(Su+t,M t),

where v = d|M |/qe is the number of q-bit blocks of the padded plaintext.

Finalization. The authentication tag T is generated in the finalization phase bycalling StateUpdate ten more times. Since our attacks are completely irrelevantto how the tag is generated, we omit its details.

2.2 MiniMORUS and Rotational Invariance

MiniMORUS, proposed by Ashur et al. [2], is a family of helper constructionsderived from MORUS. For every MORUS instance with a 5q-bit state, there is aMiniMORUS instance with 5 ·(q/4)-bit state. To be more specific, each register inMiniMORUS contains a single word of w = q/4 bits. Therefore, the word-orientedrotations in the StateUpdate function of MORUS are removed in MiniMORUS,and the rotations within words (≪ω bi) are equivalent to ordinary bit-wiserotations (≪ bi) in MiniMORUS. We refer the reader to Fig. 5 and Fig. 3 for acomparison.

C

M

≪ b0

M

≪ b1

M

≪ b2

M

≪ b3

≪ b4

S0

S1

S2

S3

S4

Fig. 5: The StateUpdate function of MiniMORUS.

188 CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS

Obviously, MiniMORUS can be regarded as a reduced version of MORUS.Therefore, it is easier to search for linear trails of MiniMORUS. When a lineartrail of MiniMORUS is identified, we can consider the trail for MORUS wherethe bits involved in every q/4-bit register of MiniMORUS are copied into all thefour q/4-bit words in the corresponding register of MORUS. To put it simply, weonly consider trails of MORUS involving the same bits within each word of oneregister. This kind of patterns are invariant under word-wise rotations. Therefore,the trails for MiniMORUS can be regarded as truncated representations of thetrails for MORUS with rotational invariant patterns. We refer the reader to [2]for more details.

3 Correlation of Quadratic Boolean Functions

In this section, we give a brief introduction of necessary background of Booleanfunctions, prove that the correlation of a quadratic Boolean function can beread out from its disjoint quadratic form, and show how to convert an arbi-trary quadratic Boolean function into its so-call disjoint quadratic term with apolynomial time algorithm.

Let f : Fn2 → F2 be a Boolean function with algebraic normal form (ANF)

f(x) =∑

u∈Fn2

auxu,

where x = (x1, · · · , xn),u = (u1, · · · , un), au ∈ F2, and xu =∏ni=1 x

uii . The

degree of the Boolean function f is defined as

deg(f) = maxu∈Fn

2 :au 6=0wt(u),

where wt(u) is the Hamming weight of u.

Definition 1 (Correlation). The correlation of an n-variable Boolean func-tion f is cor(f) = 1

2n

∑x∈Fn

2(−1)f(x), and the weight of the correlation is defined

as − log2 |cor(f)|.

In the following, we use Var(f) to denote the set of variables involved in theBoolean function f . For example, if h = x1x2 +x1x3 + 1 and g = x2x3x4 +x3x4,then Var(h) = x1, x2, x3 and Var(g) = x2, x3, x4. Note that the variables aretreated as symbolic objects. A variable xi is degenerate if it does not appear in theANF of f , i.e., xi /∈ Var(f). For example, if f(x1, x2, x3, x4, x5) = x1+x2x3+x4,then x5 is degenerate.

Lemma 1. Let g(x1, · · · , xn) =∑kt=1 ft be a Boolean function such that the k

sets Var(ft) for 1 ≤ t ≤ k are mutually disjoint. Then cor(g) =∏kt=1 cor(ft).

CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS 189

Proof. Let ft be a Boolean function with nt variables for 1 ≤ t ≤ k, and m =n− n1 − · · · − nk. According to Definition 1, we have

cor(g) =1

2n

x∈Fn2

(−1)g(x) =∑

x∈Fn2

(−1)(f1+f2+···+fk)(x)

2n

=∑

x1∈Fn12

(−1)f1(x1)

2n1· · ·

xk∈Fnk2

(−1)fk(x1)

2nk·∑

x∈Fm2

(−1)0

2m

=

k∏

t=1

cor(ft),

as desired.

Example 1. cor(x1x2 + x3x4) = cor(x1x2) · cor(x3x4) = 2−2.

Corollary 1. Let f(x1, · · · , xn) be a Boolean function, and f = g+xj such thatxj /∈ Var(g) is a separated linear term. Then cor(f) = 0.

Example 2. cor(x1x2+x2x3x4+x3x5+x6) = cor(x1x2+x2x3x4+x3x5)·cor(x6) =cor(x1x2 + x2x3x4 + x3x5) · 0 = 0.

Lemma 2. Let f(x, y) = xy + ax+ by be a Boolean function and a, b ∈ F2 areconstants. Then cor(f) = (−1)ab · 2−1.

Proof. Prove by exhaustive analysis of a and b with Definition 1.

Definition 2. Two Boolean functions f(x) and g(x) are called cogredient ifthere exists an invertible matrix M , such that g(x) = f(xM).

Lemma 3. Let f(x) and g(x) be two Boolean functions cogredient to each other.Then cor(f) = cor(g).

Proof. Since f(x) and g(x) are cogredient to each other, g(x) = f(xM) forsome invertible matrix M . The result follows from the following equation

cor(g) = 12n

∑x∈Fn

2

(−1)g(x) = 12n

∑x∈Fn

2

(−1)f(xM)

= 12n

∑xM−1∈Fn

2

(−1)f(x) = 12n

∑x∈Fn

2

(−1)f(x).

Lemma 3 implies that the correlation of a Boolean function is invariant byapplying an invertible linear transformation to the input variables. Also, it issufficient to consider functions with constant term 0 since cor(f) = −cor(f + 1)for any f .

190 CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS

Definition 3 (Quadratic form). A Boolean function f is quadratic if deg(f) =2. A quadratic Boolean function is called a quadratic form if its constant termis 0. Hence, a quadratic form can be written as

f(x1, · · · , xn) =∑

1≤i≤j≤nai,jxixj = Qf (x1, · · · , xn) + Lf (x1, · · · , xn)

where ai,j ∈ F2, Qf contains all quadratic terms of f while Lf consists of alllinear terms of f .

Let f(x1, · · · , xn) be a quadratic Boolean function. For i ∈ 1, · · · , n, weuse σ(f, xi) to denote the number of terms of Qf involving variable xi.

Definition 4 (Disjoint quadratic form). Let f(x1, · · · , xn) be a quadraticform. A term xixj of f is a separated quadratic term if σ(f, xi) = σ(f, xj) = 1.In particular, f is disjoint if all its quadratic terms are separated quadratic terms.

Example 3. The two functions x1x2 + x3x4 and x1x3 + x2x4 + x2 + x5 are bothdisjoint quadratic forms, while x1x2 + x2x3 is not a disjoint quadratic form.

Lemma 4. Let f = xi1xi2 + · · · + xi2k−1xi2k + xj1 + · · · + xjs be a disjoint

quadratic form. Then

cor(f) =

(−1)

∑kt=1 Coef (xi2t−1

)Coef (xi2t) · 2−k j1, · · · , js ⊆ i1, · · · , i2k

0 j1, · · · , js ( i1, · · · , i2k

where Coef (xu) denotes the coefficient of the monomial xu in the ANF of f .

Proof. It follows from Lemma 1, Corollary 1, and Lemma 2.

With Lemma 4, it is easy to obtain the correlation of a disjoint quadraticform. In the remainder of this section, we will present an efficient algorithmfor converting any given quadratic form to a cogredient disjoint quadratic form.Hence, we can efficiently compute the correlation of any given quadratic form.Before diving into the details of the algorithm, we first introduce some usefulnotations and subroutines employed in Algorithm 1.

Subroutine 1 (PickIndex). Given a quadratic Boolean function f(x) with x =(x1, · · · , xn), PickIndex(f) returns the index t of xt, where t is the smallestinteger t ∈ 1, · · · , n, such that σ(f, xt) ≥ σ(f, xt′) for all t′ ∈ 1, · · · , n.Example 4. Let n = 3, f(x) = x1x2 + x2x3 + x3. Then PickIndex(f) = 2.

Subroutine 2 (Substitute). Given a Boolean function f(x) = f(x1, · · · , xn)and an n×n invertible matrix M , Substitute(f,M) returns the Boolean func-tion f(xM).

Example 5. Let f = x1x2+x2x3+x3, andM =

1 0 01 1 00 1 1

. Then Substitute(f,M)

gives f(xM) = (x1 + x2)(x2 + x3) + (x2 + x3)x3 + x3 = x1x2 + x1x3 + x2.

CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS 191

In Algorithm 1, for a given Boolean function f(x1, · · · , xn), we repeatedlyuse a substitution of variables of the form:

xu ← xt1 + xt2 + · · ·+ xtmxj ← xj , ∀j ∈ 1, · · · , n − u

,

where m ≥ 2, u ∈ t1, · · · , tm, and t1 < t2 < · · · < tm. This substitution canbe reformulated in the matrix form as x ← xIu←t1,··· ,tm , where Iu←t1,··· ,tm isobtained from the n×n identity matrix I by substituting the u-th column witha column vector whose tj-th entry is 1 for 1 ≤ j ≤ m and other entries are 0.Note that we always have Iu←t1,··· ,tm = I−1u←t1,··· ,tm .

Algorithm 1: Transform to disjoint quadratic form

Input: A quadratic form f(x) = f(x1, · · · , xn)Output: An invertible matrix M and a disjoint quadratic form f(x) such

that f(x) = f(xM)

1 /* Initialization */

2 M ← I /* I is the n× n identity matrix */

3 f(x)← f(x1, · · · , xn)

4 v ← PickIndex(f)

5 /* Transformation */

6 while σ(f , xv) ≥ 2 do

7 m← σ(f , xv) /* The number of quadratic terms involving xv */

8 Find all t1 < t2 < · · · < tm, such that xvxti is a term of f .

9 f ← Substitute(f , It1←t1,··· ,tm)10 M ← It1←t1,··· ,tm ·M

11 if σ(f , xt1) ≥ 2 then

12 k ← σ(f , xt1)

13 Find all s1 < s2 < · · · < sk, such that xt1xsi is a term of f .

14 f ← Substitute(f , Iv←s1,··· ,sk )15 M ← Iv←s1,··· ,sk ·M16 end

17 v ← PickIndex(f)

18 end

19 return M and f

Example 6. Let f ← f(x1, x2, x3, x4, x5) = x1x2 +x1x5 +x2x3 +x2x4 +x1 +x2.

Then σ(f , x1) = 2, σ(f , x2) = 3, σ(f , x3) = 1, σ(f , x4) = 1, and σ(f , x5) = 1.

Thus, v ← PickIndex(f) = 2. Now we extract the common factor xv = x2 in

192 CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS

Qf :

f(x) = x2(x1 + x3 + x4) + x1x5 + x1 + x2.

Then we apply the following substitution of variables:

x1 ← x1 + x3 + x4

xj ← xj , j ∈ 1, · · · , 5 − 1 . (1)

This variable substitution gives f ← x2x1+(x1+x3+x4)x5+(x1+x3+x4)+x2 =x1x2 +x1x5 +x3x5 +x4x5 +x1 +x2 +x3 +x4. Then we need to check whether x1(the variable corresponding to a sum of the original variables rather than a single

xj) appears multiple times in Qf . Since σ(f , x1) = 2 (x1 appears multiple times),

we extract the common factor: f = x1(x2 +x5)+x3x5 +x4x5 +x1 +x2 +x3 +x4.Then we apply the variable substitution:

x2 ← x2 + x5

xj ← xj , j ∈ 1, · · · , 5 − 2 . (2)

This variable substitution gives f ← x1x2 + x3x5 + x4x5 + x1 + (x2 + x5) +x3 + x4 = x1x2 + x3x5 + x4x5 + x1 + x2 + x3 + x4 + x5. At this point (a wholewhile loop is done), we can observe that x1x2 is a separated quadratic term

of f . Actually, as shown in Theorem 1, every execution of the while loop willmake one quadratic term separated. Then PickIndex(f) returns 5, and we have

f = x1x2 + (x3 + x4)x5 + x1 + x2 + x3 + x4 + x5. Applying the substitution

x3 ← x3 + x4

xj ← xj , j ∈ 1, · · · , 5 − 3 , (3)

gives f = x1x2 + x3x5 + x1 + x2 + x3 + x5, which is a disjoint quadratic form.It follows from Equations (1) – (3) that

M =

1 0 0 0 00 1 0 0 00 0 1 0 00 0 1 1 00 0 0 0 1

·

1 0 0 0 00 1 0 0 00 0 1 0 00 0 0 1 00 1 0 0 1

·

1 0 0 0 00 1 0 0 01 0 1 0 01 0 0 1 00 0 0 0 1

=

1 0 0 0 00 1 0 0 01 0 1 0 00 0 1 1 00 1 0 0 1

.

It is readily to verify that f = f(xM). Consequently, according to Lemma 3,the correlation of f is (−1)1·1+1·1 · 2−2 = 2−2 .

To show the validity of Algorithm 1, we present the following result.

Lemma 5. For any input quadratic form f(x) = f(x1, · · · , xn) of Algorithm 1,each while loop will generate at least one separated quadratic terms.

CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS 193

Proof. Let f = xv(xt1 + xt2 + · · · + xtm) + g, where v = PickIndex(f) and

t1, t2, · · · , tm be all the indices such that xtixv is a term of f with t1 < t2 <· · · < tm. Then we have σ(g, xv) = 0 according to the way we choose ti’s. Afterthe variable substitution x← x · It1←t1,··· ,tm , we have

f ← xvxt1 + g(x · It1←t1,··· ,tm).

Since xv is unchanged under It1←t1,··· ,tm , we have σ(f , xv) = 1 + σ(g, xv) = 1.

If σ(f , xt1) = 1, then xvxt1 is a separated quadratic term. Otherwise, we

have σ(f , xt1) ≥ 2. Assume that the current f can be written as f = xt1(xv +xs1 + · · · + xsk) + h, where s1, s2, · · · , sk are all the indices such that xt1xsi is

a term of f and s1 < s2 < · · · < sk. It implies that σ(h, xt1) = 0. Further,we have σ(h, xv) = 0 since σ(h, xv) ≤ σ(g, xv) = 0. Then the transformation

x← Iv←s1,··· ,sk carries the function f into

f ← xvxt1 + h(x · Iv←s1,··· ,sk).

Thus, we have σ(f , xt1) = 1 and σ(f , xv) = 1 . This means that xvxt1 is aseparated quadratic term.

Theorem 1. Given a quadratic form f(x) = f(x1, · · · , xn), Algorithm 1 out-

puts a disjoint quadratic form f(x) and an invertible n × n matrix M , such

that f(x) = f(xM). Moreover, Algorithm 1 has time complexity O(n3.8) andmemory complexity Ω(n2).

Proof. According to Lemma 5, each while loop will generate at least one sepa-rated quadratic term. Hence, after at most n/2 while loops, all quadratic terms

of the current f are disjoint quadratic terms.

Now we briefly analyze the complexity of Algorithm 1. From the above anal-ysis, Algorithm 1 will have n/2 while loops in the worst case. This impliesthat the time complexity is upper bounded by the n matrix multiplications.Therefore, the time complexity of the algorithm can be estimated as O(n1+2.8),where we take O(n2.8) as the time complexity of the multiplication two n × nmatrices [34]. It is readily seen that the memory complexity is Ω(n2).

To sum up, with Lemma 3, Lemma 4, and Algorithm 1, we can compute thecorrelation of any quadratic Boolean function with polynomial time complexity.

Remark. On 22-06-2019, we received an E-mail from Ryan Williams, whichindicated that essentially the same theory concerning quadratic forms had beendeveloped much earlier (despite some superficial differences in the appearance).We refer the reader to [8,15,28] for more information.

194 CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS

4 Exploitable Linear Approximations of MORUS-like KeyStream Generators

We consider a typical stream cipher construction shown in Fig. 6. A partiallyunknown state SU (initialized with a secret key and some public values) is pro-cessed by an initialization algorithm. Then a vectorial Boolean function G isapplied to the state S0 to produce one key stream word Z0. For 0 ≤ i < k, astate update function is employed to obtain a new state Si+1 = F(Si), fromwhich a key stream word Zi+1 = G(Si+1) is extracted.

Init

SU

β−1

S0

γ0

Gλ0

Z0

α0 β0F

S1

γ1

Gλ1

Z1

α1 β1F · · ·

· · ·

αk−2 βk−2F

Sk−1

γk−1

Gλk−1

Zk−1

αk−1 βk−1F

Sk

γk

Gλk

Zk

αk

Fig. 6: Linear trails for MORUS-like key-stream generator

For this kind of stream ciphers, a generic attack based on linear cryptanal-ysis (e.g.,[27]) can be applied, whose goal is to find a sequence of linear masks(λ0, · · · , λk) for the key-stream blocks Zi, such that the absolute value of the

correlation cor(∑k

i=0 λiZi)

can be maximized, where the number of cipher-

text blocks involved in the linear approximation is called the span. In whatfollows, we establish a model in which finding (λ0, · · · , λk) is conceptually thesame as finding linear trails of a block cipher with additional constraints im-posed on some linear masks at some special positions. With this model, existingtools [25,9,35,37,16] for finding good linear trails of block ciphers can be appliedto search for (λ0, · · · , λk).

Definition 5. A linear trail of the key stream generator shown in Fig. 6:

(β−1, γ0, λ0, α0, β0, · · · , αk−1, βk−1, γk, λk, αk)

is said to be exploitable if and only if β−1 = 0, αk = 0, and αi + γi + βi−1 = 0for 0 ≤ i ≤ k.

The motivation behind Definition 5 is that when the following equations

β−1 = 0

αk = 0

αi + γi + βi−1 = 0, 0 ≤ i ≤ kγiS

i + λiZi = 0, 0 ≤ i ≤ k

αiSi + βiS

i+1 = 0, 0 ≤ i ≤ k − 1

(4)

CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS 195

hold simultaneously, we have

k∑

i=0

λiZi =

k∑

i=0

γiSi = β−1S

0 +

k−1∑

i=0

(αiSi + βiS

i+1) + αkSk = 0. (5)

Although in Definition 5 we require β−1 = 0, in fact, any characteristic startingwith some βi = 0 that follows the same pattern specified in Definition 5 acrossseveral consecutive ciphertext blocks can be exploited.

In this work, the MILP-based approach [35,37,16] is employed to search forlinear trails of MORUS. One solution of the MILP model is a linear characteristicsatisfying additional constraints specified in Definition 5. The objective functionof the model is to minimize the number of active AND gates. The trails producedby the models are only locally consistent, and thus we cannot guarantee theirglobal soundness with respect to optimality and validity, since the models areconstructed under the assumption that all AND gates are independent.

Let us inspect a toy example where f = f1 + f2 = x1x2 + x1x3 + x2 and

f1(x1, x2, x3) = x1x2 + x2

f2(x1, x2, x3) = x1x3.

The reader can check that in this case cor(f1) = cor(f2) = 2−1, but cor(f) = 0,which implies that the sum of biased Boolean functions may be balanced. There-fore, global consistency of the full trail cannot be ensured by local consistency. Tobe more concrete, we show a real example. Table 4 presents an invalid linear trailgenerated by our MILP model whose span is 3. Note that in this paper, we showour trails in their linear-mask representations. There is a correspondence be-tween the linear-mask representation and the trail-equation representation usedin [2]. The five linear masks between α0 and β0 listed in Table 4 are the linearmasks in the positions shown in Fig. 5 marked with dashed lines. Each row of thelinear masks determines which AND gates are activated, and each active ANDgate produces one equation containing one product term. By adding up theseequations, we can reproduce the trail-equation representations used in [2]. In thiswork, we always need to convert the linear-mask representation into the trail-equation representation, which is required to determine its overall correlation byusing the method proposed in Sect. 3.

For the sake of completeness, we give a complete example of the conversionprocess based on the trail shown in Table 4. From the linear masks, we can getthe following equations:

C030 ⊕ S0

0,30 ⊕ S01,30 = S0

2,30 · S03,30

C022 ⊕ S0

0,22 ⊕ S01,22 = S0

2,22 · S03,22

S00,30 ⊕ S1

0,3 ⊕ S03,30 = S0

1,30 · S02,30

S00,22 ⊕ S1

0,27 ⊕ S03,22 = S0

1,22 · S02,22

S01,22 ⊕ S1

1,21 ⊕ S04,22 = S0

2,22 · S03,22

196 CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS

S04,22 ⊕ S1

4,3 ⊕ S12,22 = S1

0,22 · S11,22

C129 ⊕ S1

0,29 ⊕ S11,29 = S1

2,29 · S13,29

C127 ⊕ S1

0,27 ⊕ S11,27 = S1

2,27 · S13,27

C122 ⊕ S1

0,22 ⊕ S11,22 = S1

2,22 · S13,22

C121 ⊕ S1

0,21 ⊕ S11,21 = S1

2,21 · S13,21

C13 ⊕ S1

0,3 ⊕ S11,3 = S1

2,3 · S13,3

S10,29 ⊕ S2

0,2 ⊕ S13,29 = S1

1,29 · S12,29

S10,22 ⊕ S2

0,27 ⊕ S13,22 = S1

1,22 · S12,22

S10,21 ⊕ S2

0,26 ⊕ S13,21 = S1

1,21 · S12,21

S11,27 ⊕ S2

1,26 ⊕ S14,27 = S1

2,27 · S13,27

S11,3 ⊕ S2

1,2 ⊕ S14,3 = S1

2,3 · S13,3

S12,27 ⊕ S2

2,2 ⊕ S20,27 = S1

3,27 · S14,27

C226 ⊕ S2

0,26 ⊕ S21,26 = S2

2,26 · S23,26

C22 ⊕ S2

0,2 ⊕ S21,2 = S2

2,2 · S23,2

Adding up the above equations gives the trail equation:

C030 ⊕ C0

22 ⊕ C129 ⊕ C1

27 ⊕ C122 ⊕ C1

21 ⊕ C13 ⊕ C2

26 ⊕ C22

= S12,22 · S1

3,22 ⊕ S11,22 · S1

2,22 ⊕ S12,22 ⊕ S1

3,22 ⊕ S11,22

⊕ S12,21 · S1

3,21 ⊕ S11,21 · S1

2,21 ⊕ S13,21

⊕ S12,29 · S1

3,29 ⊕ S11,29 · S1

2,29 ⊕ S13,29 ⊕ S1

1,29

⊕ S02,30 · S0

3,30 ⊕ S01,30 · S0

2,30 ⊕ S03,30 ⊕ S0

1,30

⊕ S01,22 · S0

2,22

⊕ S10,22 · S1

1,22

⊕ S13,27 · S1

4,27 ⊕ S14,27

⊕ S22,26 · S2

3,26

⊕ S22,2 · S2

3,2 ⊕ S22,2

⊕ S03,22

⊕ S12,27.

The right-hand side of the equation is a quadratic Boolean function. Thus byapplying the method shown in Sect. 3, we can obtain its correlation. However,for this special case, we know that its correlation is zero without converting itinto the disjoint quadratic form, since the variable S1

2,27 never appears in anyother term of the quadratic Boolean function. Thus, according to Corollary 1,the correlation of C0

30 ⊕ C022 ⊕ C1

29 ⊕ C127 ⊕ C1

22 ⊕ C121 ⊕ C1

3 ⊕ C226 ⊕ C2

2 is zero.

CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS 197

Table 4: An invalid trail of MiniMORUS-640 with span 3

Round Linear masks

0

α0 40400000 40400000 00000000 40400000 0000000008000008 00400000 00000000 00000000 0000000008000008 00200000 00000000 00000000 0040000008000008 00200000 00000000 00000000 0040000008000008 00200000 00000000 00000000 00400000

β0 08000008 00200000 00400000 00000000 00000008γ0 40400000 40400000 00000000 40400000 00000000λ0 40400000

1

α1 20600000 28400008 00400000 20600000 000000080c000004 08000008 00000000 00000000 000000080c000004 04000004 08000000 00000000 0800000004000004 04000004 00000004 00000000 0000000004000004 04000004 00000004 00000000 00000000

β1 04000004 04000004 00000004 00000000 00000000γ1 28600008 28600008 00000000 20600000 00000000λ1 28600008

2γ2 04000004 04000004 00000004 00000000 00000000λ2 04000004

At this point, we emphasize that Definition 5 is only used as a mental helperto identify potentially good trails. Since in practice, we apply search tools thatproduce “good” linear trails assuming the independencies of the rounds or com-ponents within F and G. However, these assumptions are generally not trueas illustrated by the above example. Therefore, the outputs of the search toolsare not reliable. We must recompute the correlation of the full trail by usingdedicated methods which are suitable to the target under consideration. Forinstance, using the method presented in Sect. 3, we automatically detect suchinconsistencies shown in the above examples.

5 Searching for Linear Approximations of MORUS

By setting the plaintext to zero message as in [2], MiniMORUS and MORUS fitexactly into the model established in Sect. 4. Hence, linear trails of MiniMORUSand MORUS can be searched by using any existing tools for finding linear approx-imations. In our work, we apply the MILP-based approach, where the constraintsimposed on the linear trails are encoded into MILP models.

In practice, we must determine the number of ciphertext blocks involved inthe final linear combination of the ciphertext bits before we can set up the MILPmodel. First, we theoretically show that there is no useful linear approximationfor MORUS involving only one ciphertext block. Let λ0 be a linear mask of thekey-stream generator shown in Fig. 6 for one ciphertext block. Then we have

λ0Z0 =

j,λ0,j=1

(S00,j ⊕ (S0

1,j+b′2

⊕ S02,j · S0

3,j))

=⊕

j,λ0,j=1

S00,j ⊕

j,λ0,j=1

(S01,j+b

′2

⊕ S02,j · S0

3,j).

198 CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS

Since the variable S00,j does not appear in other terms, we have cor(λ0Z

0) = 0according to Corollary 1.

Since the linear trails used in [2] span across 5 ciphertext blocks, we decideto only search for rotational invariant trails with spans greater than 1 and lessthan 6 (models for larger spans will have more variables which are difficult tosolve). The best trails we found are of span 4, and the trails for MiniMORUS-640and MORUS-640 are listed in Table 5 and Table 6, respectively.

Table 5: A linear trail of MiniMORUS-640 with correlation −2−8

Round Linear masks

0

α0 10000000 10000000 00000000 10000000 0000000000000002 00000000 00000000 00000000 0000000000000002 00000000 00000000 00000000 0000000000000002 00000000 00000000 00000000 0000000000000002 00000000 00000000 00000000 00000000

β0 00000002 00000000 00000000 00000000 00000000γ0 10000000 10000000 00000000 10000000 00000000λ0 10000000

1

α1 08000200 08000202 00000002 08000200 0000000000004001 00000002 00000002 00000000 0000000000004001 00000001 00000000 00000000 0000000200004001 00000001 00000000 00000000 0000000200004001 00000001 00000000 00000000 00000002

β1 00004003 00000003 00000002 00000000 00004000γ1 08000202 08000202 00000002 08000200 00000000λ1 08000202

2

α2 00000100 00004100 00000000 00000100 0000400000002000 00004000 00000000 00000000 0000400000002000 00002000 00000000 00000000 0000000000002000 00002000 00000000 00000000 0000000000002000 00002000 00000000 00000000 00000000

β2 00002000 00002000 00000000 00000000 00000000γ2 00004103 00004103 00000002 00000100 00000000λ2 00004103

3γ3 00002000 00002000 00000000 00000000 00000000λ3 00002000

As an illustration, let us compute the correlation of the trail of MiniMORUS-640 shown in Table 5. Firstly, according to the linear masks shown in Table 5,we write down the following equations which hold with probability 1.

C028 ⊕ S0

0,28 ⊕ S01,28 = S0

2,28 · S03,28

S00,28 ⊕ S1

0,1 ⊕ S03,28 = S0

1,28 · S02,28

C127 ⊕ S1

0,27 ⊕ S11,27 = S1

2,27 · S13,27

C19 ⊕ S1

0,9 ⊕ S11,9 = S1

2,9 · S13,9

C11 ⊕ S1

0,1 ⊕ S11,1 = S1

2,1 · S13,1

S10,27 ⊕ S2

0,0 ⊕ S13,27 = S1

1,27 · S12,27

S10,9 ⊕ S2

0,14 ⊕ S13,9 = S1

1,9 · S12,9

CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS 199

Table 6: A linear trail of MORUS-640 with correlation 2−38, where “*4” standsfor 4 copies of the same bit string

Round Linear masks

0

α0 10000000*4 10000000*4 00000000*4 10000000*4 00000000*400000002*4 00000000*4 00000000*4 00000000*4 00000000*400000002*4 00000000*4 00000000*4 00000000*4 00000000*400000002*4 00000000*4 00000000*4 00000000*4 00000000*400000002*4 00000000*4 00000000*4 00000000*4 00000000*4

β0 00000002*4 00000000*4 00000000*4 00000000*4 00000000*4γ0 10000000*4 10000000*4 00000000*4 10000000*4 00000000*4λ0 10000000*4

1

α1 08000200*4 08000202*4 00000002*4 08000200*4 00000000*400004001*4 00000002*4 00000002*4 00000000*4 00000000*400004001*4 00000001*4 00000000*4 00000000*4 00000002*400004001*4 00000001*4 00000000*4 00000000*4 00000002*400004001*4 00000001*4 00000000*4 00000000*4 00000002*4

β1 00004003*4 00000003*4 00000002*4 00000000*4 00004000*4γ1 08000202*4 08000202*4 00000002*4 08000200*4 00000000*4λ1 08000202*4

2

α2 00000100*4 00004100*4 00000000*4 00000100*4 00004000*400002000*4 00004000*4 00000000*4 00000000*4 00004000*400002000*4 00002000*4 00000000*4 00000000*4 00000000*400002000*4 00002000*4 00000000*4 00000000*4 00000000*400002000*4 00002000*4 00000000*4 00000000*4 00000000*4

β2 00002000*4 00002000*4 00000000*4 00000000*4 00000000*4γ2 00004103*4 00004103*4 00000002*4 00000100*4 00000000*4λ2 00004103*4

3γ3 00002000*4 00002000*4 00000000*4 00000000*4 00000000*4λ3 00002000*4

S11,1 ⊕ S2

1,0 ⊕ S14,1 = S1

2,1 · S13,1

S14,1 ⊕ S2

4,14 ⊕ S22,1 = S2

0,1 · S21,1

C214 ⊕ S2

0,14 ⊕ S21,14 = S2

2,14 · S23,14

C28 ⊕ S2

0,8 ⊕ S21,8 = S2

2,8 · S23,8

C21 ⊕ S2

0,1 ⊕ S21,1 = S2

2,1 · S23,1

C20 ⊕ S2

0,0 ⊕ S21,0 = S2

2,0 · S23,0

S20,8 ⊕ S3

0,13 ⊕ S23,8 = S2

1,8 · S22,8

S21,14 ⊕ S3

1,13 ⊕ S24,14 = S2

2,14 · S23,14

C313 ⊕ S3

0,13 ⊕ S31,13 = S3

2,13 · S33,13

Combining the above equations, we obtain an equation whose left-hand sideinvolves only cipher-text bits, while the right-hand side of the equation can beregarded as a quadratic Boolean function.

C028 ⊕ C1

27 ⊕ C19 ⊕ C1

1 ⊕ C214 ⊕ C2

8 ⊕ C21 ⊕ C2

0 ⊕ C313

= S02,28 · S0

3,28 ⊕ S01,28 · S0

2,28 ⊕ S03,28 ⊕ S0

1,28

⊕ S12,9 · S1

3,9 ⊕ S11,9 · S1

2,9 ⊕ S13,9 ⊕ S1

1,9

⊕ S12,27 · S1

3,27 ⊕ S11,27 · S1

2,27 ⊕ S13,27 ⊕ S1

1,27

200 CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS

⊕ S22,8 · S2

3,8 ⊕ S21,8 · S2

2,8 ⊕ S23,8 ⊕ S2

1,8

⊕ S20,1 · S2

1,1 ⊕ S20,1 ⊕ S2

1,1

⊕ S22,1 · S2

3,1 ⊕ S22,1

⊕ S22,0 · S2

3,0

⊕ S32,13 · S3

3,13

The right-hand side of the above equation can be transformed into its disjointquadratic form with the method presented in Sect. 3.

S02,28 · S0

3,28 ⊕ S01,28 · S0

2,28 ⊕ S03,28 ⊕ S0

1,28

⊕ S12,9 · S1

3,9 ⊕ S11,9 · S1

2,9 ⊕ S13,9 ⊕ S1

1,9

⊕ S12,27 · S1

3,27 ⊕ S11,27 · S1

2,27 ⊕ S13,27 ⊕ S1

1,27

⊕ S22,8 · S2

3,8 ⊕ S21,8 · S2

2,8 ⊕ S23,8 ⊕ S2

1,8

⊕ S20,1 · S2

1,1 ⊕ S20,1 ⊕ S2

1,1

⊕ S22,1 · S2

3,1 ⊕ S22,1

⊕ S22,0 · S2

3,0

⊕ S32,13 · S3

3,13

= (S02,28 ⊕ 1)(S0

1,28 ⊕ S03,28)

⊕ (S12,9 ⊕ 1)(S1

1,9 ⊕ S13,9)

⊕ (S12,27 ⊕ 1)(S1

1,27 ⊕ S13,27)

⊕ (S22,8 ⊕ 1)(S2

1,8 ⊕ S23,8)

⊕ (S20,1 ⊕ 1)(S2

1,1 ⊕ 1)

⊕ S22,1(S2

3,1 ⊕ 1)

⊕ S22,0 · S2

3,0

⊕ S32,13 · S3

3,13 ⊕ 1

Therefore, the correlation of C028 ⊕C1

27 ⊕C19 ⊕C1

1 ⊕C214 ⊕C2

8 ⊕C21 ⊕C2

0 ⊕C313

is −2−8. Similarly, we can compute the correlations of the trails of MORUS-640,MiniMORUS-1280, and MORUS-1280.

Before going any further, we would like to give some insight into the trailsof MiniMORUS to show how the linear approximations covering different partsof the cipher eventually eliminate all internal variables, leading to approxima-tions involving only ciphertext variables. The following discussion is similar tothe Section 4 of [2]. Several fragments are common between [2] and ours. Werecommend the reader to review the Figure 2 of [2] before reading the followingpart.

We can use the variables of Ct to approximate the variables of St+10 , denoted

by Ct → St+10 . At the same time, Ct+1, St+1

0 , St+21 → St+1

4 . These approxi-mations are visualized in Fig. 7a and Fig. 7b, Note that two AND operationsare involved in Fig. 7a, in which one is approximated to S3 and the other is

CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS 201

C

M

≪ b0

M

≪ b1

M

≪ b2

M

≪ b3

≪ b4

S0

S1

S2

S3

S4

(a) Ct → St+10 (weight is 1)

C

M

≪ b0

M

≪ b1

M

≪ b2

M

≪ b3

≪ b4

S0

S1

S2

S3

S4

(b) Ct+1, St+10 , St+2

1 → St+14 (weight is 0)

C

M

≪ b0

M

≪ b1

M

≪ b2

M

≪ b3

≪ b4

S0

S1

S2

S3

S4

(c) Ct+1 → St+20 (weight is 1)

C

M

≪ b0

M

≪ b1

M

≪ b2

M

≪ b3

≪ b4

S0

S1

S2

S3

S4

(d) Ct+2, St+20 → St+2

1 (weight is 1)

C

M

≪ b0

M

≪ b1

M

≪ b2

M

≪ b3

≪ b4

S0

S1

S2

S3

S4

(e) Ct+1 → St+20 (weight is 1)

C

M

≪ b0

M

≪ b1

M

≪ b2

M

≪ b3

≪ b4

S0

S1

S2

S3

S4

(f) Ct+2, St+20 , St+3

1 → St+24 (weight is 0)

C

M

≪ b0

M

≪ b1

M

≪ b2

M

≪ b3

≪ b4

S0

S1

S2

S3

S4

(g) Ct+2 → St+30 (weight is 1)

C

M

≪ b0

M

≪ b1

M

≪ b2

M

≪ b3

≪ b4

S0

S1

S2

S3

S4

(h) Ct+3, St+30 → St+3

1 (weight is 1)

C

M

≪ b0

M

≪ b1

M

≪ b2

M

≪ b3

≪ b4

S0

S1

S2

S3

S4

(i) St+20 , St+2

1 , St+22 → St+1

4 , St+24 (weight is 1)

C

M

≪ b0

M

≪ b1

M

≪ b2

M

≪ b3

≪ b4

S0

S1

S2

S3

S4

(j) Ct+2 → St+20 , St+2

1 , St+22 (weight is 1)

Fig. 7: MiniMORUS linear trail fragments

202 CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS

approximated to S1. This seems to require weight 2. This trail fragment is thesame as one of the fragments in [2], and [2] explains that there is another way ofapproximating those two AND operations: one is approximated to S3 ⊕ S2 andthe other is approximated to S2 ⊕ S1. Two ways of the approximation form ahull effect, which makes its weight 1.

Fig. 7b was also used in [2], which involves two AND operations. ThoseAND operations take the same input variables, S2 and S3. Hence those twodeterministically cancel each other, which makes the weight of this fragment 0.

Basically, by combining the fragments in Fig. 7a to Fig. 7d, 1 bit of St+14 is

approximated from the ciphertext bits. We do the same to approximate 1 bitof St+2

4 by sliding the steps by 1. Fig. 7e to Fig. 7h are for this approximation.Hence by removing the step indices, Fig. 7e to Fig. 7h are exact copies of Fig. 7ato Fig. 7d.

Note that the linear trail up to here, which has weight 6, is identical with[2]. Ashur et al. [2] iterated this approximation twice and added 4 more approx-imations, which makes the weight of their trail (6× 2) + 4 = 16. The core of ourimprovement lies in the detection of a rather complicated new approximationthat approximates St+1

4 and St+24 by ciphertext bits only with weight 2. The

new approximations are shown in Fig. 7i to Fig. 7j, in which St+14 and St+2

4

are approximated to the 3-bit sum of St+20 , St+2

1 and St+22 , and Ct+2 are also

approximated to the 3-bit sum of St+20 , St+2

1 , St+22 . The previous work [2] found

the attack by hand thus the most of the approximations are simple such that2 internal state bits are approximated to 1-bit of another state. thanks to thegeneric model in Sect. 4, we could detect this efficient approximation

We stress that the trail fragments are only used to shed insight on the fulltrails, and the verification of these trail fragments are only used to providedadditional evidence of the validity of the analysis. We never use trail fragmentsto compute the correlation. The correlation must be computed on the full trail aswhole.

Remark. we would like to make a remark on the effect of the in-word rota-tion (≪w) offsets (bi, i ∈ 0, · · · , 4) of MORUS on the linear trails we find.In [2], Ashur et al. assumes that the trails work for any choice of bi withoutany concrete discussion of the actual effect. We randomly choose 50 different(b0, b1, b2, b3, b4)’s and generate 50 MILP models to search for their trails. Wedo observe slight variance of the correlations of the trails we find for differ-ent choices of (b0, b1, b2, b3, b4). For example, in the case of (b0, b1, b2, b3, b4) =(16, 31, 23, 3, 17), we identify a trail of MORUS-640 with correlation 2−34, mean-ing that under our current cryptanalysis technique, this version is weaker thanthe original design.

5.1 Distinguishing Attack and Message-recovery Attack on MORUS

So far, for the sake of simplicity, we have assumed that all message blocks arezero. As already pointed out in [2], message variables only contribute linearly tothe trails.

CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS 203

Therefore, under the condition that the involved message bits are kept con-stants, the trails we identified can be employed to mount two types of attacks.The first one is a (partially) known-plaintext distinguishing attack, where a largenumber of partially known plaintexts are encrypted, and then we can detect thebias from the ciphertexts. The second one is a message-recovery attack, in whichwe can recover some unknown plaintext bits if the same plaintext is encryptedfor many times. The scenario in which the message-recovery attack can be ap-plied does happen in practice. For example, the same message can be encryptedwith different IVs and potentially different keys in the so-called broadcast set-ting [22,1].

For the message-recovery attacks, we rely on the approach proposed by Mat-sui [24]. For example, if the correlation of the trail employed in our message-recovery attack is 2ρ, we would encrypt a (unknown) message approximately ntimes with different nonces or keys. Let Tb be the number of encryptions suchthat the linear combination (derived from the trail) of the ciphertext bits is equalto b ∈ 0, 1. Then we guess the value of the linear combination L(M) of themessage M according to the following rule:

L(M) =

0, if T0 > T1 and ρ > 0,

1, if T0 > T1 and ρ < 0,

1, if T1 > T0 and ρ > 0,

0, if T1 > T0 and ρ < 0.

The success probability of the procedure can be estimated as∫∞−2√n|ρ|

1√2πe−x

2/2dx,

which would be greater than 84.1% if we set n > 14 |ρ|−2 [24]. Therefore, if the

correlation of the underlying approximation is 2−c, we need about 22c encryp-tions to mount the attack.

On the data complexity. As pointed out by Ashur et al. [2], the data complexi-ties of the attacks could be slightly lowered by using multiple linear trails [18,17,6].Actually, given any trail found in this paper, we can derive another trail withthe same correlation by rotating the masks within words by a common offset. Ifwe assume independency, we could run q/4 (the word size) copies of the trail inparallel on the same encrypted blocks, which would save a factor of 25 on thedata complexity for MORUS-640, and 26 for MORUS-1280.

5.2 Verification of the Attacks

To confirm the validity of our analysis, we experimentally verify the trails ortrail fragments. For MiniMORUS, we are able to fully verify the correlations.Experiments show that the weights of the correlations of

C028 ⊕ C1

27 ⊕ C19 ⊕ C1

1 ⊕ C214 ⊕ C2

8 ⊕ C21 ⊕ C2

0 ⊕ C313

andC0

16 ⊕ C162 ⊕ C1

29 ⊕ C120 ⊕ C2

33 ⊕ C229 ⊕ C2

11 ⊕ C22 ⊕ C3

15

204 CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS

for MiniMORUS-640 and MiniMORUS-1280 are 7.7919 and 8.1528 respectively,which are quite close to 8, the theoretically predicted correlation.

For MORUS, the correlation of the best trails we find is 2−38, indicatingthat about 276 encryptions have to be performed to verify the full trail, whichis out of our reach. Following the approach presented in [2], we decompose thefull trail into trail fragments according to Fig. 7, and every fragment is verifiedindependently.

For MORUS-640 and MORUS-1280, the full trails can be divided into fivetrail fragments shown in Table 7 and Table 8, respectively. We independentlyverify these trail fragments and the results are given in Fig. 8a and Fig. 8b.Again, the results fit the theoretical analysis very well.

Table 7: The five trail fragments of MORUS-640

Trail fragment Weight

χ1 C0124,92,60,28 ⊕ C1

97,65,33,1 = S14,97,65,33,1 ⊕ S2

1,96,64,32,0 7

χ2 C1123,91,59,27 ⊕ C2

96,64,32,0 = S21,96,64,32,0 8

χ3 C2104,72,40,8 ⊕ C3

109,77,45,13 = S31,109,77,45,13 8

χ4 C1105,73,41,9 ⊕ C2

110,78,46,14 = S31,109,77,45,13 ⊕ S2

4,110,78,46,14 7

χ5 C297,65,33,1 = S1

4,97,65,33,1 ⊕ S24,110,78,46,14 8

Table 8: The five trail fragments of MORUS-1280

Trail fragment Weight

χ1 C0208,144,80,16 ⊕ C1

221,157,93,29 = S14,221,157,93,29 ⊕ S2

1,203,139,75,11 7

χ2 C1254,190,126,62 ⊕ C2

203,139,75,11 = S21,203,139,75,11 8

χ3 C2194,130,66,2 ⊕ C3

207,143,79,15 = S31,207,143,79,15 8

χ4 C1212,148,84,20 ⊕ C2

225,161,97,33 = S31,207,143,79,15 ⊕ S2

4,225,161,97,33 7

χ5 C2221,157,93,29 = S1

4,221,157,93,29 ⊕ S24,225,161,97,33 8

6 Searching for Trails with Smaller Spans

In [2], it is said that ciphertext correlations like those presented in previoussections can be exploited only when the same message is encrypted enough times:

“ ... they can be leveraged to mount an attack in the broadcast setting,where the same message is encrypted multiple times with different IVsand potentially different keys [23]. In particular, the broadcast settingappears in practice in man-in-the-browser attacks against HTTPS con-nections following the BEAST model [11]. ”

However, we find that this strong condition can be relaxed. Let us recall Fig. 4,and consider a trail with a 4-block span. If we encrypt a set of n-block (n > 4)

CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS 205

χ1 χ2 χ3 χ4 χ5

0

2

4

6

8

1

3

5

7

Weigh

tof

thecorrelation

Predicted Measured

(a) MORUS-640

χ1 χ2 χ3 χ4 χ5

0

2

4

6

8

1

3

5

7

Predicted Measured

(b) MORUS-1280

Fig. 8: Experimental verification of the trail fragments of MORUS-640 andMORUS-1280

messages sharing a common 4-block prefix M0 ‖M1 ‖M2 ‖M3, then our anal-ysis presented in previous sections is completely irrelevant with those messageblocks beyond this common prefix. In fact, if we encrypt M0 ‖ M1 ‖ M2 ‖ M3

and M0 ‖ M1 ‖ M2 ‖ M3 ‖ · · · ‖ Mn−1 with the same key, nonce, and as-sociated data, the same intermediate values and ciphertexts will be producedwithin the 4-block span. Therefore, we can draw the conclusion that the correla-tions involving k-block ciphertext can be leveraged to mount an attack if enoughmessages with a k-block common prefix are encrypted with different IV ‖ key.

Note that the above condition is strictly weaker than that presented in ASI-ACRYPT 2018 [2], and this setting does occur in practice. For example, whenARP packets are encrypted in WPA2-AES enabled WIFI networks, they share a16-byte common prefix (8-byte LLC header and 8-byte ARP request header) [10].This 16-byte common prefix extends to 22 bytes if the attacker is able to con-trol the following 6-byte MAC address, which is not difficult to carry out [36].Therefore, trails with smaller spans are more preferable, which motivates us tosearch for linear trails with smaller spans. The best trail with respect to thenumber of ciphertext blocks involved (span) we find is a trail of MORUS-640with correlation 2−79, whose span is 3 (see Table 9). However, the correlation istoo low to be used in an attack.

The discussion of this section also indicates that the trails we find are superiorto the ones presented in [2] in terms of both correlation and span. Moreover,since given a trail found in this paper, we can derive another trail with the samecorrelation by rotationally shift the masks within words by a common offset,we can identify the shifting offset minimizing the number of trailing zeros inthe masks of the last block, which may further reduce the size of the commonprefix. For example, by shifting the trail of MORUS-640-1280 shown in Table 6,we obtain a trail shown in Table 10 requiring only 481-bit common prefix whenused in an attack.

206 CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS

Table 9: A linear trail of MORUS-640 with correlation 2−79 whose span is 3

Round Linear masks

0

α0 00002520*4 00002520*4 00002020*4 00002520*4 00000000*40004a400*4 00000000*4 00002020*4 00000000*4 00000000*40004a400*4 00000000*4 00002020*4 00000000*4 00000000*400048420*4 00000000*4 00101000*4 00000020*4 00000020*400048400*4 00000020*4 00101000*4 08000000*4 00000020*4

β0 00048420*4 00000000*4 00101020*4 08000000*4 00040000*4γ0 00002520*4 00002520*4 00002020*4 00002520*4 00000000*4λ0 00002520*4

1

α1 00009420*4 00041000*4 00140020*4 08041000*4 00040000*400128400*4 00040000*4 00140400*4 08048420*4 00040000*400128400*4 00020000*4 00100400*4 08008420*4 00000000*400028000*4 00020000*4 08020000*4 08008020*4 00000000*408028020*4 08028020*4 08020000*4 08020020*4 00000000*4

β1 08028020*4 08028020*4 08020000*4 08020020*4 00000000*4γ1 00041000*4 00041000*4 00041000*4 00041000*4 00000000*4λ1 00041000*4

2γ2 08028020*4 08028020*4 08020000*4 08020020*4 00000000*4λ2 08028020*4

To take it one step further, the positions of the identical message blocksrequired in the attack do not need to be located at the beginning. A commonsuffix works as well as a common prefix, and any four consecutive common blockswork.

7 Conclusion and Open Problems

In this work, we propose a polynomial-time algorithm for computing the cor-relation of a quadratic Boolean function based on its disjoint quadratic form.This method is employed to determine the correlations of the linear trails ofMiniMORUS and MORUS we find by solving MILP problems derived from ageneric helper model for MORUS-like key-stream generators.

As a result, a set of trails involving four blocks of ciphertext with correla-tion 2−38 is identified for all versions of full MORUS, which leads to the firstdistinguishing and message-recovery attacks on MORUS-640-128 and MORUS-1280-128. We also observe that the condition specified in [2] to launch the attackscan be relaxed, and this relaxation shows that our trails are superior to thosepresented in previous work not only in terms of correlation, but also in terms ofthe numbers of ciphertext blocks involved.

At this point, it is natural to ask some open questions. Firstly, is it possibleto compute the correlation of Boolean functions with degrees higher than twoefficiently? We believe that an efficient algorithm solving this problem wouldhave a significant effect for cryptanalysis. Secondly, can we find good trails forMORUS which are not rotationally invariant?

Acknowledgment. The authors thank the anonymous reviewers for manyhelpful comments. The work is supported by the National Key R&D Program

CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS 207

Table 10: A linear trail of MORUS-640 with correlation 2−38

Round Linear masks

0

α0 00004000*4 00004000*4 00000000*4 00004000*4 00000000*400080000*4 00000000*4 00000000*4 00000000*4 00000000*400080000*4 00000000*4 00000000*4 00000000*4 00000000*400080000*4 00000000*4 00000000*4 00000000*4 00000000*400080000*4 00000000*4 00000000*4 00000000*4 00000000*4

β0 00080000*4 00000000*4 00000000*4 00000000*4 00000000*4γ0 00004000*4 00004000*4 00000000*4 00004000*4 00000000*4λ0 00004000*4

1

α1 08002000*4 08082000*4 00000000*4 08082000*4 00000000*400040001*4 00080000*4 00000000*4 00080000*4 00000000*400040001*4 00040000*4 00000000*4 00000000*4 00080000*400040001*4 00040000*4 00000000*4 00000000*4 00080000*400040001*4 00040000*4 00000000*4 00000000*4 00080000*4

β1 000c0001*4 000c0000*4 00080000*4 00000000*4 00000001*4γ1 08082000*4 08082000*4 00000000*4 08082000*4 00000000*4λ1 08082000*4

2

α2 04000000*4 04000001*4 00000000*4 04000000*4 00000001*480000000*4 00000001*4 00000000*4 00000000*4 00000001*480000000*4 80000000*4 00000000*4 00000000*4 00000000*480000000*4 80000000*4 00000000*4 00000000*4 00000000*480000000*4 80000000*4 00000000*4 00000000*4 00000000*4

β2 80000000*4 80000000*4 00000000*4 00000000*4 00000000*4γ2 040c0001*4 040c0001*4 00080000*4 04000000*4 00000000*4λ2 040c0001*4

3γ3 80000000*4 80000000*4 00000000*4 00000000*4 00000000*4λ3 80000000*4

of China (Grant No. 2018YFB0804402), the Chinese Major Program of Na-tional Cryptography Development Foundation (Grant No. MMJJ20180102), theNational Natural Science Foundation of China (61732021, 61802400, 61772519,61802399), and the Youth Innovation Promotion Association of Chinese Academyof Sciences. Chaoyun Li is supported by the Research Council KU Leuven:C16/15/058, OT/13/071, and by European Union’s Horizon 2020 research andinnovation programme (No. H2020-MSCA-ITN-2014-643161 ECRYPT-NET).

References

1. AlFardan, N.J., Bernstein, D.J., Paterson, K.G., Poettering, B., Schuldt, J.C.N.:On the security of RC4 in TLS. In: Proceedings of the 22th USENIX SecuritySymposium, Washington, DC, USA, August 14-16, 2013. pp. 305–320 (2013)

2. Ashur, T., Eichlseder, M., Lauridsen, M.M., Leurent, G., Minaud, B., Rotella, Y.,Sasaki, Y., Viguier, B.: Cryptanalysis of MORUS. In: Advances in Cryptology -ASIACRYPT 2018 - 24th International Conference on the Theory and Applicationof Cryptology and Information Security, Brisbane, QLD, Australia, December 2-6,2018, Proceedings, Part II. pp. 35–64 (2018)

3. Bar-On, A., Dunkelman, O., Keller, N., Weizman, A.: DLCT: A new tool fordifferential-linear cryptanalysis. IACR Cryptology ePrint Archive 2019, 256(2019), https://eprint.iacr.org/2019/256, accepted to EUROCRYPT 2019

4. Bellare, M., Namprempre, C.: Authenticated encryption: Relations among notionsand analysis of the generic composition paradigm. In: Advances in Cryptology -

208 CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS

ASIACRYPT 2000, 6th International Conference on the Theory and Applicationof Cryptology and Information Security, Kyoto, Japan, December 3-7, 2000, Pro-ceedings. pp. 531–545 (2000)

5. Bellare, M., Namprempre, C.: Authenticated encryption: Relations among notionsand analysis of the generic composition paradigm. J. Cryptology 21(4), 469–491(2008)

6. Biryukov, A., Canniere, C.D., Quisquater, M.: On multiple linear approximations.In: Advances in Cryptology - CRYPTO 2004, 24th Annual International Cryptol-ogyConference, Santa Barbara, California, USA, August 15-19, 2004, Proceedings.pp. 1–22 (2004)

7. CAESAR: Call for Submission: http://competitions.cr.yp.to/

8. Carlitz, L.: Gauss sums over finite fields of order 2n. Acta Arithmetica 15(3),247–265 (1969)

9. Dobraunig, C., Eichlseder, M., Mendel, F.: Heuristic tool for linear cryptanaly-sis with applications to CAESAR candidates. In: Advances in Cryptology - ASI-ACRYPT 2015 - 21st International Conference on the Theory and Applicationof Cryptology and Information Security, Auckland, New Zealand, November 29 -December 3, 2015, Proceedings, Part II. pp. 490–509 (2015)

10. Domonkos, T.P., Lueg, L.: Taking a different approach to attack WPA2-AES, orthe born of the CCMP known-plain-text attack (2010), https://www.hwsw.hu/

kepek/hirek/2011/05/wpa2aes_ccmp_known_plaintext.pdf

11. Duong, T., Rizzo, J.: Here come the ⊕ ninjas. Ekoparty (2011)

12. Dwivedi, A.D., Kloucek, M., Morawiecki, P., Nikolic, I., Pieprzyk, J., Wojtowicz,S.: SAT-based cryptanalysis of authenticated ciphers from the CAESAR competi-tion. In: Proceedings of the 14th International Joint Conference on e-Business andTelecommunications (ICETE 2017) - Volume 4: SECRYPT, Madrid, Spain, July24-26, 2017. pp. 237–246 (2017)

13. Dwivedi, A.D., Morawiecki, P., Wojtowicz, S.: Differential and rotational crypt-analysis of round-reduced MORUS. In: Proceedings of the 14th International JointConference on e-Business and Telecommunications (ICETE 2017) - Volume 4: SE-CRYPT, Madrid, Spain, July 24-26, 2017. pp. 275–284 (2017)

14. Early Symmetric Crypto workshop (ESC 2013): https://www.cryptolux.org/

mediawiki-esc2013/index.php/ESC_2013

15. Ehrenfeucht, A., Karpinski, M.: The computational complexity of (XOR, AND)-counting problems. Tech. rep., International Computer Science Institute (1990)

16. Fu, K., Wang, M., Guo, Y., Sun, S., Hu, L.: MILP-based automatic search algo-rithms for differential and linear trails for Speck. In: Fast Software Encryption -23rd International Conference, FSE 2016, Bochum, Germany, March 20-23, 2016,Revised Selected Papers. pp. 268–288 (2016)

17. Jr., B.S.K., Robshaw, M.J.B.: Linear cryptanalysis using multiple approximations.In: Advances in Cryptology - CRYPTO ’94, 14th Annual International CryptologyConference, Santa Barbara, California, USA, August 21-25, 1994, Proceedings. pp.26–39 (1994)

18. Jr., B.S.K., Robshaw, M.J.B.: Linear cryptanalysis using multiple approximationsand FEAL. In: Fast Software Encryption: Second International Workshop. Leuven,Belgium, 14-16 December 1994, Proceedings. pp. 249–264 (1994)

19. Kales, D., Eichlseder, M., Mendel, F.: Note on the robustness of CAESAR candi-dates. IACR Cryptology ePrint Archive 2017, 1137 (2017), http://eprint.iacr.org/2017/1137

CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS 209

20. Langford, S.K., Hellman, M.E.: Differential-linear cryptanalysis. In: Advances inCryptology - CRYPTO ’94, 14th Annual International Cryptology Conference,Santa Barbara, California, USA, August 21-25, 1994, Proceedings. pp. 17–25 (1994)

21. Li, Y., Wang, M.: Cryptanalysis of MORUS. Des. Codes Cryptography (2018),https://doi.org/10.1007/s10623-018-0501-6

22. Mantin, I., Shamir, A.: A practical attack on broadcast RC4. In: Fast SoftwareEncryption, 8th International Workshop, FSE 2001 Yokohama, Japan, April 2-4,2001, Revised Papers. pp. 152–164 (2001)

23. Mantin, I., Shamir, A.: A practical attack on broadcast RC4. In: Fast SoftwareEncryption – FSE 2001. LNCS, vol. 2355, pp. 152–164. Springer (2001)

24. Matsui, M.: Linear cryptanalysis method for DES cipher. In: Helleseth, T. (ed.) Ad-vances in Cryptology – EUROCRYPT 1993. LNCS, vol. 765, pp. 386–397. Springer(1993)

25. Matsui, M.: On correlation between the order of S-boxes and the strength of DES.In: Advances in Cryptology–EUROCRYPT 1994. pp. 366–375. Springer (1995)

26. Mileva, A., Dimitrova, V., Velichkov, V.: Analysis of the authenticated cipherMORUS (v1). In: Cryptography and Information Security in the Balkans - SecondInternational Conference, BalkanCryptSec 2015, Koper, Slovenia, September 3-4,2015, Revised Selected Papers. pp. 45–59 (2015)

27. Minaud, B.: Linear biases in AEGIS keystream. In: Joux, A., Youssef, A.M. (eds.)Selected Areas in Cryptography – SAC 2014. LNCS, vol. 8781, pp. 290–305.Springer (2014)

28. Mirwald, R., Schnorr, C.: The multiplicative complexity of quadratic booleanforms. Theor. Comput. Sci. 102(2), 307–328 (1992)

29. Rogaway, P.: Authenticated-encryption with associated-data. In: Proceedings ofthe 9th ACM Conference on Computer and Communications Security, CCS 2002,Washington, DC, USA, November 18-22, 2002. pp. 98–107 (2002)

30. Rogaway, P.: Nonce-based symmetric encryption. In: Fast Software Encryption,11th International Workshop, FSE 2004, Delhi, India, February 5-7, 2004, RevisedPapers. pp. 348–359 (2004)

31. Rogaway, P., Shrimpton, T.: A provable-security treatment of the key-wrap prob-lem. In: Advances in Cryptology - EUROCRYPT 2006, 25th Annual InternationalConference on the Theory and Applications of Cryptographic Techniques, St. Pe-tersburg, Russia, May 28 - June 1, 2006, Proceedings. pp. 373–390 (2006)

32. Salam, M.I., Simpson, L., Bartlett, H., Dawson, E., Pieprzyk, J., Wong, K.K.:Investigating cube attacks on the authenticated encryption stream cipher MORUS.In: 2017 IEEE Trustcom/BigDataSE/ICESS, Sydney, Australia, August 1-4, 2017.pp. 961–966 (2017)

33. Shi, T., Guan, J., Li, J., Zhang, P.: Improved collision cryptanalysis of authenti-cated cipher MORUS. Artificial Intelligence and Industrial Engineering–AIIE pp.429–432 (2016)

34. Strassen, V.: Gaussian elimination is not optimal. Numerische Mathematik 13(4),354–356 (1969)

35. Sun, S., Hu, L., Wang, P., Qiao, K., Ma, X., Song, L.: Automatic security evalu-ation and (related-key) differential characteristic search: Application to SIMON,PRESENT, LBlock, DES(L) and other bit-oriented block ciphers. In: Advancesin Cryptology - ASIACRYPT 2014 - 20th International Conference on the The-ory and Application of Cryptology and Information Security, Kaoshiung, Taiwan,R.O.C., December 7-11, 2014. Proceedings, Part I. pp. 158–178 (2014)

210 CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS

36. Tews, E., Weinmann, R., Pyshkin, A.: Breaking 104-bit WEP in less than 60seconds. In: Information Security Applications, 8th International Workshop, WISA2007, Jeju Island, Korea, August 27-29, 2007, Revised Selected Papers. pp. 188–202(2007)

37. Todo, Y., Isobe, T., Meier, W., Aoki, K., Zhang, B.: Fast correlation attack revis-ited - cryptanalysis on full Grain-128a, Grain-128, and Grain-v1. In: Advances inCryptology - CRYPTO 2018 - 38th Annual International Cryptology Conference,Santa Barbara, CA, USA, August 19-23, 2018, Proceedings, Part II. pp. 129–159(2018)

38. Vaudenay, S., Vizar, D.: Under pressure: Security of CAESAR candidates beyondtheir guarantees. IACR Cryptology ePrint Archive 2017, 1147 (2017), http://eprint.iacr.org/2017/1147

39. Wu, H., Huang, T.: The authenticated cipher MORUS (v2). Submission to CAE-SAR: Competition for Authenticated Encryption. Security, Applicability, andRobustness (Round 3 and Finalist) (2016), https://competitions.cr.yp.to/

round3/morusv2.pdf

CORRELATION OF QUADRATIC BOOLEAN FUNCTIONS: CRYPTANALYSIS OF MORUS 211

Chapter 10

Improved InterpolationAttacks on CryptographicPrimitives of Low AlgebraicDegree

Publication data

Chaoyun Li and Bart Preneel: Improved Interpolation Attacks on Crypto-graphic Primitives of Low Algebraic Degree. Selected Areas in Cryptography2019: 171-193, 2019

Contributions

Principal author

212

Improved Interpolation Attacks onCryptographic Primitives of Low Algebraic Degree

Chaoyun Li and Bart Preneel

imec-COSIC, Dept. Electrical Engineering (ESAT), KU Leuven, Leuven, [email protected], [email protected]

Abstract. Symmetric cryptographic primitives with low multiplicativecomplexity have been proposed to improve the performance of emergingapplications such as secure Multi-Party Computation. However, primi-tives composed of round functions with low algebraic degree require acareful evaluation to assess their security against algebraic cryptanal-ysis, and in particular interpolation attacks. This paper proposes newlow-memory interpolation attacks on symmetric key primitives of lowdegree. Moreover, we present generic attacks on block ciphers with asimple key schedule; our attacks require either constant memory or con-stant data complexity. The improved attack is applied to the block cipherMiMC which aims to minimize the number of multiplications in large fi-nite fields. As a result, we can break MiMC-129/129 with 38 rounds withtime and data complexity 265.5 and 260.2 respectively and with negligiblememory; this attack invalidates one of the security claims of the design-ers. Our attack indicates that for MiMC-129/129 the full 82 rounds arenecessary even with restrictions on the memory available to the attacker.For variants of MiMC with larger keys, we present new attacks with re-duced complexity. Our results do not affect the security claims of the fullround MiMC.

Keywords: Block cipher, Cryptanalysis, Interpolation attack, MiMC

1 Introduction

Symmetric cryptographic primitives have been widely employed to provide confi-dentiality and authenticity for communicated and stored data [24]. Recently, theyfind new applications in advanced cryptographic protocols for computing on en-crypted data, such as secure Multi-Party Computation (MPC), Zero-Knowledgeproofs (ZK) and Fully Homomorphic Encryption (FHE). The adoption of ded-icated symmetric key primitives turns out to be vital to improve the efficiencyof these protocols. The main design goal is to minimize the multiplicative com-plexity (MC), i.e., minimize the number of multiplications in a circuit and/or tominimize the multiplicative depth of the circuit. However, traditional block ci-phers, stream ciphers and hash functions are typically not designed to minimizethese parameters; to the contrary, having high multiplicative depth is seen as animportant requirement to achieve strong security.

IMPROVED INTERPOLATION ATTACKS 213

Many new symmetric primitives have been proposed in the context of MPC,ZK, or FHE schemes [5, 15, 8, 3]. The block cipher LowMC [5] is one of the earliestdesigns dedicated to FHE and MPC applications. With very small multiplica-tive size and depth, it outperforms AES-128 in computation and communicationcomplexity for these applications. The stream ciphers Kreyvium [8] and FLIP[20] have been designed to minimize the AND-depth of the circuit. Indeed, theyaim to provide practical solutions for efficient homomorphic-ciphertext compres-sion [8, 20]. A new family of stream ciphers Rasta [12] intends to achieve bothminimum AND-depth and minimum number of AND gates per encrypted bit.

MiMC, proposed by Albrecht et al. in 2016 [3, 4], is dedicated to applicationsfor which the total number of field multiplications in the underlying crypto-graphic primitive poses the largest performance bottleneck. More specifically,MiMC aims to minimize multiplications in the larger fields F2n and Fp. Indeed,MiMC outperforms both AES and LowMC in applications such as MPC [15],Succinct Non-interactive Arguments of Knowledge (SNARKs) [7], and ScalableTransparent ARguments of Knowledge (STARKs) [6]. New variants of MiMC,such as GMiMC [2], have been constructed by inserting the original design intogeneralized Feistel structures.

However, the security of MiMC is not well understood. Due to the simplealgebraic structure and the large number of rounds, the security evaluation ofMiMC has been focusing on algebraic attacks such as interpolation attacks andGrobner basis attacks [3, 1]. In the design paper, the authors first consider theclassical interpolation attack. Moreover, the so-called GCD attack has been in-troduced. With this new technique, new lower bounds on the number of roundshave been derived. However, there is a need for further work to assess the secu-rity of round-reduced MiMC and to find tighter lower bounds on the number ofrounds.

Our Contributions. This paper presents novel attacks against primitives withlow algebraic degree. The first new attack is based on an observation from Sun etal. [27]. It introduces novel interpolation attacks with constant memory complex-ity: some key-dependent terms of the interpolated polynomial are determineddirectly, without constructing the complete polynomial. Then we propose an al-gorithm with constant memory for recovering the second highest order coefficientresulting in an efficient key recovery attack.

The second new attack exploits a simple cyclic key schedule. The master keyis k0||k1|| · · · ||k`−1 and the round keys are given by ki = ki mod ` + ci , wherethe ci’s are constants that are chosen independently. For this specific key sched-ule, we present generic attacks with either constant memory or constant datacomplexity. Our attacks follow a guess-and-determine strategy. After guessing(` − 1) subkeys, we apply state-of-the-art key recovery attacks to the reducedcipher. The advantage of our strategy is that we can keep the data and memorycomplexity of the whole attack as low as those of the attack on the reducedcipher. The results of our attacks are summarized in Table 1.

As an illustration, we apply the new attacks to the block cipher MiMC.Specifically, we can break 38-round MiMC-129/129 with time complexity 265.5,

214 IMPROVED INTERPOLATION ATTACKS

data complexity 260.2 and negligible memory. Our results refute the claim ofthe MiMC designers who consider attacks with less than 264 bytes memory andconclude [4, p. 17]: “38 rounds are sufficient to protect MiMC-129/129 againstthe interpolation, the GCD and the other attacks. Time-memory trade-offs mightwell be possible, and we leave this as a topic for future research.” Our attacksimply reduces memory while keeping the time complexity at the same value,hence we show that there is no trade-off. Further, our attack indicates that for

MiMC-n/n over Fq the number of rounds cannot be smaller than d log2(q)log2(3)

e even

if there is a restriction on the memory available to the attacker.

For a two-key version of MiMC-n/n, the best attack described by the design-ers has complexity O(33r). The designers further claimed that the bound can beimproved by a Meet-In-The-Middle (MITM) attack [4, p. 18], but they offer nodetails. By employing our generic attack to the concrete design, the complexitycan be reduced to O(r3r) if r ≤ d n

log2(3)e− 1 and O(r32r−1) if r ≥ d n

log2(3)e. Our

reduced bound is the first tighter bound based on specific attacks.

To the best of our knowledge, our analysis of MiMC is the first third partycryptanalysis of MiMC.

Related Work. MiMC has a very a simple round function Fi(x) := (x+k+ci)3.

This design is inspired by the KN cipher of Nyberg and Knudsen [22] and thePURE cipher of Jakobsen and Knudsen, which is a simplified variant of the KNcipher [16]. The KN cipher is a prototype cipher which is provably secure againstlinear and differential attacks. However, Jakobsen and Knudsen showed that theKN cipher is vulnerable to the higher-order differential attacks [16]. The sameauthors introduced interpolation attacks and applied the new method to assessthe security of PURE [16, 17].

However, neither the higher-order differential attack [19, 18] nor the classicalinterpolation attack is applicable to MiMC. In both attacks, one needs to guessthe last round key which is exactly the master key of MiMC. Thus, one alreadyreaches the complexity of exhaustive key search. By contrast, our low-memoryinterpolation attack does not need to guess any round key; it is the first low-memory attack applicable to round-reduced MiMC.

Interpolation attacks are known to be efficient against primitives with oper-ations over a large finite field. To improve the attack on bit-oriented primitives,Dinur et al. [10] proposed the optimized interpolation attack, which breaks thefirst version of LowMC. The optimized interpolation attacks exploit higher-orderdifferential properties, building on Shimoyama et al. [25]. As pointed out by thedesigners of MiMC, the degree of any state bits rises quickly when the roundfunction is viewed as a vectorial Boolean function. This makes it impossible toobtain higher-order differentials of MiMC after a few rounds. Hence, the opti-mized interpolation attacks on MiMC would be infeasible.

Recently, Rechberger et al. have introduced difference enumeration tech-niques to analyze the full LowMC v2 [23]. In order to counter this atack, anew version was proposed called LowMC v3 [5].

IMPROVED INTERPOLATION ATTACKS 215

Table

1.

Atta

cks

onr-ro

und

key

-altern

atin

gand

Feistel

netw

ork

ciphers

with

round

functio

nof

deg

reed

over

Fq

.F

or`>

1,w

ecy

clically

add`

indep

enden

tsu

bkey

sin

each

round.

Typ

eK

eysize

Tim

eM

emory

Data

Ref.

qO

(rdr)

O(rd

r)dr

+1

[4]

qO

(r2dr)

O(rd

r)3

[4]

Key

-altern

atin

gq

O(rd

r)]O

(1)

dr

+1

Sect.

3.3

q`

O(rd

r)]O

(1)

dr

+1

Sect.

4.1

q`

O(R

KA (r,`)d

RKA(r,`)q`−

1)†]

O(1

)dR

KA(r,`)

+1

Sect.

4.2

q`

O(R

KA (r,`)

2dR

KA(r,`)q`−

1)O

(RKA (r,`)d

RKA(r,`))

3Sect.

4.2

qO

(br2 c

2dbr2 c)

O(br2 cdbr2 c)

3[4

]q

O(rd

r−2)¶

O(1

)dr−

2+

1Sect.

3.3

Feistel

q`

O(rd

r−2)¶

O(1

)dr−

2+

1Sect.

4.1

netw

ork

q`

O(R

FN (r,`)d

RFN(r,`)−

2q`−

1)¶‡O

(1)

dR

FN(r,`)−

2+

1Sect.

4.2

q`

O(bR

FN(r,`)

2c2dbRFN(r,`

)2

cq`−

1)O

(bR

FN(r,`)

2cdbRFN(r,`

)2

c)3

Sect.

4.2

]r≤dlo

gd (q−

1)e

+`−

2¶r≤dlo

gd (q−

1)e

+`

†R

KA (r,`)

=(br+1`c−

1)`

‡R

FN (r,`)

=1

+(br` c−

1)`

216 IMPROVED INTERPOLATION ATTACKS

We conclude the related work by briefly recalling some recent work on thededicated low MC stream ciphers Kreyvium and FLIP. Cube attacks [11] andguess-and-determine attacks are common techniques for the cryptanalysis ofstream ciphers. Cube attacks based on the division property have been intro-duced by Todo et al. [28] and further improved by Wang et al. [29]. They yieldthe current best key recovery attack on round-reduced Kreyvium. A preliminaryversion of the stream cipher FLIP [20] has been broken by guess-and-determineattacks [13]. This has resulted in more conservative parameters of the design.

The remainder of this paper is organized as follows. In Sect. 2, we introduceiterated ciphers and recall some classical polynomial algorithms. In Sect. 3, newlow-memory interpolation attacks are presented. Section 4 proposes attacks onciphers with simple key schedules. Applications of our attacks to MiMC areprovided in Sect. 5. The final section concludes the paper.

2 Preliminaries

In this section, alternating ciphers and Feistel ciphers are presented. We alsorecall some polynomial algorithms, which will be used in the sequel.

Notation. We will use the following notation in the sequel.

– Let Fq be the finite field with q elements, where q is a prime power.

– The symbol “+” stands for addition in the finite field Fq . It can also denoteinteger addition; we trust that the meaning will be clear from the context.

– d is the degree of the round function F (x), where d > 1

– r represents the number of rounds of a block cipher

– κ is the size of key space in bits

– R(d, q) = dlogd(q − 1)e– T/M/D represent time, memory and data complexities of an attack respec-

tively

2.1 Basic Constructions for Block Ciphers

An r-round key-alternating (KA) cipher is constructed by iterating a roundfunction r times where each round consists of a key addition and the applicationof a nonlinear function F . The ciphertext is obtained by adding a final key krto the output of the last round. Let the round function be Fi(x) = F (x + ki).Then the encryption process is given by

Ek(x) = (Fr−1 Fr−2 · · · F0)(x) + kr , (1)

where k is the master key, ki is the i-th round key derived from k by a keyschedule algorithm, and x and Ek(x) are plaintext and ciphertext, respectively.An r-round KA cipher is depicted in Fig. 1.

IMPROVED INTERPOLATION ATTACKS 217

Key Schedule Algorithm

k

x F . . . F y

k0 k1 kr−1 kr

Fig. 1. A key-alternating cipher

An r-round Feistel Network (FN) cipher consists of the r-round repetitionof a round function F and swap:

xLi = xRi−1 , (2)

xRi = F (ki + xRi−1) + xLi−1 , (3)

where x = xL0 ||xR0 is the plaintext, and the ciphertext is xRr ||xLr since the swapoperation is not applied in the last round. One round of an FN cipher is depictedin Fig. 2.

F

kixLi−1 xR

i−1

xLi xR

i

Fig. 2. One round of a Feistel network

In this paper, we always assume that the round function F is a monic poly-nomial of degree d over Fq , i.e.,

F (x) = xd +

d−1∑

i=0

aixi , (4)

where d is a positive integer and ai ∈ Fq .We associate to the parameters q, κ, d, r (cf. supra) a KA cipher KA[q, κ, d, r].

Similarly, we define the FN cipher FN[q, κ, d, r]. Here q is the size of only half ofthe state, i.e., the whole state has size q2. It should be pointed that we ignore thedetails of the polynomials since our attacks work on the generic constructionsregardless of the concrete choice of the components.

218 IMPROVED INTERPOLATION ATTACKS

How to Choose the Polynomial F (x). Since a block cipher must have invertibleround functions, F (x) needs to be a permutation polynomial for KA ciphers.While for FN ciphers, there is no such restriction. It is readily seen that F (x)is the only nonlinear component. For instance, F (x) must have high nonlin-earity and low differential uniformity to provide resistance against differentialand linear attacks [21, 9]. For FHE- and MPC-friendly ciphers, an additionalrequirement is to minimize the number of multiplications in the implementationof F (x). This motivates the choice of F (x) with very low algebraic degree, suchas x3 in MiMC.

The Number of Rounds. Since we focus on ciphers with low degree components,a large number of rounds is needed to protect against algebraic cryptanalysis.The design goal is to achieve the balance between security and performance.Thus we aim to deduce some lower bounds to preclude algebraic attacks.

2.2 Polynomial Algorithms

This paper measures the time complexity of polynomial algorithms in terms offield operations. Without loss of generality, we also assume that the underlyingfinite fields support Fast Fourier Transforms (FFTs). Similarly, the memorycomplexity is estimated with regard to field elements.

Polynomial Interpolation. Assume that f(x) ∈ Fq[x] has degree at most n,where n is a positive integer. Consider (n+ 1) distinct points (x0, y0), (x1, y1),· · · , (xn, yn) where yi = f(xi) and xi ∈ Fq . Then f(x) is uniquely determinedby the following Lagrange interpolation formula

f(x) =

n∑

i=0

yi ·∏

0≤j≤n,j 6=i

x− xjxi − xj

. (5)

It has been shown in [26, 14] that the Lagrange interpolation polynomial can beconstructed with time and memory complexity O(n log(n)).

GCD Algorithms. Given two polynomials of degree n with coefficients fromFq , the straightforward Euclidean Algorithm computes the Greatest CommonDivisor (GCD) with O(n2) field operations. The Fast Euclidean Algorithm com-putes the same GCD in O(M(n) log(n)) field operations, where M(n) is thetime to multiply two n-degree polynomials [14]. In this paper, we take M(n) =O(n log(n)). Hence, the time complexity of the GCD algorithm is O(n log2(n)),which is exactly the estimate used by the MiMC designers [3].

3 Low-Memory Interpolation Attacks

This section presents novel interpolation attacks on primitives with low algebraicdegree. Compared with classical interpolation attacks, our new attacks have verylow memory complexities. Before giving our attacks, we first recall the classicalinterpolation attacks.

IMPROVED INTERPOLATION ATTACKS 219

3.1 Interpolation Attacks

Interpolation attacks were introduced by Jakobsen and Knudsen [16, 17]: oneconsiders the (intermediate) ciphertext as a polynomial of plaintext. With suf-ficiently many plaintext/ciphertext pairs, one can reconstruct this polynomial.Since the polynomial is key-dependent, it is possible to recover some round keysby employing a guess-and-determine strategy.

Assume that a block cipher E has r rounds. First, one finds an upper boundN on the degree of the intermediate ciphertext after (r−1) rounds, denoted withyr−1. Next one guesses the last round key and obtains the corresponding valueof yr−1. With (N + 1) distinct plaintext/ciphertext pairs, one can constructthe polynomial representation of yr−1 by Lagrange interpolation. Afterwards,the key guess can be confirmed with an additional plaintext/ciphertext pair.Specifically, one decrypts the last round and evaluates the polynomial in thecorresponding plaintext. Then the key guess is considered as a valid key candi-date if the decrypted and evaluated values match. Otherwise, the key guess iseliminated and we repeat the process until the correct key is found.

Let L denote the number of all possible last round keys of the cipher E. Thenthe above attack has time complexity O(N log(N)·L), memory complexity O(N)and data complexity N + 2.

The meet-in-the-middle (MITM) approach has also been introduced in[16]. One considers h(x) and g(y) as two polynomials describing the same inter-mediate state, where x and y denote the plaintext and ciphertext respectively,hence h(x) = g(y). If one substitutes the values of x and y, this yields a linearequation in the unknown coefficients of h and g. By collecting a sufficient numberof plaintext/ciphertext pairs, one can solve the linear system to recover thesecoefficients. Then one can mount a key recovery attack with a similar guess-and-determine strategy as in the original interpolation attack. The only differenceis that here we test the key guess by checking if the plaintext/ciphertext pairsatisfies the equation h(x) = g(y).

If both the encryption and decryption round functions have low degree,the MITM attack can cryptanalyze more rounds than the original interpola-tion attack. Let deg(h) = N1 and deg(g) = N2. Then one needs to establishO(N1 +N2 + 2) linear equations with O(N1 +N2 + 2) data and solve these foreach key guess. The time and memory complexities are O((N1 + N2 + 2)2 · L)and O((N1 +N2 + 2)3) respectively, where L is the number of last round keys.

We briefly discuss the impact of the MITM attack on different constructions.For any permutation polynomial g(x) ∈ Fq[x] with deg(g) > 1, let g−1(x) be the(compositional) inverse of g(x), then

g−1(g(x)) ≡ x (mod xq − x) .

Hence we have deg(g) · deg(g−1) ≥ q. Note that we always assume that deg(g)is small, so deg(g−1) can be quite large, i.e., close to q. Thus, for KA[q, κ, d, r],there is no benefit to consider the MITM attacks. However, for Feistel networks,we need to take the MITM attack into account since the inverse of the roundfunction has the same degree as the original one.

220 IMPROVED INTERPOLATION ATTACKS

3.2 Leading Terms of the Output

We present some results on the leading terms, i.e. , terms with the highest andthe second highest degrees, of the output of KA and FN ciphers. These resultswill be used in the sequel.

Proposition 1. Let f(x) = xd+∑d−1i=0 aix

i be the round function of KA[q, κ, d, r],where d is a positive integer with d > 1 and ai ∈ Fq . If r ≤ R(d, q) − 1, thenfor KA[q, κ, d, r], we have that (i) the algebraic degree of the output is dr, and(ii) the leading terms of the output are xd

r

+ (drp · k0 + dr−1p · ad−1)xdr−1, where

d ≡ dp mod p, 0 ≤ dp ≤ p− 1 and p is the characteristic of Fq .

Proof. The claim (i) is a direct corollary of (ii), so it suffices to prove (ii). Wewill show the result by induction on r. If r = 1, then the output is f(x+k0)+k1.By the binomial theorem, the output can be written as

xd + d · k0xd−1 + g1(x) + ad−1xd−1 + g2(x) ,

where deg(g1 + g2) ≤ d− 2. Hence the leading terms are

xd + (dp · k0 + ad−1)xd−1 .

Assume that the claim holds for t− 1, where 1 ≤ t− 1 ≤ R(d, q)− 2. Then theleading terms of the output of round t− 1 are

xdt−1

+ (dt−1p · k0 + dt−2p · ad−1)xdt−1−1 .

Again, by the binomial theorem, the leading terms of the round t output is

(xd

t−1)d

+ d ·(xd

t−1)d−1

(dt−1p · k0 + dt−2p · ad−1)xdt−1−1

= xdt

+(dtp · k0 + dt−1p · ad−1

)xd

t−1 ,

which implies that the claim is true for t. Therefore, the claim holds for anyr ≤ R(d, q)− 1. ut

For FN[q, κ, d, r], we consider plaintexts of the form x||C, where C is a con-stant in Fq . As shown in the following proposition, we can achieve two morerounds compared with KA ciphers.

Proposition 2. Let f(x) = xd+∑d−1i=0 aix

i be the round function of FN[q, κ, d, r],where d is a positive integer with d > 1 and ai ∈ Fq . Consider plaintexts of theform x||C, where C is a constant in Fq . If 3 ≤ r ≤ R(d, q) + 1, then the leading

terms of the right part of the output are xdr−2

+ (dr−2p · (k1 + f(C + k0)) + dr−3p ·ad−1)xd

r−2−1, where d ≡ dp mod p, 0 ≤ dp ≤ p− 1 and p is the characteristic ofFq .

IMPROVED INTERPOLATION ATTACKS 221

Proof. The output of the first round is C||(x + f(C + k0)). This leads to theoutput (x+ f(C + k0))||(f(x+ k1 + f(C + k0)) +C) after the second round andoutput (f(x+k1+f(C+k0))+C)||(x+f(C+k0)+f(f(x+k1+f(C+k0))+C+k2))after the third round. Then similarly to Proposition 1, one can prove that theleading terms of the right part of the output are

xdr−2

+ (dr−2p · (k1 + f(C + k0)) + dr−3p · ad−1)xdr−2−1

when 3 ≤ r ≤ R(d, q) + 1. utRemark 1. Note that the special case p = 2, d odd and P = x||C or C||x hasbeen described by Sun et al. [27].

3.3 New Attacks

One of the bottlenecks of classical interpolation attacks is that the attackeralways needs to store the whole interpolated polynomial. Thus, the memorycomplexity can be very high if the degree of the polynomial is high. Based onthe result in Sect. 3.2, for certain KA and FN ciphers, the key can be deducedfrom the second highest term of the interpolated polynomial. Hence, to recoverthe key, we only need to store the coefficient of the specific term rather than thewhole polynomial. In this way, we can present our new interpolation attack withconstant memory complexity.

Interpolating One Coefficient. Now we present the algorithm for recoveringthe coefficient of the second highest term of the interpolated polynomial.

Assume that g(x) ∈ Fq[x] has degree at most ∆. Also assume that we know(∆+1) points (x0, y0), (x1, y1), · · · , (x∆, y∆), where xi = αi for some primitiveelement α ∈ Fq and yi = g(xi). Then by the Lagrange interpolation formula,g(x) is uniquely determined by the formula

g(x) =

∆∑

i=0

g(αi) ·∏

0≤j≤∆,j 6=i

x− αjαi − αj .

Let g(x) =∑∆i=0 aix

i, then the coefficient of the second highest term is equal to

a∆−1 =

∆∑

i=0

−g(αi) · ∑0≤j≤∆,j 6=i

αj

∏0≤j≤∆,j 6=i

(αi − αj) =

∆∑

i=0

g(αi)βiγi, (6)

where γi =∏

0≤j≤∆,j 6=i(αi − αj) and βi = −∑0≤j≤∆,j 6=i α

j . Note that

γi+1 = γi · α∆ ·αi − α−1αi − α∆ and βi = αi −

0≤j≤∆αj .

By combining these observations, we present the procedure for recovering onlythe coefficient of the second highest term in Algorithm 1.

Proposition 3 describes the complexity of Algorithm 1.

222 IMPROVED INTERPOLATION ATTACKS

Algorithm 1 Recover the coefficient of the second highest term

Input: The algebraic degree ∆ of the polynomial, a primitive element α ∈Fq , and thepolynomial evaluation oracle O

Output: The coefficient t of the second highest term1: t← 02: s← −∑∆

j=0 αj

3: a←∏∆j=1(1− αj)

4: b← 15: for i from 0 to ∆ do6: t← t+O(b) · s+b

a

7: if i < ∆ then8: a← a · α∆ · b−α−1

b−α∆9: b← b · α

10: end if11: end for12: return t

Proposition 3. Algorithm 1 has time complexity O(∆ log(∆)) and memorycomplexity O(1).

Proof. The time complexity of Algorithm 1 is exactly the time complexity ofinterpolating one coefficient in Lagrange interpolation, which is shown to beO(∆ log(∆)) in [26]. Due to the simplicity of the algorithm, one can immediatelyobtain that the memory complexity is O(1). ut

New Attacks on KA. Assume that dp 6= 0 and r ≤ R(d, q) − 1. Then theattack on KA[q, κ, d, r] is described below.

1. Let ∆ = dr. Choose a primitive element α ∈ Fq , and the encryption oracleE as input to Algorithm 1.

2. Run Algorithm 1. Let t be the output.3. By Proposition 1, we have t = drp · k0 + dr−1p · ad−1. Therefore, k0 can be

determined from

k0 =t− dr−1p · ad−1

drp.

In the above attack, we need to query the encryption oracle dr+1 times. Thetime and memory complexity are dominated by Algorithm 1, which is O(rdr) andO(1) respectively according to Proposition 3. In summary, the time/memory/datacomplexities of the attack on KA[q, κ, d, r] are as follows:

T = O(rdr),M = O(1), D = dr + 1 . (7)

New Attacks on FN. Assume that dp 6= 0 and 3 ≤ r ≤ R(d, q) + 1. Forr ≤ R(d, q) + 1, the attack on FN[q, κ, d, r] is shown below:

IMPROVED INTERPOLATION ATTACKS 223

1. Let ∆ = dr−2 and C0 be a constant in Fq . Take a primitive element α ∈Fq , and the FN encryption oracle E as input to Algorithm 1. Note that theinput of E is of the form αi||C0.

2. Run Algorithm 1. Let t be the output.3. By Proposition 1, we have t = dr−2p · (k1 + f(C + k0)) + dr−3p · ad−1.4. Pick two other distinct constants C1 and C2. Repeat Steps 1-3, assume that

the results are t1 and t2 respectively. Now we have the system of equationswith unknowns k0 and k1:

t0 = dr−2p · (k1 + f(C0 + k0)) + dr−3p · ad−1 ,t1 = dr−2p · (k1 + f(C1 + k0)) + dr−3p · ad−1 ,t2 = dr−2p · (k1 + f(C2 + k0)) + dr−3p · ad−1 .

(8)

5. From Eqn. (8), we obtain

f(Ci + k0)− f(Cj + k0)− ti − tjdr−2p

= 0 ,

where 0 ≤ i < j ≤ 2. Then k0 can be determined by computing the GCD ofthe three polynomials of k0. Finally, one can obtain

k1 =t0 − dr−3p · ad−1

dr−2p

− f(C0 + k0) .

Note that f(x) is assumed to have low degree in this paper. Thus, the com-plexity of Step 5 is negligible. Then similarly to the analysis of attacks on KAciphers, the complexity of the above attack on FN[q, κ, d, r] is:

T = O(rdr−2),M = O(1), D = dr−2 + 1 .

Discussion. It is worth pointing out that our new attacks are chosen-plaintextattacks since we need to choose plaintexts of a specific form in Algorithm 1.As classical interpolation attacks are known-plaintext attacks, the low-memoryinterpolation attacks requires a stronger attack model.

Note that Sun et al. in [27] also present a low-memory higher-order integralattack which applies to both KA and FN ciphers. However, their attack needs toknow the values of the interpolated polynomials over all elements in Fq . Thatis, their attack has data complexity q. Under our assumption, we always havedr + 1 ≤ q. Hence, our attack has smaller data complexity than the higher-orderintegral attack in [27].

An interesting research direction is to break the barrier of the assumption,i.e., determine some key-dependent terms even with r > R(d, q) + 1.

4 Attacks on Block Ciphers with Simple Key Schedules

This section proposes attacks on block ciphers with simple key schedules. Thefirst type of attacks are direction applications of the low-memory attack proposed

224 IMPROVED INTERPOLATION ATTACKS

in Sect. 3. The second class of attacks are based on the same strategy and aredivided into two groups in terms of memory and data complexities.

We consider ciphers with key space size κ = q`, i.e., KA[q, q`, d, r] andFN[q, q`, d, r], where 2 ≤ ` ≤ r+1. In this section, we assume a simple key sched-ule. The master key is k0||k1|| · · · ||k`−1 and the i-th round key is ki mod ` + ci,i.e., the i-th round function is

Fi(x) := F (x+ ki mod ` + ci), 0 ≤ i ≤ r ,

where F has the form as in (4) and the ci’s are independently chosen constants.

4.1 Iterative Low-Memory Interpolation Attacks

This section presents attacks on block ciphers with simple key schedules byiteratively applying the low-memory attack proposed in Sect. 3.

Our attack is based on the observation that the low-memory interpolationattack is independent of the key schedule. Hence, one can recover the firstsubkey k0 by the low-memory interpolation attack and then substitute the ob-tained subkey to peel off the first round. By repeating the process, the remainingsubkeys can be determined, as illustrated in Algorithm 2. An interesting prop-erty of the attack is that the complexity of the whole attack is dominated bythat of recovering the first subkey. Actually, the complexities of recovering thei-th subkey ki are given by those of the low-memory interpolation attack on the(r− i)-round reduced cipher. Thus, by the analysis in Sect. 3.3, the i-th subkeyki can be determined with complexities:

Ti = O(rdr−i),Mi = O(1), Di = dr−i + 1, i = 0, 1, · · · , `− 1.

Therefore, the total complexities of the full attack is dominated by those ofrecovering the subkey k0, i.e. ,

T = O(rdr),M = O(1), D = dr + 1 .

Algorithm 2 Attacks on KA[q, q`, d, r] with r ≤ R(d, q) − 1 and FN[q, q`, d, r]with r ≤ R(d, q) + 1

1. Recover k0 by the low-memory interpolation attack in Sect. 3.3.2. Substitute k0 in the cipher. Then repeat Step 1 to recover k1.3. Repeat Step 2 until the master key k0||k1|| · · · ||k`−1 is obtained.

Now we elaborate the above attack for the cipher KA[q, q2, d, r] with a simplekey schedule. Assume that dp 6= 0 and r ≤ R(d, q) − 1. Then the attack isdescribed below.

1. Let ∆ = dr. Choose a primitive element α ∈ Fq , and the encryption oracleE as input to Algorithm 1.

IMPROVED INTERPOLATION ATTACKS 225

2. Run Algorithm 1. Let t be the output.3. By Proposition 1, we have t = drp · k0 + dr−1p · ad−1. Therefore, k0 can be

determined from

k0 =t− dr−1p · ad−1

drp.

4. Substitute k0 in the cipher. Then repeat Steps 1-3 with the minor changethat ∆ = dr−1. Finally, k1 can be recovered.

Similar to the discussion in Sect. 3.3, it is readily seen that the time/memory/datacomplexities of the attack on KA[q, q2, d, r] are as follows:

T = O(rdr),M = O(1), D = dr + 1 .

The following proposition summarize the results in this section.

Proposition 4. Assume that r ≤ R(d, q)−1 for KA[q, q`, d, r] and r ≤ R(d, q)+1 for FN[q, q`, d, r]. Then there exists an attack on KA[q, q`, d, r] with time com-plexity O(rdr), memory complexity O(1), and data complexity dr + 1 while thereexists an attack on FN[q, q`, d, r], with time complexity O(rdr−2), memory com-plexity O(1), and data complexity dr−2 + 1.

4.2 Attacks based on Guess-and-Determine Strategies

This section explores the guess-and-determine strategies in the analysis of blockciphers with simple key schedules. We propose a generic attack and then imple-ment the attack with existing techniques.

Based on a guess-and-determine strategy, a generic attack on KA[q, q`, d, r]and FN[q, q`, d, r] is presented in Algorithm 3. The main idea is that after guess-ing (` − 1) subkeys k0, k1, · · · , k`−2, we can skip the first (` − 1) rounds. If thelast round key is not k`−1, by decrypting with the guessed subkeys we can alsoskip several final rounds until we hit k`−1. As a result, we only need to considerRKA(r, `) and RFN(r, `) rounds for KA[q, q`, d, r] and FN[q, q`, d, r] respectively,where

RKA(r, `) =

(⌊r + 1

`

⌋− 1

)` and RFN(r, `) = 1 +

(⌊r`

⌋− 1)` . (9)

Moreover, the reduced cipher can be regarded as a reduced-round cipher byreplacing some key additions with constant additions. This fact allows us toextend the attack on the single key version to large key versions. The aboveobservation has been summarized in the following.

Proposition 5. Assume that there is an attack on KA[q, q, d, r] or FN[q, q, d, r]with time complexity T (r), memory complexity M(r), and data complexity D(r).Then there exists an attack on KA[q, q`, d, r] or FN[q, q`, d, r] with time com-plexity T (Rλ(r, `))q`−1, memory complexity M(Rλ(r, `)), and data complexityD(Rλ(r, `)), where λ ∈ KA, FN.

We will implement Algorithm 3 with low-memory interpolation and GCDattacks.

226 IMPROVED INTERPOLATION ATTACKS

Algorithm 3 Generic attacks on KA[q, q`, d, r] and FN[q, q`, d, r]

1. Guess subkeys k0, k1, · · · , k`−2.2. Mount a key recovery attack on the reduced cipher with k`−1 the only unknownkey. If it fails to recover the remaining k`−1, then go back to Step 1. Otherwise, oneobtains a candidate k∗`−1.3. Test the candidate master key k0||k1|| · · · ||k∗`−1 with an additional random plain-text/ciphertext pair. If the test is passed, then k0||k1|| · · · ||k∗`−1 is the right key.Otherwise, repeat Steps 1-3 until right keys are found.

Low-Memory Interpolation Attacks Note that the attack in Sect. 3.3 canbe directly applied to the reduced ciphers of KA[q, q`, d, r] and FN[q, q`, d, r].Then by Proposition 5, we have the following result.

Proposition 6. Assume that r ≤ R(d, q) + ` − 2 for KA[q, q`, d, r] and r ≤R(d, q) + ` for FN[q, q`, d, r]. There exists an attack on KA[q, q`, d, r] with timecomplexity O(RKA(r, `)d

RKA(r,`)q`−1), memory complexity O(1), and data complex-ity dRKA(r,`) + 1 and there exists an attack on FN[q, q`, d, r], with time complex-ity O(RFN(r, `)d

RFN(r,`)−2q`−1), memory complexity O(1), and data complexitydRFN(r,`)−2 + 1.

To illustrate the main procedure, we present an attack on KA[q, q2, d, r].Assume that dp 6= 0, r ≤ R(d, q) and r ≡ 0 (mod 2). Then the attack is givenbelow.

1. Guess subkey k0.2. Let ∆ = dr−2. Choose a primitive element α ∈ Fq , and the encryption

oracle E as input to Algorithm 1.3. In Line 6 of Algorithm 1, the oracle returns F−1

(E(F−1(b)+k0+c0)+k0+cr

),

where F−1 is the compositional inverse of F . Thus, Algorithm 1 returns thecoefficient of the second highest term of the polynomial representing the last(r − 2) rounds of the cipher. Run Algorithm 1.

4. Let t be the output of Algorithm 1. By Proposition 1, we have t = dr−2p ·k1 +dr−3p ·ad−1, where the notation is from Proposition 1. Therefore, k∗1 canbe determined from k∗1 = (t− dr−3p · ad−1)/dr−2p .

5. Test the candidate master key k0||k∗1 . If the test is passed, then k0||k∗1 is theright key. Otherwise, repeat Steps 1-5 until the right keys are found.

Step 1 needs q guesses in the worst case; for each guess we execute Steps 2-4, thatcorrespond to a low-memory interpolation attack on an (r − 2)-round reducedcipher. From Eqn. (7) the complexity of the above attack is given by

T = O(rdr−2q),M = O(1), D = dr−2 + 1 .

GCD Attacks The GCD attack on MiMC was introduced by Albrecht et al. [3]:it deduces the key by computing the greatest common divisor of polynomialsfrom known plaintext/ciphertext pairs. The GCD attack enjoys very low data

IMPROVED INTERPOLATION ATTACKS 227

complexity, which makes it appropriate in a low-data scenario. It is straight-forward to plug the attack into the framework of Algorithm 3. We will presentGCD attacks on KA[q, q`, d, r] and FN[q, q`, d, r].

Denote by E(x) the encryption of plaintext x under the key k0||k1|| · · · ||k`−1.The GCD attack on KA[q, q`, d, r] proceeds as follows:

1. Guess subkey k0, k1, · · · , k`−2.

2. Denote by E(k`−1, x) the output of the RKA(r, `)-round reduced cipher withinput x. For any two different plaintext/ciphertext pairs, one can obtain thecorresponding input/output pairs (xi, yi) for i = 0, 1.

3. Compute the univariate polynomial E(K,xi) − yi explicitly for i = 0, 1.It is clear that these polynomials share K − k`−1 as a factor if the keyguess is correct. Indeed, in this case with high probability gcd(E(K,x0) −y0, E(K,x1)− y1) = K − k`−1.

4. Compute gcd(E(K,x0) − y0, E(K,x1) − y1). If the result is 1 or has onlyirreducible factors with degree larger than two, then the key guess is wrongand we go back to Step 1. Otherwise, the constant part of the linear factorsof the result are candidates for k`−1, denoted by k∗`−1.

5. Test the candidate master key k0||k1|| · · · ||k∗`−1. If the test is passed, thenk0||k1|| · · · ||k∗`−1 is the right key. Otherwise, repeat Steps 1-6 until the rightkeys are found.

As we show in Appendix A, Step 3 can be implemented with both timeand memory complexities O(RKA(r, `)d

RKA(r,`)). Note that both of the polyno-mials E(K,x0) − y0 and E(K,x1) − y1 have degree dRKA(r,`). Then, by theestimation in Sect. 2.2, the complexity of computing greatest common divi-sors in Step 4 is O(RKA(r, `)

2dRKA(r,`)). Thus, the time complexity for each sub-key guess is dominated by the computation of the greatest common divisor,i.e. , O(RKA(r, `)

2dRKA(r,`)). Therefore, the total time complexity of the above at-tack is O(RKA(r, `)

2dRKA(r,`)q`−1). Moreover, the memory consumption is aroundO(RKA(r, `)d

RKA(r,`)) since Step 3 dominates the memory complexity. Notably, weonly need three plaintext/ciphertext pairs. To sum up, the time/memory/datacomplexities of the above attack are given by

T = O(RKA(r, `)2dRKA(r,`)q`−1),M = O(RKA(r, `)d

RKA(r,`)), D = 3 .

With the MITM approach, one can mount an attack on FN[q, q`, d, r] withsimilar complexity but double the number of rounds attainable. Now we sketchthe main idea by the attack on FN[q, q, d, r]. First, we construct two polynomialsG(K,x) and H(K, y) representing the state after round dr/2e as a polynomialin the unknown key and the plaintext or ciphertext respectively. Then the keycan be deduced by computing the greatest common divisor of two polynomialsG(K,x0)−H(K, y0) and G(K,x1)−H(K, y1) whose degrees are upper boundedby dbr/2c. Hence, the time/memory/data complexities of the above attack aregiven by

T = O(br/2c2dbr/2c),M = O(br/2cdbr/2c), D = 3 .

228 IMPROVED INTERPOLATION ATTACKS

We can generalize the above attack to FN[q, q`, d, r] with slight modifications.Note that we only need to consider the RFN(r, `)-round cipher after the subkeyguessing. Next, we compute two polynomials G(K,x) and H(K, y) representingthe state after round dRFN(r, `)/2e of the reduced cipher as a polynomial in theunknown k`−1 and the input or output of the reduced cipher respectively. Thenconsider the two polynomials G(K,x0) − H(K, y0) and G(K,x1) − H(K, y1).The remaining steps of the GCD computation and key filtering are the same asin the case KA[q, q`, d, r]. Hence, we omit the details. Similar to the attack onFN[q, q, d, r], we have that the complexity of the attack on FN[q, q`, d, r] equals

T = O(bRFN(r, `)/2c2dbRFN(r,`)/2cq`−1),M = O(bRFN(r, `)/2cdbRFN(r,`)/2c), D = 3 .

In this way, with similar complexities one can double the number of roundsattainable compared with the GCD attack on KA[q, q`, d, r].

We summarize the discussion in the following result.

Proposition 7. There exists an attack on KA[q, q`, d, r] with time complex-ity O(RKA(r, `)

2dRKA(r,`)q`−1), memory complexity O(RKA(r, `)dRKA(r,`)), and data

complexity 3, and there exists an attack on for FN[q, q`, d, r] with time complexity

O(bRFN(r, `)/2c2dbRFN(r,`)/2cq`−1), memory complexity O(bRFN(r, `)/2cdbRFN(r,`)/2c),and data complexity 3.

Remark 2. GCD attacks enjoy very low data complexity while they suffer largememory complexity since one needs to compute and store the two polynomials.Thus, the low-memory interpolation attack and low data GCD attacks are notsuperior to each other.

5 Applications to MiMC

In this section, we apply our new techniques to the block cipher MiMC. Usingour new techniques, we can break a variant of MiMC with memory restrictionon attacks and lower the attack complexity of the larger key versions.

5.1 Description of MiMC

MiMC is a family of block cipher designs operating entirely over the finite fieldFq ; they can be seen as generalizations of the KN-cipher [22] and PURE [16]. Thedesign aims to achieve an efficient implementation over a field Fq — especially thelarge prime field Fp — by minimizing computationally expensive field operations,e.g. multiplications or exponentiations.

MiMC-n/n. Let q be a prime or power of 2 such that gcd(3, q − 1) = 1. For amessage x ∈ Fq and a secret key k ∈ Fq , the encryption process of MiMC-n/n isconstructed by iterating a round function r times. At round i, the round functionis defined as

Fi(x) := (x+ k + ci)3,

IMPROVED INTERPOLATION ATTACKS 229

where the ci’s are random constants in Fq and c0 = cr = 0. Then the encryptionprocess is given by

Ek(x) = (Fr−1 Fr−2 · · · F0)(x) + k .

The number of rounds is given by r = d log2(q)log2(3)

e, which is the minimal number of

rounds needed to thwart the best known attacks, i.e. , interpolation attacks [4].

MiMC-2n/n(Feistel). By employing the same permutation polynomial in FN,one can process larger blocks and have the same circuit for encryption anddecryption. The round function of MiMC-2n/n is defined by

xLi ||xRi ← xRi−1 + (xLi−1 + k + ci)3||xLi−1 ,

where the ci’s are random constants in Fq and c0 = cr = 0. The swap operation is

not applied in the last round. The number of rounds is given by r′ = 2 · d log2(q)log2(3)

e,which is the minimal number of rounds needed to thwart the best known attacks,i.e. , meet-in-the-middle GCD attacks [4].

5.2 Attacks on a Variant with Low Memory Complexity

This section presents an attack on an instantiation of MiMC where the memoryavailable to the attacker is limited. Our results indicate that the number ofrounds proposed by the designers is too optimistic.

In [3], the designers consider the case in which there is a restriction on thememory available to the attacker. In this setting, many memory-consuming at-tacks will be infeasible. According to the designers, this enables the reductionof the number of rounds to gain better performance. To be specific, the authorsclaim that this restriction has a great impact on interpolation attacks and GCDattacks. Indeed, the problem arises if the attacker is not able to store all thecoefficients of the interpolation polynomial and similar for the GCD attack.

For MiMC-129/129, the number of rounds is 82 =⌈

129log2(3)

⌉in the original

design. A much more aggressive version with only 38 rounds is proposed underthe assumption that the attacker is restricted to a memory of 264 bytes.

The Attack. We note that the 38-round MiMC-129/129 fits into the model ofKA[2129, 2129, 3, 38]. Additionally, we have dp = 1 in this case. Then we canadapt the attack on KA[q, κ, d, r] to this concrete cipher. The attack is givenbelow.

1. Let ∆ = 338. Choose a primitive element α ∈ F2129 , and the encryptionoracle 38-round MiMC-129/129 as input to Algorithm 1.

2. Run Algorithm 1. Let t be the output.

3. By Proposition 1, we have k = t since c0 = 0, dp = 1 and ad−1 = 0.

230 IMPROVED INTERPOLATION ATTACKS

Complexity Analysis. In this attack, we need to query the encryption oracle338 + 1 times, i.e, around 260.23. Actually, the time complexity is dominated bythe running time of Algorithm 1, which is around 38 · (338 + 1), i.e., 265.48. Thedata complexity is also 338 + 1, i.e, around 260.23. Finally, as we can see, thememory complexity is negligible.

Our low-memory interpolation attacks have the same time complexity asclassical interpolation attack with negligible memory complexity. This implies

that the number of round cannot be smaller than⌈log2(q)log2(3)

⌉even if there is a

restriction on the memory available to the attacker.

Discussion. It is worth pointing out that neither of the classical interpolationattacks nor higher-order differential attacks work on MiMC-n/n. In both attacks,one needs to guess the last round key which is exactly the master key of MiMC.This leads to an attack with complexity worse than exhaustive key search. Bycontrast, our low-memory interpolation attack does not need to guess any roundkey. Therefore, our attack is the first low-memory attack against MiMC.

5.3 Attacks on Larger Key Versions

This section shows attacks on variants of MiMC with a larger key size. Ourresults indicate that the security margin is less than claimed by the designers.

Instead of adding the same key in each round, a variant of MiMC is proposedwith a key length that is equal to ` times the block length. In this case, wecyclically add ` independent keys. That is, at round i, the round function isdefined as

Fi(x) := (x+ ki mod ` + ci)3 ,

where the ci’s are random constants in Fq and c0 = cr = 0.We note that the MiMC-n/n and MiMC-2n/n with larger key size fit into the

model of KA[q, q`, 3, r] and FN[q, q`, 3, r] respectively. Then by Propositions 4, 6and 7, we have the following results.

Proposition 8. Let RKA(r, `) and RFN(r, `) be given as in Eqn. (9). (i) Assumethat r ≤ dlog3(q − 1)e − 1 for KA[q, q`, 3, r] and r ≤ dlog3(q − 1)e + 1 forFN[q, q`, 3, r]. There exists an attack on r-round MiMC-n/n with key size `nhaving complexity T = O(r3r),M = O(1), D = 3r + 1. While for MiMC-2n/nwith key size `n, there exists an attack with complexity T = O(r3r−2),M =O(1), D = 3r−2 + 1.

(ii) Assume that r ≤ dlog3(q−1)e+ `−2 for KA[q, q`, 3, r] and r ≤ dlog3(q−1)e + ` for FN[q, q`, 3, r]. There exists an attack on r-round MiMC-n/n withkey size `n having complexity T = O(RKA(r, `)3

RKA(r,`)q`−1),M = O(1), D =3RKA(r,`) + 1. While for MiMC-2n/n with key size `n, there exists an attack withcomplexity T = O(RFN(r, `)3

RFN(r,`)−2q`−1), M = O(1), D = 3RFN(r,`)−2 + 1.(iii) There exists an attack on r-round MiMC-n/n having key size `n with

complexity T = O(RKA(r, `)23RKA(r,`)q`−1),M = O(RKA(r, `)3

RKA(r,`)), D = 3.

IMPROVED INTERPOLATION ATTACKS 231

While for r-round MiMC-2n/n having key size `n, there exists attacks with com-plexity T = O(bRFN(r, `)/2c23bRFN(r,`)/2cq`−1),M = O(bRFN(r, `)/2c3bRFN(r,`)/2c),D = 3.

The designers of MiMC-n/n analyze the case ` = 2 [3]. By computing theGrobner basis the time complexity equals O(4 · 33r) while the resultant algo-rithms lead to a complexity of O(34.69r). By Proposition 8 (i), our attacks havetime complexity O(r3r) if r ≤ dlog3(q − 1)e − 1. While for r ≥ dlog3(q − 1)e,by Proposition 8 (ii), our attacks have asymptotic time complexity O(r32r−1).Therefore, our analysis shows a smaller security margin of the MiMC-n/n in-stance with larger key size.

For MiMC-2n/n, by Proposition 8 (i) and (iii), our attacks have time com-

plexity O(r3r−2) if r ≤ dlog3(q−1)e−1 and O(r23b3r−2

2 c−1) if r ≥ dlog3(q−1)e.The MiMC designers claimed that their security boundsO(4·33r) andO(34.69r)

can be improved by an MITM approach [4, p. 18]. However, there were no detailson the claim. Our reduced bound is the first tighter bound which is derived froma specific attack.

5.4 Verification on MiMC over Small Fields

We have verified our attack experimentally. For instance, we have implementedthe low-memory interpolation attack on 10-round MiMC-17/17. As a result, wecan recover the key in 1.3 seconds with Sage.

We have also implemented the GCD attacks on larger key versions. Take` = 2, for finite fields with small size, one can recover the master key in practicaltime with Sage. For example, one can recover the key in less than one hour for7-round MiMC-11/11.

We have also carried out experiments to evaluate the behavior of the GCDvalue obtained after guessing certain round keys when the GCD attack is appliedto the larger key version. Again we take ` = 2. The experiments are performedin fields Fq with q ≤ 217. We take random plaintext/ciphertext pairs to obtainthe distribution of GCD values. Our experiments show the following results:

– When the key guess k0 is correct, we can always obtain the GCD valueK − k1.

– When the key guess k0 is wrong, mostly we get GCD value 1 hence we caneliminate the wrong key guess immediately. With small probability, say lessthan 1%, we can get nontrivial GCD values and even a linear factor K − a.In this case, a is considered as a valid candidate that can be filtered out withan additional test.

The above observations support the settings of the attack described in Sect. 4.2.

6 Concluding Remarks

This paper has shown that the memory requirements for classical interpolationattacks can be reduced substantially, resulting in practical attacks on primitives

232 IMPROVED INTERPOLATION ATTACKS

with low algebraic degrees. For a simple key schedule, we present generic attacksthat have either constant memory or constant data complexity. To illustrateour techniques, we have applied the new attacks to the block cipher MiMC.As a result, we can break a round-reduced version of MiMC with low memorycomplexity and we can reduce the attack complexity of the larger key versions.However, our results do not affect the security claims of the full round MiMC.To the best of our knowledge, our analysis of MiMC is the first third-partycryptanalysis of MiMC.

For future research, it is of interest to assess the security of MiMC withoriginal key size, i.e., a single key addition in all rounds. It remains unclearif the approaches in this paper can be applied to the new proposal GMiMC.Moreover, it is an open problem to analyze the security of the MiMC-basedhash function MiMCHash.

Acknowledgement. The authors thank the anonymous reviewers for manyhelpful comments. The work is supported by the Research Council KU Leuvenunder the grant C16/15/058 and by the European Unions Horizon 2020 researchand innovation programme under grant agreement No. H2020-MSCA-ITN-2014-643161 ECRYPT-NET.

A Algorithm for Computing E(K,xi) − yi

This section describes the algorithm to obtain the explicit expression of E(K,xi)−yi which is used in Step 3 of the GCD attacks in Sect. 4.2. Recall that here Kis the variable and (xi, yi) is an input/output pair corresponding to some plain-text/ciphertext pair.

1. Select dRKA(r,`) + 1 different values α0, · · · , αdRKA(r,`) ∈ Fq .2. Compute βj = E(αj , xi)− yi for i = 0, 1 and 0 ≤ j ≤ dRKA(r,`).3. Interpolate the polynomial gi(x) such that gi(αj) = βj for i = 0, 1 and

0 ≤ j ≤ dRKA(r,`).

First observe that the iterative structure of E(K,xi)− yi enables us to eval-uate E(αj , xi)− yi round by round. In each round one needs to evaluate a poly-nomial with constant degree, which can be done in constant time. Hence, eachβj is obtained with complexity only O(RKA(r, `)) though the degree is dRKA(r,`). Itfollows that the second step has time complexity O(RKA(r, `)d

RKA(r,`)). The thirdstep is a standard polynomial interpolation with complexity O(RKA(r, `)d

RKA(r,`)).Hence, the total time complexity is O(RKA(r, `)d

RKA(r,`)). The memory complexi-ties of the algorithm is O(RKA(r, `)d

RKA(r,`)) due to the polynomial interpolationin the third step [14].

References

1. Albrecht, M.R., Cid, C., Grassi, L., Khovratovich, D., Luftenegger, R., Rechberger,C., Schofnegger, M.: Algebraic cryptanalysis of STARK-friendly designs: Applica-

IMPROVED INTERPOLATION ATTACKS 233

tion to MARVELlous and MiMC. Cryptology ePrint Archive, Report 2019/419(2019), https://eprint.iacr.org/2019/419

2. Albrecht, M.R., Grassi, L., Perrin, L., Ramacher, S., Rechberger, C., Rotaru, D.,Roy, A., Schofnegger, M.: Feistel structures for MPC, and more. Cryptology ePrintArchive, Report 2019/397 (2019), https://eprint.iacr.org/2019/397

3. Albrecht, M.R., Grassi, L., Rechberger, C., Roy, A., Tiessen, T.: MiMC: Efficientencryption and cryptographic hashing with minimal multiplicative complexity. In:Advances in Cryptology - ASIACRYPT 2016 - 22nd International Conference onthe Theory and Application of Cryptology and Information Security, Hanoi, Viet-nam, December 4-8, 2016, Proceedings, Part I. pp. 191–219 (2016)

4. Albrecht, M.R., Grassi, L., Rechberger, C., Roy, A., Tiessen, T.: MiMC: Efficientencryption and cryptographic hashing with minimal multiplicative complexity.Cryptology ePrint Archive, Report 2016/492 (2016), https://eprint.iacr.org/2016/492

5. Albrecht, M.R., Rechberger, C., Schneider, T., Tiessen, T., Zohner, M.: Ciphersfor MPC and FHE. In: Advances in Cryptology - EUROCRYPT 2015 - 34th An-nual International Conference on the Theory and Applications of CryptographicTechniques, Sofia, Bulgaria, April 26-30, 2015, Proceedings, Part I. pp. 430–454(2015)

6. Ben-Sasson, E., Bentov, I., Horesh, Y., Riabzev, M.: Scalable, transparent, andpost-quantum secure computational integrity. Cryptology ePrint Archive, Report2018/046 (2018), https://eprint.iacr.org/2018/046

7. Ben-Sasson, E., Chiesa, A., Genkin, D., Tromer, E., Virza, M.: SNARKs for C:verifying program executions succinctly and in zero knowledge. In: Advances inCryptology - CRYPTO 2013 - 33rd Annual Cryptology Conference, Santa Barbara,CA, USA, August 18-22, 2013. Proceedings, Part II. pp. 90–108 (2013)

8. Canteaut, A., Carpov, S., Fontaine, C., Lepoint, T., Naya-Plasencia, M., Pail-lier, P., Sirdey, R.: Stream ciphers: A practical solution for efficient homomorphic-ciphertext compression. In: Fast Software Encryption - 23rd International Confer-ence, FSE 2016, Bochum, Germany, March 20-23, 2016, Revised Selected Papers.pp. 313–333 (2016)

9. Daemen, J., Rijmen, V.: The Design of Rijndael: AES - The Advanced EncryptionStandard. Information Security and Cryptography, Springer (2002)

10. Dinur, I., Liu, Y., Meier, W., Wang, Q.: Optimized interpolation attacks onLowMC. In: Advances in Cryptology - ASIACRYPT 2015 - 21st InternationalConference on the Theory and Application of Cryptology and Information Secu-rity, Auckland, New Zealand, November 29 - December 3, 2015, Proceedings, PartII. pp. 535–560 (2015)

11. Dinur, I., Shamir, A.: Cube attacks on tweakable black box polynomials. In: Ad-vances in Cryptology - EUROCRYPT 2009, 28th Annual International Conferenceon the Theory and Applications of Cryptographic Techniques, Cologne, Germany,April 26-30, 2009. Proceedings. pp. 278–299 (2009)

12. Dobraunig, C., Eichlseder, M., Grassi, L., Lallemand, V., Leander, G., List, E.,Mendel, F., Rechberger, C.: Rasta: A cipher with low ANDdepth and few ANDsper bit. In: Advances in Cryptology - CRYPTO 2018 - 38th Annual InternationalCryptology Conference, Santa Barbara, CA, USA, August 19-23, 2018, Proceed-ings, Part I. pp. 662–692 (2018)

13. Duval, S., Lallemand, V., Rotella, Y.: Cryptanalysis of the FLIP family of streamciphers. In: Advances in Cryptology - CRYPTO 2016 - 36th Annual InternationalCryptology Conference, Santa Barbara, CA, USA, August 14-18, 2016, Proceed-ings, Part I. pp. 457–475 (2016)

234 IMPROVED INTERPOLATION ATTACKS

14. von zur Gathen, J., Gerhard, J.: Modern Computer Algebra (3. ed.). CambridgeUniversity Press (2013)

15. Grassi, L., Rechberger, C., Rotaru, D., Scholl, P., Smart, N.P.: MPC-friendly sym-metric key primitives. In: Proceedings of the 2016 ACM SIGSAC Conference onComputer and Communications Security, Vienna, Austria, October 24-28, 2016.pp. 430–443 (2016)

16. Jakobsen, T., Knudsen, L.R.: The interpolation attack on block ciphers. In: FastSoftware Encryption, 4th International Workshop, FSE ’97, Haifa, Israel, January20-22, 1997, Proceedings. pp. 28–40 (1997)

17. Jakobsen, T., Knudsen, L.R.: Attacks on block ciphers of low algebraic degree. J.Cryptology 14(3), 197–210 (2001)

18. Knudsen, L.R.: Truncated and higher order differentials. In: Preneel, B. (ed.) FSE1994. LNCS, vol. 1008, pp. 196–211. Springer (1994)

19. Lai, X.: Higher order derivatives and differential cryptanalysis. In: Communicationsand Cryptography. The Springer International Series in Engineering and ComputerScience, vol. 276, pp. 227–233 (1994)

20. Meaux, P., Journault, A., Standaert, F., Carlet, C.: Towards stream ciphers for effi-cient FHE with low-noise ciphertexts. In: Advances in Cryptology - EUROCRYPT2016 - 35th Annual International Conference on the Theory and Applications ofCryptographic Techniques, Vienna, Austria, May 8-12, 2016, Proceedings, Part I.pp. 311–343 (2016)

21. Nyberg, K.: Differentially uniform mappings for cryptography. In: Advances inCryptology - EUROCRYPT ’93, Workshop on the Theory and Application of ofCryptographic Techniques, Lofthus, Norway, May 23-27, 1993, Proceedings. pp.55–64 (1993)

22. Nyberg, K., Knudsen, L.R.: Provable security against a differential attack. J. Cryp-tology 8(1), 27–37 (1995)

23. Rechberger, C., Soleimany, H., Tiessen, T.: Cryptanalysis of low-data instances offull LowMCv2. IACR Transactions on Symmetric Cryptology 2018(3), 163–181(Sep 2018). https://doi.org/10.13154/tosc.v2018.i3.163-181, https://tosc.iacr.org/index.php/ToSC/article/view/7300

24. Shannon, C.E.: Communication theory of secrecy systems. Bell Systems TechnicalJournal 28(4), 656–715 (1949)

25. Shimoyama, T., Moriai, S., Kaneko, T.: Improving the higher order differentialattack and cryptanalysis of the KN cipher. In: Information Security, First Inter-national Workshop, ISW ’97, Tatsunokuchi, Japan, September 17-19, 1997, Pro-ceedings. pp. 32–42 (1997)

26. Stoß, H.: The complexity of evaluating interpolation polynomials. Theor. Comput.Sci. 41, 319–323 (1985)

27. Sun, B., Qu, L., Li, C.: New cryptanalysis of block ciphers with low algebraic de-gree. In: Fast Software Encryption, 16th International Workshop, FSE 2009, Leu-ven, Belgium, February 22-25, 2009, Revised Selected Papers. pp. 180–192 (2009)

28. Todo, Y., Isobe, T., Hao, Y., Meier, W.: Cube attacks on non-blackbox polynomialsbased on division property. In: Advances in Cryptology - CRYPTO 2017 - 37thAnnual International Cryptology Conference, Santa Barbara, CA, USA, August20-24, 2017, Proceedings, Part III. pp. 250–279 (2017)

29. Wang, Q., Hao, Y., Todo, Y., Li, C., Isobe, T., Meier, W.: Improved divisionproperty based cube attacks exploiting algebraic properties of superpoly. In: Ad-vances in Cryptology - CRYPTO 2018 - 38th Annual International CryptologyConference, Santa Barbara, CA, USA, August 19-23, 2018, Proceedings, Part I.pp. 275–305 (2018)

IMPROVED INTERPOLATION ATTACKS 235

Part IV

Protocol

236

Chapter 11

Towards Truly PracticalIntrusion Detection Systemover Encrypted Traffic

Publication data

Sébastien Canard and Chaoyun Li: Towards Truly Practical IntrusionDetection System over Encrypted Traffic. Manuscript, 2019

Contributions

Principal author. The introduction and implementation are due to SébastienCanard.

237

Towards Truly Practical Intrusion DetectionSystem over Encrypted Traffic

Sebastien Canard1 and Chaoyun Li2?

1 Orange Labs, France2 imec-COSIC, Dept. Electrical Engineering (ESAT), KU Leuven, Belgium

[email protected],[email protected]

Abstract. Privacy and data confidentiality are today at the heart ofmany discussions. But such data protection should not be done at thedetriment of other security aspects. In the context of network traffic,intrusion detection system becomes in particular totally blind when thetraffic is encrypted, making clients again vulnerable to known threatsand attacks. Reconciling security and privacy is then one of the majortopics for which we should find relevant and scalable solutions that canbe deployed as soon as possible. In this context, BlindBox and BlindIDSare two recent proposals that permit to perform Deep Packet Inspectionover an encrypted traffic, based on two different elegant cryptographictechniques. But, on one side, even if BlindBox is quite efficient to detectan anomalous encrypted traffic, it necessitates a very high setup time forclients and servers and does not protect the know-how of Security Edi-tors (SEs) working on detection rules. On the other side, BlindIDS doesprotect SE’s market and does not introduce any latency during setuptime, but is definitely not enough efficient for a practical use. In thispaper, we show that the design of a fully efficient and market-compliantintrusion detection system over an encrypted traffic is possible. Our sys-tem is based on the use of only symmetric cryptography, and permits toencrypt a packet of 1500 bytes in about 6µs and to test such packet with3000 rules in less than 2µs.

Keywords: Intrusion detection; deep packet inspection; symmetric key cryp-tography; security; privacy

1 Introduction

Once restricted to the most sensitive web traffic, such as online payment, en-cryption is now widely used. Each year, the share of encrypted web sessionsis growing, reaching 68 percent of overall sessions in 2017 [21]. The surge ofencrypting traffic gives cybercriminals an avenue to hide in plain sight. Encryp-tion turns malicious traffic indistinguishable from non-harmful one, preventing,e.g., intrusion detection based on deep packet inspection (DPI). Since 2015, Dell

? Corresponding author

238 TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC

security report [10] has warned companies and individuals alike on the dangerof non-inspected internet traffic. In its 2016 report [11], the number of affectedusers by under-the-radar attacks is estimated to 900 millions. The necessity of aresponsible encrypted traffic inspection is again stressed in the 2018 report [12].To alleviate the consequences of such attacks, security editors propose to useproxies that establish a secure connection with the web server on behalf of theclient. The proxy is then in a position of decrypting and inspecting the wholetraffic. However, this approach raises problems of confidentiality, security andprivacy.

1.1 Main Existing Approaches

In this context, the use of encryption techniques that allow a third party, theproxy, to inspect the encrypted traffic without having to decrypt the contents, isan approach that cannot be overlooked. BlindBox [20] is one of the first to havetaken this step towards reconciling privacy and security. It relies on multi-partycomputations techniques, and in particular, garbled circuits, in order to enablea middlebox appliance to search for malicious patterns directly on the encryptedcontents. As highlighted in [7], this seminal paper suffers nevertheless from twoshortcomings.

First, the set of rules to be searched has to be encrypted for each pair ofsender/receiver, as the encryption key is derived from the session key. This in-duces a high setup time for every internet connection, incompatible with onlinedetection. Moreover, the set of patterns has to be encrypted under a differentkey for each parallel connection, which results in a huge memory consumption.Second, in order to perform the patterns’ encryption for each session, the in-specting proxy has to access the patterns in clear. Those patterns are generatedby a security editor, and are the core of its added value. Hence, in a competitivemarket, it is unlikely that the editors will be willing to disclose those patternsto all possible proxies.

Those shortcomings were addressed with the BlindIDS solution [7]. In thisproposal, the authors leverage decryptable searchable encryption (DSE), a cryp-tographic primitive where decryption keys are independent from trapdoors keys,used to perform search. This independence allows security editors to encrypttheir patterns once and for all. These encrypted session can be used for everyparallel session, thus notably reducing both setup time and memory consump-tion. However, the search operation uses pairings, whose performances are notsuitable for online inspection.

We prove in this paper that this is feasible to obtain the best of both solutions:efficient setup, low memory consumption, rules’ confidentiality against proxiesand efficiency of the whole protocol among a sender, a receiver and a proxy.

1.2 Our Contributions

In this paper, we propose the first encrypted traffic inspection system that per-mits to obtain all the desired properties. Our system reaches the following inter-

TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC 239

esting properties: (i) no public key encryption is involved (except potentially forderiving the session key); (ii) unauthorized actors cannot learn any informationabout the traffic other than it is malicious or safe (traffic indistinguishability);(iii) detection patterns are encrypted by the security editor and never accessibleby other parties (rule indistinguishability); (iv) encrypted patterns are valid forall connections and (v) pattern matching is almost as simple as an equality test.

Solution Overview. Our first idea to obtain a truly efficient system is usingsymmetric encryption techniques. Then, all steps, from the encryption/decryptionpart to the rule generation and the detection algorithm, are based on symmetriccryptography methods.

The first step then consists in having a secret key s shared between the Se-curity Editor SE and the senders/receivers S/R. Such key is used by SE, for eachrule ri, to compute the corresponding blinded rules Bi during the generationof the rules procedure (called RuleGen). A sender also makes use of this key tocompute a blinded version pj of each token tj of the traffic. Using a determin-istic algorithm, the detection procedure (named Detect) is then a simple matchbetween the Bi’s and the pj ’s. It is obvious that the Service Provider SP shouldnot know the key s so as not to break the traffic indistinguishability property.Moreover, as the receiver has a priori to come back from pj to the token, wecan here use a pseudorandom permutation (PRP) F . Finally, one problem thatmay occur is that a sender/receiver having access to the set of blinded rules canperform alone (that is without requesting the Service Provider) a brute forceattack to break the rule indistinguishability property. We can here easily assumethat SE can make use of a specific channel providing confidentiality, authenticityand integrity to send blinded rules to SP.

actor: action inputs actionsSE: RuleGen rules ri, key s Bi = F (s, ri)

S: Send tokens tj , key s pj = F (s, tj)R: Receive traffic pj , key s tj = F−1(s, pj)SP: Detect rules Bi, traffic pj Bi = pj?

The main problem with the above tentative is that the matching betweenone Bi and one pj is done by SP during the Detect procedure, so that the latterneeds to obtain both. Regarding the traffic and the blinded token pj , it cannotbe sent as it is since it can be used by SE to break the traffic indistinguishabilityproperty. We then make use of an encapsulation technique where each pj isencrypted using a key K shared by SP on one side, and both S and R on the otherside. This time, as we need decryption (at least by SP for efficiency reasons), weuse a pseudorandom function (PRF) G in counter mode.

240 TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC

actor: action inputs actionsSE: RuleGen rules ri, key s Bi = F (s, ri)

S: Send tokens tj , pj = F (s, tj)keys (s,K) qj = G(K, j)⊕ pj

R: Receive traffic qj , keys (s,K) pj = qj ⊕G(K, j)tj = F−1(s, pj)

SP: Detect rules Bi, key K pj = G(K, j)⊕ qjtraffic qj Bi = pj?

With such system, there is still the problem that the resulting encryptedtraffic is deterministically computed, such that the same token pj results in thesame ciphertext qj . Using a true random value to compute the pj ’s poses theproblem that the matching will be possible only if both parties (SE and S) use thesame randomness. To perform that, we introduce a counter which is used withthe PRP F . Let C be the maximum number of occurrences of distinct tokensin the traffic. During RuleGen, SE generates C blinded tokens for each rule ri.During Send, the sender chooses at random a counter c ∈ [0, C − 1] to computethe blinded tokens (the first token uses c, which one is then incremented eachtime a new blinded token is computed). In this way, the frequencies of the tokensare hidden. Furthermore, we add a true random salt to G such that the qj ’s areindistinguishable from random values for any adversary without access to K.

actor: action inputs actionsSE: RuleGen rules ri, key s Bi,k = F (s, ri‖ck)

S: Send tokens tj , keys (s,K), pj = F (s, tj‖c)counter c, random salt qj = G(K, salt + j)⊕ pj

R: Receive traffic qj , keys (s,K), pj = qj ⊕G(K, salt + j)counter c, salt tj‖c = F−1(s, pj)

SP: Detect rules Bi, key K, pj = G(K, salt + j)⊕ qjtraffic qj , salt Bi = pj?

One problem with the above description is that F being a PRP, a fradulentsender or receiver can inverse the Bi,k’s to break the rule indistinguishabilityproperty. The solution is to replace F by a non-reversible pseudorandom func-tion. But the receiver is no more able to decrypt the received traffic, so thatwe need to send the traffic twice: one time using the technique given above (re-placing the PRP by a PRF, and used to detect unsafe traffic) and the other oneusing a classical TLS channel and a shared key k (used by the receiver to decryptthe traffic). If the traffic is safe, this moreover permits the receiver to go fasterby just decrypting the TLS part. But this also permits the sender to send twodifferent things: one safe fake traffic that is tested by SP and one true corruptedone that will be executed by the receiver. Without any additional trick, this canbe detected by the receiver that has all the material to check the difference. Butthis is done at the detriment of the efficiency. Our idea is to ask SP to hash theencrypted tokens pj it has obtained during the Detect phase and a similar hashis also computed by the receiver, permitting him to detect such fraud by thesender by a simple equality test.

TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC 241

actor: action inputs actionsSE: RuleGen rules ri, key s Bi,k = F (s, ri‖ck)

S: Send tokens tj , keys (s,K, k), e = TLS(k, tjj), pj = F (s, tj‖c)counter c, random salt qj = G(K, salt + j)⊕ pj

R: Receive traffic e, keys (s,K, k), tjj = TLS−1(k, e), pj = F (s, tj‖c)counter c, salt, hash hj hj = H(pj)?

SP: Detect rules Bi, key K, pj = G(K, salt + j)⊕ qj , Bi = pj?traffic qj , salt hj = H(pj)

There is still one important drawback in the current version of the proto-col: the Security Editor has to compute a number of encrypted rules which isproportional to the number of couples S/R times the constant C. Our idea toremove the proportionality to the number of couples S/R is to make use of abroadcast encryption scheme BE (see Section 3.1) for the group of all sendersand receivers (such group is denoted I in the sequel). Then, SE knows the mas-ter secret key mk to manage the group, and each sender (resp. receiver) owns amembership key denoted skn (resp. skn) which permits him to retrieve the keys using some header Hdr output by SE during (broadcast) encryption. This way,SE can generate the same encrypted rules for several distinct users.

actor: action inputs actionsSE: RuleGen rules ri, (s,Hdr) = BE.Enc(mk, I)

master key mk Bi,k = F (s, ri‖ck)S: Send tokens tj , keys (s,K, k), s = BE.Dec(skn,Hdr)

counter c, random salt e = TLS(k, tjj), pj = F (s, tj‖c)qj = G(K, salt + j)⊕ pj

R: Receive traffic e, s = BE.Dec(skn,Hdr),keys (skn,K, k), tjj = TLS−1(k, e),

counter c, pj = F (s, tj‖c),salt, hash hj hj = H(pj)?

SP: Detect rules Bi, key K, pj = G(K, salt + j)⊕ qj , Bi = pj?traffic qj , salt hj = H(pj)

Implementation. We implemented our system and provide a thorough eval-uation over popular web pages, comparing performances with BlindBox andBlindIDS, showing that our work is a significant step for a real-life deploymentof privacy-preserving intrusion detection systems. More precisely, the encryptionof a packet of 1500 bytes is done in about 6µs (compare to 90µs for BlindBoxand 27ms for BlindIDS). The detection phase for 3000 rules and one packetnecessitates less than 2µs (compare to 33µs for BlindBox and 74s for BlindIDS).

Organization. The paper is now organized as follows. In the next section,we first recall and modify a little the security model proposed in [7] for anideal intrusion detection system over an encrypted traffic. We then give thedetails of our new protocol in Section 3 and describe our implementation and

242 TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC

experimental results in Section 4. The related work is detailed in Section 5,before the conclusion.

2 System Architecture and Security

2.1 Actors and Architecture

We consider the following actors in the DPI system (following [7] and [20]):

– the Security Editor, denoted SE, who is responsible for generating and main-taining a list of malwares’ signatures;

– the Service Provider, denoted SP, who searches intrusions in the traffic, usingthe rules provided by the SE;

– a sender, denoted S, who sends messages over the Internet. The set of allsenders is denoted S.

– a receiver, denoted R, who receives the messages. The set of all receivers isdenoted R.

The SE role is performed by organizations such as McAfee, Symantec andKaspersky. The detection signatures are the main assets for the SE. The SP,namely, the middlebox in [20], provides both physical and cloud-based servicessuch as proxies. In most deployments, the SE or the middlebox can read theplain traffic between S and R. This paper aims to propose IDS using DPI overencrypted traffic which is as good as that over the plain traffic.

2.2 Main Procedures

We here just give some minor modification of the initial model proposed in [7].An intrusion detection system over an encrypted traffic, denoted ∆ is composedof the following procedures. The main difference is that we more clearly de-fine the different actors’ cryptographic keys, and put them, by default, in thecorresponding procedures.

– Setup, on input the security parameter λ, generates the public parametersparam of the system, and the keys of the different actors, that is skSE for Se-curity Editor, skSP for Service Provider, skS for Sender, and skR for Receiver.

– RuleGen, on input the parameters param, the SE secret key skSE and a setM of rules to detect a malicious traffic, outputs a set B of blinded rules thatare then sent to SP.

– Send takes as input the public parameters param, the secret key skS of thesender, and a traffic T = tjj , where each tj is a token of fixed size. Itoutputs an encrypted traffic E for a receiver R.

– Detect, on input param, the Service Provider private key skSP, an encryptedtraffic E and the set B of blinded rules from SE, outputs a bit b ∈ 0, 1,stating that the underlying traffic T is malicious (b = 0) or safe (b = 1).It may also return some auxiliary information aux, such as for example theblinded rule that matched or some additional information for the receiver.If something goes wrong, it outputs an error message ⊥.

TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC 243

– Receive is executed by taking on input the parameters param, the receiver’ssecret key skR, an encrypted traffic E and optionally some additional aux-iliary information aux coming from SP. It outputs a plain traffic T , or anerror message ⊥.

Correctness Let param be the output of the Setup procedure. Let B be a setof blinded rules, as B = RuleGen(param, skSE,M) whereM is a set of rules, andlet E = Send(param, skS, T ) where T is a traffic. An intrusion detection systemover an encrypted traffic is said correct iff

T = Receive(param, skR, E, aux), and

Detect(T,M) = Detect(param, skSP, E,B) (incl. aux).

2.3 Security Requirements

We now define the expected security for such system. Informally speaking, weconsider that the Service Provider is honest-but-curious since it applies the DPIhonestly but can try to obtain information about either the users’ traffic or theSE’s rules. We also use the honest-but-curious paradigm for the Security Editoras all the rules are considered as true and authentic malicious patterns. Butsimilarly to the SP, the SE may try to acquire information about the clear-textcontent of the traffic. We however do not consider the case where the SP and theSE collude, as they can in this case easily mount a dictionary attack. Finally,we also do not consider a coalition between a sender and a receiver since, as ina non encrypted traffic, they can easily agree on any shared secret key and anyencryption algorithm to add an overlayer of encryption so that the detectionbecomes infeasible. We now go into more formal details.

As shown in [7], there are mainly three security properties that should beverified by such a system: detection, traffic indistinguishability and rule indis-tinguishability. We here modify a little these properties, so as they better suitreal needs. We think that these modifications can lead to better secure schemesin the future. The modification has already been sketched in the introduction,and will be detailed for each security property below. All security experimentsare given in Figure 1. We consider, for each of them, that the Setup has alreadybeen executed as:

(param, skSE, skSP, skS, skR)← Setup(1λ) .

Detection. The detection property informally states that any malicious trafficmust be detected by the Service Provider. This is close to the above correctnessproperty, but considering that either the sender or the receiver tries to cheat.The related security experiment is given in Figure 1. On input the parame-ters, A outputs an encrypted traffic E such that it is stated as safe (that is,Detect(param, skSP, E,B) = 1) while the decrypted version T is malicious (thatis, Detect(T,M) = 0).

244 TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC

Expdet∆,A(λ)

B ← RuleGen(param, skSE,M);E ← A(param, skS, skR);if Detect(param, skSP, E,B) = 1, return 0;T ← Receive(param, skR, E);if Detect(T,M) = 0, return 0;return 1.

Expsp−tr−ind∆,A (λ)

b←$ 0, 1;(T0, T1, auxA)← ASend,RuleGen(skSP, param);if type(T0, T1) = 0, return 0;Eb ← Send(param, Tb);b′ ← ASend,RuleGen(Eb, auxA);return (b = b′).

Expse−tr−ind∆,A (λ)

b←$ 0, 1;(T0, T1, auxA)← ASend,RuleGen(skSE, param);if type(T0, T1) = 0, return 0;Eb ← Send(param, Tb);b′ ← ASend,RuleGen(Eb, auxA);return (b = b′).

Exphme−rul−ind∆,A (λ)

b←$ 0, 1;(M0,M1)← Af (param, skSP, skS, skR);Bb ← RuleGen(param, skSE,Mb);b′ ← Ag(Bb);return (b = b′).

Expsp−rul−ind∆,A (λ)

b←$ 0, 1;(M0,M1, auxA)← ASend(param, skSP);Bb ← RuleGen(param, skSE,Mb);b′ ← ASend(Bb, auxA);return (b = b′).

Fig. 1. Security Experiments

Definition 1 (Detection). An intrusion detection system over encrypted traf-fic ∆ is said detectable if for any probabilistic polynomial-time A, there existsa negligible function ν(λ) such that:

Succdet∆,A(λ) = Pr[Expdet∆,A = 1

]≤ ν(λ).

As explained in [7], we do not consider the case where sender and receiverare both dishonest and collude. No TLS inspection system can treat this case asthe sender and the receiver may agree on some secret coding or encryption inorder to hide malicious traffic, in an undetectable way.

Traffic indistinguishability. The traffic indistinguishability property infor-mally states that it is not feasible for non-authorized actors to learn any infor-mation about the traffic, other than it is malicious or safe. Particularly, the SPis assumed to not learn any information of the traffic other than the match oftraffic and rules.

In fact, compare to [7], we consider that the Service Provider SP manages itsown private key skSP such that its role necessitates some knowledge that are notavailable to other actors. We thus consider two different traffic indistinguisha-bility experiments, depending on the knowledge of the adversary: the secret key

TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC 245

skSE of the Security Editor or the one skSP of the Service Provider. It is obvi-ous that having access to both keys easily break the traffic indistinguishabilityproperty.

In both cases, we deal with the problem that the adversary may choose, in theindistinguishability experiment, one malicious traffic and one safe traffic so thatit will be easy for him to distinguish which one is used by the challenger, usingthe Detect algorithm or some auxiliary information. We then reuse the notion oftype [7]: two traffics T0 and T1 are of the same type, denoted type(T0, T1) = 1, iff

Detect(param, T0,M) = Detect(param, T1,M),

including the auxiliary information aux, and where M is a set of rules.More formally, we give in Figure 1 two different experiments, Expsp−tr−ind∆,A (λ)

and Expse−tr−ind∆,A (λ), for an adversary A having access to both a Send oracle(given a plain traffic T of its choice, A obtains the related encrypted traffic E)and the RuleGen oracle (given a set of rules M of its choice, the adversary getsback B ← RuleGen(param, skSE,M)). To emphasis the adversary’s power, it isdenoted as ASend,RuleGen in the two experiments. The adversary first chooses twotraffics T0 and T1 and, if they have the same type, one of them, Tb is encryptedand given to A. Eventually, A has to guess the bit b.

We moreover have more restrictions on the adversary A in Expsp−tr−ind∆,A (λ).We assume that A does not query the Send oracle with traffic containing tokensin T0 or T1; otherwise, the traffic indistinguishability is trivially broken since theSend encrypts traffics deterministically up to a counter which is not exponentiallylarge. Also, A does not chooses tokens in T0 or T1 as rules to query RuleGen;otherwise, the detection functionality allows A to trivially distinguish the T0and T1 by the pattern of matching. This is a common restriction for searchableencryption security definitions (e.g., see the MBSE security in [20]).

Definition 2 (Traffic indistinguishability). An intrusion detection systemover encrypted traffic ∆ is said traffic-indistinguishable if for any probabilisticpolynomial-time A, there exists a negligible function ν(λ) such that both

Advsp−tr−ind∆,A (λ) =∣∣∣2 · Pr

[Expsp−tr−ind∆,A = 1

]− 1∣∣∣ ≤ ν(λ), and

Advse−tr−ind∆,A (λ) =∣∣∣2 · Pr

[Expse−tr−ind∆,A = 1

]− 1∣∣∣ ≤ ν(λ).

Rule indistinguishability. The rule indistinguishability property states thatit is not feasible to learn any information about the rules. In fact, contrary to [7],we consider two different kinds of rule indistinguishability.

High-min entropy rule indistinguishability. We remark that if the adversary is aSender or a Receiver, then it can create any valid traffic of its choice, and makeuse of the encrypted rules to test them and learn some information. In this case,

246 TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC

we make use of the high min-entropy property, saying that [4] a probabilisticadversary A = (Af ,Ag) has min-entropy µ if

∀λ ∈ N ∀r ∈M : Pr[r′ ← Af (1λ, b) : r′ = r

]≤ 2−µ(λ) .

A is said to have high min-entropy if it has min-entropy µ with µ(λ) ∈ ω(log λ).This restriction may limit the number of rules we can manage since for some

of them, part of the information can be publicly known, as for example “bad”domain names for URL blacklists (see also Section 4.2).

Service Provider rule indistinguishability. But if the adversary has no access tosuch secrets (that is skS, skR), then it may want to obtain some informationabout the underlying rules, but has no more restrictions on the entropy. Weargue that such an adversary may be complementary to the previous one, andthat an intrusion detection system over encrypted traffic should be resistant toboth kinds of attacks.

Both experiments are given in Figure 1, for (i) an adversary A = (Af ,Ag)being able to create any traffic and with high min-entropy (see e.g., [4] for details)

for Exphme−rul−ind∆,A (λ), and (ii) a standard adversary A for Expsp−rul−ind∆,A (λ),

that has access to a Send oracle, denoted by ASend, giving on output an encryptedtraffic from a plain payload. The adversary A (Af ) chooses two sets of rulesM0

and M1, and one of them is used in the RuleGen procedure. The output Bb isthen given to A (Ag), that eventually outputs the bit b.

Notice that the relation between Send and RuleGen is symmetric if we com-pare Expsp−rul−ind∆,A (λ) to Expsp−tr−ind∆,A (λ). Thus, similar to the Expsp−tr−ind∆,A (λ),

we assume that the adversary in Expsp−rul−ind∆,A (λ) does not query RuleGen withany rules in M0 or M1 and not query Send with traffic containing rules in M0

or M1.

Definition 3 (Rule indistinguishability). An intrusion detection system overencrypted traffic ∆ is said rule-indistinguishable if for any probabilistic polynomial-time A = (Af ,Ag) having high min-entropy, there exists a negligible functionν(λ) such that both

Advhme−rul−ind∆,A (λ) =∣∣∣2 · Pr

[Exphme−rul−ind∆,A = 1

]− 1∣∣∣ ≤ ν(λ) and

Advsp−rul−ind∆,A (λ) =∣∣∣2 · Pr

[Expsp−rul−ind∆,A = 1

]− 1∣∣∣ ≤ ν(λ).

2.4 Detection Range

IDS’s rules are typically divided into two parts. The first part is patterns, that areto be matched exactly to the traffic content. The second part is regular expres-sions that are evaluated using the content as input. It is expected from middle-box appliances to provide the same quality of service over encrypted or clear-texttraffic. However, intrusion detection system over encrypted data mostly focus on

TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC 247

pattern matching. Evaluating regular expression over encrypted data is a deli-cate task, as even fully homomorphic encryption is not able to handle the wholerange of possible expressions. BlindBox proposed a workaround, where the de-cryption key is accessible in case of a match on the patterns. The appliance thendecrypts the content and evaluates the regular expression on clear-text. Thisdoes not reduce the false negative rate, that poses the higher threat, but onlythe false positive rate, and at the cost of the privacy.

3 Details of Our Protocol

In this section, we first give the main cryptographic building blocks we need forour construction, before giving the details of the latter.

3.1 Cryptographic Building Blocks

At first, we present the main cryptographic building blocks we will need. We alsogive the related security requirements that will be useful in our security proofs.Let λ be a security parameter.

ExpprfF,A(λ)

b←$ 0, 1;K ← 0, 1s;For i = 1, 2, · · · , q do(Mi, auxA)← A(1λ);if b = 0, then Ri ← U(1n);if b = 1, then Ri = F (K,Mi);b′ ← A(R1, · · · , Rq, auxA);return (b = b′).

Expow−prfF,A (λ)

K ← 0, 1s;M ← 0, 1`;R← F (K,M);

M ← A(K,R);

return (M = M).

Expbe−indBE,A (λ)

b←$ 0, 1;(param,msk)← Setup(1λ);(I, auxA)← A(param);if b = 0, then K← K;if b = 1, then(Hdr,K) = Enc(param,msk, I);b′ ← A(K, auxA);return (b = b′).

Fig. 2. Building Blocks Security Experiments

248 TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC

Pseudorandom function. A function F : 0, 1s × 0, 1` → 0, 1n is apseudorandom function (PRF) if

– given a keyK ∈ 0, 1s and an inputM ∈ 0, 1`, one can efficiently computeF (K,M);

– in a nutshell, an adversary against a PRF should not be able to distinguishthe output of F from the uniform distribution U . More formally, for anyprobabilistic polynomial-time A, there exists a negligible function ν(λ) suchthat

AdvprfF,A(λ) =∣∣∣2 · Pr

[ExpprfF,A = 1

]− 1∣∣∣ ≤ ν(λ).

where ExpprfF,A is given in Figure 2, in which U is the uniform distribution, andwhere A is given access to an oracle which on input a message M , outputsF (K,M).

Note that F can be implemented as a keyed hash function such as SHA–256.In our construction, we need another property for the used PRF. More pre-

cisely, in the rule indistinguishability experiment, the adversary knows the keyK, which does not permit us to rely on the above pseudorandomness, for obvi-ous reasons. We then consider the case of a fixed-key PRF and we require theone-wayness of the resulting function, against an adversary having access to thekey K. More formally, for any probabilistic polynomial-time A, there exists anegligible function ν(λ) such that

Advow−prfF,A (λ) = Pr[Expow−prfF,A = 1

]≤ ν(λ).

where Expow−prfF,A is given in Figure 2.It is commonly believed that the keyed SHA–256 verifies such property (see,

e.g., [19] for some comments on that point). Moreover, as a hash function,SHA–256 can also be treated as a random oracle. Using both the one-waynessof the keyed SHA–256 and the random oracle model, we will be able to provethat our scheme provides rule indistinguishability against fraudulent senders andreceivers (in the high min-entropy setting, see Section 2.3).

Hash function. We also need a cryptographically secure hash function H, thatis collision resistant, resistant to pre-image and resistant to second pre-image.

Broadcast encryption. As shown in [9,22], Broadcast Encryption (BE) schemes [14]can be used to enforce some access control in the multi-user setting. Most of thetime, and this will be the case here, a classical symmetric key encryption schemeshould be added to such broadcast encryption scheme. A broadcast encryptionBE can be described by the following procedures.

– Setup, on input the security parameter λ, it generates the public parametersparam of the system, a master secret key msk.

TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC 249

– Extract, on input the parameters param, the key msk and a user’s index ioutputs the key ski of user i.

– Enc takes as input the public parameters param, the master key msk and aset of indices I. It outputs a header Hdr and a key K ∈ K.

– Dec, on input param, a header Hdr and a secret key ski for i ∈ I, suchprocedure outputs the key K.

Regarding security, a specific form of indistinguishability should be definedon the key K [6]. The corresponding experiment Expbe−indBE,A is given in Figure 2,where A has access to an Enc oracle on input I (getting on output (Hdr,K)) andto a Dec oracle on input (Hdr, i) (having access to the output key K). We saythat a broadcast encryption scheme BE is indistinguishable if for any probabilisticpolynomial-time A, there exists a negligible function ν(λ) such that

Advbe−indBE,A (λ) =∣∣∣2 · Pr

[Expbe−indBE,A = 1

]− 1∣∣∣ ≤ ν(λ).

3.2 Description

Let F,G be secure PRFs, both with parameters s and `. Let H be a cryptograph-ically secure hash function. Finally, let BE be a secure broadcast encryption. Alldetails are given in Section 3.1. Let S (resp. R) be the set of senders (resp.receivers) and let I be a set of indices related to S ∪R.

We now give the details of each step of our intrusion detection system overan encrypted traffic.

– Setup

• The Security Editor SE first executes mk←$ BE.Setup(1λ). It then com-putes skn ← BE.Extract(mk, n) for each element of S ∪ R and sends theresult (in a secure way) to the corresponding actor. After that, it com-putes (Hdr, s)← BE.Enc(mk, I). We assume that s ∈ 0, 1s. Finally, SEdefines the integer C as the maximum number of occurrences of distincttokens in the traffic.

During a particular session between a sender S (with index n ∈ I) and areceiver R (with index n ∈ I), a few more things are executed by the actors.• A key K ← 0, 1s for the PRF G is generated and secretly shared by

SP, S and R.• A key k for a TLS protocol is also generated and secretly shared by S

and R.At the end, we have

param = (C,Hdr), skSE = (mk), skSP = (K),

skS = skR = (skn,K, k)

– RuleGen• For each rule ri ∈ M, and for each ck ∈ [0, C − 1], the Security Editor

SE computes Bi,k ← F (s, ri||ck). Then SE sends the set B = Bi,ki,k toSP.

250 TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC

– Send• On input Hdr and a packet payload T , S first computes s← BE.Dec(Hdr, skn).• S then chooses c←$ZC and salt←$ 0, 1`. Next, the packet payload is

parsed into a set of unique tokens tjj . For each tj , S computes

pj = F (s, tj ||c) (1)

qj = G(K, salt + j)⊕ pj , (2)

where distinct token has a counter c, and it is incremented by one moduloC when the token repeats.

• Finally, S encrypts the whole packet payload with TLS key k, and obtaine. It then sends E = (qjj , e, c, salt) to SP.

– Detect• After receiving the encrypted traffic E = (qjj , e, c, salt) from S, SP

computes pj = G(K, salt+ j)⊕qj for each received qj . If there is a matchbetween one pj and one Bi,k ∈ B, then the encrypted token is markedas malicious and SP generates an alert. Otherwise, the token is markedas legitimate. If all tokens in the packet are legitimate, SP redirectsto R the TLS encrypted packet payload and the values c and salt. SPadditionally hashes the set pjj to obtain the value h, and gives suchauxiliary information aux to R.

– Receive• On input e, c, salt and its secret key skR, the receiver R decrypts the

traffic e using the TLS protocol and the key k. After that, it generatesthe pj ’s in the same way as S by using skn, c and salt.

• R computes the hash value h of the obtained pj ’s for verification. Ifit matches with h, R accepts the traffic. Otherwise, S is considered asmalicious and R outputs ⊥.

3.3 Security Analysis

We show that the proposed scheme has the detection property and traffic andrule indistinguishablility properties.

Detection We first prove the detection property of the proposed scheme.

Theorem 1. Our scheme is detectable if H is collision-resistant.

Proof. As explained in Section 2.3, we do not consider the case where S and Rcollude to break the detection property. The case where the sender is honest isobviously achieved, so that we here only consider the case of a dishonest sender.

We then consider a successful adversary A against the detection property.According to the detection experiment Expdet∆,A in Figure 1, the adversary Asends E = (qjj , e, c, salt) to SP such that:

1. e is the ciphertext of a malicious traffic T ′ under TLS key k, which meansthat there exists t′j0 ∈ T ′ and c0 such that p′j0 = F (s, t′j0 ||c0) ∈ B;

TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC 251

2. the Receive procedure does not output ⊥, which means that the honestreceiver R can first retrieve the set t′jj using e and k, then compute, forall j, pj = F (s, t′j ||c), and finally H(pj) = H(G(K, salt + j) ⊕ qj) in hashverification (otherwise, the traffic is rejected, see Section 3.2); and

3. the detection procedure Detect, on input qj outputs 1 for all j, which meansin particular that G(K, salt + j)⊕ qj /∈ B.

As B and s are considered as upright, the first and second points show that pj0 ∈B. This together with the second point implies that pj0 6= G(K, salt + j0)⊕ qj0 .It follows from the second point that A has obtained a collision of H, whichhappens with negligible probability provided that H is collision-resistant. ut

Traffic Indistinguishability We now prove that our scheme is traffic indistin-guishable: it verifies both traffic indistinguishability against a malicious serviceprovider (sp− tr − ind) and against a malicious security editor (se− tr − ind).We thus prove the two following theorems. We assume in the sequel that theTLS protocol is secure and do not consider the value e in our proof. One cansimply add the advantage of breaking TLS, which can obviously be consideredas negligible.

Theorem 2. Our scheme is traffic-indistinguishable against malicious serviceprovider with

Advsp−tr−indA (λ) ≤ 2(Advbe−indBE,A (λ) + AdvprfF,A(λ)).

Proof. Assume that the adversary A knows SP’s secret key skSP = K and hasaccess to Send and RuleGen oracles. The adversary A outputs two plain trafficsT0 and T1. According to a bit b, an encrypted traffic is generated, as Eb =

Send(param, Tb), where Tb = t(b)j j .Game 0. This is the original attack game, where the encrypted traffic Eb is com-

posed of (i) the set q(b)j = G(K, salt+ j)⊕ p(b)j j where each p(b)j = F (s, t

(b)j ‖cj)

and s = BE.Dec(Hdr, skn), (ii) the used random counter c0 and salt, and (iii) theTLS ciphertext e (not considered). The adversary A eventually outputs a bit b′.

Since A knows K, it can obtain the p(b)j ’s as p

(b)j = G(K, salt + j)⊕ q(b)j .

Let S0 be the event that the b = b′. Then we have

Advsp−tr−indA (λ) =∣∣∣2 Pr

[Exp

sp−tr−indA (λ) = 1

]− 1∣∣∣ = |2 Pr[S0 ]− 1|.

Game 1. We modify Game 0 by replacing the broadcast key s, output in s =BE.Dec(Hdr, skn), by a random value in K. Let S1 be the event that the b = b′ inGame 1. We can describe a distinguisher between Game 0 and Game 1, whichexactly corresponds to the broadcast encryption indistinguishability experimentgiven in Figure 2 (as A has no access to mk, nor s). Then

|Pr[S0 ]− Pr[S1 ] | = Advbe−indBE,A (λ). (3)

252 TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC

Game 2. Recall that A is not allowed to query Send with any traffic which

contains any t(b)j or query the RuleGen with t

(b)j . Further, the p

(b)j ’s are distinct

due to the varying counter cj for the identical t(b)j , as in Eq. (1). This enables

us to transform Game 1 to Game 2 in which we replace the p(b)j ’s with truly

random values in U(1λ). Let S2 be the event that the b = b′ in Game 2. Wecan again describe a distinguisher between Game 1 and Game 2, which exactlycorresponds to the pseudorandomness of a PRF, as described in Figure 2. Then

|Pr[S1 ]− Pr[S2 ] | = AdvprfF,A(λ). (4)

Here, the counter c does not change anything since, in the PRF security experi-ment, the input M is known to the adversary, as c (and the plain traffic) in ourgame.

At the end of this game, the traffic is then composed of (i) the set of random

values p(b)j j and (ii) the used random counter c0 and salt. That is, A can onlyget b′ by random guess. Obviously, Pr[S2 ] = 1/2.

Using additionally the results given in (3) and (4) above, we finally have

Advsp−tr−indA (λ) ≤ 2(Advbe−indBE,A (λ) + AdvprfF,A(λ)),

which concludes the proof. ut

Theorem 3. Our scheme is traffic-indistinguishable against any adversary with-out knowledge of skSE with

Advse−tr−indA (λ) = 2AdvprfG,A(λ).

Proof. We prove the result on adversaries who know SE’s secret key skSE =mk and has access to Send and RuleGen oracles. It outputs two plain trafficsT0 and T1. According to a bit b, an encrypted traffic is generated, as Eb =

Send(param, Tb), where Tb = t(b)j j .Game 0. This is the original attack game, where the encrypted traffic Eb is com-

posed of (i) the set q(b)j = G(K, salt+ j)⊕ p(b)j j where each p(b)j = F (s, t

(b)j ‖cj)

and s = BE.Dec(Hdr, skn), (ii) the used random counter c0 and salt, and (iii) theTLS ciphertext e (not considered). The adversary A eventually outputs a bit b′.

Since A knows mk, it knows the value s. Hence, A can compute p(0)j and p

(1)j .

Let S0 be the event that the b = b′. Then we have

Advse−tr−indA (λ) =∣∣2 Pr

[Expse−tr−indA (λ) = 1

]− 1∣∣

= |2 Pr[S0 ]− 1|.

Game 1. We modify Game 0 by replacing the G(K, salt+ j) with by the outputfλ(salt + j) for given salt, where the function fλ chosen uniformly at randomin the set of all functions mapping l-bit strings to n-bit strings. Let S1 be theevent that the b = b′ in Game 1. Since K is unknown to A and salt is randomlychosen for any new query, one can describe a distinguisher between Game 0 and

TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC 253

Game 1, which exactly corresponds to the pseudorandomness of the PRF G, asdescribed in Figure 2. Then

|Pr[S0 ]− Pr[S1 ] | = AdvprfG,A(λ).

At the end of this game, the traffic is then (removing the TLS ciphertext e)

composed of (i) the set q(b)j = fλ(salt + j)⊕ p(b)j j given as above, and (ii) theused random counter c0 and salt. That is, the traffic is encrypted with a one-timepad. Obviously, Pr[S1 ] = 1/2, and then

Advse−tr−indA (λ) = 2AdvprfG,A(λ),

which concludes the proof. ut

Rule Indistinguishability We now prove that our scheme is rule indistinguish-able: it verifies both rule indistinguishability in the basic setting (sp− rul− ind)and in the high min-entropy one(hme − rul − ind). We thus prove the two fol-lowing theorems.

Theorem 4. Our scheme is rule-indistinguishable in the basic setting with

Advsp−rul−indA (λ) ≤ 2(Advbe−indBE,A (λ) + AdvprfF,A(λ)).

Proof. Assume that the adversary A knows SP’s secret key skSP = K and hasaccess to the RuleGen and Send oracle. It outputs two sets of rules M0 andM1. According to a bit b, a set Bb of blinded rules is generated, as Bb =RuleGen(param, skSE,Mb).Game 0. This is the original attack game, where the blinded rules are given

by the set Bb = B(b)i,ki,k with B

(b)i,k = F (s, r

(b)i ‖ck) with r

(b)i ∈ Mb and ck ∈

[0, C − 1].The adversary A eventually outputs a bit b′. Let S0 be the event that the

b = b′. Then we have

Advsp−rul−indA (λ) =∣∣∣2 Pr

[Exp

sp−rul−indA (λ) = 1

]− 1∣∣∣ = |2 Pr[S0 ]− 1|.

As the adversary has chosenM0 andM1, and since it has access to the Sendoracle giving on input a known payload T = tjj , it simply has to generatea payload permitting to learn some information about the rules that are trulyrelated to Bb.

In fact, as for Theorem 2, we can first define Game 1 in which we replace thebroadcast key s, output in s = BE.Dec(Hdr, skn), by a random value in K.

Notice that A does not query RuleGen or Send oracle with the r(b)i ’s. More-

over, theB(b)i,k ’s are distinct due to the varying counter ck. Then, we can transform

Game 1 to Game 2 in which we replace the B(b)i,k ’s with truly random values in

U(1λ). As the blinded rules are no more related to input rules in Mb, this isobvious that Pr[S2 ] = 1/2.

254 TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC

We then have

Advsp−rul−indA (λ) ≤ 2(Advbe−indBE,A (λ) + AdvprfF,A(λ)),

which concludes the proof. ut

Theorem 5. In the random oracle model, our scheme is rule-indistinguishablein the high min-entropy setting with

Advhme−rul−indA (λ) ≤ Advow−prfSHA–256,A(λ) + 2−µ(λ)+1.

Proof. Assume that the adversary A knows sender and receiver’s secret keysskS = skR = (skn,K, k). Hence, A knows the key s to the PRF F . It outputstwo sets of rules M0 and M1. According to a bit b, a set Bb of blinded rules isgenerated, as Bb = RuleGen(param, skSE,Mb).Game 0. This is the original attack game, where the blinded rules are given

by the set Bb = B(b)i,ki,k with B

(b)i,k = F (s, r

(b)i ‖ck) with r

(b)i ∈ Mb and ck ∈

[0, C − 1].The adversary A eventually outputs a bit b′. Let S0 be the event that the

b = b′. Then we have

Advhme−rul−indA (λ) =∣∣2 Pr

[Exphme−rul−indA (λ) = 1

]− 1∣∣ = |2 Pr[S0 ]− 1|.

In this case, the adversary A = (Af ,Ag) can create any traffic of its choice,since it has access to sender’s key. But, as we fall into the high min-entropysetting, Af and Ag cannot communicate with each other, and Ag has no chanceto obtain one element in Mb “by chance”.Game 1. We modify Game 0 to Game 1 by adding an abort when the adversarymakes use of a rule included inMb. Let S1 be the event that the b = b′ in Game1. Obviously, the difference between S1 and S0 is given by the high min-entropy(see Section 2.3), and thus we have

|Pr[S0 ]− Pr[S1 ] | ≤ 2−µ(λ),

where µ(λ) ∈ ω(logλ) according to the high min-entropy property of the ruleset.

After Game 1, considering keyed SHA–256 for the PRF F , the aim of theadversary A is to distinguish, among the unknown sets M0 and M1, which of

the two has been used to compute Bb = SHA–256(s, r(b)i ‖ck)i,k. Applying the

high min-entropy, A has no way to find one of the r(b)i by chance. Using the

technique introduced by Bellare and Rogaway [5], we can prove by contradictionand in the random oracle model, that we can used such distinguisher A to breakthe one-wayness of the keyed SHA–256 (see Section 3.1). For this purpose, weconstruct a machine that is given a key K and a value R = SHA–256(K,M) foran unknown input M . We set s as the key K, and embed the challenge R inthe set Bb that is sent back to A. Our machine then watches for random oraclequeries that A makes related to SHA–256. If there is one such query (K,M) for

TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC 255

which R = SHA–256(K,M), then the machine output M . As in [5], we arguethat A has no advantage in distinguishing M0 and M1 in the case that A doesnot ask for the such image. So our machine wins with non negligible probability,which permits us to conclude that

|2 Pr[S1 ]− 1| = AdvGame1A (λ) = Advow−prfSHA–256,A(λ),

and also conclude our proof. ut

4 Implementation and Validation

In this section, we focus on the implementation and validation of our approach.We evaluate the performances of our protocol and compare the obtained re-sults with the classical HTTPS and the two main existing results: BlindBox andBlindIDS.

4.1 Implementation Details

Implementation environment. We implemented our protocol given in theprevious section on an Intel(R) Xeon(R) with a E5-2637 processor with 4 coresrunning at 3.70GHz, in C language running in a 64-bits Linux OS.

Cryptographic choices. To implement our protocol, we have chosen the keyedhash function HMAC-SHA–256 as a pseudorandom function and SHA–256 againas a hash function. This gives us the output size of 256 bits for F and G respec-tively (which corresponds to the size of the pj ’s and the qj ’s). The size of the keyss and K is moreover defined as 128 bits, as prescribed by most of governmentagencies. For all these functions, we have used the LibTomCrypt library.

Regarding the broadcast encryption, we have chosen to use the basic LSDscheme [16], which one is an improvement of the SD scheme [18] in terms of keymanipulation. We have considered a group of 232 users (corresponding to 4.3billions, which seems to be enough in practice). Following [16], the decryptionprocedure necessitates 31 executions of a pseudorandom function and each userhas to memorize about 180 keys (128-bit length).

In our benchmarks, we only consider the decryption phase. In fact, the keygeneration, extraction and encryption ones are done by the Security Editor dur-ing an off-line phase, and are then of less importance to compare our solutionwith related work. Assuming that the group of users may have evolved sincethe last connection of a user, we moreover take into account the time needed toexecute this decryption procedure at each Send procedure.

4.2 Functional Tests

We consider the same framework as BlindBox and BlindIDS. We then refer tothe same public datasets for detection functionalities related to malwares [1,23],

256 TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC

parental control [3] and general rules [2]. For all these datasets, our solution canbe applied using comparison and perfect matches with either the URL headerfield [1, 3] or hexadecimal strings or text keywords [2, 23].

In the first case, we support most of the proposed entries3. In the secondcase, as we cannot manage regular expressions, we can only manage 3/4 of theproposed entries (see [7, 20] for details).

Regarding the ability of our solution to detect attacks, as the structure ofour solution is exactly the same as in the BlindBox one [7], and as we use thesame tokenization strategy applied to the same dataset, it is obvious that we canachieve the same accuracy as in BlindBox and BlindIDS (again, see [7, 20] fordetails). More precisely, there are two tokenization algorithms. At first, window-based tokenization produces fixed-length tokens: for every offset in the traffic,the sender creates a token of a fixed length. Then, delimiter-based tokenizationgives variable length tokens: each token starts and ends before or after a specificdelimiter such as a punctuation, a spacing, or a special symbol.

Using this framework, we compare our solution with standard HTTPS andwith the two main existing solutions, namely BlindBox [20] and BlindIDS [7].This comparison is done based on the setup and encryption time on sender/re-ceiver’s side, key size for sender/receiver, detection time on the Service Provider’sone and RAM usage for the Service Provider. We provide several results basedon the size of the considered traffic and the number of rules that have beenedited by the Security Editor.

4.3 Performance Comparison

We can now evaluate the performance of our solution. The result is given inTable 1, together with our comparison with related work. In Table 1, we use thefigures in [20] for HTTPS and BlindBox and take the figures in [7] for BlindIDS.We do not give the figures for detection overhead of HTTPS (n.a. in Table 1)since the standard HTTPS cannot perform intrusion detection over encryptedtraffic.

These figures definitely show that our solution is very performing and betterthan related works in all aspects.

Connection setup. As for BlindIDS, our solution does not impact the setuptime for a connection, while the BlindBox one depends on the number of rulesto be tested, in order to generate the garbled circuits.

Data encryption. Regarding the sender/receiver side, our approach reducesby 3 orders of magnitude the time to encrypt the traffic, compare to BlindIDS

3 Contrary to [20] and [7], we consider that 100% is not really possible since there aresome rules containing some URL blacklists that can be easily obtained. For example,if the adversary is both the sender and the receiver, this can be used to break the ruleindistinguishability property and obviously does not fall into the “high-min entropy”setting.

TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC 257

Table 1. Performance and comparison of our solution with standard HTTPS, theBlindBox and the BlindIDS solutions

Description HTTPS BlindBox BlindIDS Our solution

Connection time(Sender/Receiver)

Setup (1 keyword)Setup (3K Rules)Encrypt (128 bits)Encrypt (1500 bytes)

73ms73ms13ns3µs

588ms97s69ns90µs

73ms73ms729µs27ms

73ms73ms240ns6.5µs

Detection time(Service Provider)

1 Rule, 1 Token1 Rule, 1 Packet3K Rules, 1 Token3K Rules, 1 Packet

n.a.n.a.n.a.n.a.

20ns5µs137ns33µs

691µs41.3ms700ms74s

10ns980ns16ns1.5µs

RAM usage(Service Provider)

1 Rule, 1 Connection3K Rules, 1 Connection1 Rule, 100 Connections3K Rules, 100 Connections

n.a.n.a.n.a.n.a.

1.75MB5.12GB175MB512GB

0.2KB0.58MB0.2KB0.58MB

0.1KB0.3MB0.1KB0.3MB

solution. This is due to the fact that we only manipulate symmetric crypto-graphic techniques. In comparison with BlindBox, one can see that the biggerthe traffic size is, the better is our solution. Even if the broadcast decryptionphase is expensive, it is done only once, whatever the size of the traffic. Finally,it is obvious that we cannot compare to the HTTPS but we demonstrate herethat we are not so far coming near the same order of magnitude.

As shown in [7], it necessitates about 97s to encrypt a Twitter page of 284KBusing BlindBox, while BlindIDS can do that in about 5s. A CNN webpage(131KB) necessitates 2.3s using BlindIDS and again 97s using BlindBox. A Face-book page (74KB) is loaded in about 1s using BlindIDS and 97s using BlindBox.Using our solution, the resulting time for all these websites is less than 100ms,which is very close to the result of the current standard HTTPS protocol!

Detection. We also evaluate and compare the overhead for the Service Providerduring detection. We then measure the memory space and the time required toperform detection, according to the number of detection rules (from 1 rule to3K rules) and the size of the network connections (from 1 token of 128 bits to1 packet of 1500 bytes). Again, our performances regarding the time needed totest all the rules is quite similar to the one of BlindBox. Even if it seems thatour figures are a little bit better, both implementations have not been done in anoptimized manner, and as the one from BlindBox has been done in 2015, severaloptimizations are certainly available today. However, compare to BlindIDS, weare widely more performing, up to 7 orders of magnitude for 3K rules and apacket of 1500 bytes! Again, symmetric cryptography is definitely better thanpairing based asymmetric cryptography in terms of performance.

On memory usage, we are close to the BlindIDS solution, as we obtain similarresults. Compare to BlindBox, we drastically decrease the memory space neededsince we do not make use of garbled circuits.

Real-life deployment. In practice, there are two ways to deploy such kind ofsolution. Either one can manage both the client and the server (in case of an

258 TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC

Intranet for example) and in this case, such entity can oblige both sides to im-plement such traffic encryption system. Or the whole solution (and in particularthe encryption and decryption algorithms) should be standardized so as to beintegrated natively into browsers. Then, a server can e.g. refuse any traffic notimplementing this traffic encryption solution, and the client can also refuse toconnect to any server not accepting those algorithms.

5 Related Work

This section reviews some related work on intrusion detection over encryptedtraffic. We focus on works most relevant to us: multi-party computation basedBlindBox, public key searchable encryption based BlindIDS, pattern matchingon encrypted streams, and finally searchable symmetric encryption schemes. Inorder to compare with the two closest works, BlindBox and BlindIDS, we alsogive at the end of this section a comparison in terms of security and functionality.

BlindBox. The BlindBox paper [20] proposes three distinct detection protocolssupporting DPI over encrypted traffic. They all support equality tests betweenthe encrypted traffic and the rules defined by the Security Editor. In the thirdsolution, the Service Provider can also retrieve the decryption key embeddedinto the trapdoor used for equality test, permitting a full decryption of thetraffic, and then the possibility for SP to operate a full IDS (but with no moreconfidentiality).

Those solutions are all based on garbled circuits and oblivious transfers.The idea is to execute a garbled circuit evaluation for each TLS connection,and for each detection rule to be tested. The secret key used to encrypt thetraffic is secretly embedded into the garbled circuit by the sender and the SPdeterministically encrypt each pattern to be tested using that key (but withoutknowing it, using oblivious transfer techniques). Then, the garbled circuit isexecuted in a 2-Party protocol, and finally outputs the decision on the safenessof the traffic.

As shown in the previous section and in [7], this process should be done ateach TLS connection, and then drastically increases the setup time (97 secondsaccording to [20]). Moreover, the memory space needed is proportional to thenumber of (i) unique receivers to be protected, (ii) unique TLS connections, and(iii) unique detection rules. Another drawback of BlindBox is that the ServiceProvider is required to have a direct plain access to the SE rules4, which isdefinitely not market-compliant, since the SE will be very reluctant to sharetheir detection rules with SPs.

Thus, even if the BlindBox authors show that such garbled circuits and obliv-ious transfer techniques can be very efficiently implemented, this is definitely

4 The obfuscated rule encryption technique used in BlindBox is only used to protectthe set of rules w.r.t. users, but not the Service Provider.

TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC 259

not enough efficient for a practical privacy-friendly IDS over an encrypted traf-fic. BlindBox is not scalable, and has serious limitation compare to the marketecosystem for network security solutions.

The Embark system [17], defined by the authors as an extension of BlindBox,does not treat these limitations. It only provides a solution to securely outsourcenetwork middleboxes to the cloud.

BlindIDS. BlindIDS [7] takes a different approach. It offers to encrypt thepatterns only once for all the TLS connections, using public key cryptography,and more specifically decryptable searchable encryption [15] (DSE). The idea isfor the sender to encrypt the traffic using the DSE, which traffic can be decryptedby the receiver. In parallel, the Security Editor provides to the Service Providerone trapdoor for each pattern to be tested on the encrypted traffic, using thetesting procedure of the DSE scheme.

Then, BlindIDS improves the BlindBox scheme in two aspects. At first, theconnection setup time is constant since all trapdoors are computed only once forall TLS traffic. Similarly, the memory space required to perform DPI only de-pends on the number of detection rules, and no longer on the number of receiversnor the number of concurrent TLS connections. Another positive consequence isthat the Service Provider no longer knows the detection patterns it is searchingin the encrypted traffic, due to the properties of the DSE. However, the useof a public-key cryptography, and especially pairings, comes with an increasingdecryption overhead on the receiver side. Then, the BlindIDS is still not enoughfor a real-world use.

Pattern matching on encrypted streams. Recently, Desmoulins et al. [13]have proposed a pattern matching system over encrypted streams. They in-troduce a new kind of searchable encryption that manages so-called “shiftabletrapdoors”. While BlindBox and BlindIDS only permits to detect patterns thatperfectly match one substrings, this solution permits to detect a pattern even if itstraddles two substrings. This solution then permits to manage many more rulesthan related work, but the detection procedure is about 10 times less efficientthan the one of BlindIDS.

Searchable symmetric encryption. Searchable symmetric encryption (SSE)enables a user to encrypt data in such a way that it can later generate search to-kens to send as queries to the storage server [9]. An immediate application of SSEis to the design of searchable cryptographic cloud storage systems. Consideringthe efficiency of underlying symmetric key primitives, SSE seems a promising al-ternative to public key searchable encryption schemes used in BlindIDS [7] andin [13]. However, directly employing existing SSE protocol [8,9] cannot meet allof our design requirements. To be specific, SP needs to match the encryptedtraffic from S and encrypted rules from SE. To this end, S and SE should re-spectively encrypt data and generate search trapdoors for given keywords. In

260 TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC

either single-user or multi-user SSE schemes, e.g., SSE-1 and MSSE in [9], thetwo actors S and SE share exactly the same secret keys for data encryptionand trapdoor generation. Hence the SE can break the traffic indistinguishabilitytrivially. Therefore, the existing SSE scheme cannot be directly used in our case.

5.1 Functionality and Security Comparison

Regarding functionalities and security requirements, there are some differencesbetween BlindBox, BlindIDS and our solution that we now detail.

– Privacy-friendly means that no access is possible to the plaintext related toencrypted traffic: this property is similarly verified by the three solutions,except with the third protocol proposed in BlindBox, for which the serviceprovider is allowed to decrypt the whole traffic when it detects suspicioustokens when executing the two first protocols.

– Security-aware means that the solution supports DPI over encrypted traffic.This property is satisfied by the three solutions, in exactly the same way forboth BlindIDS and our solution. For BlindBox, the authors claim that theirsolution provides a full IDS functionality, but (i) this is at the cost of breakingthe privacy-friendly property, as explained above and (ii) the claimed “full”characteristic is a little bit exaggerated as a regular expression is in BlindBoxevaluated only on suspicious traffic, that has not passed the first test. Wemay imagine a more astute attack that permits some malicious traffic topass the first round while it would not have passed an evaluation over aregular expression. Finally, as stated in [20], depending on the tokenizationtechnique, the three solutions may fail to detect an attack that a standardimplementation would have detected, especially in the delimiter-based case.In fact, most rules occur on the boundary of non-alphanumeric charactersand thus does not transmit all possible tokens.

– Market-compliant means that each party (security editor and service provider)preserves its own know-how without revealing it to the other parties. Thisproperty is not verified in BlindBox as the Security Editor should send tothe Service Provider the whole set of rules/patterns to perform detection. Incontrast, both BlindIDS and our solution succeed in verifying this property.Our solution naturally necessitates the Service Provider to manage a secretkey, while this is not the case for BlindIDS, but this can be added quiteeasily by managing an additional encryption layer with a Service Providerkey.

– Security level: all the three solutions reach the same level of security. Theonly exception is that BlindBox does not achieve the rule indistinguishabilityagainst Service Provider property, as explained above.

6 Conclusion

We have provided a new approach to intrusion detection over an encryptedtraffic. While our general framework is close to BlindIDS, the fact that we make

TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC 261

use of symmetric cryptography makes things far more efficient, and comparableto the state-of-the-art BlindBox in terms of encryption and detection time. It istoday possible to obtain the best of all existing solutions in one system: efficientsetup, low memory consumption, rules’ confidentiality against service providersand efficiency of the whole protocol. The current drawback of our solution isthat we need to manage the counter c < C, which asks the Security Editor toprovide C “trapdoors” for each rule. The way to prevent the use of such trickcan be very good in the future.

References

1. Malware domain list. https://www.malwaredomainlist.com/mdl.php, 2016.2. Snort. https://www.snort.org/downloads/, 2016.3. URL blacklist. http://www.urlblacklist.com/?sec=home, 2016.4. Mihir Bellare, Marc Fischlin, Adam O’Neill and Thomas Ristenpart, “Determin-

istic Encryption: Definitional Equivalences and Constructions without RandomOracles”, CRYPTO 2008, pages 360–378, 2008.

5. Mihir Bellare, Phillip Rogaway, “Random Oracles are Practical: A Paradigm forDesigning Efficient Protocols”. ACM Conference on Computer and Communica-tions Security 1993, pages 62-73, 1993

6. Dan Boneh, Craig Gentry and Brent Waters, “Collusion Resistant Broadcast En-cryption with Short Ciphertexts and Private Keys”. CRYPTO 2005, LNCS 3621,pages 258–275, Springer, 2005.

7. Sebastien Canard, Aıda Diop, Nizar Kheir, Marie Paindavoine and Mohamed Sabt,“BlindIDS: Market-Compliant and Privacy-Friendly Intrusion Detection Systemover Encrypted Traffic”, AsiaCCS 2017, pages 561–574, 2017.

8. D. Cash, J. Jaeger, S. Jarecki, C. Jutla, H. Krawczyk, M.-C. Rosu, and M. Steiner,“Dynamic searchable encryption in very large databases: Data structures and im-plementation”, in Proc. of NDSS, 2014.

9. Reza Curtmola, Juan A. Garay, Seny Kamara and Rafail Ostrovsky, “Searchablesymmetric encryption: improved definitions and efficient constructions”, ACM CCS2006, pages 79–88, 2006.

10. Dell Security, Annual Threat Report, https://www.bitpipe.com/detail/RES/

1431453319_167.html, 2015.11. Dell Security, Annual Threat Report, http://www.netthreat.co.uk/assets/

assets/dell-security-annual-threat-report-2016-white-paper-197571.

pdf, 2016.12. Dell Security, Annual Threat Report, https://www.dell.com/learn/us/en/

vn/press-releases/2017-04-20-dell-end-user-security-survey-highlights-unsafe-data-security-practices-in-the-workplace, 2018.

13. Nicolas Desmoulins, Pierre-Alain Fouque, Cristina Onete, Olivier Sanders, “Pat-tern Matching on Encrypted Streams”. ASIACRYPT (1) 2018, pages 121-148,2018.

14. Amos Fiat and Moni Naor, “Broadcast Encryption”, CRYPTO ’93, pages 480–491,1993.

15. Thomas Fuhr and Pascal Paillier, “Decryptable Searchable Encryption”, ProvableSecurity 2007, pages 228–236, 2007.

16. Dani Halevy and Adi Shamir, “The LSD Broadcast Encryption Scheme”, CRYPTO2002, pages 47–60, 2002.

262 TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC

17. Chang Lan, Justine Sherry, Raluca Ada Popa, Sylvia Ratnasamy, Zhi Liu, “Em-bark: Securely Outsourcing Middleboxes to the Cloud”. NSDI 2016, pages 255-273,2016.

18. D. Naor., M. Naor, J. Lotspiech, Revocation and Tracing Schemes for StatelessReceivers, Electronic Colloquium on Computational Complexity (ECCC), number043, 2002.

19. NIST Special Publication (SP) 800-107 Rev. 1, Recommendation for ApplicationsUsing Approved Hash Algorithms, 2012.

20. Justine Sherry, Chang Lan, Raluca Ada Popa and Sylvia Ratnasamy, “BlindBox:Deep Packet Inspection over Encrypted Traffic”, ACM SIGCOMM 2015, pages213–226.

21. SonicWall Cyber threat report https://www.sonicwall.com/fr-fr/news/

sonicwall-cyber-threat-report-2018/, 2018.22. Xingliang Yuan, Xinyu Wang, Jianxiong Lin and Cong Wang, “Privacy-preserving

deep packet inspection in outsourced middleboxes”, IEEE INFOCOM 2016, pages1–9, 2016.

23. Yara rules repository. https://github.com/Yara-Rules/rules, 2016.

TOWARDS TRULY PRACTICAL INTRUSION DETECTION SYSTEM OVER ENCRYPTED TRAFFIC 263

Curriculum Vitae

Chaoyun Li was born in 1990 in Hubei, China. He received the B.Sc. andM.Sc. degrees in mathematics from Hubei University, Wuhan, China, in 2012and 2015, respectively. In September 2015, he started working towards thePhD degree in the research group COSIC (COmputer Security and IndustrialCryptography) at the Department of Electrical Engineering (ESAT), KU Leuven.His research interests include the design and analysis of symmetric ciphers,applied cryptography and coding theory.

265

List of Publications

International Conferences

1. Chaoyun Li and Bart Preneel: Improved Interpolation Attacks onCryptographic Primitives of Low Algebraic Degree. Selected Areas inCryptography 2019: 171-193, 2019

2. Danping Shi, Siwei Sun, Yu Sasaki, Chaoyun Li and Lei Hu: Correlationof Quadratic Boolean Functions: Cryptanalysis of All Versions of FullMORUS. Advances in Cryptology-CRYPTO 2019 (II): 180-209, 2019

3. Qingju Wang, Yonglin Hao, Yosuke Todo, Chaoyun Li, Takanori Isobeand Willi Meier: Improved Division Property Based Cube AttacksExploiting Algebraic Properties of Superpoly. Advances in Cryptology-CRYPTO 2018 (I): 275-305, 2018

International Journal

1. Shun Li, Siwei Sun, Danping Shi, Chaoyun Li and Lei Hu: LightweightIterative MDS Matrices: How Small Can We Go? To appear in IACRTransactions on Symmetric Cryptology 2019(4), 2019

2. Yonglin Hao, Takanori Isobe, Lin Jiao, Chaoyun Li, Willi Meier, YosukeTodo and Qingju Wang: Improved Division Property Based Cube AttacksExploiting Algebraic Properties of Superpoly. IEEE Transactions onComputers 68(10): 1470-1486, 2019

3. Shun Li, Siwei Sun, Chaoyun Li, Zihao Wei and Lei Hu: ConstructingLow-latency Involutory MDS Matrices with Lightweight Circuits. IACRTransactions on Symmetric Cryptology 2019(1): 84-117, 2019

267

268 LIST OF PUBLICATIONS

4. Zibi Xiao, Xiangyong Zeng, Chaoyun Li and Yupeng Jiang: Binarysequences with period N and nonlinear complexity N − 2. Cryptographyand Communications 11(4): 735-757, 2019

5. Wei Li, Linfeng Liao, Dawu Gu, Chaoyun Li, Chenyu Ge, Zheng Guo,Ya Liu and Zhiqiang Liu: Ciphertext-only Fault Analysis on the LEDLightweight Cryptosystem in the Internet of Things. IEEE Transactionson Dependable and Secure Computing 16(3): 454-461, 2019

6. Lisha Li, Chaoyun Li, Chunlei Li and Xiangyong Zeng: New Classes ofComplete Permutation Polynomials. Finite Fields and Their Applications55: 177-201, 2019

7. Wei Li, Vincent Rijmen, Zhi Tao, Qingju Wang, Hua Chen, YunwenLiu, Chaoyun Li and Ya Liu: Impossible Meet-in-the-middle FaultAnalysis on the LED Lightweight Cipher in VANETs. SCIENCE CHINAInformation Sciences, 61(3): 032110:1-032110:13, 2018

8. Lisha Li, Shi Wang, Chaoyun Li and Xiangyong Zeng: PermutationPolynomials (xpm − x+ δ)s1 + (xpm − x+ δ)s2 + x over Fpn . Finite Fieldsand Their Applications 51: 31-61, 2018

9. Chaoyun Li, Qingju Wang: Design of Lightweight Linear DiffusionLayers from Near-MDS Matrices. IACR Transactions on SymmetricCryptology 2017(1): 129-155, 2017

Manuscripts and Preprints

1. Sébastien Canard and Chaoyun Li: Towards Truly Practical IntrusionDetection System over Encrypted Traffic. Manuscript, 2019

2. Yonglin Hao, Lin Jiao, Chaoyun Li, Willi Meier, Yosuke Todo, QingjuWang: Links between Division Property and Other Cube Attack Variants:Some Proofs and Disproofs. Manuscript, 2019

3. Yonglin Hao, Lin Jiao, Chaoyun Li, Willi Meier, Yosuke Todo, QingjuWang: Observations on the Dynamic Cube Attack of 855-RoundTRIVIUM from Crypto’18. IACR Cryptology ePrint Archive 2018: 972(2018)

FACULTY OF ENGINEERING SCIENCEDEPARTMENT OF ELECTRICAL ENGINEERING

COSICKasteelpark Arenberg 10, bus 2452

B-3001 [email protected]

http://www.esat.kuleuven.be/cosic/