Practical homomorphic encryption and cryptanalysis

Practical Homomorphic Encryption andCryptanalysis

Dissertation

zur Erlangung des Doktorgrades

der Naturwissenschaften (Dr. rer. nat.)

an der Fakultat fur Mathematik

der Ruhr-Universitat Bochum

vorgelegt vonDipl. Ing. Matthias Minihold

unter der Betreuung vonProf. Dr. Alexander May

Bochum

April 2019

First reviewer: Prof. Dr. Alexander May

Second reviewer: Prof. Dr. Gregor Leander

Date of oral examination (Defense): 3rd May 2019

Author’s declaration

The work presented in this thesis is the result of original research carried out by the candidate, partly

in collaboration with others, whilst enrolled in and carried out in accordance with the requirements of

the Department of Mathematics at Ruhr-University Bochum as a candidate for the degree of doctor

rerum naturalium (Dr. rer. nat.). Except where indicated by reference in the text, the work is the

candidates own work and has not been submitted for any other degree or award in any other university

or educational establishment. Views expressed in this dissertation are those of the author.

Place, Date Signature

Chapter 1

Abstract

My thesis on Practical Homomorphic Encryption and Cryptanalysis, is dedicated to efficient homomor-

phic constructions, underlying primitives, and their practical security vetted by cryptanalytic methods.

The wide-spread RSA cryptosystem serves as an early (partially) homomorphic example of a public-

key encryption scheme, whose security reduction leads to problems believed to be have lower solution-

complexity on average than nowadays fully homomorphic encryption schemes are based on.

The reader goes on a journey towards designing a practical fully homomorphic encryption scheme,

and one exemplary application of growing importance: privacy-preserving use of machine learning.

1.1 Cryptography Part: Synopsis

Fully homomorphic encryption empowers users to delegate arbitrary computations in a private-preserving

way on their encrypted data. Surprisingly, in many scenarios the executing party does not actually

need to see the private content in order to return a useful result. This part focuses on efficient ways

transforming ubiquitously present machine learning models into privacy-friendly algorithmic cognitive

models, achieving strong security notions by returning encrypted results later to be decrypted by the

user with the secret key only.

New algorithmic constructions, laying the foundation to CPU/GPU implementations, and the pre-

sented adaptive parameterization are solutions to sensitive real-world applications like evaluating deep

neural networks on private inputs. We propose a practical FHE scheme, FHE–DiNN, tailored to

homomorphic inference, exhibiting performance that is independent of the number of a given neural

network’s layers, and give conclusive, experimental results of our implementation.

Portions of the work presented in this part were previously published at CRYPTO 2018 [BMMP18].

1.2 Cryptanalysis Part: Synopsis

This part advances algorithms for variants of subset problems. Generalization of the subset-sum problem

to sparse, multidimensional cases, and their reductions to the one-dimensional case are given. Impli-

cations to the Learning with Errors (LWE) relate the security of practical cryptographic schemes, as

studied in the previous part, with classical and quantum theoretic complexity considerations.

We introduce the property of equiprobability, when probabilistic solvers return every subset solution

with roughly the same probability, and identify which well-known algorithms need be modified to have it.

Portions of the work presented in this part were previously published at AQIS18 [BMR18].

3

Contents

1 Abstract 3

1.1 Cryptography Part: Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Cryptanalysis Part: Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

I Cryptology and Homomorphic Encryption 13

2 Introduction 15

2.1 Scope of this Thesis and Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.1.1 Community Value of provided Solutions . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Implementation: From Formulas to working Code . . . . . . . . . . . . . . . . . . . . . . 16

2.2.1 The Advantage of Open-Source Software . . . . . . . . . . . . . . . . . . . . . . . 16

3 Cryptography and Cryptology 17

3.1 Suitable Problems and Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.1 Definition of the Learning with Errors problem . . . . . . . . . . . . . . . . . . . 18

3.2 Complexity Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3 Boolean Gates, Circuits and Functional Completeness . . . . . . . . . . . . . . . . . . . 22

4 Cryptology 23

4.1 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2 Security Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.3 Sources of Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.4 NIST’s Post-Quantum Cryptography Standardization . . . . . . . . . . . . . . . . . . . 26

5 Quantum Computing 27

5.1 Quantum Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.2 Quantum Computer and Quantum Algorithms . . . . . . . . . . . . . . . . . . . . . . . 28

5.2.1 Grover’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.2.2 Shor’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.2.3 Provable Security and the One-Time Pad . . . . . . . . . . . . . . . . . . . . . . 29

6 Homomorphic encryption (HE) 31

6.1 Definitions and Examples of Homomorphic encryption (HE) . . . . . . . . . . . . . . . . 31

6.1.1 The RSA Cryptosystem and the Factorization Problem . . . . . . . . . . . . . . 31

6.1.2 Paillier Cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5

II Fully Homomorphic Encryption (FHE) & Artificial Intelligence (AI) 33

7 Cloud Computing 35

7.1 Cloud Computing: Promises, NSA, Chances, Markets . . . . . . . . . . . . . . . . . . . 35

7.2 Hardware Solution: Secure Computing Enclaves . . . . . . . . . . . . . . . . . . . . . . . 37

7.3 Software Solution: FHE and FHE–DiNN . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7.3.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

8 Mathematical Foundations of FHE 39

8.1 Basic Concepts from Algebra and Probability Theory . . . . . . . . . . . . . . . . . . . 39

8.2 Lattice Problems for Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

8.2.1 Discrete Gaussian distribution on Lattices . . . . . . . . . . . . . . . . . . . . . . 41

8.3 Learning With Errors (LWE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

8.3.1 Equivalence between the decisional - (dLWE) and search-LWE (sLWE) . . . . . . 44

8.4 Homomorphic Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

8.4.1 Standardization of Homomorphic Encryption . . . . . . . . . . . . . . . . . . . . 47

8.5 An Efficient FHE-scheme for Artificial Intelligence (AI) . . . . . . . . . . . . . . . . . . 47

9 FHE–DiNN 49

9.1 Localization of this Research within the Field . . . . . . . . . . . . . . . . . . . . . . . . 50

9.1.1 Prior Works and Known Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 50

9.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

9.2.1 Notation and Notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

9.2.2 Fully Homomorphic Encryption over the Torus (TFHE) . . . . . . . . . . . . . . 51

9.2.3 TGSW: Gadget Matrix and Decomposition . . . . . . . . . . . . . . . . . . . . . 53

9.2.4 Homomorphic Ciphertext Addition and Multiplication . . . . . . . . . . . . . . . 54

9.3 Artificial Intelligence, Machine learning, Deep Learning . . . . . . . . . . . . . . . . . . 56

9.3.1 Task: Homomorphic Evaluation of Neural Networks . . . . . . . . . . . . . . . . 56

9.3.2 The MNIST Handwritten Digit Database . . . . . . . . . . . . . . . . . . . . . . 57

9.3.3 Cost Functions Measuring Neural Networks’ Performance . . . . . . . . . . . . . 59

9.3.4 Hyperparameters of a Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

9.3.5 Training: The Learning Phase of a Model . . . . . . . . . . . . . . . . . . . . . . 60

9.4 FHE–DiNN: Framework for Homomorphic Evaluation of Deep Neural Networks . . . . 61

9.4.1 Beyond the MNIST dataset: Medical Applications of Image Recognition . . . . . 61

9.4.2 Training Neural Networks: Back-Propagation and Stochastic Gradient Descent . 62

9.5 Discretized Neural Networks: Training and Evaluation . . . . . . . . . . . . . . . . . . . 62

9.5.1 Discretizing and Evaluation of NNs . . . . . . . . . . . . . . . . . . . . . . . . . . 63

9.5.2 Training a DiNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

9.6 Homomorphic Evaluation of a DiNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

9.6.1 Evaluating the linear Component: The Multisum . . . . . . . . . . . . . . . . . . 65

9.6.2 Homomorphic Computation of the non-linear sign-Function . . . . . . . . . . . . 66

9.6.3 Scale-invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

9.7 Optimizations within FHE–DiNN over TFHE . . . . . . . . . . . . . . . . . . . . . . . 67

9.7.1 Reducing Bandwidth: Packing Ciphertexts and FFT . . . . . . . . . . . . . . . . 67

9.7.2 Early KeySwitch Allows Faster Bootstrapping . . . . . . . . . . . . . . . . . . . . 69

9.7.3 Programming the Wheel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

9.7.4 Adaptively Changing the Message Space . . . . . . . . . . . . . . . . . . . . . . . 72

9.7.5 Reducing Communication Bandwidth: Hybrid Encryption . . . . . . . . . . . . . 72

9.7.6 Alternative BlindRotate Implementation: Trading-Off Run-Time with Space . . . 75

9.7.7 Support for various Layers of Unlimited Depth . . . . . . . . . . . . . . . . . . . 77

9.7.8 Neural Networks for Image Classification . . . . . . . . . . . . . . . . . . . . . . 78

9.7.9 Interactive Homomorphic Computation of the argmax Function . . . . . . . . . . 79

9.7.10 Beyond Artificial Neural Networks: CapsNets . . . . . . . . . . . . . . . . . . . . 80

9.8 Practical attack vectors against FHE–DiNN using fplll . . . . . . . . . . . . . . . . . 80

9.8.1 Security Reductions: TLWE to appSVPγ . . . . . . . . . . . . . . . . . . . . . . 80

9.8.2 Theoretical attack vectors against FHE–DiNN . . . . . . . . . . . . . . . . . . 81

9.8.3 Security Evaluation and Parameter Choices . . . . . . . . . . . . . . . . . . . . . 82

9.8.4 General Attacks on Variants of LWE . . . . . . . . . . . . . . . . . . . . . . . . . 83

9.9 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

9.10 Comparison with Cryptonets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

9.10.1 Performance of FHE–DiNN on (clear) inputs x . . . . . . . . . . . . . . . . . . 89

9.10.2 Performance of FHE–DiNN on (encrypted) inputs Enc (x) . . . . . . . . . . . 89

10 FHE & AI on GPUs (cuFHE–DiNN) 91

10.1 Practical FHE evaluation of neural networks using CUDA . . . . . . . . . . . . . . . . . 91

III Cryptanalysis of FHE schemes 93

11 Underlying primitives and the subset-sum problem (SSP) 97

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

11.1.1 Links between the subset-sum problem and Learning With Errors . . . . . . . . 97

11.2 Solving the subset-sum problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

11.2.1 Variants of the subset-sum problem . . . . . . . . . . . . . . . . . . . . . . . . . 99

11.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

11.4 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

11.4.1 Definitions of subset-sum problems (SSP) . . . . . . . . . . . . . . . . . . . . . . 101

11.4.2 Basic components for solving SSP . . . . . . . . . . . . . . . . . . . . . . . . . . 103

11.4.3 Number of solutions for hard SSP instances . . . . . . . . . . . . . . . . . . . . . 104

11.5 Solution Equiprobability and Equiprobable SSP solvers . . . . . . . . . . . . . . . . . . 106

11.5.1 Equiprobable quantum SSP solvers . . . . . . . . . . . . . . . . . . . . . . . . . . 106

11.5.2 Equiprobable classical solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

11.6 Multidimensional subset-sum problem (MSSP) . . . . . . . . . . . . . . . . . . . . . . . 112

11.6.1 Reducing SSP instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

11.6.2 Conversion into bordered block diagonal form (BBDF) . . . . . . . . . . . . . . . 113

11.6.3 Solving k-MSSP for one BBDF–reduced block . . . . . . . . . . . . . . . . . . . . 114

11.6.4 Assembling the blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

11.6.5 The subset-product problem (SPP) . . . . . . . . . . . . . . . . . . . . . . . . . . 117

11.6.6 Multiplicatively reduced instances and smoothness basis . . . . . . . . . . . . . . 118

11.6.7 Transforming an SPP instance into a k-MSSP instance . . . . . . . . . . . . . . . 118

11.6.8 Full Example for transforming an SPP into k-MSSPand solving the SSP . . . . . 119

11.6.9 The Modular SPP Assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

11.7 Conclusion and Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

List of Figures

2.1 Cryptology = Interplay of Cryptanalysis and Cryptography . . . . . . . . . . . . . . . . . 15

3.1 Projection of m-dimensional lattice with basis A = a1,a2, . . . ,am ∈ Zm×nN2 . . . . . . . . 19

3.2 Complexity classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

7.1 NSA performing a ‘Happy Dance!!’ [sic] accessing private data . . . . . . . . . . . . . . . 35

7.2 Overview of today’s ubiquitous cloud computing services . . . . . . . . . . . . . . . . . . . 36

7.3 Four main challenges for FHE-based systems deployed in the Cloud. . . . . . . . . . . . . 38

8.1 Sum x+ = x1 + x2, and product x∗ = x1 · x2 under ring homomorphism f . . . . . . . . . 40

8.2 NearestPlane(s) Algorithms on good (left) vs. bad (right) bases of the lattice L. . . . . . 42

8.3 Three color channels interpreted as RGB-image . . . . . . . . . . . . . . . . . . . . . . . . 48

9.1 Popular neural network activation functions and our choice ϕ1, the sign-function . . . . . 51

9.2 Taxonomy of Deep Learning within Artificial Intelligence. . . . . . . . . . . . . . . . . . . 56

9.3 Neuron computing an activated inner-product in FHE–DiNN. . . . . . . . . . . . . . . . 58

9.4 A generic, dense feed-forward neural network of arbitrary depth d ∈ N. . . . . . . . . . . 59

9.5 Sample forward- and back-propagation through a deep NN, measuring the loss L. . . . . 60

9.6 FFT’s divide-and-conquer strategy for power-of-2 lengths; N = 16. . . . . . . . . . . . . . 68

9.7 Programming the wheel with anti-periodic functions . . . . . . . . . . . . . . . . . . . . . 70

9.8 Programming the wheel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

9.9 Hybrid Encryption: On-Line vs. Off-Line Processing . . . . . . . . . . . . . . . . . . . . . 74

9.10 Computing an encryption comes also at the cost of ciphertext expansion. . . . . . . . . . 74

9.11 A convolution layer applies a sliding kernel to the original image, recombining the weights. 78

9.12 Max-Pooling layers work by combining neighbouring values and taking their maximum. . 79

9.13 Model: Malicious Cloud and sources of leakage of FHE–DiNN ciphertexts and keys. . . 82

9.14 Attacking LWE: Lattice reduction and enumeration. . . . . . . . . . . . . . . . . . . . . . 83

9.15 Pre-processing of a Seven from the MNIST test set. . . . . . . . . . . . . . . . . . . . . . 85

9.16 Discretized MNIST image is fed into our neural network with 784:100:10–topology . . . . 86

9.17 Cloud-security model, the user domain is separated from the server . . . . . . . . . . . . 87

11.1 SSP = Knapsack packing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

11.2 Intuition: SSP instances with density D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

11.3 Landscape of the (d, k, n, w,+)-subset problem family . . . . . . . . . . . . . . . . . . . . 100

11.4 Left/Right Split. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

11.5 Grover’s search algorithm on a list with N = 2n elements (on a high level). . . . . . . . . 107

11.6 Superposition of 2n3 qubits simulates L2 when searching for collisions, collected in Lout. . 107

11.7 Example of a Johnson graph, here J(5, 2), as used in Theorem 100. . . . . . . . . . . . . . 110

11.8 A (d, n, n,+)-subset problem solver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

11.9 A heuristic for solving k-MSSP sparse instances given oracle access to an SSP solver O. . 116

11.10Transforming a (1, n, n, ·)-subset problem to a (d, k, n,+)-subset problem instance. . . . . 119

11.11Modular SSP decomposition and conversion to k-MSSP over G. . . . . . . . . . . . . . . . 121

9

Notations

Z = . . . ,−2,−1, 0,+1,+2, . . . . . . . . . . . . . set of all integers

N = 0, 1, 2, . . . . . . . . . . . . . set natural numbers or non-negative integers

∞ . . . . . . . . . . quantity larger than any n ∈ N

P ⊆ N . . . . . . . . . . prime numbers, i. e. divisible only by 1 and itself

Q,R,C . . . . . . . . . . set of rational, real, resp. complex numbers

(a, b), [a, b), (a, b], [a, b] . . . . . . . . . . (half) open resp. (half) closed interval

Z ⊇ Zq =

[− q2 . . .

q2 ) , q even,

[− q−12 . . . q−1

2 ], q odd.. . . . . . . . . . ring of integers mod q

|Zq| = q . . . . . . . . . . cardinality of a set∑,∏

. . . . . . . . . . Quantifiers for summation resp. multiplication

0, 1n,Znq ,Qn,Rn . . . . . . . . . . vector-spaces of dimension n

b = (b1, b2, . . . , bn) . . . . . . . . . . (row) vectors

wt(b) = |bi : bi 6= 0‖ . . . . . . . . . . (Hamming) weight of vector b ∈ 0, 1n

|b|1 :=∑i |bi| . . . . . . . . . . 1-norm of vector b ∈ Cn

‖b‖ := ‖b‖2 =√∑

i b2i . . . . . . . . . . Euclidean length, or 2-norm of vector b ∈ Cn

‖b‖∞ := maxi |bi| . . . . . . . . . . `∞-, or infinity norm of b ∈ Cn

~0 ,~1 . . . . . . . . . . all-zero, all-one vector

A,B, · · · ∈ Rm×n, . . . . . . . . . . real (m× n)-matrices comprised of (row) vectors

A = (ai)1≤i≤m = (ai,j)1≤i≤m,1≤j≤n . . . . . . . . . . elements ai,j : ith row, jth column of matrix A

In . . . . . . . . . . n× n identity matrix

Im(f) . . . . . . . . . . the image of function f : X → Im(f) ⊆ Y

ker(f) . . . . . . . . . . the kernel of function f : ker(f)→ ~0

e. g., i. e., cf., et al., resp. . . . . . . . . . . commonly used (Latin) abbreviations

11

Greek letters with names

Letter Name Letter Name

α alpha ν nu

β beta ξ, Ξ xi

γ, Γ gamma o omicron

δ, ∆ delta π, Π pi

ε epsilon ρ rho

ζ zeta σ, Σ sigma

η eta τ tau

θ, Θ theta υ, Υ upsilon

ι iota φ, Φ phi

κ kappa χ chi

λ, Λ lambda ψ, Ψ psi

µ mu ω, Ω omega

Capital letters are only shown if they differ from the respective Roman letter.

Frequently used notation requiring more words are the set of binary strings of length n, defined as

0, 1n := (x0, . . . , xn−1) : x0, . . . , xn−1 ∈ 0, 1, n ∈ N,

and binary strings of arbitrary lengths that can be defined or written as the union of sets,

0, 1∗ := (x0, . . . , xn−1) : n ∈ N , , x0, . . . , xn−1 ∈ 0, 1 =⋃n∈N0, 1n.

We have the Kleene star of the binary set is the set of all strings S∗ = (), (0), (1), (0, 0), (0, 1), . . ..Quite frequently, we will use the Landau notations O(·), O(·), Θ(·), Ω(·), ω(·), o(·) defined as follows. We

remark that the equality symbol ′ =′ is historically overloaded for O-notations, meaning that f = O(g)

expresses that f is an element of the set of all functions growing comparably such as g:

Definition 1 (Landau’s Big-O). Let f, g : N→ (0,∞). We define the abbreviations:

• f = O(g) :⇔ ∃ a, n0 ∈ N such that f(n) ≤ a · g(n) for every n > n0, n ∈ N,

• f = Ω(g) :⇔ g = O(f),

• f = o(g) :⇔ ∀ 0 < ε ∈ R,∃n0 ∈ N so f(n) < εg(n) holds for every n > n0,

• f = ω(g) :⇔ g = o(f),

• f = Θ(g) :⇔ f = O(g) ∧ g = O(f).

We alternatively use limits to define the asymptotic notation above, e. g. f = o(g)⇔ g = ω(f) :⇔limn→∞

f(n)g(n) = 0. Analogously, f(n)

g(n) :⇔ ∃ limn→∞

< ∞. We write Ok(·) when dealing with asymptotic

statements that hold for fixed k.

Part I

Cryptology and Homomorphic

Encryption

13

Chapter 2

Introduction

Crypto means Cryptocurrency.—The Internet (2018)

This thesis can be roughly divided into two parts representing both, the constructive and the

destructive aspect of crypto i. e. cryptology. Cryptology comprises cryptography, the design of these

algorithms, and cryptanalysis, the analysis of their security. Both terms however are interchangeably

used for a long time [MVO96], and recently the meaning of the common abbreviation is challenged.

Crptanalysis

Crptography

Fig. 2.1: Cryptology = Interplay ofCryptanalysis and Cryptography

Nowadays, cryptology is heavily used to protect stored and trans-

mitted data against malicious attacks, by algorithmic means.

The first part of Practical Homomorphic Encryption and

Cryptanalysis introduces wide-spread homomorphic cryptosys-

tems. Secondly, in the main part, we explain modern goals of

crypto in the cloud setting, and provide privacy-respecting, con-

structive solutions for real-world problems of the future with to-

day’s tools. Finally, in the third part, we heed the destructive

side, by analyzing the resilience of constructions, the security in

Part II is based on, against cryptanalysis deploying classical and quantum algorithms. We close the

circle, and eventually give an outlook for future research directions in modern cryptology.

The dissertation is organized as follows: Part I sets the stage for crypto. While Part III is dedicated

to cryptanalysis of Subset Sum Problems, the main result—the analysis and practical instantiation of a

FHE scheme—is described in Part II. A discussion on open research problems concludes the dissertation.

15

CHAPTER 2. INTRODUCTION

2.1 Scope of this Thesis and Problem Statement

Everyone is a moon, and has a dark side which he never shows to anybody.—Mark Twain

Today we live in an information society, where choices are made based upon massive processing informa-

tion. Huge amounts of text and data is transmitted over digital networks, filtered and stored everyday.

Topics like protection of sensible information and the need for mechanisms to ensure integrity of data

arise. Companies’ business success depends on how scalable their services are for a global deploy-

ment. People’s fundamental interest in their privacy is systematically left behind, individual rights

are exchanged with arguments for preventing crimes. Chapter 4 focuses on motivations of modern

cryptography, why data protection is necessary, and we illuminate which cryptographic primitives are

suitable for a Cloud computing setting.

2.1.1 Community Value of provided Solutions

Cryptologic research, we believe, should have a return value to the community, by timely communicating

results and making resources easily accessible to the broader public, when widely reaching out to show

useful practical applications and explain technology in simplest possible terms. This is particularly true

if it was funded by the public in the first place, which makes knowledge transfer key.

Hence, this is an argument for providing examples, open access to software source code alongside

an academic paper, demonstrating capabilities and guaranteeing reproducible results.

2.2 Implementation: From Formulas to working Code

Cryptographic schemes need to follow Kerckhoffs principle, and secure implementations need to include

security margins. Even more, it is considered good practice to publish a proof-of-concept open-source

implementation along with the mathematical description of a new scheme for complete coverage. The

theoretical scheme might benefit from public scrutiny and discussions at an early stage, when flaws are

detected and improvements suggested. The time to arrive at working code based on a mathematical

description written in a document is often underestimated, so providing pseudo-code is beneficial for a

later conversion to production-ready code and a proof-of-concept implementation demonstrating in the

appropriate, programming language for the target platform with automated build tools. A reference

implementation eases the verification for reviewers and during security audits, simplifies the task of

practitioners to come up with optimized implementations, and provides comparable efficiency claims.

2.2.1 The Advantage of Open-Source Software

Following modern software development principles, which focuses on manageable small parts and well-

divided functionality from data such that components can be quickly updated, exchanged or altered, we

provide an implementation along our presented research findings on github.com. Considerable effort is

needed to put a cryptographic construction into working code. Security weaknesses can more efficiently

spotted, and modularity allows replacing the concerned parts with reliable components. For example,

the repository around our CRT-RSA Lattice Attack was published in this spirit, for other researcher to

continue the development of fruitful add-ons. This and other provided frameworks focus on modularity

for easy extension, documentation for faster start and deep comprehension, analysis provided in the

accompanying technical document, and applicability through examples. Naturally, implementations

of constructions building on promising, secure protocols that are to be deployed in practice need to

undergo more scrutiny than ones for academic proof-of-concept purposes, just like FHE-DiNN.

16

https://github.com/MatthiasMi/

https://github.com/MatthiasMi/CRT-RSA_LatticeAttack_Analysis

https://github.com/MatthiasMi/FHEDiNN

Chapter 3

Cryptography and Cryptology

A well-defined mathematical algorithm can encrypt something quickly, but to decrypt it

would take billions of years—or trillions of dollars’ worth of electricity to drive the computer.

So cryptography is the essential building block of independence for organizations on the

Internet, just like armies are the essential building blocks of states, because otherwise one

state just takes over another. There is no other way for our intellectual life to gain proper

independence from the security guards of the world, the people who control physical reality.

—Julian Assange (2012)

Since the early days of cryptography the research focus was on secure information transmission, which

historically was above all with focus on military objectives and threat models. To securely transfer

data over an established, insecure channel, additionally to the coding theoretic viewpoint, a step for

scrambling the message blocks before transmission is needed. For completeness-sake, error correcting

procedures are wrapped around the prepared message, since in practice every channel bears some

sources for errors:

Abstractly, in an example where Alice wants to transmit some information I to Bob, encoded in

a language over an alphabet A the flow is:

Information I ⇒ source (en)coding ⇒ x = (x1, x2, . . . , xk) ∈ Ak message block ⇒ encrypt

message x 7→ z ⇒ channel (en)coding ⇒ c = (c1, c2, . . . , cn) ∈ C ⊆ An codeword ⇒ submission

via noisy channel (this is where errors might alter the message)⇒ received message block y = c+e

⇒ decoding⇒ z ≈ z⇒ decrypt message z 7→ x inverse source coding⇒ (received information) I

≈ I (information before transmission) eventually arrives at Bob.

We abstract these coding theoretic constructions when looking at the secure information transmission

process from a cryptographic point of view later on, and assume reliable, authenticated channels for

the purpose of this thesis.

The term cryptography stands for information security in a wide sense nowadays. It deals with

the protection of confidential information, with granting access to systems for authorized users only

and providing a framework for digital signatures. Cryptographic methods and how to apply them,

should not be something only couple of experts know about and just a slightly bigger group of people

is able to use. In contrast, cryptography should become an easy to use technology for various fields of

communication. It can be as easy as a lock symbol appearing in the Internet browser, when logging

on to bank’s website, signalizing a secure channel, for example. Generally speaking any information

that needs to be electronically submitted or stored secretly could profit from an easy access to good

cryptographic methods. On the contrary, an idiom says, it might come at a quality trade-off:

Security is [easy, cheap, good]. Choose two.

17

CHAPTER 3. CRYPTOGRAPHY AND CRYPTOLOGY

Cryptographic primitives can roughly be categorized into two types; symmetric (or private) and asym-

metric (or public) ones. Although we will discuss the role of asymmetric schemes, in the predominant

case when insecure channels connect unacquainted user’s over the Internet, we will also emphasize the

strengths of symmetric primitives and how the two can be blended, in Section 9.7.5.

For the security considerations, we keep the latest cryptanalytic results in mind and even assume a

powerful, fully-functional quantum computer (as in Chapter 5), and regard the impact of such a device

on current state-of-the-art cryptosystems. In this sense, the discussion continues with future needs and

enhancements in the field to prepare cryptographic primitives for a world, where a powerful quantum

computer exists. The question why and how certain systems based on lattices withstand this threat,

and are post-quantum is dealt with for our main construction.

3.1 Suitable Problems and Algorithms

A random instance of an NP-hard problem, which we will define formally later on, is often the initial

point for any cryptographic public-key primitive, along with a secret and the transformed version, which

makes it computationally feasible to solve. One of the first problems in computer science proven to be

among the hardest, or NP-complete, is the subset-sum problem (SSP) which over the years underwent

thorough analysis and appears in numerous applications. Of cryptologic interest were an early candidate

constructions for public-key encryption, cryptographic signature schemes, or pseudorandom number

generation based on SSP, some of which did not withstand cryptanalysis for long. In Part III, we

introduce new algorithms for solving the sparse multiple SSP, from which an efficient subset-product

solver is constructed. Other sources of hard problems, suitable for cryptographic purposes, are

historically intricate number theoretic problems, or more recently stemming from learning theory. A

problem of broad practical relevance still today is the factorization problem, which is not proven nor

believed to be NP-complete. Nevertheless, it is in widely used in the RSA cryptosystem [RSA78],

which we discuss in more detail in Part I, as classical attacks on reasonably-sized instances are considered

infeasible to date. At the core of numerous scientific discussions is the impact of a different, emerging

computer architecture—the quantum computer and foremost Shor’s famous quantum algorithm for

efficient integer factorization.

The Learning with Errors (LWE) problem, of prominent importance in Part II, detailed below, from

the aforementioned field of computational learning theory, exhibits a powerful property: random self-

reducibility. Randomly generated average-case instances of the LWE problem are provably as hard to

solve as worst-case instances of lattice problems, assumed to be among the hardest NP-hard problems,

which together with its quantum resiliency is key.

This property of permitting worst-case-to-average-case reductions, serves as a promising basis for

manifold, modern cryptologic constructions, unless P=NP is counter-intuitively proven one day.

3.1.1 Definition of the Learning with Errors problem

We briefly list the relevant facts about lattices in this section and only provide the main definitions

concerning lattices and the Learning with Errors problem, without reiterating great surveys on lattices

particularly used in cryptography. We instead rather refer to the most comprehensive introduction of

Regev’s lecture notes [Reg09a] or Peikert’s survey [Pei16]. We take an easy problem, and study ways to

make it more difficult as we progress. Being given A = (ai)1≤i≤m = (ai,j)1≤i≤m,1≤j≤n ∈ Z, an m× nmatrix with integer entries and m ≥ n, consider the linear system of m equations, see Figure 3.1:

18


a1,1s1 + a1,2s2 + . . . + a1,nsn = b1

a2,1s1 + a2,2s2 + . . . + a2,nsn = b2...

... . . ....

...

am,1s1 + am,2s2 + . . . + am,nsn = bm

. (3.1)

Given b = 0, the set of all solutions s = (s1, s2, . . . , sn) of b = As can be algebraically viewed as

a lattice L, informally a repetitive, discrete structure embedded in n real dimensions. If all n rows

of A are linearly independent, then L is a full-rank lattice with rank(L) = n. Computationally, it is

easy to solve for s unless there are errors added to every coordinate. Now, redefining b = As + e with

~0

a1

a2. . .

am

Zn

L(A)

Fig. 3.1: Projection of m-dimensional lattice with basis A = a1,a2, . . . ,am ∈ Zm×nN2 .

e 6= 0, sampled from a probability error distribution χ, makes the problem of solving Equation (3.1) for

s quickly infeasible, e. g. for zero-centered Gaussian errors—even for moderate-sized dimension n, see

Section 8.3. We will provide a formal definition of hard lattice problems in Section 8.2, as the Learning

with Errors problem is widely used for many modern cryptographic constructions and comes in two

versions:

Definition 2 (Search-LWE Problem sLWE (Informal)). Given independent LWE samples (ai, 〈a,s〉+ei) ∈ Znq × Zq, for 1 ≤ i ≤ m ∈ N, approximate scalar products with small errors, the task is to find s

with high probability.

Definition 3 (Decisional-LWE Problem dLWE (Informal)). Given independent samples (ai,bi) ∈Znq × Zq, for 1 ≤ i ≤ m ∈ N, the task is to distinguish if bi are sampled from the uniform distribution

over Zq, or conversely, they admit the form of approximate scalar products with some s ∈ Znq and small

errors, bi = 〈a,s〉+ ei.

Assumption 4 (LWE Conjecture (Informal)). It is computationally infeasible (as in there exists no

ppt-algorithm) given independent samples (ai,bi) ∈ Znq × Zq, for 1 ≤ i ≤ m ∈ N, to decide dLWE.

Perhaps interestingly, FHE–DiNN presented in Chapter 9, bases the security of homomorphically

performing machine learning (ML) inference, basically an artificial intelligence (AI) algorithm, on data

in a private-preserving way on a problem stemming from the AI research literature itself, namely LWE.

19


3.2 Complexity Theory

The evidence in favor of P 6= NP and its algebraic counterpart is so overwhelming, and the

consequences of their failure are so grotesque, that their status may perhaps be compared

to that of physical laws rather than that of ordinary mathematical conjectures.—Volker

Strassen (1986)

In this section, we informally recap the main concepts of complexity theory, as part of theoretical

computer science, underpinning all cryptologic research. It is a short introduction to an extend such

that it agrees on vocabulary to describe appropriate problems later-on in this work.

A key concept in any cryptologic discourse are algorithms—effective solution-finding specifications,

attributed to the ancient, middle-eastern mathematician al-Khwarizmi. An algorithm A runs in polyno-

mial time T (n), functionally dependent on the parameter n ∈ N, if T can be represented as polynomial,

i. e. ∃I ⊆ N : T (n) =∑i≤I tin

i with coefficients ti ∈ R. Polynomially dependence is commonly de-

noted in Landau’s big-O notation (cf. Section 1.2) as poly(n) = nO(1), without further specifying the

coefficients or degree, as often in cryptology asymptotic, qualitative statements are sufficient.

We now define two important complexity classes:

Definition 5 (Polynomial Time (P) and Exponential Time (EXP) Complexity Classes). spacespace

Let f : 0, 1∗ → 0, 1 be a problem instance, formally defined as a binary string.

• f ∈ P if there exists a NAND circuit C, which on inputs x ∈ 0, 1∗, outputs C(x) = f(x) in

poly(|x|) steps, i. e. less than p(|x|) steps for some polynomial p : N→ R in the input length.

• f ∈ EXP if there exists a NAND circuit C, which on inputs x ∈ 0, 1∗, outputs C(x) = f(x) in

2poly(|x|) steps, i. e. less than 2p(|x|) steps for some polynomial p : N→ R in the input length.

We typically mean a function is in F ∈ EXP\P if we speak of exponential -time functions.

Intuitively, P can be described as the class of problems whose difficulty grows moderately if we

increase the input size, i. e. the demand for time and memory resources to solve a given problem

instance of size n+ 1 is not too large compared to a size n problem instance.

A problem, where there exists a probabilistic polynomial-time ppt algorithm A solving it with high

probability is used as a synonym for a tractable, or feasible problem and A is referred to as efficient, or

fast. We introduce another complexity class, for which it is not known if it in fact is different from P:

Definition 6 (Bounded Probability Polynomial Time (BPP)). Let f : 0, 1∗ → 0, 1. f ∈ BPP :⇔∃d ∈ N and a probabilistic algorithm A such that

Pr[A(x) = f(x)] ≥ 23

at most p(|x|) = |x|d steps, polynomially in the input length |x| for p : N→ R.

We use similar notations in the quantum setting, where A is permitted non-classical computations.

Given a randomized algorithm A, we denote the action of running A on input(s) (x1, x2, . . .) of a

problem instance parametrized by n with uniform random coins R and assigning the output(s) to

(y1, y2, . . .) by (y1, y2, . . .)←$A(1n, x1, x2, . . . ;R) if we require absolute explicitness. By negl = negl(n)

we denote the set of negligible functions in n, i. e. all functions that can be upper-bounded by 1p (n), for

some polynomial p(n).

Definition 7 (Polynomial Problem Reductions). Let f, g : 0, 1∗ → 0, 1∗ be two problem instances.

f reduces to g, i. e. f ≤p g, if for every for every input x ∈ 0, 1∗, there exists an algorithm A :

0, 1∗ → 0, 1∗ that transforms the instance with

f(x) = g(A(x)).

20


If f ≤p g and g ≤p f , then f and g are equivalent, i. e. they are in the same complexity class.

Definition 8 (Non-Deterministic Polynomial Time NP). Let f : 0, 1∗ → 0, 1. f ∈ NP :⇔ ∃d ∈ Nand a verification function v : 0, 1∗ → 0, 1 ∈ P with

f(x) = 1⇔ ∃x′ ∈ 0, 1nd : v(xx′) = 1.

We remark that the verification certificate x′ exists, and is polynomial in n = |x|, if f(x) = 1,

whereas if f(x) = 0 then ∀x′ ∈ 0, 1nd , v(xx′) = 0 holds.

Definition 9 (NP-hardness and NP-completeness). Let g : 0, 1∗ → 0, 1. g is NP hard if we can

reduce f ≤p g,∀f ∈ NP. g is NP-complete :⇔ g is NP-hard and g ∈ NP.

We remark that we extend Definition 9 to search-problems, whose associated decisional-problems

fulfill the definition, too. General applicable algorithms to solve NP-complete are guessing a syntac-

tically correct input and brute-force testing, which we use synonymously for exhaustively searching

solutions.

The inclusion P ⊆ NP holds, but the question if the statement NP ⊆ P (and thus P = NP) is also

true, is one of the millennium problems listed in 2000 by the Clay Mathematics Institute in Cambridge.

A solution to this question is worth 1 000 000 US$, which is an additional motivation for mathematicians

and computer scientists. Several cryptographic assumptions are, for performance reasons, based on

problems believed to lie between P and NP-complete problems. Problems like factoring integers,

discrete logarithm computation in large prime order subgroups of Z∗p, or the approximate version

of finding the closest vector in lattices are believed to be NP-intermediate [Lad75], i. e. in (NP-

complete)\P, if their intersection is non-empty.

Figure 3.2 depicts a possible relations of a few complexity classes considered in this thesis.

P ?= BPP

NP-hard

NP-complete

co-NP-

hard

EXP

Fig. 3.2: Possible relations of complexity classes, and areas of feasible (blue) to hard (red) instances.

In cryptology, we often denote the security parameter by n ∈ N and we assume it is passed to all

algorithms implicitly in the unary representation 1n unless stated differently, throughout the thesis we

regard an algorithm modeled by a randomized, stateless Turing machine.

In Section 3.1 we study problems of specific interest in cryptography, based on the assumption:

Assumption 10 (P 6= NP Conjecture). There exists a computational problem P ∈ NP-intermediate.

21


3.3 Boolean Gates, Circuits and Functional Completeness

There is no book so bad ... that it does not have something good in it.—Don Quixote (1604)

The binary NAND function, ∧ : 0, 12 7→ 0, 1, ubiquitously used in digital electronics, is defined by

Equation (3.2), or equivalently, can be computed using NAND(a, b) = NOT (AND(a, b)), as charac-

terized by Theorem 11.

NAND(a, b) =

0, a = b = 1,

1, else.(3.2)

Its logic gate symbol is denoted as a ∧ b = a∧b, or a|b, the Sheffer stroke, cf. Theorem 13. Conversely,

NOT , AND, resp. OR can be computed by only NAND function compositions:

Theorem 11 (NAND can compute NOT, AND as well as OR gates.). Unary NOT , binary AND,

resp. OR can be expressed as a composition of binary NAND functions arranged as Boolean circuit.

Proof. Let a, b ∈ 0, 1, then AND(a, a) = a holds, hence NAND(a, a) = NOT (AND(a, a)) =

NOT (a) ∈ 0, 1. As AND(a, b) = NOT (NOT (AND(a, b))) = NOT (NAND(a, b)) ∈ 0, 1 holds,

NAND alone is sufficient to compute AND. By De Morgan’s law we combine AND with NOT to

compute OR writing: OR(a, b) = NOT (AND(NOT (a), NOT (b))) ∈ 0, 1.

One could implement whole programs, even an operating system, via a huge composition of NAND

logic gates, yet with primitives such as additions, multiplications, and comparisons the circuits quickly

become rather unwieldy. For instance, a binary neural network (BNN), introduced in Section 9.1.1,

can be thought of as a Boolean circuit that uses threshold gates (Equation (9.1)) instead of NAND as

its basic component, see Section 3.3. We state the following theorems without a proof:

Theorem 12 (Addition / Multiplication of two n-bit numbers using only NANDs , [She13]). Let n ∈ Nand denote x1, x2 ∈ 0, 1n the binary representation of two n-bit numbers.

• ADDn : 0, 12n → 0, 1n+1 computes the sum x1 + x2. The number of NANDs in the ADDn

circuit is polynomial in n, i. e. a small linear amount, say 100n.

• MULTn : 0, 12n → 0, 12n the product x1 · x2. The number of NANDs in the MULTn circuit

is polynomial in n, i. e. a small quadratic amount, say 1000n2.

That NAND is universal is of importance for first implementations of fully homomorphic schemes.

Theorem 13 (NAND is universal, [She13]). Let n,m ∈ N and a Boolean mapping of n-bits to m-bits

f : 0, 1n → 0, 1m. There exists a NAND-circuit computing f in O(m2n) steps.

We will see later how early FHE schemes achieve universality by implementing the bare minimum—

evaluating a particular function deemed necessary to work with encrypted inputs, and an subsequent

NAND-gate for further compositions.

22

Chapter 4

Cryptology

4.1 Threat Model

They own your every secret, your life is in their files.

The grains of your every waking second sifted through and scrutinized.

They know your every right. They know your every wrong.

Each put in their due compartment - sins where sins belong.

They know you. They see all. They know all indiscretions.

Compiler of your dreams, your indignations.

Following your every single move.

They know you.— Tomas Haake (The Demon’s Name Is Surveillance, 2012)

A threat model defines the scenario and resources of the most potent adversary trying to break a system

or a security design. In cryptology, in stark contrast to security through obscurity, Shannon’s 1949

maxim that the enemy will immediately gain full familiarity with a system, is held high [Sha49]. It is a

reformulation of a much older desideratum, Kerckhoffs established in 1883, in a military tone [Ker83],

the terminology indicating the origin of early cryptologic research and its applications.

Definition 14 (Kerckhoffs’ Desideratum). The security of a cryptosystem should not rely on its secrecy

and can be stolen by the enemy without causing trouble. Only the (small) key shall be kept secret.

Obviously, it is debatable to actively try to keep a wide-spread, or even publicly-scrutinized, cryp-

tographic algorithm secret, which might eventually be efficiently recovered from an implementation

by reverse-engineering techniques. It is essential to understand that design-choices are not necessarily

confidential, but solely the secret key material that security ultimately reduces to needs to be treated

carefully over its life-time. The common scenario, assuming no knowledge of the key, yet full knowledge

about the cryptosystem, as counter-intuitive as it might seem to someone who did not deeply study

the science of secrets – cryptology, is known as the secret-key model. What history taught is that

cryptographic systems, not adhering to Kerckhoffs’ principle, were broken sooner or later. Systems not

coming with a detailed, comprehensive specification along with a security assessment are deemed as

providing merely security-by-obscurity.

4.2 Security Definitions

We mainly follow the notation of [KL14] and May’s lecture notes when the concepts one-way function,

trapdoor function, and the notion of provable security are briefly discussed in this section. Although

cryptographers still look for a rigorous proof, it is conjectured that one way functions exist, imply-

ing many useful cryptographic primitives, pseudorandom generators, functions or secure private key

encryptions [HILL99]:

23

CHAPTER 4. CRYPTOLOGY

Definition 15 (One-Way Function). A function f : 0, 1∗ → 0, 1∗ is called one way, if:

1. y = f(x) is easy to compute, i. e. f is efficiently computable by a ppt algorithm A.

2. Conversely, given y = f(x), for any adversary A, the probability taken over inputs x←R 0, 1nto invert such that A(f(x)) = y is negligible, i. e. formally Pr [f(A(f(x))) = f(x)] < negl(n).

Ideally for cryptographic tasks, there is no ppt algorithm for finding the pre-image x of y under f .

Although there is an explicit function which has been demonstrated to be one-way if and only if

one-way functions exist at all, still the mere existence of one-way functions is not known.

Definition 16 (Trapdoor Function). A triplet (S, f., f−1. ) of algorithms gives a trapdoor function, if:

• (s, t)← S(1n) is a pair, drawing a random problem instance, and specifying the trapdoor.

• fs : 0, 1n → 0, 1n is a one-way function without knowledge of t, at security level λ(n),

• fs is hard to invert as formalized in Definition 15, Item 2, but provided additional, secret infor-

mation t (the trapdoor), then there is a ppt algorithm (see Section 3.2) to efficiently compute

the inverse x := f−1(s,t)(y) for any y.

Typically this definition is extended to allowing randomized ppt algorithms computing the inverse

x of y under f , yet only with sufficiently small probability, i. e. some negligible function negl(n).

Definition 17 (Attack of a Cryptosystem). An attack A is an algorithm with defined input and

output, interacting with a cryptosystem. A is comparable, through the underlying algorithm’s complexity,

which can be measured in time (number of atomic computation steps), memory (the overall resources

intermediate results take that need to be kept in storage), and data (associated resources that need to be

available, e. g., number of ciphertexts or known plaintext-ciphertext pairs.).

To derive security statements within a threat model, an attacker A acts with (in-)finite resources.

Definition 18 (Security of a Cryptosystem (Informal)). Assuming the feasibility of the best known

attacker A, requiring lowest possible resources as in Definition 17, as a break of a cryptosystem, the

systems’ security is based on a trapdoor function, as in Definition 16, which defines the link of A’s

run-time to the desired security level, say λ(n) for a theoretical, or practical attack, measured in bits.

Definition 19 (Adversary Scenarios (Overview)). Given an encryption resp. decryption function Encpkresp. Decsk with secret key sk and public key pk (we call it symmetric, if sk = pk = k).

1. A Ciphertext-Only Adversary (COA) A has a set of ciphertexts ci = Encpk(pi)i and wants pi.

2. A Known-plaintext adversary (KPA) A has a set of pairs (p1,Encpk(p1)), . . . , (pn,Encpk(pn)).

3. A Chosen-plaintext adversary (CPA) A is given a set of plaintexts p1, p2, . . . , pn of the adver-

sary’s choice. They obtain Encpk(p1),Encpk(p2), . . . ,Encpk(pn) through oracle access.

4. A Chosen-ciphertext adversary (CCA) A is given a set of ciphertexts c1, c2, . . . , cn of the

adversary’s choice. They obtain Decsk(c1),Decsk(c2), . . . ,Decsk(cn) through oracle access.

For the latter two, there are adaptive variants when the adversary interactively chooses oracle

queries as the attack progresses. This applies to CPA and CCA which sometimes even further refines

when the oracle is accessed into two stages (CCA1 and CCA2). What they all try to capture is that

no additional information is leaked by a ciphertext than, obviously, its length counted in bits.

The passive ciphertext-only adversary model is the weakest form of an honest-but-curious adversary,

who tries to extract information merely by eavesdropping on messages on the channel. The active, and

24


arguably practically more relevant attacks, combine passive capabilities and try to reveal secrets, not

intended for their pair of eyes, by malicious interaction with the cryptographic communication protocol

or the implementation of the cryptographic system.

Typically this interaction is modeled by a security game, where an adversary of a cryptosystem is

faced, e. g., with the task given a ciphertext and two different plaintexts to determine which one of them

was encrypted to yield the given ciphertext. The two plaintexts are indistinguishable for the adversary

in the strongest sense, sometimes denoted as IND-CCA2-secure, if they can do no better than guessing

which one of the two. We remark that at this point, attackers with unlimited computational power are

allowed, and later we will consider an important metric – run-times of algorithms.

Definition 20 (IND-CCA2 game). The two players in the CCA2 game protocol are a 1n cryptosystem

instance – the challenger – and an attacking party with following adversarial capabilities:

1. An adversary A gets as input a properly generated encryption key pk ∈ 0, 1n from the challenger.

A makes, at most polynomially many, queries T1 = poly(n) to the cryptosystem instance and

performs intermediate computations T ′1 = poly(n) before signifying the end of pre-processing stage.

2. Then A sends an (ordered) message pair m0,m1 ∈ 0, 1n, then receives a challenge c ∈ 0, 1`(n)

from the challenger, and is asked to identify the index bit b′ ←R 0, 1, for which c decrypts to mb′ .

We are assured that c is properly constructed, i. e., c = Encpk(mb), for b←R 0, 1, m←R 0, 1n.

3. The adversary A can make additional queries T2, T′2 = poly(n) before it outputs b.

4. The adversary wins the game if the returned b = b′ match. If A is able to do that, with more than

a negligible quantity in the security parameter, better than mere guessing, the system is insecure.

IND-CCA2-security is classically the strongest security definition a public-key encryption system

can fulfill. In fact FHE cryptosystems, which we study in the next chapters, cannot satisfy that

property. The circular security requirement, cf. Assumption 48, of nowadays FHE cryptosystems, cf.

Definition 44, i. e. to publish the bootstrapping key bk (see Theorem 46) violates even the CCA1 security

definition, as querying the decryption oracle with the bootstrapping key reveals the secret key to the

attacker – resulting in a total break.

Firstly, since FHE schemes do not provide full chosen-ciphertext security we settle for a weaker,

chosen-plaintext-like solution for this setting. We will thus present the IND-CPA game required in

Part II, sorted with ascending power refining what an adversary knows, and model the capabilities to

interact with a cryptosystem:

Definition 21 (IND-CPA game). The two players in the CPA game protocol are a 1n cryptosystem

instance – the challenger – and an attacking party with following adversarial capabilities:

1. An adversary A gets as input a properly generated encryption key pk ∈ 0, 1n from the challenger.

2. Then A provides the (ordered) pair of messages (m0,m1),m0,m1 ∈ 0, 1n, then receives a chal-

lenge c ∈ 0, 1`(n), and is asked to identify the index bit b′ ←R 0, 1 s. t. for which c decrypts

to mb′ . (The game assures that c is properly constructed by the challenger, i. e., c = Encpk(mb),

for b←R 0, 1,m←R 0, 1n.)

3. The adversary wins the game if the returned b = b′ match. If A is able to do that, with more than

a negligible quantity in the security parameter, better than mere guessing, the system is insecure.

Definition 22 (Indistinguishability of Ciphertexts under CPA). An encryption scheme (Gen,Enc,Dec)

works with ciphertexts indistinguishable under CPA, if the advantage of a ppt adversary A winning

the game in Definition 21 is at most negligible over guessing the index bit b:

AdvIND-CPAA (1n) :=

∣∣∣∣12 − Pr[A(1n)]

∣∣∣∣ ≤ negl(n).

25


4.3 Sources of Entropy

Simple cryptographic attacks reveal that deterministic encryption inherently allows to construct a

codebook translating message to fixed ciphertext or codeword, when observed. Using randomized

encryption on the other hand, is a paradigm which urges to produce a plethora of different ciphertexts

for any fixed message that decodes to that message (with overwhelming probability) instead.

We will generally assume all encryption and decryption algorithms to be randomized. Hence,

generated (pseudo-) randomness is a must to be unpredictable, yet it requires seeds—bits of initial

entropy. These seeds often originate from the computer or platform the cryptographic system runs

itself. Possible sources are different sensors that measure the environment and feed into the entropy

pool. Sources of entropy for the required cryptographic randomness, with various degrees of reliability,

include:

• time stamp when booting a device, Cloud server or starting a program or process,

• content of initially uninitialized memory of a platform,

• unrelated process IDs currently running on the host system,

• hardware input, e. g. from hard drive and network adapter timings, and

• auxiliary sensors’ measurements, e. g. temperature, of from cameras and microphones.

4.4 NIST’s Post-Quantum Cryptography Standardization

A few years back effectively all used public-key cryptography was threatened by an emerging technology,

quantum computers, which can be mitigated by what is known as Post-Quantum Cryptography.

Unfortunately, Post-Quantum Cryptography repeatedly needs to be contrasted with the quite dif-

ferently natured quantum cryptography, which aims to provide provable security in an information-

theoretical sense assuming only the laws of physics. As promising as this line of research seems, all

participating end-point devices and network links need to be capable of handling sending and receiving

messages encoded as quantum states, requiring a optical fiber infrastructures rather than conventional,

wired networks. International standardization efforts [Nat16] were summarized in a Call for Post-

Quantum Cryptography by the US-American National Institute of Standards and Technology (NIST).

The purpose of the competition is to re-establish confidence in modern cryptographic primitives among

the 69 accepted submissions, listed on NIST’s website, late 2017. Since then particularly those schemes

are scrutinized by cryptanalysis community, re-evaluating existing schemes and comparing them with

novel proposals. In February 2019 the competition’s mailing list displays more than 700 comments,

describing complete breaks of 13 proposals and discuss related security implications.

Even though the standardization process was designed to be a fair comparison and a combined

international effort, the US government shutdown of 2018/2019 delayed the competition, Eventually the

announcement of 26 short-listed schemes, 17 public-key encryption and key-establishment candidates,

and 9 digital signatures, yet to be more closely evaluated in the second round, were published. A

portfolio of final recommendations by NIST is awaited since by followers of the discussion 1.

1https://groups.google.com/a/list.nist.gov/forum/#!topic/pqc-forum/bBxcfFFUsxE

26

https://groups.google.com/a/list.nist.gov/forum/#!topic/pqc-forum/bBxcfFFUsxE

Chapter 5

Quantum Computing

To read our E-mail, how mean

of the spies and their quantum machine;

Be comforted though,

they do not yet know

how to factorize

twelve or fifteen1. —Volker Strassen (1998)

Since the invention of digital electronic devices for computing tasks [Wyn31] improvements were

largely achieved due to miniaturization of the computer’s components. Naturally a considerable math-

ematical effort and the theoretic evolution of computer sciences predate these engineering feats. Al-

gorithmic advancements made possible scenarios unthinkable before. In recent years computing per-

formance gains are mainly accomplished by deploying parallelism as physical boundaries are almost

hit. In [Wil11, Ch. 1.1] Williams extrapolates the trend in miniaturization and claims that the sonic

barrier, one atom per bit storage, will be reached around the year 2020. Soon, it might not be possible

to further push development by fine tuning computer chips because of these physical limitations and

the implied quantum effects appearing at this scale.

Classical physics is a model explaining macroscopic observations well, it is seemingly not appropriate

to describe small scale phenomena, which is the domain of quantum physics. New computer architec-

tures use quantum effects in a new, beneficial way to carry out computations rather than compensate

the derogatory implications to classical, digital electronic computers.

5.1 Quantum Bits

The fundamental difference of classical information and quantum information is their basic unit, the

quantum bit (or qubit) is not restricted to merely two states, say 0 and 1, like the classical bit, but

is a vector of length 1 in a 2-dimensional complex vector space. The theoretical model of a qubit is a

generalization of bits, based on systems with two distinguishable states that do not change uncontrol-

lably. In order to read the value of a qubit, the quantum system needs to be measured, an irreversible

step according to the laws of quantum mechanics. Therein lies the main difference of the macroscopic

storage and manipulation of classical bits and the microscopic implementation of a qubit system.

Let |0〉 and |1〉 denote the two basis states of a fixed 2-value quantum system in so-called Bra-ket

or Dirac notation. The laws of quantum mechanics tells us that a qubit can be in more than just one

of the two basis states |0〉 and |1〉, but it can be simultaneously in each of them, yet with a certain

probability only. Upon measurement a qubit is collapses to precisely one of those two distinct states.

1In fact, 15 was long setting the record for numbers factored using Shor’s algorithm. Today it is 21 = 7 · 3 [MLL+12].

27

CHAPTER 5. QUANTUM COMPUTING

Formally, a qubit can be in a superposition of the basis states, say

|ψ〉 = a0 |0〉+ a1 |1〉, |a0|2 + |a1|2 = 1

with complex coefficients a0, a1 ∈ C. Measurement of the qubit |ψ〉 can not recover the coefficients

a0, a1, yet the state |0〉 with probability |a0|2 and hence |1〉 with probability |a1|2 = 1− |a0|2. Mathe-

matically, the canonical or computational basis is|0〉 =

(10

), |1〉 =

(01

)extendable to n-qubit quantum

systems.

Let |0〉, |1〉, . . . , |2n − 1〉 denote the basis states of a n-qubit quantum system. A state can be

represented by a vector in C2 × C2 × · · · × C2︸︷︷︸n times

= CN , with N = 2n.

5.2 Quantum Computer and Quantum Algorithms

Physical realization of a universal quantum computer seems hard, yet first designs exist in a lab environ-

ments, resulting from massive research efforts. A practical, universally functional quantum computer

might exist in the not so distant future.

We briefly introduce two particularly noteworthy quantum algorithms: Grover’s search algorithm

with an application to solving subset-product problem in Part III, and Shor’s integer factorization

algorithm which is a justification to move from the RSA cryptosystem, as discussed in Part I towards

lattice-based cryptosystems in Part II.

5.2.1 Grover’s Algorithm

Grover’s quantum searching technique is like cooking a souffle. You put the state obtained

by quantum parallelism in a “quantum oven” and let the desired answer rise slowly. Success

is almost guaranteed if you open the oven at just the right time. But the souffle is very

likely to fall – the amplitude of the correct answer drops to zero – if you open the oven too

early.—Kristen Fuchs (1997)

The significance of Grover’s algorithm for this thesis lies in its applicability to the NP-complete SSP.

Any NP-complete search problem instance of N = 2n elements can be reformulated to find one par-

ticular input x = w of a function f , given as an oracle f : 0, 1N → 0, 1 with f(w) := 1 and

f(x) := 0, x 6= w otherwise for a search space of a priori unknown structure. In [BBBV97] general

unstructured search problems are studied from a computational complexity theoretic viewpoint and

the lower bound, O(√N) run-time and O(logN) storage, achievable by a quantum computer is proved.

It is assumed that to call the oracle encoding f takes polynomial time, i. e. run-time O(logkN), for

some k ∈ N. Grover’s algorithm hence yields a quadratic speed-up in the generic case over classical

exhaustive-search techniques with O(N) run-time and O(1) storage. The primordial application of

Grover’s algorithm to cryptography is simply searching for the secret key of a cryptosystem protecting

sensitive data of interest. For symmetric primitives, this attack essentially doubles the secret’s length

in order to remain at the same level of security after a sufficiently powerful quantum computer archi-

tecture is ready. For asymmetric cryptography quantum algorithms can mean a total break, as in the

following Section 5.2.2, or provide generic quadratic speed-ups as explored in Section 11.5.

5.2.2 Shor’s Algorithm

In this section we proceed with introducing one of the most famous quantum algorithms to date. In

1994, the first algorithm necessitating a quantum computer, consisting in a classical and a quantum

28

CHAPTER 5. QUANTUM COMPUTING

part, of tremendous significance to cryptology was formulated, and subsequently published in enhanced

form [Sho99].

Let N = pq be a non-negative composite integer, or bi-prime with primes p, q. A quantum computer

with a quantum register of size logQ ∈ O(logN), can deploy Shor’s algorithm to efficiently factorize

the modulus N , hence break the RSA cryptosystem, in O((logN)3) quantum gate operations.The

best known classical algorithms to solve the factorization problem, Coppersmith’s modifications to the

number field sieve [Cop93], have run-time sub-exponential in N , or super-polynomial in n := logN ,

compared to Shor’s quantum algorithm, we have:

O(e(C+o(1))n

13 (logn)

23

)Shor−→ O(n3).

In 2012, the record for the largest number factored using Shor’s algorithm on a quantum computer

was N := 21 = 3 · 7, which is still the largest reported success. Clearly, this is not actually threatening

deployed RSA or CRT-RSA cryptosystems with keys of typically several thousand bits length.

5.2.3 Provable Security and the One-Time Pad

Already in 1949, Shannon [Sha49] pioneered the field of information theory and defined the term

perfect secrecy. In conclusion, perfect secrecy of a message m ∈ 0, 1` can only be achieved using a

key k ∈ 0, 1L of (at least) equal length ` ≤ L ∈ N. In the provably-secure private-key cryptosystem

known as one-time pad (OTP), where each bit mi of the message m is added mod 2 (or XORed) with

the key bit ki of k assumed truly random and for one-time use only. It satisfies perfect secrecy in the

information theoretic sense, yet it is hard to reach in practice as the generation of random bits (cf.

Section 4.3), the key-distribution already requires a secure channel between two communication parties,

and a flawless implementation of the cryptosystem is seemingly not so straight-forward surprisingly.

29

Chapter 6

Homomorphic encryption (HE)

6.1 Definitions and Examples of Homomorphic encryption (HE)

6.1.1 The RSA Cryptosystem and the Factorization Problem

The famous RSA public key cryptosystem — named after Rivest, Shamir and Adleman — is renowned

in a broader community, and based on the easily explained integer factorization problem: Decomposing

an integer N = pq, the product of two big unknown prime numbers p, q is hard, whereas multiplying

them is easy.

Definition 23 (RSA Cryptosystem). Let n denote a security parameter.

RSA.Gen Generates the public key pk = (N, e) and the private key part sk = d, where the following

relations hold:

– N = p · q with prime numbers p, q ∈ P ⊆ N of equal length such that |N | = n,

– e ∈ N co-prime to ϕ(N) := (p− 1)(q − 1), i. e., gcd(e, ϕ(N)) = 1,

– d ∈ N such that e · d = 1 mod ϕ(N), i. e., d := e−1 mod ϕ(N) s. t. m = me·d

mod N .

RSA.Enc Encryption computes c := me mod N .

RSA.Dec Decryption using d: m = cd = me·d mod N .

We remark that although we present the version of the RSA cryptosystem with ϕ, the function

ϕ(N) := (p− 1)(q − 1) can be replaced by the least common multiple instead of the product λ(N) :=

lcm(p− 1, q − 1) ≤ ϕ(N) to save computational effort. The public key requires a non-negative integer

e ≥ 2, gcd(e, ϕ(N)) = 1 together with the modulus N . The private key is the factorization of N that

is given by the prime numbers p, q, and the multiplicative inverse of e, which is a positive integer d,

such that ed ≡ 1 mod ϕ(N) holds. Using the factors p and q it is possible to calculate d with ease.

Given a plain text encoded as a number m with 2 ≤ m ≤ N−2, the encrypted plain text is obtained

by computing c := me mod N . Receiving 1 ≤ c ≤ N − 1, the owner of the private key can calculate

cd mod N , which of course equals m = med = (me)d mod N the original message.

Although it is an interesting topic — studying the secure choices for p, q, e themselves or the algo-

rithms, for computing the occurring modular powers efficiently, for instance — we refer to the huge

amount of literature on this important branch of cryptography [Bon99]. In Chapter 5 however, we

will see the weakness of RSA under the assumption of a powerful quantum computer, the speedup

that can be achieved compared to the best classical algorithms which ultimately justifies the move to

cryptosystems based on quantum-safe assumptions.

31

CHAPTER 6. HOMOMORPHIC ENCRYPTION (HE)

6.1.2 Paillier Cryptosystem

Paillier’s public-key encryption scheme [Pai99] is a triplet of algorithms, and its CPA security is based

on the composite residuosity assumption—given an element, does its N -th root exist modulo N2, for

composite N? Using the following definitions we demonstrate how we identify the malleability property

of the system as homomorphism between the groups: (ZN ,+)→ (ZN2 , ·).

Definition 24 (Paillier Cryptosystem). Let n denote a security parameter.

Paillier.Gen Generates the public key pk = (N, g) and the private key part sk = (λ, µ), where the

following relations hold:

– N = p · q with prime numbers p, q ∈ P ⊆ N of equal length n,

– g ∈ Z∗N2 co-prime to N , for instance, g = N + 1,

– (λ, µ) such that λ = ϕ(N), µ = ϕ(N)−1 mod N .

Paillier.Enc Encryption computes c := gm · rNmodN2, for random co-prime r.

Paillier.Dec Decryption using (λ, µ): m := (cλ mod N2)−1N · µ mod N .

For messages 0 ≤ m2 ≤ m1 <N2 , we have

m1 ±m2 mod N = Decsk(Encpk(m1) · Encpk(m2)±1 mod N2) ∈ ZN .

In case of the difference, the modular inverse Encpk(m2)−1 ∈ Z∗N2 is efficiently computed using the

extended Euclidean algorithm.

We remark that the cryptosystem can easily be generalized for the case when 0 ≤ m1,m2 < N , e. g.

with unknown sign of the difference m1−m2, or even generalized to the Damgard-Jurik cryptosystem.

For the purpose of introducing homomorphic encryption we omit an overly detailed discussion. Given

two admissible plaintexts we first compute the ciphertexts under the secret key, c0 := Encpk(m0), c1 :=

Encpk(m1) ∈ ZN2 . We then verify that an encryption of the addition m0+m1 mod N can be computed

without the public key, simply by computing the product Encpk(m0) · Encpk(m1) ∈ ZN2 :

c0 · c1 = (gm0 · r0N ) · (gm1 · r1

N ) mod N2

= gm0+m1 · (r0 · r1)N

mod N2

= Encpk(m0 +m1) ∈ ZN2 .

To obtain homomorphisms in two operations, a more complicated task, will be developed as follows.

32

Part II

Fully Homomorphic Encryption

(FHE) & Artificial Intelligence (AI)

33

Chapter 7

Cloud Computing

7.1 Cloud Computing: Promises, NSA, Chances, Markets

Thanks to documents, classified by the US-American National Security Agency (NSA), made public in a

perilous endeavour (or leaked) by Edward Snowden in 2013, light was shed on how privately stored data

at Microsofts SkyDrive cloud service are giving in to pressure of nation-state adversaries. Other exam-

ples, when sharing data unprotected by cryptography, formerly meant for internal use only, were passed

on at scale, is NSA’s top-secret Prism program [GMP+13], whose reaction is sketched in Figure 7.1.

Fig. 7.1: NSA performing a‘Happy Dance!!’ [sic] when access-ing private data circumventing en-cryption.

Regulatory frameworks for asymmetric information monopolies

are being re-thought [And01, And14] and ideally cast into legisla-

tion. Preceding the European General Data Protection Regulation

(GDPR) now serving as a paragon internationally, a data protec-

tion regulation time-line can be reconstructed. Originating in 1970,

namely the Hessian Data Protection Regulation [Hes], is arguably

the first privacy-protection law with respect to the digital sphere

globally. In 1986 an overhauled second version, regulating data pro-

cessing by public authorities in Germany, was extended such that it

is applicable to companies operating on the free-market. It served

as blue-print for the European Data Protection Directive (Directive

95/46/EC), a law for natural person’s data rights in 1995. Adopted

in 2016, this legislation was superseded by the GDPR, which is en-

forceable European-wide since mid 2018, an important step to the

recognition of the digital right to privacy for everybody. Companies

and organizations not adhering to the Privacy by Design-concept

can be fined up to 4 % of their annual global turnover for breach-

ing GDPR, proportional to the severity of infringements. Data processing entities in the Cloud are

compelled to take the security and privacy issues and cryptographic implementations seriously as cases

of negligence might be prosecuted. The Cloud hardware, unlike the Internet of Things hand-held

devices, is off-site, anywhere on the globe, but under control of services providers, who potentially

modify content, eavesdrop on communication and computations, or even maliciously tamper with data.

Despite potential threats stemming from this power of large data companies such as Amazon to date,

the Cloud business is still a market with rising potential. Abstracted interfaces make it convenient

to use, with early, wide-spread e-Mail providers, and a plethora of examples of this technologies being

sketched in Figure 7.2. Several security requirements arise in the scenario of delegated data storage

and data-processing. Sharing data confidentially over the Internet requires an enforced access control

policy with customized definitions at upload which users can retrieve which data.

Data stored in the decentralized Cloud preferably remains illegible to anyone not explicitly autho-

rized, both at rest and during subsequent computations. Even though files that were encrypted under

35

https://www.eugdpr.org

CHAPTER 7. CLOUD COMPUTING

Fig. 7.2: Overview of today’s ubiquitous cloud computing services [Joh09]

common symmetric or hybrid encryption schemes before upload ensure confidentiality as the secret keys

stays in a user’s domain, more sophisticated actions and computations other than merely accessing the

data are not readily possible. A laborious, undesirable solution requires to download the full data set

then decrypt it using the secret key, compute the desired function, re-encrypt and re-upload the whole

lot. The utility of out-sourced data is limited unless more powerful cryptographic primitives are de-

ployed. In a scenario where far reaching decisions depend on inferences of aggregated data, verification

meaning issuing cryptographic proofs for the correctness, is of great importance and prevents service

providers to lazily return an arbitrary value, or maliciously alter the actual result, serving own interests.

Finally, privacy-preserving computation protects anything deducible from a computations output about

the input beyond trivial or public observations. It concerns hiding data access patterns of queries and

the overall ratio of useful data transmitted between the user and the Cloud.

In the following chapter, we describe the theoretic practical constructions leading to Fully Homo-

morphic Encryption (FHE) schemes, and explain why it is a tool of extraordinary relevance to Cloud

computing. Applying FHE to practical scenarios, such as evaluating a neural network on encrypted

inputs, is a seemingly even more challenging task. Less of a client-side technology due to high com-

putation and data-storage requirements, Machine Learning as a Service (MLaaS) promises versatile

utility in the Cloud setting. Moreover, the process of time-consuming and resource-heavy processes

such as training a cognitive model on large databases can be delegated. Companies often consider their

fine-tuned prediction algorithms, e. g. condensed into a trained neural network, as intellectual property

which they are unwilling to share for competitive reasons. Proprietary tools are typically Cloud-based,

with an interface to most conveniently access the machine learning algorithms, built from medical, ge-

nomic, financial or other sensitive data. Neural Networks are trained on observation/label pairs to

solve classification problems by abstracting observations into categories, assigning the most-likely class

to new samples. A class recovered from the input data potentially amounts to recognize a particular

medical condition, such as a diagnosis based on a database of medical observations.

36


7.2 Hardware Solution: Secure Computing Enclaves

“Arguing that you don’t care about the right to privacy because you have nothing to hide

is no different than saying you don’t care about free speech because you have nothing to

say.”—Edward Snowden (2013)

For a long time it was unthinkable to have secure, private, and trustworthy computing delegated to

the Cloud. While FHE can be seen as a software enclave that does not rely on tamper-protected nor

trusted hardware, Intel’s Software Guard Extensions (SGX) promises a hardware solution. The goal is

large scale Homomorphic Encryption (HE) by offering an assumed trusted enclave within the Cloud

provider to communicate authenticated and securely. This enables delegated operations f on the data,

yet the approach necessitates private keys to reside at the Cloud, resulting in a weaker threat model.

7.3 Software Solution: FHE and FHE–DiNN

Cryptography rearranges power: It determines who can gather which data, and how such

data can be processed.—Philip Rogaway (distinguished lecture at Asiacrypt 2015)

Traditionally, except leaving the data unprotected in the first place, there were two unsatisfying solu-

tions to perform useful delegated computation, by evaluating any function f on encrypted data. Either

it was necessary to entrust the service providers with a key for on-site decryption, or deploy the inef-

ficient approach of downloading, processing, re-encrypting, and re-uploading data—else it was simply

deemed impossible.

This thesis will introduce a new and powerful FHE framework [BMMP18] that enables Cloud ser-

vices to offer privacy-preserving predictions of users’ uploaded input data mitigating issues concerning

their privacy. We showcase how to efficiently solve server-side machine learning tasks by computing

the prediction of a neural network on encrypted inputs almost reaching the models’ clear-text accu-

racy. After supervised training on plaintext data, yet unseen, but likewise distributed input data is

encrypted, evaluated, and returned as encrypted label, thus homomorphically classifying the data. Ob-

viously, the confidentiality goal is reached as the private key under which the data is encrypted resides

in the legitimate owner’s controlled domain, who alone can decrypt the result. We address delegated

computation privately performed in the Cloud-threat model, as in Figure 7.3, where an observer only

learns unsubstantial information on top of what is needed for billing the user requesting a classification.

7.3.1 Limitations

We do not consider the problem of privacy-preserving data-mining, when neural networks are trained

on encrypted data as addressed in [AS00]. We assume an already trained neural network in the clear

and focus on the evaluation, or inference phase.

A related concern of service providers is that malicious users might try to recreate the network,

deemed intellectual property, by sending maliciously-formed requests in order to extract information.

We do not explore protection for learning the neural network itself, when considering information

encoded in the weights sensitive. Statistical database in the training phase, as is discussed in the

differential privacy literature [Dwo06], can be used to mitigate such issues or enforcing an upper bound

on user requests.

37


Fully Homomorphic Encrypted (FHE)data storage and computation servicesoffered by cloud-service provider (CSP)

Malicious Attacker(inside or outsideCSPs network)

construction optimization implementation research

Ideal lattice-basedInteger-based(R)LWE-based

shorter ciphertextskey-generation speedreduced dimensions

Message encodingBatch modeSIMD

Multi-Key FHEfew bootstrappingsbetter performance

Fig. 7.3: Four main challenges for FHE-based systems deployed in the Cloud.

38

Chapter 8

Mathematical Foundations of FHE

In order to describe the theory for Fully Homomorphic Encryption (FHE) from scratch, we provide the

most important notions and the mathematical background adapted to our object of studies. We start

with a definition of the mathematical discipline of abstract algebra:

8.1 Basic Concepts from Algebra and Probability Theory

Definition 25 (Abelian Group). Let G be a finite set with a binary operation (often ′+′)

: G×G→ G

(a, b) 7→ a b.

(G, ) is an Abelian group :⇔ ∀a, b, c ∈ G:

1. a (b c) = (a b) c ∈ G, associativity;

2. a b = b a ∈ G, commutativity;

3. ∃1G ∈ G : a 1G = 1G a = a, for a neutral group element 1G;

4. ∃a−1 ∈ G : a a−1 = 1G, an inverse exists within the group.

Next, we need to define an algebraic structure with two operations.

Definition 26 (Ring). Let R be a finite set with two binary operations (often ′+, ·′):

1. (R,+) is an additive Abelian group

2. (R, ·) is a monoid, i. e. ∀a, b, c ∈ R:

• a · (b · c) = (a · b) · c ∈ R, associativity;

• ∃1R ∈ R : a · 1R = 1R · a = a, for a neutral element 1R, the multiplicative unit;

3. · is (left- and right-) distributive over +: a·(b+c) = (a·b)+(a·c) ∈ R, (b+c)·a = (b·a)+(c·a) ∈ R.

This already lets us define fundamental structures underneath FHE. Finally, we look at R-modules,

and their external operation, as we also want to study mappings between these structure.

Definition 27 (Ring Homomorphism). Let (R,+, ·), and(S,⊕,) be two rings. A function f : R→ S

is a ring homomorphism :⇔

• it preserves addition, i. e. f(a+ b) = f(a)⊕ f(b) ∈ S, ∀a, b ∈ R,

• it preserves multiplication, i. e. f(a · b) = f(a) f(b) ∈ S, ∀a, b ∈ R,

39

CHAPTER 8. MATHEMATICAL FOUNDATIONS OF FHE

X Y

x1 • • f(x1)

(R,+, ·) (S,⊕,)

x+ •f

x ∈ X ⇒ f(x) ∈ Y

• f(x+)

x∗ • • f(x)

x2 • f

x /∈ X ⇒ f(x) /∈ Y• f(x2)

Fig. 8.1: Sum x+ = x1 + x2, and product x∗ = x1 · x2 under ring homomorphism f .

• it relabels the multiplicative unit, i. e. f(1R) = 1S ∈ S.

A homomorphism f , according to Definition 27, is a structure-preserving transformation between

two algebraic structures depicted in Figure 8.1, with f(x+) = f(x1) ⊕ f(x2) ∈ S and f(x) =

f(x1) f(x2) ∈ S. Although homomorphisms might leverage cryptanalytic attacks in some contexts,

e. g. exploiting the malleability of RSA, in the following it permits crafting cryptosystems to perform

arithmetic operations on encrypted data without decryption.

Definition 28 (R-Module). Let (R,+, ·) be a commutative ring. M ⊆ R is an R-module :⇔ (M,+)

is an Abelian group and the following mapping fulfills ∀r, r1, r2 ∈ R, and ∀m,m1,m2 ∈M :

• : R×M →M

(r,m) 7→ r •m

1. 1R •m = m ∈M,

2. r1 • (r2 •m) = r1 • (r2 •m) ∈M,

3. (r1 + r2) •m = r1 •m+ r2 •m ∈M (Associativity),

4. r • (m1 +m2) = r •m1 + r •m2 ∈M (Distributivity).

In Section 9.2.2, we deal with a module defined over the one dimensional real torus, called TLWE

in the context of the implementation TFHE for basic operations of an FHE scheme. First, we need to

formally introduce another preliminary ingredient: lattices—discrete sub-groups of Rn.

Definition 29 (q-ary Lattice). We can define a q-ary lattice explicitly as a set of vectors

Lq(A) : =

y ∈ Zn : y = f(z) :=

n∑i=1

zi · bi = Az mod q, zi ∈ Z

(8.1)

= Imf (A) ⊆ Zn, (8.2)

or see it as the image of the linear map f with matrix A ∈ Zn×mq , applied to integer vectors.

40


For m > n ∈ N and a rank m ∈ N matrix, the kernel of A under this map forms a q-ary lattice, too.

The scaled dual of L := Lq(A) is hence defined by

ker f = L⊥q (A) = x ∈ Zn : Ax = 0 mod q ⊆ Zn,

and the algebraic dual of L is denoted as L∗ = y ∈ Span(L) : 〈x ,y〉 ∈ Z,x ∈ L. Its volume, a

basis-independent lattice-invariant, is the non-negative quantity vol(L(A)) := |det(A)|.

8.2 Lattice Problems for Cryptography

The survey on lattices for cryptography [Pei16], identifies five suitable problems.

Definition 30 (NP-hard lattice problems). Let L = L(A) ⊆ Rn denote a lattice, specified by integral,

linear row combinations of the basis matrix A, we specify 5 exemplary hard problems:

• Closest Vector Problem (CVP). Given target t ∈ Rn, the task is to identify a lattice vector

v ∈ L closest to the target with respect to the Euclidean norm ‖.‖2 on the normed space Rn,

v = argminv′∈L ‖t− v′‖2. Figure 8.2 demonstrates two famous algorithmic solutions when given

targets and more (resp. less) orthogonal lattice bases of good (resp. bad) quality.

• Bounded Distance Decoding (BDD) problem. In this variant of CVP, we are promised that

‖t−v‖2 < ε is upper-bounded with ε λ1(L), the first successive minimum λ1, which makes the

solution v unique as λ1(L) := minv 6=~0 ‖v‖2.

• Shortest Vector Problem (SVP). Given L = L(A) ⊆ Rn a lattice, find a shortest vector v ∈ L,

i. e. with distance exactly ‖v‖2 = λ1(L) from the origin.

• approximate Shortest Vector Problem (appSVPγ). In this approximation variant of SVP, we

want to find an approximate shortest vector v ∈ L, which satisfies the relaxed distance condition

‖v‖2 ≤ γ(n) · λ1(L) that allows adjusting hardness by e. g. setting γ(n) = poly(n) [ADRS15].

• unique SVP (uSVPγ). In this approximation variant of SVP, we are promise that a unique shortest

vector exists (up to its sign), i. e. λ2, the second successive minimum, is bounded by γ(n) · λ1.

The closer 1 ≤ γ(n), the lattice gap, is to 1, the harder it generally is to find a suitable vector, as [LM09]

give hardness reductions between the important BDD, uSVPγ , and appSVPγ problem.

To formalize the two suitable lattice-based problems, from Definition 2 and Definition 3, as a basis

for cryptographic constructions, we first introduce useful probability distributions to model the noise

which we need and discuss as part of the phase, cf. Definition 53.

8.2.1 Discrete Gaussian distribution on Lattices

To most conveniently define the discrete Gaussian probability distribution, for our purposes, one does

not choose the mean, a vector v, and the standard deviation, σ, but use parameter s – the so-called

scaled standard deviation, as asymptotically lims→∞ = s/√

2π + o(s).

Then, for real s > 0, we define the Gaussian function %s(v) := exp(−π‖v‖2/s2) with width s. Using

the normalization factor %s(V) =∑

v∈V %s(v), it serves as probability density function over countable

sets V ⊂ Rn, especially when V = L ⊂ Zn is a lattice, we set

DL,s(v) =%s(v)

%s(L)=

e−π‖v‖2/s2∑

v∈L %s(v). (8.3)

41


~0b2

b1

t

H1

π(t)vB

(a) A good basis is preferable to solve the CVP ina lattice L with target point t (red) using Babai’sNearestPlane algorithm in 2 dimensions [Bab85].Here, recursively calling it intermediately projects tonto the closest 1-dimensional hyperplane H1 = b2 +Span(b1), i. e. π(t), then onto the 0-dimensional sub-space H0 ⊆ L to obtain vB (green), an integral linearcombination of lattice basis points.

~0

b1

b2

t

H(1)2

H(1)1

v(2)LP

v(1)LP

(b) A bad basis of the same lattice L might mis-lead Babai’s algorithm incorrectly identifying a closest-vector. The target t (red) in Lindner-Peikert’s al-gorithmic variant [LP11] NearestPlanes is projected

onto multiple hyperplanes instead, e. g. H(1)1 = b2 +

Span(b1), H(1)2 = 2b2 + Span(b1) to output a set of

closest points vLP (green).

Fig. 8.2: NearestPlane(s) Algorithms on good (left) vs. bad (right) bases of the lattice L.

The situation modulo an integer q ≥ 1 is defined via the projection πq : Zn → Znq . For v ∈ Znq we have

DZnq ,s(v) = DZn,s(π−1q (v)).

As sampling random lattice points following a discrete Gaussian distribution is computationally

expensive [GPV08] for arbitrary lattices (and L = Znq ), substituting the distribution is a performance

incentive. Justifying probability distribution exchange without jeopardizing security requires a measure.

Definition 31 (Statistical Distance of Probability Distributions). The statistical distance (variational

distance) between two distributions, X and Y mapping onto Z, is defined by

∆(X,Y ) =1

2

∑z∈Z|P(X = z)− P(Y = z)| := δ ∈ R.

Definition 32 (Statistical/Computational Indistinguishability). Probability ensembles Xnn∈N, Ynn∈N• are statistical indistinguishable, if n 7→ ∆(Xn, Yn) = δ(n) < negl(n),

• are computational indistinguishable, if @ D ppt-distinguisher with advantage AdvD(n) > negl(n).

We remark that statistical indistinguishability implies computational indistinguishability and introduce

an essential quantity, the smoothing parameter of a lattice L via its non-zero elements in the dual L∗.

Definition 33 (Smoothing Parameter ηε(L) [MR07]). For a rank n lattice L ⊆ Rn and real ε > 0, we

call the smallest real width s bounding the Gaussian function by ηε(L) = infs > 0 : %1/s(L∗\~0 ) ≤ ε.

Theorem 34 (Smoothness bound for rank n lattice L ⊆ Rn [MR07]). For real ε > 0, we have that

ηε(L) ≤ λn(L)√

ln(2n(1+1/ε))π , with λn the n-th successive minimum of L and the constant π.

Theorem 35 (Bound for L = Z [ZXZ18, Theorem 1]). For ε < 0.086435, ηε(Z) ≤√

ln(ε/44+2/ε)π .

Example 36. In the self-dual case of L = Z, e. g. ε = 8100 gives the smoothing parameter ηε(Z) ≤ s = 3

2 .

Example 37. For Z, e. g. ε < 2−160, Theorem 35 gives the bound ηε(Z) = ln(2(1 + 1/2−160)

)≤ s = 6.

42


If s exceeds the smoothness bound, applying Poisson’s summation formula, Regev shows [Reg09b]:

Proposition 38. Let ε > 0, s ≥ ηε(Z), then for any c ∈ Rn, the Gaussian function can be bounded by:

(1− ε)snvol(L(A)) ≤ %s(L+ c) ≤ (1 + ε)snvol(L(A)).

A sufficiently large Gaussian parameter s makes a continuous Gaussian indistinguishable from the

discrete Gaussian distributionDL,s on L, see [LP11,MP12], meaning the statistical distance is negligible.

This is justified when adapting formulas known on tail-bounds as approximation in the discrete case.

For fixed s > 0, we have that in the one-dimensional case v = x ∈ R, y →∞, for the tails fulfill:

1−y∫−y

%s(x)

sdx = 1− 1

s

y∫−y

exp(−πx

2

s2

)dx

= 1−

y∑x=−y

exp(−πx2

s2 )

∞∑x=−∞

exp(−πx2

s2 )

= e−Θ(y2

s2

).

This justifies the use of discrete Gaussian error distributions in LWE-based cryptosystems.

8.3 Learning With Errors (LWE)

Regev, Godel-prize winner of the year 2018 for his work on the Learning with Errors problem (LWE),

introduced it in 2005 [Reg05] to the field of cryptology. In this work, the LWE problem is parameterized

by three variables; the dimension n ∈ N a modulus q, and the noise parameter α < 1. The LWE

distribution is studied for moduli q = q(n) ∈ N functionally dependent on n, and the error distribution

χs(α) on Zq → R+ is a discrete Gaussian distribution as justified in Section 8.2.1 with s = αq.

Definition 39 (LWE Distribution lwen,q,s,χ). Let n, q = q(n) ∈ N, and χs be an error distribution

with s = αq. Given the secret sk = s ∈ Znq , the LWE distribution lwen,q,s,χ is a product distribution

over Znq × Zq. Generated by sampling a uniformly independently identically distributed from Znq , and

sampling e← χs, an LWE-sample is a pair of random variables (a, 〈a,s〉+ e mod q) ∈ Znq × Zq.

For m denoting the number of LWE-samples drawn from the LWE distribution, they can be arranged

as a m × n matrix A, and right-hand side vector b. [Reg05, MM11a] formally define the search-LWE

resp. decisional-LWE problem and present a search-to-decision reduction with poly(q) overhead.

Definition 40 (Search-LWE (sLWE)). An algorithm A is said to solve search-LWE, if given m inde-

pendent LWE samples, (ai, 〈a,s〉+ ei) ∈ Znq × Zq, 1 ≤ i ≤ m, drawn from lwen,q,s,χ, it holds that

Pr[AsLWE(1n) = s′ 6= s] ≤ negl(n).

Definition 41 (Decisional-LWE (dLWE)). An algorithm A is said to solve decisional-LWE, if given

m independent samples, (ai, 〈a,s〉+ ei) ∈ Znq × Zq, for 1 ≤ i ≤ m, drawn from lwen,q,s,χ, it holds that∣∣∣AdvdLWEA (1n, (A,As + e))− AdvdLWE

A (1n, (A,b))∣∣∣ ≥ negl(n),

A distinguishes LWE-samples from uniformly distributed samples b← Znq and non-negligible advantage.

43


8.3.1 Equivalence between the decisional- (dLWE) and search-LWE (sLWE)

Probably the most stressed property of LWE is that from a complexity theoretic viewpoint essentially

there is no difference between Definition 41 and Definition 40— they are equivalent up to polynomial

reductions for primes q = poly(n) [Reg05] and even exponentially large composite q > poly(n) [MP12].

Clearly, if we have access to a solver for sLWE, we can let it compute s. If the algorithm fails, we

identify the dLWE samples to stem from a uniform distribution. If the algorithm efficiently recovered

a secret s, we can test whether the inner-product with various a closely resembles the right-hand sides

b, and decide dLWE by majority vote of our observations.

Conversely, oracle access to dLWE can be turned into an algorithm for sLWE with only polynomially

extra steps. Given a decisional-LWE oracle D that returns D(a, b) = 1 if b follows an LWE distribution

for suitable s with overwhelming success probability 1 − o(1), we can successively recover the secret

coordinate by coordinate when testing various right-hand sides b(s) as functions of s.

We remark that [BLP+13] demonstrated the connection to lattices giving a classical reduction.

In Section 9.2.2, we introduce the notation for an LWE-based private-key encryption scheme on the

torus T := R/Z, the reals modulo 1, rather than Zq, for our main application. Before we will turn our

attention to fully homomorphic encryption later on , we define partially homomorphic encryption of

one input bit as first a step towards FHE.

8.4 Homomorphic Encryption

Definition 42 (Partially Homomorphic Encryption). Let F = ∪F` be a class of functions where every

f ∈ F` maps 0, 1` to 0, 1, ` = poly(n). An F-homomorphic public key encryption scheme is

a CPA secure public key encryption scheme (FHE.Gen,FHE.Enc,FHE.Dec) together with an algorithm

FHE.Eval : 0, 1∗ → 0, 1∗ ∈ ppt(n):

FHE.Gen Generates the public and private key pair (pk, sk).

FHE.Enc Encryption computes a randomized ciphertext c := FHE.Encpk(m).

FHE.Dec Decryption m := FHE.Decsk(c) recovers the message, up to a negligible failure probability.

FHE.Eval For all x1, . . . , x` ∈ 0, 1, and f ∈ F` of polynomially bounded size |f | = poly(`) it

holds that: c = FHE.Evalf (FHE.Encpk(x1), . . . ,FHE.Encpk(x`)) with length upper-bounded

by n ≥ |c| and FHE.Decsk(c) = f(x1, . . . , x`).

Definition 43 (Bootstrap-able Evaluation). An evaluation FHE.Eval, defined for a class of functions

F , is bootstrap-able if FHE.Dec ∈ F (i. e. FHE.Eval is able to homomorphically evaluate the schemes’

own decryption function, and (at least) one subsequent NAND-gate (∧)).

As the ∧-operation by itself is functionally complete, cf. Theorem 13, an implementation of a

bootstrap-able evaluation routine is sufficient for realizing an FHE scheme.

Definition 44 (Fully Homomorphic Encryption (FHE) Scheme). Let F be the class of all computable

functions and let (FHE.Gen,FHE.Enc,FHE.Dec) together with an algorithm EV AL : 0, 1∗ → 0, 1∗ ∈ppt(n) be a CPA secure partial homomorphic encryption scheme with bootstrap-able evaluation func-

tion, then it is a fully homomorphic encryption, refreshing ciphertexts while computing any f ∈ F .

Ridiculously high-bandwidth usage leads to trivial constructions that are the analogous to the Cloud

provider sending all encrypted data at every interaction. This is the reason why a good definition must

demand compactness of ciphertexts:

44


Definition 45 (Compactness of FHE Ciphertexts). Let (Gen,Encpk,Decsk) be a CPA secure, partial

homomorphic encryption scheme. This scheme is compact iff ∃p(n) : |c| ≤ p(n),∀c ∈ C ciphertexts,

with a suitable polynomial p = poly(n), required to be independent of input length ` or circuit size |f |.

Theorem 46 (Bootstrapping Theorem [Gen09]). Let (Gen,Encpk,Decsk) be a IND-CPA secure, partial

homomorphic encryption scheme with assumed circular security property. For a function family F if

∀c0, c1, arbitrary, but syntactically-correct input ciphertexts, the mapping:

fc0,c1 : sk 7→ (Decsk(c0) ∧ Decsk(c1)) ∈ F ,

it qualifies (Gen,Encpk,Decsk) as a fully homomorphic encryption scheme.

Proof. Let f : 0, 1n → 0, 1 be an efficiently computable function, i. e. there exists an algorithm com-

puting the result in poly(n) steps, cf. Definition 5. Since NAND is a universal function, cf. Theorem 13,

f can be expressed as a Boolean circuit only comprising NAND gates. Let (sk, pk) = Gen(1n) denote

the private resp. public key of the scheme. Assuming circular security, cf. Assumption 48, an encryption

of the secret key bk = Encpk(sk), the so-called bootstrapping-key, is made public at setup time, without

weakening the construction. Let c0, c1 be encryptions of bits c0 = Encpk(m0), c1 = Encpk(m1) ∈ 0, 1.We can use several functions (c, c′) 7→ fc,c′ , to construct a sequence c0, c1, c2 = c0 ∧ c1, c3, c4 =

c2 ∧ c3, . . . , cr = F (c0, c1, c3, . . . ) until arriving at the resulting ciphertext, cr, representing the result

of a circuit f ∈ F evaluated on its n inputs.

Hence, to define the function EV ALbk(f) := FHE.Eval(f, bk), which homomorphically computes

the result of applying the circuit f on its inputs, it suffices to have functions for arbitrary (c0, c1)

evaluating fc0,c1 using the refreshing operation. Given any two consecutive ciphertexts c and c′ in

the sequence c0, c1, . . . , cr), we hard-wire the Boolean function fc0,c1(sk) = (Decsk(c0) ∧ Decsk(c1)) ,

mapping any secret key candidate to 0, 1. We remark that the right side given the correct de-

cryption key sk would do the computation of first decrypting each operands and then evaluating

the NAND-gate on the two intermediate results. The missing link is constructing the Boolean func-

tion EV AL∧(c0, c1) := EV ALbk(fc0,c1) = FHE.Eval(fc0,c1 , bk) = c2 ∈ 0, 1. Decrypting c2 yields

Decsk(c2) = Decsk(EV AL∧(c0, c1)), hence

Decsk(FHE.Eval(fc0,c1 , bk)) = Decsk(FHE.Eval(fc0,c1 ,Encpk(sk))) = fc0,c1(sk) = Decsk(c0) ∧ Decsk(c1),

by construction. Putting all together, we are able to evaluate (Decsk(c0) ∧ Decsk(c1)) = (m0 ∧ m1) =

Decsk(c2) to some valuem2 ∈ 0, 1 and continue with the process for any sequence of a given circuit.

Any function family containing and supporting evaluation of the HE schemes’ decryption algorithm

as a function of the secret key, can be transformed into evaluating any computable function. This is,

until today, they only construction to obtain FHE schemes at all.

Remark 47 (FHE means f Encpk = Encpk f commutates (Informal)). Let f ∈ F be a computable

function, e. g. f = f0 + f1 ·X + · · ·+ fd ·Xd :M2 7→ M of degree d ∈ N, and m1,m2 two plaintexts, if:

(f Encpk)(m1,m2) = f(c1, c2) = c3 ∈ C

= f([m1]pk, [m2]pk)!≈ [f(m1,m2)]pk

= (Encpk f)(m1,m2),

for any m1,m2 ∈M with f(m1,m2) = m3 ∈M = Encpk(M) = C, then a IND-CPA-secure encryption

scheme S with four ppt-algorithms (FHE.Gen,FHE.Enc,FHE.Dec,FHE.Eval) becomes an FHE scheme.

45


In fact, the informally expressed relation of Remark 47 only coincide with high probability for the

respective decryptions. In the general, we have the situation depicted in the commutative diagram:

m1,m2, . . . ,mn c1, c2, . . . , cn

f(m1,m2, . . . ,mn) Encpk (f(m1,m2, . . . ,mn)) ≈ Eval(f(c1, c2, . . . , cn)).

Encpk(.)

f(·) Eval(f, ·)Encpk(.)

Unlike classical, deterministic functions, and like almost all modern encryption functions that fulfill

the goal of semantic security (cf. Definition 19), FHE schemes too require probabilistic encryption, i. e.

same values a = b 6⇒ Encpk(a) = Encpk(b) can be, but are not necessarily mapped to the same string.

Defined in this probabilistic fashion, decryption algorithm is allowed to fail with negligible probability.

As a result of this, currently it is not practical to build a circuit for every function to be evaluated.

Before we can define approximate homomorphisms as used the currently most promising instan-

tiation of an FHE scheme [GSW13], we present some more facts about lattices, cf. Section 3.1.1.

Following [AP14] the 3rd generation FHE schemes are based on the approximate-eigenvector problem,

meaning that ciphertext (matrices) satisfy s·C ≈ s·µ mod q, with secret s, modulus q, and message µ.

An FHE scheme provides a way to encrypt data while supporting computations through the en-

cryption envelope. Given an encryption of a plaintext m, one can compute an encryption of f(m) for

any computable function f . This operation does not require intermediate decryption or knowledge of

the decryption key and therefore can be performed based on public information only.

Metaphorically, FHE can be understood as knitting a scarf. When winter is approaching, a warming,

self-made scarf is nice, yet this skill is not sufficiently taught so we are willing to outsource this task:If

you hand the favourite ball of wool to your half-blind grandmother (or grandfather for that matter),

they will turn it into a beautifully woven, hand-crafted scarf. The muscle-memory will permit the

movements to produce. It is a non-interactive process, delivered when ready. This metaphor lacks an

interpretation of bootstrapping, a necessity in nowadays FHE, for e. g. refreshing the fragile structure

before adding a few new loops. Maybe, if we imagine the yarn as being radioactive with short half-

life which needs to be perpetually repaired, or ... we simply define it abstractly, and precisely as in

Chapter 8.

Applications of FHE are numerous but one particular use of interest is the privacy-preserving

delegation of computations to a remote service. The first construction of FHE dates back to 2009

and is due to Gentry [Gen09]. A number of improvements have followed [vGHV10,SS10,SV10,BV11a,

BV11b, BGV12, GHS12, GSW13, BV14, AP14], leading to a biodiversity of techniques, features and

complexity assumptions.

We will now draw our attention to a concrete implementation of an FHE scheme and our construc-

tions on top of it, for this we require:

Assumption 48 (Circular Security Assumption). Giving bk = FHE.Enc(sk) to adversary A ∈ ppt(n)

in the IND-CPA-game is only negligibly improving the advantage of breaking the FHE cryptosystem:∣∣∣AdvIND-CPAA (1n)− AdvIND-CPA

A (1n, bk)∣∣∣ ≤ negl(n).

Assumption 49 (LWE Security Assumption). An average LWE instance is, even leveraging quantum

algorithms, as hard as solving appSVPγ in the worst-case. Existing (public-/secret-key) LWE-based

cryptosystems only negligibly improve the advantage of solving the appSVPγ-instance.

The appSVPγ problem is a standard lattice problem, which for years demonstrated to be robust

against substantial cryptanalysis advances. In Section 9.8, we will elaborate more on the presumed

practical hardness mentioned in this conjecture.

46


Remark 50 (Dependency: FHE Security from the LWE-Assumption in this work). A partially ho-

momorphic public key encryption scheme (FHE.Gen,FHE.Enc,FHE.Dec,FHE.Eval) is an FHE scheme,

if

• the conditions of the bootstrapping theorem, Theorem 46, are fulfilled,

• the circular security assumption, Assumption 48, holds, and

• the (theoretical and practical) LWE-hardness conjecture, Assumption 49, holds.

We saw how to theoretically construct the holy-grail of cryptography. Finally, we turn our attention

to the implementation side and the TFHE library in Section 9.2.2, which realizes a scale-invariant

version of an FHE scheme based on the hardness of LWE, Assumption 49.

8.4.1 Standardization of Homomorphic Encryption

To balance privacy with convenience, there are efforts to standardize HE, define common Application

Programming Interfaces (APIs), and recommend implementations, e. g. for FHE [ACC+18]. HE allows

editing of encrypted data, to various degrees without sacrificing its confidentiality. The standardiza-

tion effort does only consider fully homomorphic encryption, i. e. homomorphic with respect to two

operations, addition and multiplication, and does not include standardized primitives based on partial

homomorphic encryption (PHE), e. g. the RSA cryptosystem, which is multiplicative homomorphic, or

the Paillier cryptosystem, with its additively homomorphic property.

Several open-source releases of FHE implementations are freely available on the Internet today. At

this stage it is unclear which system among them will become established as each one can be seen as

a collection of unbroken best-practices based on relatively new cryptanalytic techniques, where often

parameters were heuristically chosen in trade-offs with good performance. The following are among

the most well-known, and best analyzed schemes for academic purposes.

HELib [HS14] is an early and widely used library for homomorphic computations developed by

Halevi and Shoup at IBM soon after the BGV scheme [BGV12] was published. It allows amortizing

additions as well as multiplications by deploying parallelism when working with encrypted vectors, yet

a severe efficiently bottleneck is its bootstrapping routine, which needs to be applied coefficientwise. In

the literaturehe vectorization is refered to as batching.

Microsoft’s SEAL [SEA18] implements the BFV cryptosystem [FV12] and comes with easy to

use code examples, recently released in version 3.1. It also implements a variant of BFV, the CKKS

scheme [CKKS17]. The CKKS scheme is central to the HeaAn library, which supports fixed point

arithmetic that mimics approximate arithmetic.

Λ λ, or LOL, is a Haskell implementation library generally targeted at ring-based lattice cryptog-

raphy in a purely functional programming language.

In Section 9.2.2 we explore the TFHE library, which we used in FHE–DiNN.

8.5 An Efficient FHE-scheme for Artificial Intelligence (AI)

After the quest for the holy grail of cryptography ended in 2009, when FHE was first instantiated

(see Section 8.5), privacy-preserving machine learning became a desirable application. A framework

that evaluates neural networks (NN) is a promising step towards bridging the gap between today’s

massive deployment of machine learning and its practical, efficient, and yet most importantly, its

privacy-preserving and inevitable establishment in everyday life.

A first observation towards a solution is represented by the fact that state-of-the-art FHE schemes

cannot support arbitrary operations over the message space R, whereas typically neural networks have

47


real-valued inputs, outputs and internal states. This incompatibility motivated the research looking

for alternatives of how to marry the two originally architectures. It led to our introduction of FHE-

friendly. Discretized Neural Networks (DiNN), in which all the values are carefully discretized and

hence compatible with current FHE schemes.

Obviously, a neural network the set of all numbers assigned to the edges of its representing graph,

is expressing how the machine learning algorithm sees an image, yet this is quite different to the way

humans do. For an neural network to recognize an object, the digital image is stored as a matrix with

gray-scale values, or a tensor filled with numbers—a 3-dimensional array, one for each color channel,

typically red, green, and blue as in Figure 8.3.

For the algorithms colours as such do not matter. At first, samples of 784 = 28 · 28 independent

gray-scale values from the MNIST dataset [LCB98], resp. 12288 = 64 · 64 · 3 integer values, in case of

the catsanddogs [dsp13] colour-image classification challenge with resolution 64 × 64, are presented to

the artificial neural network.

The task can be seen as learning structures in a 784 × 256 = 200704 resp. 3145728 = 12288 · 256-

dimensional vector space over the binary alphabet, as in each channel the saturation is encoded as one

of 256 = 28 values. Ad-hoc, there are 2200704 resp. 23145728 possibilities for states the machine learning

algorithm is trying to find associations in between. In Section 9.3.1 the learning phase tailored to FHE

is discussed when studying private-preserving neural network evaluation.

Fig. 8.3: Random 28× 28 pixel Black & White or Gray-scale (e. g. from 0 to 255) element as possibleFHE–DiNN input, and an RGB-image—gray-scales as red, green, and blue color channels.

In the next sections we will show the main result, which informally expresses:

Main Result 1 (Scale-Invariant Private Evaluation of Neural Networks (Informal)). Given a FHE-

friendly neural network. Arranging neurons vertically, i. e. in one layer, or horizontally, i. e. in con-

secutive layers comes at no added cost when evaluated with FHE–DiNN.

We remark that eural networks with many layers achieve modeling complex non-linearities and

reach state-of-the-art accuracy, while one layer requires exponentially many neurons.

48

Chapter 9

FHE–DiNN

In this chapter, we present our work published in [BMMP18] that deals with a possible solution to the

privacy dilemma in the Cloud-setting: Either sensitive user data must be revealed to the party that

computes the prediction of a cognitive model on these inputs, or the model itself must be provided

to the user, who typically has considerable less computation power available locally, for the requested

evaluation to take place. The rise of machine learning – and most particularly the pervasive nature of

deep neural networks – increases the occurrence of these scenarios.

The use of powerful homomorphic encryption promises to reconcile conflicting interests in the Cloud

when outsourcing, where the computation is performed remotely, yet homomorphically on an encrypted

input hence indecipherable for the service provider. The user decrypts the returned result using their

key. A typical task, brilliantly mastered by neural networks, is that of recognizing and classifying

patterns in images or videos, but the algorithmic run-times to date quickly degrade with the number of

layers in the network. Certainly, deep neural networks comprised of hundreds or possibly thousands of

layers were thought infeasible to evaluate with homomorphic computations. In fact, before introducing

FHE–DiNN in [BMMP18], homomorphic solutions scaled exponentially with the neural network-

depth, which was brought down to linear complexity, where we showed an approach how to achieve

unprecedentedly fast, scale-invariant homomorphic evaluation of deep neural networks.

The scale-invariance property expresses that computations carried out by each individual neuron in

the network are independent of the total number of neurons and layers. This could be done building on

top of LWE-based Torus-FHE construction (ASIACRYPT 2016) and adapting its efficient bootstrap-

ping method to refresh ciphertexts when propagated throughout the network. Discretized, FHE-friendly

neural networks meant rounding weights to signed integers in an interval and activating neurons by the

discrete sign-function. For this purpose, we experimentally trained a network implementing a custom

backpropagation-algorithm, and report experimental results of running the full MNIST dataset. This

show-cased it is possible to have a simple, discretized neural network that recognizes handwritten num-

bers with over 97% accuracy and is evaluated homomorphically in just 1.64 seconds on a single core

of an average-grade laptop CPU.

The following part of this thesis is organized as follows: in Section 9.2 we define our notation,

introduce our notions about FHE, its formulation over the torus, and artificial neural networks; in

Section 9.5 we introduce FHE-friendly Discretized Neural Networks (DiNN) and show how to train and

evaluate them on data in the clear for serving as an experimental baseline; in Section 9.6 we perform

fast homomorphic evaluation of a pre-trained DiNN and present how to obtain a DiNN representing

a suitable cognitive model; in Section 9.9 we contrast experimental results on clear data with the

evaluation of encrypted inputs, draw some conclusions, and identify several open directions for future

research.

We believe that this work bridges the gap in the privacy-dilemma between using machine learning’s

fascinating capabilities and its efficient and privacy-preserving implementation in practice. Eventually,

cuFHE–DiNN pushes forward the functionality presented in FHE–DiNN by introducing algorithmic

enhancements, extend the set of possible models and benefits of implementations on a GPU.

49

CHAPTER 9. FHE–DINN

9.1 Localization of this Research within the Field

We present a scale-invariant approach to the problem of private-preserving predictions on encrypted

data. In our framework, each neuron’s output is refreshed through bootstrapping, resulting in that

arbitrarily deep networks can be homomorphically evaluated. Of course, the entire homomorphic

evaluation of the network will take time proportional to the number of its neurons or, if parallelism is

involved, to the number of its layers. But operating one neuron is now essentially independent of the

dimensions of the network: it just relies on system-wide parameters.

In order to optimize the overall efficiency of the evaluation, we adapt to our use case the bootstrap-

ping procedure of the recent construction by Chillotti et al., known as Torus-FHE [CGGI16]. TFHE

provides a toolkit of schemes and operations based on variants of Ring-LWE and GSW [LPR10,GSW13]

handling values on the Torus T = R/Z = [0, 1) instead of modular integers. A single neuron compu-

tation is a combination of a homomorphic multi-addition followed by a fast bootstrapping procedure

which, besides regularizing the ciphertext’s noise level, applies an activation function for free.

9.1.1 Prior Works and Known Concepts

The Cryptonets paper [DGBL+16] was the first initiative to address the challenge of achieving homo-

morphic classification. The main idea consists in applying a leveled Somewhat Homomorphic Encryp-

tion (SHE) scheme such as BGV [BGV12] to the network inputs and propagating the signals across

the network homomorphically, thereby consuming levels of homomorphic evaluation whenever non-

linearities are met. In NNs, non-linearities come from activation functions which are usually picked

from a small set of non-linear functions of reference (logistic sigmoid, hyperbolic tangent, etc) chosen

for their mathematical convenience. To optimally accommodate the underlying SHE scheme, Cryp-

tonets replace their standard activation by the (depth 1) square function, which only consumes one

level. A number of subsequent works have followed the same approach and improved it, typically by

adopting higher degree polynomials as activation functions for more training stability [ZYC16], or by

renormalizing weighted sums prior to applying the approximate function so that its degree can be kept

as low as possible [CdWM+17]. Practical experiments have shown that training can accommodate

approximated activations and generate NNs with very good accuracy.

However, this approach suffers from an inherent limitation: the homomorphic computation, local

to a single neuron, depends on the total number of levels required to implement the network, which

is itself roughly proportional to the number of its activated layers. Therefore the overall performance

of the homomorphic classification heavily depends on the total multiplicative depth of the circuit and

rapidly becomes prohibitive as the number of layers increases. This approach does not “scale well”

and is not adapted to deep learning, where neural networks can contain tens, hundreds or sometimes

thousands of layers [HZRS15,ZK16].

In our FHE–DiNN framework, unlike in standard neural networks, the neuron inputs and outputs,

the weights and biases, as well as the domain and range of the activation function cannot be real-valued

and must be discretized. We found it optimal to represent signals as ±1 values, weights and biases as

signed integers and to use the sign function as activation function. We call such networks Discretized

Neural Networks or DiNNs. This particular form of neural networks is somehow inspired by a more

restrictive one, referred to as Binarized Neural Networks [CB16] (BNNs) in the literature, where signals

and weights are restricted to the set −1, 1 instead of −W, . . . ,W in the case of DiNNs. Interestingly,

it has been empirically observed by [CB16] that BNNs can achieve accuracies close to the ones obtained

with state-of-the-art classical NNs, at the price of an overhead in the total network size, which is largely

compensated by the obtained performance gains. Since our DiNNs are an extension of BNNs and stand

in between BNNs and classical NNs, we expect the overhead in network size to be somewhat smaller.

Let w = (w1, w2 . . . , wk) be a vector of integers and t ∈ Z. We define the threshold function

50


corresponding to weights w, resp. threshold t as the Boolean function Tw,t : 0, 1k → 0, 1

x 7→

1,∑i wixi ≥ t,

0, else.(9.1)

that maps x ∈ 0, 1k to 1 if and only if∑i wixi ≥ t. For example, the threshold function Tw,t

corresponding to w = (1, 1, 1) and t = 2 is simply the majority function MAJ = MAJ3 : 0, 13 →0, 1, see Figure 9.1 for a sample of popular choices for activation functions.

−2.0 −1.5 −1.0 −0.5 0.5 1.0 1.5 2.0 2.5 3.0

−1.0

1.0

2.0

3.0

x

ϕ.(x) ϕ0(x) = tanh(x)ϕ1(x) = sign(x)ϕ2(x) = 2

1+e−x − 12 , ’scaled and shifted sigmoid’

ϕ3(x) = max(0, x), ’ReLU’ϕ4(x) = max(0, x) + min(0, (ex − 1)), ’SELU’ϕ5(x) = max(0, min(1, x

5 + 12 )), ’hard sigmoid’

Fig. 9.1: Several popular neural network activation functions and our choice ϕ1, the sign-function.

9.2 Preliminaries

In this section we recall definitions and constructions the rest of this chapter builds upon.

9.2.1 Notation and Notions

We denote the real numbers by R, the integers by Z and use T to indicate R/Z, i. e., the torus of

real numbers modulo 1. We use B to denote the set 0, 1, and we use R [X] for polynomials in the

variable X with coefficients in R, for any ring R . We use RN [X] to denote R [X] /(XN + 1

)and

ZN [X] to denote Z [X] /(XN + 1

)and we write their quotient as TN [X] = RN [X] /ZN [X], i. e., the

ring of polynomials in X quotiented by(XN + 1

), with real coefficients modulo 1. Vectors are denoted

by lower-case bold letters, and we use ‖·‖1, ‖·‖2, and ‖·‖∞ to denote the L1, L2, and L∞ norm of a

vector, respectively. Given a vector a, we denote its i-th entry by ai. We use 〈a,b〉 to denote the inner

product between vectors a and b.

Given a set A, we write a$← A to indicate that a is sampled uniformly at random from A. If D is

a probability distribution, we will write d← D to denote that d is sampled according to D.

9.2.2 Fully Homomorphic Encryption over the Torus (TFHE)

Anyone who considers arithmetical methods of producing random digits is, of course, in a

state of sin.—John von Neumann (1951)

51


We begin with some notation for TFHE using an LWE-based private-key encryption scheme.

Regev’s encryption scheme [Reg05] is based on the hardness of the decisional Learning with Errors

(LWE) assumption, cf. Assumption 49, which assumes it is computationally hard to distinguish between

pairs drawn from a distribution (a, b)← lwen,s,χ and random torus elements (u, v)$← Tn+1. For n ∈ N,

noise distribution χ over R, any secret s ∈ 0, 1n, the LWE distribution is defined by lwen,s,χ =

(a, b = 〈s,a〉+ e), with a$← Tn, e← χ, and b ∈ T = R/Z, the torus mod 1.

The anti-cyclic Module-LWE called TLWE in [CGGI16], with positive k ∈ N, is a unified gener-

alization of the LWE and Ring-LWE introduced in [LPR10], and the basis for an existing fast fully

homomorphic encryption library over the torus (TFHE). Here, χ is an error distribution over the cy-

clotomic ring RN = R [X] /(XN + 1

)with a power of 2 as degree N to implement it efficiently using

fast Fourier transforms. The secret is a vector of polynomials s ∈ RN with binary coefficients.

Messages are now encoded as a torus polynomial of degree less than N,µ ∈ TN [X], and fresh sam-

ples (a, b = a · s + µ+ e) ∈ TN [X]k × TN [X] are well-defined accordingly.

The extended LWE-based private-key encryption scheme defines for given positive I ∈ N:

Setup (n): for a security parameter n, fix N, k and χ; return s$← TN [X]

k

Enc (s, µ): return (a, b), with a$← TN [X]

kand b = a · s + e + µ

2I+1 , where e← χ

Dec (s, (a, b)): return b(b− a · s) · (2I + 1)e

three functions such that Dec (s,Enc (s, µ)) = µ with overwhelming probability at a chosen level λ, e.g.

providing at least 80 bits security. The message polynomials µ(X) ∈ TN [X] have coefficients with

mi ∈ [−I, I] , 0 ≤ i < N encoded as one of the torus’ 2I + 1 total slices.

In addition to the concepts in Section 8.2.1, we require the notion of sub-Gaussian distributions.

Definition 51 (Sub-Gaussians). Let σ > 0 be a real Gaussian parameter, we say that a distribution Dis sub-Gaussian with parameter σ if there exists M > 0 such that for all x ∈ R, we can upper-bound

D (x) ≤M · ρσ (x) .

Lemma 52 plays a big role, when multiple random variables are summed up in our main result 2.

Lemma 52 (Pythagorean additivity of sub-Gaussians). Let D1 and D2 be two sub-Gaussian distribu-

tions with parameters σ1 and σ2, respectively. Then D+, obtained by sampling D1 and D2 and summing

the results, is a sub-Gaussian with parameter√σ2

1 + σ22 .

We introduce the phase indicating plaintexts, and notation for bounding the noise of ciphertexts.

Definition 53 (Phase). Let sk = s ∈ BN [X]k be the secret-key and c = (a, b) ∈ TN [X]k × TN [X] a

TLWE Sample. The phase is defined by ϕs(c) = b− s · a ∈ TN [X]k.

The following is adapted content from [CGGI17], to justify why the inherently approximate homo-

morphic operations using TLWE can be used for correct computations as intended. The considerations

in [CGGI17, Section 2.1] allow us to deduce Corollary 54 and Remark 55.

Corollary 54 (Concentrated sub-Gaussian Distributions). Given a probability distribution χ over

T, (or Tn, or TN [X]k considering individual coefficients as random variables), are σ-sub-Gaussian

with σ ≤√

32 ln(2)(λ+ 1), then ∃τ ∈ T such that the mass of the probability densities (px)x∈T is

concentrated, i. e. Prx∼χ[|x− τ | > 14 ] < 2−λ, negligible in λ for distances farther than 1

4 from τ .

52


Remark 55 (Phase, Variance, and Expectation). Let the phase (cf. Definition 53) of ciphertext c, be

ϕs((a, b)) = e+ µ/S, a random variable encoding the message µ, scaled by S = 2I + 1,and e ∼ χ, then

V[ϕs(c)] =

∣∣∣∣minτ∈W

∫x∈T

px|x− τ |2dx∣∣∣∣ ≤ 1

4, and

E[ϕs(c)] = argminτ∈W

∫x∈T

px|x− τ |2dx = µ/S, mathematically recovers the message,

as long as χ is a σ-sub-Gaussian distribution with σ ≤√

32 ln(2)(n+ 1).

With Err(c) = ϕs((a, b))−µ/S the expected error magnitude E[Err(c)] resp. noise variance V[Err(c)]

of the error Err(c) = e ← χ, a 0-centered Gaussian with standard deviation σ, is hence zero resp.

Var(ϕs(c)). From Remark 55 we have that the noise amplitude is bounded by 14 , the phase is linear

for valid TLWE samples. Both facts are at the core of correctness considerations, whereas practically

they are computed differently—by rounding. The phase can be seen as indicating the angle of its input

ciphertext on the torus, and the scaling S = (2I + 1) is dealt with later on, cf. Section 9.7.3.

Decryption can now be described by rounding the (appropriately scaled) phase from the torus T to

the closest torus tick τ on the wheel W. Hence, to retrieve the exact same plaintext that was encrypted,

i. e. decryption correctness is ensured, as long as the error bound on V[Err] is met.

9.2.3 TGSW: Gadget Matrix and Decomposition

TFHE uses TGSW , a torus variant of third-generation FHE, named after Gentry, Sahai, and Waters

(GSW), based on the approximate eigenvector problem. We refer to [GSW13] for descriptions of first-

and second-generation FHE and recapitulate the main concepts from third-generation FHE in this

thesis. The block diagonal matrix with its super-increasing components in Definition 56 serves as a

generating family of the TLWE sample space.

Definition 56 (Gadget Matrix). Let Bg ∈ N be an approximation parameter, e. g. Bg = 28, ` ∈ N a

degree, e. g. ` = 2, we define the gadget matrix h as:

h =

1/Bg 0

......

...

1/B`g 0

. . .

0 1/Bg...

......

0 1/B`g

∈ TN [X]`(k+1)×(k+1).

Typical values for the approximation parameter resp. degree are, Bg = 28 and ` = 2, cf. Section 9.8.3.

Given h, Lemma 57 gives approximated decompositions in the ring TN [X], such that Dech(c) · h =

c, ∀c ∈ TN [X]k × TN [X], adjustable by setting the precision parameter `.

Lemma 57 (Gadget Decomposition Algorithm [CGGI17, Lemma 3.7]). Given a gadget matrix h, there

exists an efficient gadget decomposition algorithm u = Dech,β,ε(v) with quality β, and approximation

53


precision ε, which given a TLWE Sample v ∈ TN [X]k×TN [X] outputs a short vector u ∈ ZN [X]`(k+1):

‖u‖∞ ≤ β = 1/Bg,

‖u · h− v‖∞ ≤ ε = 1/(2B`g),

E(u · h− v) = 0.

The gadget decomposition algorithm Dec can be used to round each coefficient of an element ai of

a TLWE Sample (a, b) ∈ TN [X]k+1 to its corresponding closest multiples of 1/Bg on the torus. This

is a necessary step to homomorphically perform costly bootstrapping, or more generally key-switching,

operations on ciphertexts. For completeness, we provide the gadget decomposition in Algorithm 9.1,

returning valid decompositions Dech,β,ε according to Definition 56, whenever β =Bg2 , ε = 1

2Blg.

Algorithm 9.1: Gadget Decomposition of a TLWE Sample

Input : TLWE Sample (a, b) = ((a1, a2, . . . , ak), b = ak+1) ∈ TN [X]k × TN [X]Data: h = h(k,Bg, `)Result: u = (u1,1, . . . uk+1,`) ∈ ZN [X](k+1)`

∀ai we are given the unique coefficients ai,j ∈ T such that ai =∑N−1j=0 ai,jX

j . Find the unique

decomposition ai,j by rounding ai,j to closest multiples of 1B`g

, such that ai,j =∑kp=1 ai,j,p · 1

B`g,

under the constraint ai,j,p ∈ Z ∩ [−Bg/2, Bg/2).for i = 1 to k + 1 do

for p = 1 to ` do

ui,p =∑N−1j=0 ai,j,p ·Xj ∈ ZN [X], with |ai,j,p| ≤ Bg/2.

return u := (ui,p)1≤i≤k+1,1≤p≤`

Definition 58 (TGSW Sample). Let N,α ≥ 0 denote TLWE parameters and s ∈R BN [X]k a secret

key. For l, k ≥ 1, gadget h (cf. Definition 56), a matrix C is a TGSW Sample of message µ ∈ ZN/h⊥,

if it can be decomposed into a sum C = Z+µ ·h, where rows zi of Z = (zi)1≤i≤(k+1)` are homogeneous s

with parameter α. The expectation ϕs(C) ∈ TN [X](k+1)l and the variance are defined component-wise.

Finally, this allows us to encode a message bit µ ∈ B as TGSW Sample Cµ:

Cµ =

TLWEs(0)

...

TLWEs(0)

...

TLWEs(0)

...

TLWEs(0)

+ µ ·

1/Bg 0

......

...

1/B`g 0

. . .

0 1/Bg...

......

0 1/B`g

∈ TN [X]`(k+1). (9.2)

The noise of c′ encrypting µ in Algorithm 9.2 is independent from the input c encrypting µ, too.

9.2.4 Homomorphic Ciphertext Addition and Multiplication

With definitions for TLWE Sample and TGSW Sample we can define arithmetic operations addition

and multiplication of ciphertexts.

54


Algorithm 9.2: Bootstrapping Algorithm [Combining Algorithms 4 & 9 of [CGGI17]]

Input : c = (a, b) ∈ TLWEs(µ2 ), an encryption of an encoded message µ ∈ B

Data: bsk = BKs→s′ , ksk = KSs′→s, and an arbitrary element µ1 ∈ T, e. g. µ1 := 14 .

Result: TLWE sample c′ = TLWEs(µ · µ1) =

TLWEs(0), if µ = 0,

TLWEs(µ1), else.

Let µ′ = µ1

2 be the default state to initialize the testVector.Set b := b2Nbe, a′i := b2Naie ∈ [−N,N ], 1 ≤ i ≤ n with a = (ai)iSet t(X) := testVector = (1 +X + · · ·+XN+1)X

2N4 · µ′ ∈ TN [X]

ACC← (Xb′ · (~0 , t)) ∈ TN [X]k+1 . Rotation of the (trivial encryption of) testVector by b.for i = 1 to n do

ACC← [(X−a′i − 1) · bski]ACC . Rotation about differences to fully BlindRotate about s.

Let µ := SampleExtract(ACC) . SampleExtract extracts the coefficient of X0.return c′ ← KeySwitchksk((~0 , µ

′) + µ) . KeySwitch to low noise encryption under s.

Given a secret key s, and two ciphertexts c1 = (a1, b1) ← Encs (µ1), c2 = (a2, b2) ← Encs (µ2) ∈TLWE, for a constant w ∈ Z, we require that

Decs (c3) = Decs ((a1 + w · a2, b1 + w · b2)) = µ1 + w · µ2, (9.3)

with overwhelming probability. This is guaranteed as long as µ1 + w · µ2 ∈ −B,B. Given a secret

key s, and two ciphertexts c1 = (a1, b1) ← Encs (µ1), c2 = (a2, b2) ← Encs (µ2) ∈ lwen,q,s,χ we require

that their sum c3 decrypts to Decs (c3) = Decs ((a1 + a2, b1 + b2)) = µ1 + µ2 ∈ B with overwhelming

probability, see Section 9.6.1.1. Correctness is guaranteed as long as the noise growth can be bounded

||b1 + b2 − (a1 + a2) · s||∞ = ||e1 + e2||∞ < q4 .

Furthermore, we need two operations which allow multiplications in the scheme and which we

re-arranged for more performant use in main result 2.

Definition 59 (External TLWE Sample product with TGSW Sample). Given a TGSW Sample as

matrix C ∈ TN [X]`(k+1)×(k+1) and a TLWE Sample as vector c, their external product is defined by

:

TGSW× TLWE → TLWE

(C, c) 7→ C c = Dech,β,ε(c) · C,

by a matrix-vector multiplication with u = Dech,β,ε(b) ∈ ZN [X](k+1)` output by Algorithm 9.1.

This decomposition’s favorable noise propagation formula encrypting µC ·µc with messages µC resp.

µc encoded by C resp. c, yields a valid TLWE Sample if we can bound [CGGI17, Theorem 2.4]:

V[Err(C c)] ≤ (k + 1)`Nβ2 · V[Err(C)] + ||µC ||22 · (1 + kN)ε2 + ||µC ||22 · V[Err(c)] ≤ 1

4.

Definition 60 (Inner TGSW product). With C ′ = (ci)1≤i≤(k+1)` denoting the rows of C ′, we define

:

TGSW× TGSW → TGSW

(C,C ′) 7→ C C ′ := (C c1, C c2, . . . , C c(k+1)`)T .

Similarly, the decomposition can be used for computing the multiplication of two TGSW Sample

encrypting µC , µC′ to obtain µC · µC′ as a vector of external products with noise variance:

V[Err(C C′)] ≤ (k + 1)`Nβ2 · V[Err(C)] + ||µC ||22 · (1 + kN)ε2 + ||µC ||22 · V[Err(C′)].

55


Replacing the slower inner product of Definition 60 with the external product of Definition 59

whenever possible due to noise constraints i. e. working with TLWE instead of TGSW is key for a

practical deployment with fast speeds.

9.3 Artificial Intelligence, Machine learning, Deep Learning

At the core of artificial intelligence, ore more precisely, modern machine learning research, Deep Learning

(DL) approximates functions as composition of simpler non-linear functions. Limitations of earlier

models led to looking into more abstract, deep representations with [Has86] showing theoretic lower

bounds of Boolean circuit-depths.1 On the practical side, Deep Neural Networks (deep NNs) had

revolutionary impact in computer vision [KSH12].

One underlying heuristic assumption is in fact that a that the data under close scrutiny is dis-

tributed as if generated by a hierarchical composition of features—covering only a fraction of all pos-

sible distributions. Choosing one machine learning algorithm over another, fixing hyperparameters (cf.

Section 9.3.4), we are implicitly making similar structural assumptions.

Empirically, this machine learning abstraction corresponds to how humans naturally approach (im-

age) classification tasks. Section 9.3 aids the situatedness of deep NNs within common categorizations.

Next, we look at the main task, private-preserving inference of Neural Networkss.

Artificial Intelligence (AI)e.g. (Un-)Supervised Learning Tasks

Machine Learning (ML)e.g. Multilayer Perceptron (MLP)

Deep Learning (DL)e.g. Deep Neural Networks, Image Recognition, . . .M

atth

iasMinihold

(0x5A44531D)

Fig. 9.2: Taxonomy of Deep Learning within Artificial Intelligence.

9.3.1 Task: Homomorphic Evaluation of Neural Networks

An artificial neural network is a computing system from the field of machine learning inspired by

biological brains. Defining them rigorously is tricky as a satisfying formal treatment is only partially

available in the literature, and abbreviations are often used inconsistently. We follow [GBC16] to

introduce them tailored to our needs, so this aggregation serves as a synopsis of the vast material

available.

Definition 61 (Artificial Neural Networks (ANN)). Formally, any modelM, written as a (non-)linear

weighted composition of functions of depth 1 ≤ d ∈ N, is an artificial neural network

M :

RnI −→ RnO

x 7→ y = ϕ(∑d

`=1 w`g`(x)),

1This research direction is closely related to fundamental questions of theoretical complexity theory, Section 3.2.

56


(probabilistically) mapping high-dimensional, real-valued inputs x = (x1, x2, . . . , xnI ) ∈ RnI , nI ∈ Nwith weights w` ∈ R, local transformations g` : R → R, and (sigmoidal) activation functions σ : R →R to outputs y = (y1, y2 . . . , ynO ) ∈ RnO , nO ∈ N, modeling a (randomized) classification process.

Topologically, an ANN, also referred to as deep neural network, can be represented by a dependency

graph between input variables with nodes specifying the transformation functions, and weighted edges

describing their connectivity arranged in logical layers.

Definition 62 (Feed-Forward Neural Networks (NN)). A feed-forward neural network (NN) is an

ANN, whose graph-representation is a directed acyclic graph, with nodes composed of artificial neurons

that are arranged in several independent layers and connected by edges associated to weight values.

The growing popularity of feed-forward neural networks is due to the fact that they have been used

to solve typical machine learning tasks with exceptionally good performance while standard algorithms

struggled, like the important field of high-dimensional data classification. The analogy with a (human)

brain is given by the fact that neural networks learn, meaning that exposed to sufficiently many well-

distributed training samples, they are able to encode complex relations between inputs and outputs.

As learning essentially is mathematical optimization, neural networks that learned to understand and

robustly match input-output pairs, minimize some cost (cf. Section 9.3.3) function with their estimation

theory, to identify or model sample-distribution over x.

It is no surprise that there exists a large variety of layers used for modeling different kinds of

or aspects in data. Each neuron of a fully-connected, dense layer (a strongly-connected component

in the graph), accepts, say nI , real-valued inputs x = (x1, . . . , xnI ) and performs the following two

computations with the assigned weights, and feeds the result ultimately forward to, say nO, output

neurons.

Definition 63 (Artificial Neuron). An artificial neuron g : RnI → R is a node in the graph-representation

of an Neural Networks, with associated weights wi and bias β ∈ R that computes the following:

1. y =∑nIi=1 wixi + β ∈ R is a weighted sum of the inputs and off-set by the bias of the neuron,

2. A non-linear function ϕ, the activation function, is applied and g(x) := ϕ(y) returned.

This following is an adapted introduction to machine learning [GBC16].

• A data-set is a collection of N high-dimensional pairs Ddata = (x(i), y(i)) : x(i) ∈ 0, 1d(p×p),

y(i) ∈ 0, 1L0≤i≤N , with p×p images x of depth d, and assigned labels y, from a set of categories.

• A representative amount m < N of training examples, Dtraining = (x(1), y(1)), . . . , (x(m), y(m)),are given to the model M in the (supervised) learning phase. Ideally, Dtraining comprises only

independently and identically distributed sample points, which M can abstract from.

• The model is then used to infer labels for any images x in the test-set Dtest = Ddata\Dtraining.

Classification correctness can be checked against the according labels, and the accuracy of the

model as the ratio of correct ones among the test-set: accuracy = #correct/|Dtest| ≤ 1.

• Formally,M = argminf Ex,y∼Ddata||y−f(x)||, is the model encoding expected data-points Ddata.

9.3.2 The MNIST Handwritten Digit Database

Elder statesman of deep learning and Turing award winner Yann LeCun is one of the creators of a

widely spread data-set.

The Modified National Institute of Standards and Technology database (MNIST) [LBBH98] serves

as benchmark when comparing neural networks and consists of representative images of handwritten

57


digits provided by more than 500 different writers. The MNIST database contains 60 000 training

and 10 000 testing images written largely by high school students and meticulously aligned in post-

processing. The format of the images is 28 × 28 and the value of each pixel represents a level of 8-bit

gray. Moreover, each image x is labeled with the digit y which it intended to depict.

A typical neural network for the MNIST data-set has a 28 · 28 = 784 neuron input layer (one per

pixel), an arbitrary number of hidden layers with an arbitrary number of neurons per layer, and finally

10 output nodes (one per possible label 0, 1, . . . , 9). The output values of the NN can be interpreted

as classification scores they give each digit, with the one that achieves the highest score being the

prediction M(x).

Over the years, the MNIST data-set has been a typical benchmark for various classifiers from

machine learning, and many approaches have been applied: linear classifiers, principal component

analysis, support vector machines, neural networks, convolutional neural networks, to name a few. For

example, architectures for neural networks which compare well with FHE–DiNN’s achieved accuracy

of over 96.3% obtained more than 97% correct classifications. Recent works even surpass 99% of

classification accuracy [CMS12] as the overview table shows [LCB98]. We remark that neuron’s output

can be written as activated inner-product

ϕ (〈w,x〉) = ϕ

(nI∑i=0

wixi

),

if one extends the inputs and the neuron’s weight vector by setting w ← (β,w1, . . . , wnI ) and x ←(1, x1, . . . , xnI ), see Figure 9.3. The neurons of a neural network are organized in successive layers, which

are named according to their activation function. In practice, NNs are usually composed of layers of

various types such as fully-connected (every neuron of the layer takes all incoming signals as inputs),

convolutional (it applies a convolution to its input), pooling that will be discussed in Section 9.7.8.1.

x1

x2

......

w1

w2yΣi

y = ϕ (∑

iwi xi) ,

wi, xi, y ∈ [−B, . . . , B].

Fig. 9.3: Neuron computing an activated inner-product in FHE–DiNN.

Figure 9.4 depicts a directed acyclic graph-representation of a model or perceptron M as left-to-

right sequence of dense layers, and in the following we show for more generic, adaptive feed-forward

NN, how to support the homomorphic evaluation of intermediate results. One input layer I, e. g. one

(encrypted gray-scale) pixel per neuron, followed by d hidden layers H` of varying size, e. g. computing

(a bootstrapped, low-noise encryption of) an activated weighted sum per neuron, and one output layer

O, e. g. representing a classification (ciphertext). The wiring, weighted edges of a matrix W(1) of the

first (` = 1), intermediate (W(`))`, and the last layer W(d+1) are applied to an input vector and fed

forward. Counting the total number of neurons in a dense, feed-forward network, we have the upper

bound nI +∑d`=1 n` + nO < nI + d · nH + nO, where the number of inputs is nI , the maximal hidden

neuron count is nH = max1≤`≤d n`, and nO output neurons in a total of d+ 2 network layers.

We remark that neural networks could in principle be recurrent systems, as opposed to the purely

feed-forward ones, where each neuron is only evaluated once. Beginning with works by Cybenko [Cyb89],

Hornik’s general theorem, stated in Theorem 64, shows the universal capabilities of neural networks,

stating that feed-forward networks with one hidden layer containing a finite amount of neurons, are

58


...

x1

x2

x3

xnI−1

xnI

Inputlayer I

...

o1

onO

Ouputlayer O

...

f(1)1 f

(2)1

d Hiddenlayers H`

d

. . . ...

f(d)1

f(d)2

f(d)nd

W (1)

W (d+1)

Fig. 9.4: A generic, dense feed-forward neural network of arbitrary depth d ∈ N.

universal approximators of given smooth function, with constant Lipschitz bound.

Theorem 64 (Universal Approximation Theorem [Hor91, Theorem 4]). Let k,m ∈ N then using any

(non-constant) Ψ : Rk → R, whose m-th derivative is Lipschitz continuous, as activation function can

be used to approximate any function f with bounded derivatives up to m-th order by a neural network

with only one hidden layer and sufficiently many available neurons.

The number of neurons in that layer can grow exponentially, a fact that can be mitigated using

deep neural networks with several layers of non-linearities, which allow to extract increasingly complex

features of the inputs. Hierarchical recognition particularly applies to neural networks in computer

vision for image recognition where it is thought of detecting edges in the first layer, then collections of

edges framing objects, and finally putting together more complex structures.

The FHE–DiNN framework presented in Section 9.4 is able to evaluate such NNs of ad-hoc arbi-

trary depth d, comprising successive layers.

9.3.3 Cost Functions Measuring Neural Networks’ Performance

Cost- or loss functions are defined for samples (x, y) of the data-set Dtraining as a real-valued mea-

sure of deviation of the ground-truth y, provided as labels during supervised training, and the actual

performance of the perceptron y. y is one of the output categories oi, 1 ≤ i ≤ nO.

L0 (y, y) =

0, y = y,

1, else,(9.4)

L1 (y, y) = |y − y|, (9.5)

L2 (y, y) = (y − y)2, (9.6)

Llog (y, y) = −y log y. (9.7)

If a sample matches its label L0 is 1, else 0, disregarding how close the mismatch was. The Mean Squared

Error (MSE) can directly be derived from L2 (y, y) by scaling with the reciprocal training-set size 1m .

The output vector of the computing the loss of all m samples in the training data-set Dtraining can then

e. g. be summed, or analyzed coordinate-wise depending on the application, as L =∑mi=1 L. (yi, yi).

59


9.3.4 Hyperparameters of a Model

Hyperparameters, defining the architecture and topology of a neural network, consist of:

• learning rate α and convergence criterion,

• neural network architecture,

• number of hidden neurons per hidden layer,

• choice of activation function, and finally

• learning parameters W(`), 1 ≤ ` ≤ d+ 1.

After the hyperparameters have been fixed, the learning phase tries to find suitable model parameters.

Training the neural network means on a high-level to assign real-values to edges of a graph that links

the inputs and connects them, in some way, with all principally possible outputs which we deal with in

the next section.

9.3.5 Training: The Learning Phase of a Model

In the learning phase the loss function can be viewed depending on the weight parameters w, β which

are adjusted for sufficiently good global prediction across the entire training data set. In order to

approach optimal parameters, a gradient descent algorithm iteratively updates network parameters so

the loss function approaches a (local) minimum. In an experimental implementation of the devised

training algorithm in Section 9.3.5, we stepped in the direction of steepest descent by learning rate α

to adjust the speed:

w← w − α ∂L∂w

, β ← β − α∂L∂β

.

A building blocks of deep learning is to observe the propagation across the entire network during

training. Given a pair (x, y), we can minimize the loss, when observing intermediate values z and

reporting differences (partial derivatives) to the learning algorithm. The weights of the `th fully-

x = x(0)

W(1), f (1)

z(1)

x(1)

W(2), f (2)

z(2) =W(2)x(1)

x(2) = f (2)(z(2))W(d+1)

z(d+1)

x(d+1) = y

L (y, y) ∈ R

∆x(d+1)∆W(1) ∆W(2)

∆x(1) ∆W(d+1)

Fig. 9.5: Sample forward- and back-propagation through a deep NN, measuring the loss L.

connected layer is a n(`) × n(`−1) dimensional matrix describing the linear part of the transformation

of x(`−1) to x(`). The forward propagation algorithm uses the input x(`−1) and weight parameters

W(`) to compute the intermediate outputs x(`), z(`). Backward propagation algorithm outputs the

necessary differences ∆x(`−1),∆W(`) = ∆x(`−1)f (`)(z(`))W(`)x(`−1) for the parameter update rule

W(`) ←W(`) − α∆W(`). [GBC16, Algorithm 6.2] gives details on how forward- and back-propagation

approach the optimal model M∗ = argminME(x,y)∼Ddata ||y −M(x)||

for predicting the outcome,

thus minimizing the term (cf. Section 9.3.3).

60


9.4 FHE–DiNN: Framework for Homomorphic Evaluation of

Deep Neural Networks

The presented FHE–DiNN framework, with implementations in C++ for CPU execution and CUDA

using GPUs, builds on top of tools presented in the Section 9.2.2.

In [BMMP18] the research goal of the project was giving Cloud-based service providers the possi-

bility to operate neural networks without forcing their customers to reveal sensitive data. Implementing

our solution to the challenging task of encrypted inference of NNs, particular attention was given to

inevitable size-related issues when practically dealing with real-world use cases. In detail, as deep

learning has become most popular in image recognition, the field provided prominent research topics

in the machine learning community which, in a nutshell, is based on using neural networks which are

composed of a high number of stacked layers (cf. Figure 9.4), in order to obtain a level of abstraction

and an ability to extract relevant features from input data which would be problematic to achieve with

shallow networks. On the other hand, this poses serious problems for homomorphic evaluation which

we tackled deploying cryptologic techniques. Roughly speaking, in previous solutions, the complexity

of the system for homomorphically evaluating some operations is connected to the numbers of com-

putations required, with non-linear operations constituting the biggest hurdle. This directly impacts

the overall efficiency of previous system, as an accurate estimation of the complexity of the evaluation

has to be known in advance to set appropriate parameters. Previously, higher complexity translated to

prohibitively cumbersome solutions.

FHE–DiNN eliminates this issue by providing a depth-invariant solution to the problem of neural

network homomorphic evaluation, meaning that the parameters for the homomorphic system can be

set in advance. As they do not depend on the depth of the neural network that is to be evaluated

it is possible to transparently up-scale the back-bone in the Cloud. This opens the way to private-

preserving evaluation of deep neural networks, with a level of efficiency that does not degrade with

the depth of the network itself, yet naturally scales with the number of neurons—linearly. The way

this goal was achieved is through a clever implementation of a procedure called bootstrapping, which

allows a party who only knows public parameters to refresh an intermediate ciphertext. The level of

noise contained therein is lowered, allowing for further homomorphic computations without potentially

losing correctness. To do so, FHE–DiNN takes advantage of a new and extremely fast bootstrapping

procedure introduced by [CGGI16] which allows to apply typically costly, non-linear functions for free

during the necessary bootstrapping operations. This feature is used to apply an activation function

to the multisum (cf. Definition 66), thus transforming a noisy encryption of a value x into a refreshed

encryption of f(x), for some programmed activation function f . By using bootstrapping, and ensuring

that the output of any layer lies in the same ciphertext-space as its input, FHE–DiNN achieves scale-

invariance, which is the key for an efficient evaluation of arbitrarily deep neural networks, allowing to

continue computations after each layer.

9.4.1 Beyond the MNIST dataset: Medical Applications of Image Recognition

One equally thrilling and thriving application avenue is the one of medical imaging, when Dr. Data tries

to give iatric consultation—remotely. A data set comprising macula-centered retinal images [Det18], for

example, were labeled by 54 experts (opthalmologists) using an internationally standardized Clinical

Diabetic Retinopathy scale to classify them as indicating retinopathy conditions ranging from 0 (none)

to ’1’ (mild), ’2’ (moderate), ’3’ (severe), or ’4’ (proliferative) severity. Instead of training a NN on

the MNIST data-set for classifying digits, we can exchange the data for the retinal fundus. A model

may then be able to recognize symptoms and infer likelihood of diabetic retinopathy—possibly with

super-human performance, which might lead to even better than recognition rates than trained medical

61


doctors in the not too distant future.

As bio-medical feature-checks are becoming more pervasive, for instance on airports, our retinal

images become another highly sensitive part of our private-sphere exposed to all-seeing sensors of

big-data systems with never-forgetting storage capabilities. Check-ups for likelihoods of an occurring

disease without sacrificing privacy as retinal scanners, might become a future requirement.

9.4.2 Training Neural Networks: Back-Propagation and Stochastic Gradient Descent

Back-propagation is the standard algorithm for training a neural network, which means finding a

combination of weights that approximates the mapping between inputs and outputs.

Next, we give a high-level overview of the training algorithm and we refer a curious reader to

specialized literature (e. g., [LBOM98]) for more details. The output calculated by the network, shall be

in accordance to the expected output associated to the input. Upon a randomized start, computing an

error function ε (expectedOutput− networkOutput), depending on these quantities is used to introduce

a measure. Incrementally, one modifies the weights of the network in order to minimize the error.

Mathematically, this means following the direction of the error functions’ gradient with respect to the

weights. Calculating the gradient is seemingly the particularly tricky part. In theory, all training pairs

(input, expectedOutput) ∈ Dtraining should be considered. Due to very large data sets this approach may

be impractical, hence applying the stochastic gradient descent (SGD), where the gradient is calculated

by taking only a subset D ⊆ Dtraining of the full training set into account is preferable. Large-scale

machine learning problems verified SGD’s effectiveness in practice also mitigating over-fitting, where

the network learns the training samples by heart, rather than abstracting the underlying structures and

their relationships. In order to prevent over-fitting, regularization techniques model extra terms added

to the error function, by only updating weights with some parameterized fraction, keeping the absolute

value of the weights small. Among many others, these techniques we used in an ad-hoc training for

experiments with FHE–DiNN are described [KH91].

In the specific case of DiNNs, the details of our training procedure are presented in Section 9.5.

9.5 Discretized Neural Networks: Training and Evaluation

In this section we formally define the term discretized neural network (DiNN) and explain how to train

and evaluate them.

First of all, the training phase of DiNNs differs in several ways from the one of classical NNs as

being limited discrete-valued activation functions makes calculating derivatives impossible. Fully ho-

momorphic encryption schemes implementing the state-of-the-art so far cannot support unconstrained

operations on real-valued messages. Traditionally, neural networks also have real-valued weights, which

motivates investigating alternative architectures suitable for homomorphic operations to discretize the

layer’s inputs.

Definition 65 (Discretized Neural Network (DiNN)). A Discretized Neural Network (DiNN) is a feed-

forward artificial neural network whose inputs are integer values in −I, . . . , I and whose weights are

integer values in −W, . . . ,W, for some I,W ∈ N. For every neuron of the network, the activation

function maps the inner product between the incoming inputs vector and the corresponding weights to

integer values in −I, . . . , I.

Before generalizing, we first choose a restricted input space, −1, 1, and sign (·) as a discrete

activation function for hidden layers defined by:

sign (x) =

−1, x < 0,

+1, x ≥ 0.(9.8)

62


This unusual activation function, has the same range −1, 1 as the network’s (binarized) input, which

allows us to maintain the same semantics across different network layers. Generally, when performing

homomorphic evaluations over encrypted input, choosing smallest possible message spaces is inspired

by an increased efficiency of the overall evaluation while ensuring correctness.

9.5.1 Discretizing and Evaluation of NNs

In order to make neural network inference FHE friendly, sufficient discretization of the network’s weights

is done by rounding to the closest multiple of a precision parameter τ , the tick size.

In the following, we use daggers (†) to denote weights that have been discretized through the formula

in Equation (9.9).

w† := processWeight (w, τ) = τ ·⌊wτ

⌉(9.9)

The ad-hoc deployed a learning algorithm for the weights and biases of a DiNN, is a customized

back-propagation procedure with adapted gradient descent, checking for non-existent derivatives and

substituting the values, cf. Section 9.5.2.Although more elaborate manipulation of the training routines,

with hindsight to outputting a DiNN, seems a promising solution for creating FHE-friendly NNs,

achieving the best network accuracies was not the scope of our research.

Converting an NN to a DiNN implies a loss of precision, which typically leads to a worse classification

accuracy. However, experiments that we conducted by applying our generic FHE–DiNN framework

to the MNIST dataset [LBBH98], detailed in Section 9.9, showed only limited accuracy drops, as

intermediate results seem to level out when using approximate homomorphic arithmetic.

The parameter τ from Equation (9.9) is sensitively influences the message space that our encryp-

tion scheme must support for correctness, which directly impacts the efficiency of FHE–DiNN. The

homomorphic evaluation of a neural network is correct if the decryptions and assignment of its output

scores to labels on encrypted inputs x corresponds to the scores an evaluation on clear inputs x during

an alternative execution.

Thus, we need to bound 〈w†,x〉 for every neuron, where w† and x are the discretized weights and

the inputs associated to the neuron, respectively, and define:

Definition 66 (Multisum). We refer to y := 〈w†,x〉 =∑i w†ixi as the multisum of a DiNN’s neuron.

Evaluating a single neuron is computing its multisum followed by an activation y = sign(〈w†,x〉

),

where w†i are the processed weights associated to the incoming wires and xi are the corresponding input

values, depicted in Figure 9.3. Then, evaluating a whole discretized network essentially amounts to

processing the weights obtained in the training phase and propagating the inputs through all the layers,

according to the following, Algorithm 9.3. The neural network classifies inputs, by effectively sorting

Algorithm 9.3: FHE–DiNN of a depth d feed-forward network M.

Input : x ∈ [−I, I]nI the input wires, the network M’s topology and weightsData: τ for pre-processing weight matricesResult: M(x) the model predictionfor ` = 1 to d+ 1 do

W†` ← processWeight (w`,i, τ) = τ ·

⌊w`,iτ

⌉, 1 ≤ i ≤ nd . Pre-processing possibly off-line.

for ` = 1 to d+ 1 do

x← sign(W†

` · x)

. Update the current layer’s input to the activated multisum

return W†O · x . Final multiplication with processed weights associated to the output layer

most probable classes according to their obtained score. The monotonic nature of a typical activation

63


function, would not have an effect on the order. We remark that we do not apply any activation

function to neurons in the output layer, as we expect a consecutive, user-side decryption.

9.5.2 Training a DiNN

Choosing act (·) := sign (·) with binary range ±1 as activation function for the hidden layers had

efficiency reasons. However, this implies issues in the gradient descent algorithm during training as the

gradient of this function is zero almost everywhere.

During our training (using the powerful tools keras [C+15] and Tensorflow [AAB+15] within a home-

brewed Python script) we experimented with replacing the true gradient ddx sign (x) with the rectangular

function as in [CHS+16],

Π (x) =

1, |x| ≤ 1,

0, otherwise.(9.10)

In order to account for small variations in the output of the last layer, we wanted to use a continuous

activation function. We chose a linear approximation act (x) := Tensorflow.nnet.hard sigmoid (x) =

max(0,min(1, x5 + 12 )) as a continuous substitute for the discontinuous sign, which we denote as act (·).Other

continuous or sigmoidal examples include arctan (·) or tanh (·), are regularly used in the machine learning

literature. In Figure 9.1 we plot the most relevant activation functions.

The training algorithm of our network performs a stochastic gradient descent (SGD) on a parame-

terizeable subset of the training set, called minibatch, adapted to our binarization. Since we require the

weights for the evaluation (using Algorithm 9.3) to be FHE-friendly, discrete values in a certain interval,

we also discretize them during the forward-propagation phase. This gives a discrete approximation of

the real-valued weights which minimizes the classification error.

We only sketch the main steps as depicted in Figure 9.5, yet refer the reader to specialized literature

(e. g., [LBOM98]) for more details. First, we initialize the weights to random values and partition the

training set in sub-sets, then for each such minibatch we proceed with in the following 6 steps:

1. Initialize the input wires x to the inputs, which we update through the layers;

2. For each but the last layer, feed-forward the input to the next layer by computing x := act(W†

` · x)

,

where W†` is the matrix that contains the processed weights associated to the `-th layer;

3. Propagate the result to the output layer by computing y := W†O · x, where W†

O is the matrix of

processed weights associated to the output layer;

4. Calculate the error ε as a function of the difference between y and the expected label;

5. Calculate ∇i :=∂ ε

∂Wifor i ∈ 1, 2, the error functions’ gradient with respect to the weights2;

6. Update the weights according to the formula Wi ←Wi − η (∇i + ξWi), where η is a parameter

called learning rate and ξ is a parameter called regularization factor.

An indicator vector is constructed from the expected output, Item 4, such that a digit d ∈ [0, 9] is

mapped to a vector in Z10, whose entries, but the d-th, which is 1, are all 0. This is the target the

network actually tries to approximate.

Training algorithms operate on real weights to achieve good results—seemingly a necessity [CB16].

2The derivative of the sign function is substituted by Equation (9.10).

64


9.6 Homomorphic Evaluation of a DiNN

We give a high level description of our procedure to homomorphically evaluate a DiNN. First thoughts on

the scalability of a scheme evaluating an arbitrarily deep DiNN, with parameters that are independent

of the number of layers, we chose TFHE which performs bootstrapping operations after every neuron in

the hidden layer has been evaluated. This lead to the choice of homomorphically extracting the sign of

the result during bootstrapping. In order to compute the inner product between the encrypted inputs

and the network’s weights we mind bounds the multisum must not exceed to ensure correctness.

9.6.1 Evaluating the linear Component: The Multisum

In our FHE–DiNN framework, we assume that the weights of the network are available to the Cloud

in clear. To compute the linear multisum from Definition 66 performing repeated homomorphic addition

suffices, ensuring that the message space of the chosen encryption scheme is large enough to accommo-

date for the largest possible intermediate result. The scheme’s homomorphic capacity necessitates to

mind the noise level too, which might grow out of bounds and render the result indecipherable.

Extending the message space.

To correctly accommodate for the multisum, we extend the TFHE scheme from [CGGI16], allowing

larger message space, an idea that already appeared in previous works [PW08,KTX08,ABDP15,ALS16].

Construction 67 (Extended LWE-based encryption scheme). Let B be a positive integer and let

m ∈ −B,B be a message. Then we subdivide the torus into 2B + 1 slots, one for each possible

message, and we encrypt and decrypt as follows:

Setup (κ): for a security parameter κ, fix n = n (κ) , σ = σ (κ); return s$← Tn,

Enc (s,m): return (a, b), with a$← Tn, b = 〈s,a〉+ e+ m

2B+1 encoding message m, where e← χσ,

Dec (s, (a, b)): return b(b− 〈s,a〉) · (2B + 1)e.

By scaling with 1/ (2B + 1), input messages m ∈ −B,B are encoded to the center of its corre-

sponding torus slot during encryption, which is reversed by scaling with 2B + 1 during decryption.

9.6.1.1 Correctness of homomorphically evaluating the multisum.

We remark that ciphertexts can be added and scaled by a known constant in the following way: Let

m1,m2 ∈ −B,B denote two messages and c1 = (a1, b1) ← Enc (s,m1), c2 = (a2, b2) ← Enc (s,m2)

be encryptions under a secret key s. For any constant w ∈ Z, we have that

Dec (s, c1 + w · c2) = Dec (s, (a1 + w · a2, b1 + w · b2)) = m1 + w ·m2

as long as (1) m1 +w ·m2 ∈ −B,B, and (2) the noise did not outgrow the homomorphic capacity of

the scheme.

The first condition is easily met by choosing B ≥ ‖M‖1 := maxw ‖w‖1, with ‖w‖1 := maxi |wi|,for all weight vectors w in the neural network at setup time (e. g. by taking their maximum).

Ensuring the first condition by careful parameterization is to prevent overflows that would cause

sign-changes, in our example application of neural networks, as values change from B to −B and vice

versa in a discontinuous way. By design, we have a discontinuity around the value 0 with the sign

activation function. Interpreting the correctness of the classification procedure, the intuitive mitigation

is that the magnitude of a neuron’s output is expressing the confidence that the network has in that

65


specific value. In different terms, a sign change due to the neuron outputting +1 instead of −1 is less

serious than one overflowing B to −B.

9.6.1.2 Fixing the noise

Increasing the message space has an impact on the parameter-choice. Evaluating the multisum with

weights w means that, if the standard deviation of the initial noise is σ, then in the worst case the output

noise is as high as ‖w‖2 · σ (see Lemma 52). This can be compensated adjusting the initial standard

deviation to be a factor maxw ‖w‖2 smaller than the one in [CGGI16]. Moreover, for correctness to

hold, we need the noise to remain smaller than half of a slot of the torus, and as we are splitting

the torus in 2B + 1 slices rather than 2, we need to further decrease the noise by a factor B. This

approach might trade-off security: choosing smaller noise can compromise the schemes’ security but we

mitigate this by increase the dimension of the LWE problem n. In the first part of the bootstrapping

procedure more rounding errors are induced through larger dimensions. In conclusion, the best practical

approach seems to be choosing secure parameters and then tweaking the parameter set that guarantees

the highest accuracy.

9.6.2 Homomorphic Computation of the non-linear sign-Function

We take advantage of the flexibility of the bootstrapping technique introduced by Chillotti et al.

[CGGI16] in order to perform the sign extraction and the bootstrapping at the same time. This

only requires changing the testVector to program the values we want to recover after the bootstrapping.

The first step of the bootstrapping basically consists in mapping the torus T to an object that we will

refer to as the wheel. This wheel is split into 2N “ticks” that are associated to the possible values that

are encrypted in the bootstrapped ciphertext. Programming the testVector means choosing N of these

values; the remaining N are automatically set to the opposite values because of the anticyclic property

of TN [X]. In order to extract the sign, we thus want the top part of the wheel to be associated to the

value +1. The bottom part will then be automatically associated to −1.

Another interesting feature of the bootstrapping procedure of Chillotti et al. is that we can dynam-

ically change the message space during the bootstrapping. This allows us to set smaller parameters

and smartly use all the space in the torus to minimize the errors (by taking bigger slices).

From now on, we say that a bootstrapping is correct if, given a valid encryption of a message µ, its

output is a valid encryption of sign (µ) with overwhelming probability.

9.6.3 Scale-invariance

If the parameters are set correctly then, by using the two operations described above, we can homomor-

phically evaluate neural networks of any depth. In particular, the choice of parameters is independent of

the depth of the neural network. This result cannot be achieved with previous techniques of evaluation

relying on somewhat homomorphic evaluations of the network. In fact, they have to choose parameters

that accommodate for the whole computation, whereas our method only requires the parameters to

accommodate for the evaluation of a single neuron. The rest of the computation follows by induction.

More precisely, our choice of parameters only depends on bounds on the norms ‖·‖1, (resp. ‖·‖2) of the

input weights of a neuron. In the following, we denote these bounds by M1 resp. M2.

Then, the scale-invariance is formally defined by the following theorem:

Main Result 2 (Scale-Invariance of FHE–DiNN’s Homomorphic Evaluation). For any DiNNwith

sign-activation of arbitrary depth, let σ be a Gaussian parameter such that the noise of Bootstrap (bk, ksk, ·)is sub-Gaussian with parameter σ. If bootstrapping yields a correct result on input ciphertexts, i. e. an

66


element of a message space larger than 2M1 + 1 with sub-Gaussian noise parameter σ′ = σM2

for

M1,M2 ∈ N, then the result of the homomorphic evaluation of the DiNN is correct, too.

Proof. The proof is based around a simple induction argument on the structure of the neural network.

Initially, we remark that the correctness of the evaluation of the first layer on an encrypted input is

implied by the encryption scheme’s parameter choice. If the correctness of the input ciphertext cannot

be guaranteed, an initial bootstrapping operation can be performed in order to ensure this property.

Suppose the evaluation is correct for all neurons of the `-th layer, then the correctness for all neurons

of the (`+ 1)-th layer follows from two previous results:

1. homomorphic evaluation of the multisum results in a valid encryption of the multisum, and

2. the result of the bootstrapping is a valid encryption of the sign of the multisum.

The first statement is implied by the choice of the message space, since the multisum’s value, accord-

ing to Definition 66, is contained in [−M1,M1]—the expected value of the ciphertext output by the

homomorphic multisum is the actual multisum of the expected values of its input ciphertexts per layer.

The second one comes directly from the correctness of the bootstrapping algorithm, because the

homomorphic computation of the multisum on ciphertexts with sub-Gaussian noise of parameter σ

yields a ciphertext with sub-Gaussian noise of parameter at most σM2, according to Lemma 52. The

expected value of the ciphertext output by the bootstrapping algorithm is the actual sign of the expected

value of its input ciphertext in each layer.

Hence, the correctness of the encryption scheme ensures that the ciphertexts in the final DiNN layer

are valid encryptions of the scores, when running Algorithm 9.3.

We note that the output ciphertext’s noise distribution of the bootstrapping procedure is sub-

Gaussian; a bound on its Gaussian parameter is already provided in [CGGI16], together with how to

fix the parameters in order to ensure its correctness.

9.7 Optimizations within FHE–DiNN over TFHE

In this section, we present several improvements achieving better efficiency for the actual FHE–DiNN

implementation. We demonstrate the capability of the newly developed optimizations in detail using

the TFHE library in Section 9.9, but these various techniques can without doubt be applied in other

FHE-based applications.

9.7.1 Reducing Bandwidth: Packing Ciphertexts and FFT

One of the drawbacks of our evaluation process is that encrypting individual values for each input

neuron yields a very large ciphertext, which is inconvenient from a user perspective, as a high bandwidth

requirement is the direct consequence. In order to mitigate this issue, we pack multiple values into one

ciphertext. We use the standard technique of encrypting a polynomial (using the TLWE scheme instead

of LWE) whose coefficients correspond to the different values we want to encrypt:

ct = TLWE.Encrypt

(∑i

xiXi

),

where the xi’s represent the values of the input neurons to be encrypted3. This packing technique is

what made Ring-LWE an attractive variant to the standard LWE problem, as was already presented

in [LPR10], and is widely used in FHE applications to amortize the cost of operations [HS14,HS15].

3If the number of input neurons is bigger than the maximal degree of the polynomials N , we can pack the ciphertextby groups of N , compute partial multisums with our technique, and aggregate them afterwards

67


Then, we observe that for each neuron in the first hidden layer, we can compute the multisum with

coefficients wi by scaling the input TLWE ciphertext by∑i wiX

−i.

It follows directly from the distributivity of the multiplication over the addition that the constant

term of(∑

i xiXi)·(∑

i wiX−i) is the inner-product

∑i wixi. We can obtain a private LWE encryption

of it invoking the Extract-algorithm. To speed-up this computation we recall the definition and facts

about the Discrete Fourier Transform (DFT) as in [Bur12], follow [PTVF07, Chapter 12] for imple-

mentation aspects, and depict the data-flow of the processing using its fast variant, the Fast Fourier

Transform (FFT), in Figure 9.6.

Definition 68 (Discrete Fourier Transform (DFT)). Let N ∈ N then the Fourier Transform x =

(x0, x1, . . . , xN−1) of a real vector x = (x0, x1, . . . , xN−1) is defined by

xm = (FN (x))m :=

N−1∑j=0

xj e−πiN mj =

N−1∑j=0

xj ω2mjN , 0 ≤ m < N,

where i is the complex unit and ωmN := e2iπm/N the N−th roots of unity.

The implementation needs to mind precision-issues preventing overflows following [PTVF07], and

the transformation was computed using the well-known FFTW-library [Fri99]. For example, the coef-

ficients of the Fourier transform of a length 16 sequence x, can be expressed in terms of a convolution

of a length 8 signal. This can be further sub-divided into sums requiring length 4 inputs, which are

expressed with two values that terminate the recursion.

x0

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

x13

x14

x15

x0

x8

x4

x12

x2

x10

x6

x14

x1

x9

x5

x13

x3

x11

x7

x15

× × ×× × ×× × ×× × ×× × ×× × ×× × ×× × ×× × ×× × ×× × ×× × ×× × ×× × ×× × ×× × ×

ω0·016

ω1·016

ω2·016

ω3·016

ω4·016

ω5·016

ω6·016

ω7·016

ω0·116

ω1·116

ω2·116

ω3·116

ω4·116

ω5·116

ω6·116

ω7·116

ω0·08

ω1·08

ω2·08

ω3·08

ω0·18

ω1·18

ω2·18

ω3·18

ω0·08

ω1·08

ω2·08

ω3·08

ω0·18

ω1·18

ω2·18

ω3·18

ω0·04

ω1·04

ω0·14

ω1·14

ω0·04

ω1·04

ω0·14

ω1·14

ω0·04

ω1·04

ω0·14

ω1·14

ω0·04

ω1·04

ω0·14

ω1·14

Fig. 9.6: FFT’s divide-and-conquer strategy for power-of-2 lengths; N = 16.

Clever computation and storage of intermediate roots of unity further increases the practical per-

formance. The justification stems from Corollary 69:

Corollary 69 (Convolution and Efficient (FFT) Multiplication). Let N, I ∈ N be powers of 2, for

instance N = 1024, I = 232. The input polynomial x ∈ TN [X] and the weights are embedded in the

first components of vectors as wj ∈ Z, xj ∈WI ⊆ T, 0 ≤ j < N , then using the fast Fourier transform

allows efficient computation of the multisum (cf. Definition 66), exploiting the recursion

(FN (x))m = (FN2

((x2j)0≤j<N2

))m2

+ ωmN2· (FN

2((x2j+1)0≤j<N

2))m

2 +1,

FN (x ∗w) = FN (x) · FN (w) ∈ C,

Multisum (x,w) = (x ∗w) ≡ F−1N (FN (x) · FN (w)) mod 1.

Proof. Following [PTVF07], we conceptually split the real input data vector of length N into the

68


Langrange-Half-Representation to have hj = x2j + i · x2j+1 ∈ C, with x2j , x2j+1 ∈ WI ⊆ T, for

j ∈ 0, 1, . . . , N2 − 1, stored in two double arrays of length N2 . The symmetric properties of the

transformation reveal the recursive relation handy for computation,

xm = (FN (x))m =

N−1∑j=0

xj · ωmjN

=

N2 −1∑j=0

x2j · ωm(2j)N2

+ x2j+1 · ωm(2j)N2

· ωmN2

=(FN

2

((x2j)0≤j<N

2

))m2

+ ωmN2·(FN

2

((x2j+1)0≤j<N

2

))m2 +1

.

This enables the efficient computation of the multisum of two vectors on the torus.

Remark 70. With noise growth of ciphertexts in mind, we note that this approach is equivalent to the

computation of the multisum of LWE ciphertexts—the resulting noise remains unchanged.

We think of x = Encpk(p) ∈ T as an LWE encryption of p a pixel bit (or a whole picture packed into

one TLWE ciphertext x = Encpk(∑

i piXi)∈ TN [X]), and w as public (or company) known weights

per neuron, so we pre-compute the Fourier transform w of w off-line for increased efficiency.

What we end up with is saving bandwidth usage (by a factor up to N , the degree of the polynomials)

basically for free. Furthermore, as the weights of a deployed FHE-friendly neural network stay the same,

we can pre-compute and store the FFT representation of the polynomials∑wiX

−i ∈ TN [X], saving

execution time in the online classification phase.

Summarizing, we reduce the size of the ciphertexts for N plaintexts from N individual LWE ci-

phertexts to 1 packed TLWE ciphertext. In terms of torus elements in T, the cost reduction goes from

N(n + 1) down to N(k + 1). In our practical example, the numbers are 461824 and 2048 respec-

tively, whence we gain a factor n+1k+1 = 225.5 with this optimization while the resulting noise remains

unchanged. The approach allows us to amortize the plaintext input to ciphertext expansion factor

without deploying hybrid encryption techniques (cf. Section 9.7.5) and reduce it by a factor up to the

degree of the involved polynomials.

We remark that the resulting ciphertext is an LWE ciphertext in dimension N , and not the original

n, thus requiring a subsequent key switching step to have the format of a legitimate ciphertext for

further layers in the network. However, this limitation is not a problem thanks to the algorithmic

tweak presented in the following subsection.

9.7.2 Early KeySwitch Allows Faster Bootstrapping

As the size of the bootstrapping key, the final noise level, and the overall run time of external products,

i. e. ExternMulToTLwe in the code, which is the most frequent and costly operation hence a bottleneck

(dealt with in Chapter 10), all depend linearly on the ciphertext dimension, a reduction leads to both,

benefits in memory usage and efficiency. We noticed that the main goal can be achieved simply by

introducing a key switching step in the beginning of the bootstrapping procedure to reduce the LWE

dimension. Experiments with the key switching routine at the end of the bootstrapping procedure

instead yielded worse results, hence we mathematically re-defined the TLWE bootstrapping function:

Bootstrap = SampleExtract BlindRotate KeySwitch,

deviating from Algorithm 9.2. An encoding of the constant coefficient m0 = SampleExtract(c) of a

message polynomial m, with noise variance not larger than c, is obtained by this method. Intuitively,

69


the noise produced by KeySwitch will not be multiplied by ‖w‖2 when calculating the multisum with

this technique, yet will only be added at the end. Basically, this algorithmic refinement moved the

noise of the output ciphertext produced by KeySwitch to an overhead noise term and basically reversed

the usage of the two underlying LWE schemes. Staying in the higher dimensional N -LWE domain, the

lower dimensional n-LWE scheme is only used during the time-critical bootstrapping operation. The

noise term added from the key switching key is not critical, as the result is not used for homomorphic

computations afterwards. Having larger noise thus is a trade-off effectively allowing reduced dimensions,

lowering the time per bootstrapping, at the same security level.

Downsides of working with higher dimensional N -LWE samples are slightly more memory usage

on the server side, larger output ciphertexts, which could be circumvented by applying KeySwitch at

the end, if necessary, and slightly slower addition of ciphertexts. Profiling our code showed that this

procedure is instantaneous, in comparison to the heavy bootstrapping operation, so it is not an issue.

9.7.3 Programming the Wheel

Previously in Section 9.6.3, we have discussed how to homomorphic compute of the sign-function for

free while bootstrapping an encrypted multisum. What bootstrapping of a LWE ciphertext basically

achieves is returning a re-encryption of the encrypted value v with a lower noise level In fact, it can be

seen as two steps: First sampling fresh randomness, and second, encrypting a value that decrypts to

f(v) under the same key. Implementations often set f(v) := v to the identity for obvious reasons.

Intuitively, the first step of bootstrapping in FHE–DiNN of a TLWE ciphertext consists in mapping

any point on the (continuous) torus T to an object that we refer to as the wheel W, see Definition 71 for

a rigorous definition. The discrete wheel W ⊆ T is a set of N ticks on the torus with associated possible

values encrypted in the bootstrapped ciphertext. Figure 9.7 gives an intuition of ticks and slots on the

wheel. The half-wheel H ⊆ W is the programmable part of the wheel. There are 2I + 1 ≤ N slots S

f0

f1

f2

f−2

f−1

...

. . .

fI

f−I

f0, f1, . . . , fO = +1

f−1, . . . , f−O = −1

Fig. 9.7: Programming (the upper half of) the wheel with inputs encoding an arbitrary anti-periodicfunction with period 1/2 (left) and accommodate for output values on the torus, e. g. bootstrapping aciphertext with respect to the sign-function, depicting the discontinuity (right).

on the wheel (of fortune) that represent placeholders that we can round ciphertexts to.

Considering implementation aspects, we need to dive deeper into the inner workings of numerical

calculations on a computer. The set of all floating-point numbers defined in IEEE’s 754-2008 standard,

in binary64 representation, denoted by F232 in the following, are commonly called double precision

or 64 bit floats. Any 64 bit float z plays a prominent role in the GPU context, the first sign-bit

signifying if sign(z) is negative, followed by 11 bits for the (shifted) exponent E(z), and the mantissa

specified to a precision of 53 significant M(z) = (b51b50...b0)2, where 52 are explicitly stored, meaning

70


z = (−1)sign(z)(1.b51b50...b0) · 2E(z)−1023 ∈ F232 . For example, say

z = 0︸︷︷︸sign

10000000000︸︷︷︸exponent

1001001000011111101101010100010001000010110100011000︸︷︷︸mantissa

∈ F232 ,

with the least significant bits at the right end of this representation, we have

z = (−1)0 · (1 + 2570638124657944/252) · 21024−1023

= 1.5707963267948965579989817342720925807952880859375 · 2= 3.141592653589793115997963468544185161590576171875 ≈ π.

In fact, on a classical computer we do not have a (continuous) torus T at any point, but only a

discrete subset representable by the computer’s architecture. In our case, these are all IEEE double

precision floating point numbers, in the interval [0, 1), so T32 := R/Z∩F232 , if we want to be explicit.

Definition 71. The wheel, denoted by WI , is a fixed subset of T32, where I ≤ N is upper-bounded by the

degree of the TLWE polynomials. More explicitly, WI :=((

12I

)Z)/Z ⊆ T32, with only integer-multiples

of 12I . If I = N , we drop the index and write W := WN =

((1

2N

)Z)/Z, for short

The TLWE polynomial degree, and simultaneously the bootstrapping-key’s dimension N in Defini-

tion 71, is determined at FHE.setup()-time as in Table 9.2. If we first specify values fii for each slot

i, possibly different from the identity, and assign them to the right coefficients of the testVector∈ TN [X],

then our bootstrapping-routine does not merely re-encrypt during the on-line phase, but assigns this

chosen value to the result. In more generality, we can consider anti-periodic functions f with period12 ; using f(x + 1

2 ) = −f(x) for 0 ≤ x < 12 , f can be encoded as testVector, if we define point-wise

evaluations fi := f(

i2I+1

)at all available input slots.

The domain of f is scaled by the number of ticks, i. e. inputs from WI , and can be expressed as

f :

WI →WO

i 7→ fi := f(i),

where I is the largest input slot label we can freely choose, and O is the maximal output slot label

that needs to be accommodated for, e. g. in the current layer of the neural network, if we think on the

application implications.

Whenever dealing with the concrete implementation of our functions on a platform, numeric partic-

uliarities need to be taken into account, hence we require methods modSwitchToTorus32 (ν) and its inverse

modSwitchFromTorus32 (ν−1), respectively their equivalents when using the FFT (modSwitchToTorus32FFT,

and modSwitchFromTorus32FFT), to fulfill a precision condition as with rounding from Equation (9.9), e. g.:∣∣∣∣∣∣∣∣∣∣∣∣

∈T︷︸︸︷ν−1

ν (τp)

︸︷︷︸∈W

− τ

p

∣∣∣∣∣∣∣∣∣∣∣∣<

1

2p, τ ∈ T, p > 0.

By default, the testVector := tid(X) = 12O+1

∑N−1i=0 iXi, is a polynomial encoding the (appropriately

scaled) identity fi := i. When bootstrapping to the sign of the input, we used the constant value fi := 1

as element in the scheme’s underlying ring [BMMP18]. It turns out that defining the first fi, 0 < i ≤ I,

instead of all 2I + 1 slots, already assigns fI+i = −fi, 0 < i ≤ I, due to the negacyclic property of the

underlying ring, and f2I+2 = f0 holds. We can hence simply set tf (X) = 12O+1

∑N−1i=0 fiX

i ∈ TN [X]

71


to program the wheel for suitable f as depicted in Figure 9.8.

H ⊆ T32

0 1/2 10 ≡ 11

2 W

Fig. 9.8: When programming the wheel, we encode and assign values to all ticks of the upper half-torusH. The number of ticks on the torus T32 need to accommodate for values that appear in intermediatecomputations for correctness of all homomorphic operations.

We stress that the values we want to program the wheel with, are encoded on the testVector, but

only for the ticks on the top part of the wheel. The bottom values are fixed by the negacyclic property

of the choice of the ring-ideal in TN [X]. For example, in FHE–DiNN if the value corresponding to

the tick i is (an encryption of) v then −v is assigned to (an encryption of) tick N + i.

In order to refresh a noisy LWE ciphertext c, the bootstrapping procedure, particularly BlindRotate,

is rotating the wheel by the (unknown, encrypted) angle ϕs(c) ticks clockwise, cf. Definition 53, and

rounding to the value assigned of the rightmost tick. We modified BlindRotate for efficiency reasons,

which we discuss in Section 9.7.6.

9.7.4 Adaptively Changing the Message Space

In Section 9.6, we showed how to evaluate the whole neural network by induction, using a message

space of 2B + 1 slots, where B is a bound on the values of the multisums across the whole evaluation.

However, in order to be able to reduce the probability of errors along the way, we are able to use

different message spaces for each layer of the DiNN, and adapt the number of slots to the values given

by the local computations, depending on the values of the weights w. In order to do so, we change the

value of testVector for the current layer ` to

1

2I` + 1

N−1∑i=0

1 ·Xi,

where I` is now indexed, and is a bound on the values of the multisums for the next layer `+ 1, instead

of setting B := max` I`. The point of this manoeuver is that if the number of slots is smaller, the

slices are bigger, and the noise would have to be bigger in order to change the plaintext message. This

trick might seem superfluous, because it decreases a probability that is already negligible. However

sometimes, in practical scenarios, the correctness of the scheme is relaxed, and this trick allows us to

obtain results closer to the expected values without costing any extra computation or storage.

Using the same technique as the one that allowed us to evaluate the sign-function in Section 9.6.2,

we can choose to scale up or scale down the testVector, which will in turn scale the plaintext of the

bootstrapped ciphertext (without changing the noise) by the same factor. This allows us to choose a

different message space for each layer of the network, depending on the values of the weights w.

9.7.5 Reducing Communication Bandwidth: Hybrid Encryption

In recent years, small, connected devices with limited storage and computing power, so-called Internet of

Things (IoT) have acquired more and more popularity. This ascension, together with today’s powerful

cloud services offering storage and computing power, has made outsourcing computations a central

72


scenario to study. A user with limited resources is likely willing to store data on a remote server and

benefit from letting the server execute computations on their behalf, which raises security concerns and

privacy challenges.

For example, a user typically does not want any server that is not in their full control, to have full

access to potentially sensitive data, but still wants the server to be able to calculate some function of

the uploaded data. Although this sounds like a paradox, a solution actually exists, theoretically and is

more and more becoming practical, Fully Homomorphic Encryption (FHE). In the simplest setting, say,

Alice encrypts her data under a homomorphic encryption scheme and sends the obtained ciphertexts

to the server. Next in this framework, the server computes some agreed function on encrypted data

under Alices’ public key and sends back an encryption of the result, which can be decrypted by Alice

with the appropriate private key.

Despite its simplicity, this computational model still requires a lot of effort from the user. The most

natural approach, when trying to shift the burden of complex homomorphic computations from the

user to the powerful server, is by only requiring them doing the homomorphic schemes’ key generation

for a session key, using a light-weight symmetric encryption scheme for the bulk of data, and the final

decryption step. In fact, all FHE schemes we know of today, have large keys and comparably heavy

(re-)encryption procedures, so a hybrid framework seems like the way to go.

Details of a hybrid framework.

In a hybrid FHE framework, Alice generates a secret key skSYM for a symmetric encryption scheme,

a pair (pkHOM, skHOM) for a homomorphic public key encryption scheme, and sends pkHOM to the server

together with an encryption HEpk = EncpkHOM(skSYM) of skSYM under pkHOM. Then, Alice encrypts her

data under the symmetric encryption scheme and sends the resulting ciphertexts to the server, which

homomorphically evaluates the decryption circuit of the symmetric encryption scheme, thus obtaining

ciphertexts under the homomorphic scheme. This previous operation is a sort of bootstrapping, or

rather a re-encryption from the HOM scheme to the SYM), which does not involve any circular security

assumption as different keys are used. At this point, the server can homomorphically compute a certain

function f on the resulting ciphertexts and send back the result, encrypted under the homomorphic

encryption scheme and hence decryptable by Alice. Alice can finally recover the result thanks to her

skHOM using the decryption function.

Schematic view of a hybrid stream-cipher framework.

Section 9.7.5 depicts a central trade-off: Shifting on-line to off-line effort by pre-processing key-stream

material. This saves communication bandwidth, as shown in Section 9.7.5, at the user side in exchange

FHE.Encpk(m)

m

k c′ = Enck(m)

FHE.Encpk(k)

for operations on the server side before the usual processing can

continue. The ciphertext expansion, as in Figure 9.10, can be

mitigated:

1. Ciphertexts c = FHE.Encpk(m) (of all known schemes) have

length-expansion factor, 1 ≤ ` = |c||m| .

2. Encrypt message symmetrically c′ = Enck(m) transfer

homomorphically encrypted key K = FHE.Encpk(k) once

along the message(s).

3. Cloud decrypts and applies key K homomorphically to c′

to obtain c.

73


CloudAlice

F (k, r) bi

FHE.Evalf (ci)

Kk = Enck(pk)

key-stream

encrypted key-stream

mi

k

r

c′i

r

ci = FHE.Encpk(mi)

on-line

on-lineoff-line

processing

off-line

FHE.EvalF (Kk, r) = FHE.Encpk(F (k, r))

Bi = FHE.Encpk(bi)

Fig. 9.9: Hybrid Encryption: On-Line vs. Off-Line Processing. F is a public PRF (stream-cipher) keyedwith k and random nonce r. An encrypted version of the homomorphic public key is provided along withr and the ciphertext c′i to the cloud for the off-line setup phase (green). The user symmetrically masksthe message bit (yellow) so that the cloud can synthesize a key-stream such that we end up with anhomomorphic encryption of the message bit mi ready for further processing, e. g. by homomorphicallyapplying f (blue).

Enc

Fig. 9.10: Computing an encryption comes also at the cost of ciphertext expansion.

State-of-the-art ciphers for hybrid encryption.

The well-known AES block cipher is a first choice for the symmetric key primitive of a hybrid FHE

implementation [GHS12]. As the homomorphic evaluations of lightweight block ciphers that have been

investigated [LN14] show that there is room for improvements, for instance by optimizing the multiplica-

tive depth of the circuits implementing the design. The rationale behind this is that multiplications of

FHE-ciphertexts are typically more costly then additions. Following the paradigm of low multiplicative

depth calling for HE-dedicated symmetric key ciphers, many designs with low-noise ciphertexts have

been proposed, such as:

• LowMC [ARS+15]: a flexible block cipher based on an substitution-permutation network (SPN)

structure with 80, 128 and 256-bit key variants. It is designed to minimize the multiplicative

complexity, and also has very low multiplicative depth compared to existing block ciphers.

We stress that interpolation attacks are known against some instances of LowMC [DLMW15],

showing that the ciphers do not provide the expected security level. Some 80-bit key instances

can be broken 223 times faster than claimed and all instances at the 128-bit security level can be

broken about 210 times faster than exhaustively enumerating all 2128 possibilities.

• Kreyvium [CCF+16], sometimes misspelled as Keyvrium: a stream cipher based on Trivium, a

80-bit security cipher, aiming at the 128-bit security level. Like Trivium, Kreyvium achieves

comparable multiplicative depth as LowMC, but has smaller throughput and latency [CCF+16].

• FLIP [MJSC16]: a stream cipher construction that claims to achieve the best of both worlds —

constant yet small noise, which corresponds to a high homomorphic capacity.

74


The community gained confidence in Trivium, well-analyzed during the eSTREAM competition.

The stream cipher Kreyvium can be seen as a variant with comparable security claims. When vetting

the efficiency of the hybrid construction between Alice and the cloud, the hybrid encryption rate

|c′|/|m| is a measure of size-compression of the necessary data-transmissions. The rate is achievable

to be asymptotically close to 1 as message-size grows, for example, when `m = 1 [GB], the hybrid

construction instantiated with Trivium (resp. Kreyvium), yields an expansion rate of 1.08 (resp. 1.16).

There seems to be a gap between block ciphers and stream ciphers: the former achieve a constant

but small homomorphic capacity, while the latter guarantees a larger capacity for the initial ciphertext

that decreases with the number of blocks. In conclusion, blending symmetric ciphers and FHE is

definitely an interesting idea that could prove fundamental for bringing homomorphic encryption in the

real world. Traditionally, three metrics subject to minimization are considered for hybrid encryption:

1. total number of required ANDs per encrypted block measured in [AND/block],

2. number of ANDs per bit measured in [AND/bit],

3. multiplicative depth of the encryption circuit measured in AND-depth.

It turns out that for FHE–DiNN another metric might be better as it uses the TFHE library which

is suitable for low-complexity circuits, not just low multiplication count, as bootstrapping after every

operation is the default. Since neural networks typically operate on floating-point values, TFHE’s

addition and multiplication cannot be readily used, and one needs to build circuits comprised of logic

gates to evaluate functions on numbers encoded as sequences of bits. Minimizing the number of required

ANDs and XORs per encrypted bit is suggested [OP/bit] leading to homomorphic ops per bit: [hops].

Investigating how secure hybrid encryption is by following the general KEM-DEM framework nota-

tion [AGKS05,HK07], a straight-forward HE scheme does not achieve IND-CCA or even IND-CCCA–

security claims. While most considered HE fulfill the IND-CPA security game, often they are vulnerable

to recovery attacks [CT15] due to lack of CCA1 security. In the cloud setting using hybrid FHE, IND-

CPA security is arguably a sufficient level of security as in practice, the protocol can require Alice to

append a signature (c′, σ(c′)) to not only ensure data confidentiality but also authenticity.

9.7.6 Alternative BlindRotate Implementation: Trading-Off Run-Time with Space

We demonstrate several newly developed techniques over the TFHE library, for fast evaluation of

Neural Networks. At first some remarks on a trade-off to faster evaluate functions that need to use

Bootstrapping, dubbed ’windowed Bootstrapping’ in the following.

Unfolding the loop when computing X〈s,a〉 in the BlindRotate algorithm increases the efficiency in

the bootstrapping procedure by reducing the overall number of external products. The number of loop

iterations halves, hence halving the amount of external products—the most costly operation of the

bootstrapping. The technique introduced in [ZYL+17] is to group the terms of a basic rotation two by

two, using the following formula:

Xas+a′s′ = ss′Xa+a′ + s(1− s′)Xa + (1− s)s′Xa′ + (1− s)(1− s′).

To this new function in one step, the bootstrapping key needs be adapted to contain encryptions of the

values ss′, s(1 − s′), (1 − s)s′, and (1 − s)(1 − s′), expanding its size by a factor 2. The computation

necessitates 4 scalings of TGSW ciphertext matrices by (constant) polynomials, and 3 TGSW additions,

while TFHE’s original BlindRotate only needed 1 scaling of a TLWE ciphertext, and 1 TLWE addition

per iteration. In the homomorphic computation of the multisum 〈s,a〉 rounding errors might be induced

on only n/2 terms instead of n and the ciphertext’s output noise is also different due to this approach.

Comparing their technique, we see that [ZYL+17] are able to reduce the noise due to precision induced

75


errors during the gadget decomposition by a factor of 2 while increasing the noise from the bootstrapping

key by a factor of 2.

In this work, we suggest to use another formula in order to compute each term of the unfolded sum.

It is based on the observation that logically for any two (secret key-) bits si, si+1 ∈ B exactly one

of the following four expressions is true (equates to ’1’), while the others are all false (’0’). Working

on the bit level, · = AND, + = − = XOR, just as arithmetic over the field (B,+, ·) is usually defined:

si, si+1 ∈ B : |e = 1|e ∈ si · si+1, si · (1− si+1), si+1 · (1− si), (1− si) · (1− si+1)| = 1.

Whence, the following sum is an invariant constant, regardless of the individual terms’ truth values:

1 = si · si+1 + si · (1− si+1) + si+1 · (1− si) + (1− si) · (1− si+1),

which enables us to express one of the terms as one minus sum of the others, e. g. the last summand:

(1− si) · (1− si+1) = 1 + si · si+1 + si · (1− si+1) + si+1 · (1− si) = 1 + f2(si, si+1).

The resulting alternative BlindRotate algorithm is described in Algorithm 9.4.

Algorithm 9.4: Alternative BlindRotate algorithm with window-size w = 2.

Input : A TLWE Sample c = (a, b).Data: A (possibly noiseless) TLWE encryption ct of the testVector, the window-size w, and the

appropriately extended bootstrapping key bk such that for all i in [n/2], bk3i, bk3i+1, andbk3i+2 are TGSW encryptions of s2is2i+1, s2i(1− s2i+1), and s2i+1(1− s2i), respectively.

Result: A TLWE encryption of Xb−〈s,a〉 · testVectorACC ← Xb · ctfor i = 1 to n/2 do

ACC ← ((Xa2i+a2i+1 − 1)bk3i + (Xa2i − 1)bk3i+1 + (Xa2i+1 − 1)bk3i+2)ACCreturn ACC . The accumulator of individual blind rotations.

Having a constant in the decomposition formula is a valuable advantage, it means that we can move

it out of the external product. Thus, efficiency-wise, we halved the number of external products at the

cost of only 3 scalings of TGSW ciphertexts by constant polynomials, 2 TGSW additions, and 1 TLWE

addition.

For n = 450, w = 2, but generally given appropriately prepared key switching material, we express

pw(X) = Xb−〈a,s〉 = Xbn∏i=1

Xai·si explicitly, for w = 2:

=

n/w∏i=1

[swi · swi+1 ·(Xawi+awi+1 − 1

)+ swi · (1− swi+1) · (Xawi − 1)

+ (1− swi) · swi+1 · (Xawi+1 − 1) + (1− swi) · (1− swi+1)︸︷︷︸=1+fw(si,si+1)

] ·Xb

=

n/w∏i=1

[swi · swi+1 ·(Xawi+awi+1 − 1

)+ swi · (1− swi+1) · (Xawi − 1)

+ (1− swi) · swi+1 · (Xawi+1 − 1) + (1 + fw(si, si+1))] ·Xb,

which is a promising time-memory trade-off for the online phase. It can be extended for arbitrary w|nto pw(X) =

∏n/wi=1 X

∑w(i+1)−1j=wi aj ·sj , expressing an unrolled loop while extending the bk← bk.

76


TFHE ZYLZD17 cuFHE–DiNN

Efficiency

External products n n/2 n/2

Scaled TGSW add. 0 4 3

Scaled TLWE add. 1 0 1

rel. noise overhead δ · 12 · 12rel. out noise(on average)

roundings n(1 + kN)ε2 · 12 · 12from BK n(k + 1)`Nβ2σ2

bk ·2 ·3

rel. out noise(worst case)

roundings n(1 + kN)ε · 12 · 12from BK n(k + 1)`NβAbk ·2 ·3

rel. storage TGSW in the BK n ·2 · 32Tab. 9.1: Comparison of the three alternative BlindRotate algorithms. n denotes the LWE dimensionafter key switching; δ refers to the noise introduced by rounding the LWE samples into [N ] beforewe can BlindRotate; N is the degree of the polynomials in the TLWE scheme; k is the dimension ofthe TLWE ciphertexts; ε is the precision (1/2β)`/2 of the gadget matrix (tensor product between theidentity Idk+1 and the powers of 1/2β arranged as `-dimensional vector (1/2β, . . . , (1/2β)`) ); σbk isthe standard deviation of the noise of the TGSW encryptions in the bootstrapping key, and Abk is abound on this noise. These values were derived using the theorems for noise analysis in [CGGI17].

Multiplying naively by a monomial Xa might be faster than multiplying by a degree 2 polynomial.

Our solution, the FHE–DiNN implementation pre-computes and stores the FFT representation of

the bootstrapping keys (or simply all possibles values for Xa − 1) in order to speed up polynomial

multiplication. A cost reduction when multiplying a polynomial of any degree on-line. The size of the

bootstrapping key bk for w = 2 is a factor of 3/2 larger than bk that TFHE originally used, which is

a compromise between the two previous methods. In [ZYL+17], the noise induced by precision errors

and roundings is halved compared to TFHE’s implementation. We increase the noise coming from

the bootstrapping key by a factor 3 instead, but note that it is possible to reduce this noise without

impacting efficiency by reducing the noise in the bootstrapping key with a different parameter setting.

This is trading-off security (often security of bk is not the bottleneck, so this comes for free). This is a

possible advantage over reducing the noise induced by the precision errors, as efficiency would directly

be impacted, which we recapitulate in Table 9.1.

We remark that the idea could be generalized to larger windowed loop unfoldings consisting of more

than two terms, yielding more possible trade-offs, but we did not explore further because of the dissua-

sive exponential growth in the number of operands in the general formula. Windowed Bootstrapping

works with width w windows, where we choose w = 2 for demonstration, unrolling operations will

increase online performance, but also increase bk size by a factor of 2w−1w .

9.7.7 Support for various Layers of Unlimited Depth

As experts alternate different layers in order to remove redundant information and reduce dimensionality

in the input data, using a deeper topology instead, supporting deep neural networks is key. Such

Multilayer Perceptrons (MLP) have a number of hidden layers with varied numbers of neurons each

and different activation functions.

This layout, the topology of a MLP, is denoted for example by 784:100:10 expressing that the 784–

neuron input layer feed-forward into a hidden layer of 100–neurons and finally one output layer with

77


the 10–possible layers. To illustrate what happens if one trains a network with only one hidden layer

and linear activations, the result is that the hidden layer encodes the principal components [DHS01],

as if performing principal component analysis.

9.7.8 Neural Networks for Image Classification

A convolutional neural network (CNN) architecture is deemed suitable for image classification tasks.

Its layers typically repeat convolutional operations on the input image, followed by a pooling step to

focus on the most prominent features and finally a classical, fully connected layer in order to relate

these features with each others. More specifically, these models apply linear filters, represented as

convolutional kernels, at first. Processing all suitable regions selected in the input followed by a non-

linear activation function, each value is updated in this step. Secondly, pooling is applied to previously

recognized features, typically down-sampling, hence reducing the amount of data. A parameterized

tiling of the input selects for instance the maximum value only, hence focusing on strongest features.

Thirdly, a dense layer tries to encode the information content of the preceding layer.

The above described modules can be executed in sequence, leading to deep networks, in order to

increase recognition rate. In the final step either the most-dominant, predicted label of the classification

is output as result, or softmax activation generates a probability distribution of all possible labels.

9.7.8.1 More Variety of Layers and the Maximum

Our scheme is not restricted to classifying MNISTs gray-scale images, but could be applied to more

sophisticated standard image recognition challenges, e. g., CIFAR, ImageNet, or the Clinical Diabetic

Retinopathy database, where each pixel has 3 values for red, green and blue. More general, this

framework can assist in securing a plethora of other settings where AI is used to categorize data.

Especially, convolution layers have received much attention. Cf. Figure 9.11.

3 feature maps

(e.g. RGB)

heigh

th

widthw

n filters of

size k × k × 3

n feature maps

widthw

heigh

th

Convolutionallayer

input imageapply

. . .

. . .

. . .

. . .

. . .

. . .

Fig. 9.11: A convolution layer applies a sliding kernel to the original image, recombining the weights.

So far no non-interactive algorithm to compute the maximum of many inputs homomorphically is

known. The ReLU(x) = max(0, x) as an activation function yields good results in practice. If we are

able to compute max(x, y) for x, y ∈ T on the torus we can use the formula ReLU(x−y)+y = max(x, y).

78


Being able to calculate the maximum of two encrypted inputs without decrypting, enables to use a

layer heavily used in the literature: max–Pooling depicted in Figure 9.12.

-6 9 3 5

0 7 0 -1

5 1 8 -3

7 -2 8 6

max pooling

k = 2,k× k 9 5

7 8

2

2

Fig. 9.12: Max-Pooling layers work by combining neighbouring values and taking their maximum.

Another useful tool is average pooling, avg–Pooling, which is FHE-friendly due to the linearity in

the publicly known coefficients. The approach to enable pooling is to calculate the maximum (resp. a

weighted average) and replacing several considered inputs by the maximum (resp. a weighted average)

of the neighbors. sum–Pooling is yet another alternative.

Finally the homomorphic maximum enables to cleverly calculate the maximum argument of two

encrypted inputs without decrypting. This can realize the output of a binary classification or, by

appropriate repetition, the classification into one of multiple possible output labels.

argmax1≤x≤NI

P (I (x)) = maxy ∈ I|P (y),

when we speak of perceptrons P more generally.

9.7.9 Interactive Homomorphic Computation of the argmax Function

In [BMMP18] a score vector is output, yet sometimes instead of the distribution, only one output, the

argmax, would be desirable. The last layer of a CNN often consists of a fully-connected layer followed

by a loss-layer, which can be seen as a measure quantifying how the perceptron P mis-classifies inputs

coming from the training set. When it is important that the classifier remains confidential, a way for the

computation of argmax in the Cloud is based on ideas in [BPTG14]. Basically we can use the technique

using FHE–DiNN’s sign–computation as an interactive subroutine. We present an interactive way for

the computation of the argmax function.

Say A, owner of the secret key has, encryptions of m, denoted as [m], that are indecipherable for the

cloud C yet we wish to outsource the perceptron Pevaluation on the encrypted inputs [m]. After the

evaluation, C has encryptions of the NO = 10 scores [si]1≤i≤10 and wants to compute and communicate

their argmax, i. e. argmaxi si to A. Suppose additionally, C does not want A to learn the individual

scores, but only the index of their maximum, while ensuring A that C does not to learn whichever is

the largest. C randomly shuffles the inputs according to some permutation π and asks A for successive

comparisons in an interactive protocol.

If we write [a′], [b′] for each consecutive pair of the new order [s′i] = [sπ(i)], 1 ≤ i ≤ 10 for the random

permutation π of indices. C sends [a′′] = [a′ + r0], [b′′] = [b′ + r1], [c′′] = [sign(a′ − b′)], where r0, r1 is

added randomness to blind the scores from A, but additionally sends sign(a′− b′), which can be seen as

encoding of 1 iff b′ ≤ a′, −1 iff a′ < b′. A decrypts [a′′] = [a′+ r0], [b′′] = [b′+ r1], yet does not compare

them in the clear, as she merely knows which one of the two is the smallest – thanks to [c′′]. A then

re-encrypts (hence re-randomizes) the ciphertext on the position corresponding to the max of [a′] and

79


[b′] and sends the triplet ([a], [b], [c]) back to C. The order needs to be preserved as the third ciphertext

being an encryption of the sign of the first two is indicating their max. Then C uses this information

to remove r0 resp. r1 depending on which position (0 or 1) their sign indicates. C can remove the

blinding factor because it has an encryption of the sign. In the next step their max is compared it with

the next score, meaning the updated elements [a′]← [a′] resp. [a′]← [b′] and [b′]← [s′i+1] is set to the

next score and passed to A. After NO−1 = 9 rounds C knows, thanks to this interactive protocol, that

the index of the largest input in the permuted list without A learning the scores. The correct answer

to the query with index x is argmax1≤x≤NI P (I (x)), with respect to the accessible perceptron P.

9.7.10 Beyond Artificial Neural Networks: CapsNets

Capsule Networks (CapsNets) are a modification to the model of artificial NNs which, in contrast

to biological brains, rather rigidly wire neurons. Following the Hebbian principle from neuroscience,

supposedly units that fire together, wire together ; a single neuron rarely becomes active alone, but along

with k− 1 others for some fixed number k, hence modeled as the k-winners-take-all method. CapsNets

mimic this learning approach architecturally by combining parts of a NN to form a capsule. Capsules

have its output wired to higher-level capsules whose activity vectors have maximal scalar product with

the prediction stemming from the lower-level capsules [SFH17].

As soon as the homomorphic ReLU and Softmax activation function are available the presented

framework can be extended to support homomorphic evaluation of CapsNets. As an example, thinking

of a way how to divide the MNIST images into receptive fields, we see that it is fully-specified by

9× 9 values, the width 20 = (28− 9) + 1 pixel squares densely covering the 28× 28 image, effectively

en-capsuling 28− 9 input pixels into only 1 output feature, each.

Ideally, when given a trained net in a popular exchange format, e. g. .h5-format [Nat19], one can

perform the network-quantization when processing it on the fly or save it in an intermediate format.

Then it is automatically transformed it into a DiNN which supports FHE-friendly evaluation and as

possible final step, derives a deployment-net for inference only, that can be used for efficient prediction,

ready to be run on either CPU s or GPU s.

Our C++ implementation is able to iterate through FHE-friendly neural networks and homomor-

phically apply their layers to the encrypted input, passing sequentially through the layers.

9.8 Practical attack vectors against FHE–DiNN using fplll

Due to faster algorithms resulting from thorough cryptanalysis, when attacking binary-LWE one can

solve instances in higher dimensions compared to standard LWE.

To vet the validity of the LWE-hardness assumption from Assumption 49 in practice, several lattice

challenges [Dar19] have been proposed, e. g. SVP-, Ideal-, LWE-, and Ring-Challenges, to meet all

flavors used in cryptographic constructions. Researchers compete on these platforms, test their crypt-

analytic methods on the instances, and set records, which aids a comparison of various algorithmic

approaches.

9.8.1 Security Reductions: TLWE to appSVPγ

Although there are several ways to show security of TLWE, we follow the approach of seeing TLWE

as a generalization of LWE and Ring-LWE, which are reduced to appSVPγ [LPR10]: The security of

LWE (cf. Assumption 49) in the cases as specified below, is as hard as the NP-hard lattice problem

appSVPγ . For TLWE -instantiations with power of 2 N, k = 1 we readily have a reduction to the ring

variant of LWE with the cyclotomic ring R := TN [X]k = (R[x]/Z[x])k/(XN + 1), and binary secret,

which can be reduced to appSVPγ by applying the following theorem due to Lyubashevsky, Peikert,

80

https://www.latticechallenge.org/svp-challenge/

https://latticechallenge.org/ideallattice-challenge/index.php

https://latticechallenge.org/lwe_challenge/challenge.php

http://web.eecs.umich.edu/~cpeikert/rlwe-challenges/


and Regev [LPR10]. An instantiation with N = 1 and largek is makes TLWE the LWE problem with

binary secret.

Theorem 72 (Quantum Reduction Ring-LWE to appSVPγ). Let N be a power (hence multiple) of 2, a

ring R, β(N) = q/ω(√N logN), for prime q = 1 mod N , γ′(N) = Nω(1), then there exists a uniform

probability distribution χ over R, such that any ppt algorithm A solving RLWEN,q,χ would imply an

efficient quantum algorithm for solving appSVPγ , when setting γ = γ′ · β.

There is a non-tight, classical reduction that incurs only a limited parameter loss [RSW18]. A recent

study predicts the crossover point of classical lattice reduction algorithms, with a quantum algorithm

tailored to the Ring-LWE case exploiting ideal lattice structures [DPW19]. They conclude that their

quantum algorithm provides shorter vectors than BKZ-300, e. g. using the open source version of fplll.

This is roughly meeting the weakest security level of NIST lattice-based candidates, cf. Section 4.4, for

cyclotomic rings of rank larger than about 24000.

9.8.1.1 Reduction ksk to TLWE

Let s, s′ ∈ BN [X]k be two TLWE -keys, we have that s′ ∈R BN [X]k implies s′ · 2−i ∈R BN [X]k. As

an array of TLWE -samples, the hardness of breaking security of the key-switching key, ksks→s′ =

LWEs,α(s′1 · 2−1), LWEs,α(s′2 · 2−2), . . . , LWEs,α(s′n · 2−n) with encryptions of each individually scaled

key-bit as TLWE -sample, reduces immediately to the setting in Theorem 72.

9.8.1.2 Reduction TGSW to TLWE

As TGSW is the matrix equivalent of TLWE (cf. Equation (9.2)), it is an array of TLWE -samples.

9.8.1.3 Reduction bsk to TGSW

As an array of TGSW -samples, the security of the bootstrapping key follows from Section 9.8.1.2.

Before we want to take the perspective of an attacker AFHE–DiNN and study which resources she

can use within the model of Cloud security, in the following, we summarize the theoretical reductions:bsk→ TGSW

ksk

→ TLWE→ appSVPγ .

9.8.2 Theoretical attack vectors against FHE–DiNN

Threat models for the Cloud range from honest, semi-honest to actively malicious ones. Typically, it

is not safe to assume the Cloud as trusted domain, ruling out the honest model option right away.

In the FHE–DiNN use-case model the Cloud does not want to deliberately avoid computational

correctness, for instance outputting nonsense instead of truthfully computing the desired prediction.

Correctness could be enforced by deploying verifiable computation techniques, however. It can actively

try to subvert security by, signaling transmission errors, for example, and requesting multiple ciphertexts

associated with the same plaintext ci = ([µ]s)i, which might naturally differ as the encryption algorithm

is randomized. The client can detect this and prevent the Cloud from obtaining (too many, say

O(N logN)) fresh encryptions, which sets the LWE-sample complexity to N , a small multiple, or at

most O(N), from one TLWE -sample. So the Cloud is left with extracting information from one

ciphertext, the bsk, or the ksk, leading to two different attacks as in Figure 9.13.

Firstly, suppose we have a TLWE Sample c = (a, b) ∈ TN [X]k × TN [X], encrypting the message

µ ∈ TN [X] for s ∈R BN [X]k, chosen uniformly at random with n = kN bits of entropy, we can write

more explicitly: b(X) =∑k`=1 a`(X) · s`(X) + e(X) + µ(X), detailed in Algorithm 9.5.

81


Client

g

FHE–DiNN @ Server

DiNN

m

bk = [s]µ

ksk = [s′]µ

c = [µ]µ

security leak?

Fig. 9.13: Model: Malicious Cloud and sources of leakage of FHE–DiNN ciphertexts and keys.

Secondly, a FHE–DiNN-attacker can try to perform key-recovery from bk and ksk, again having

only access to these ciphertexts.

Remark 73 (FHE–DiNN Ciphertext-Only Security). Algorithm 9.5 describes asymptotically the

best-known FHE–DiNN Ciphertext-Only Attacker ACOA(c).

Algorithm 9.5: FHE–DiNN Ciphertext-Only Attacker ACOA(c)

Input : c ∈ TLWEs,α, with sk = s ∈R BN [X]k

Data: N,αResult: µ ∈ B with µ = Decsk(c)Find µ by using best known algorithms to extract the message from the ciphertext (cf.Section 9.8.2) running in time T COA(N,α) = TTLWE(N,α) = N · TLWE(N,α), and requiresSCOA(N,α) = O(N logN) space.

Remark 74 (FHE–DiNN Full Key-Recovery). Algorithm 9.6 describes asymptotically the best-known

FHE–DiNN key-recovery attacker Absk(c) resp. Aksk(.) depending whether the attacker.

Algorithm 9.6: FHE–DiNN Key-Recovery Attacker (KRA) AKRA(.)

Input : bk ∈ BKs→s′,α, resp. ksk ∈ KSKs→s′,α,t, with sk = s ∈R BN [X]k, sk′ = s′ ∈R BN [X]k

Data: N,α, n, l, kResult: s ∈ BN [X]k with si = Decsk(bski) resp. s′i = Decsk(kski)Find s resp. s′ by using best known algorithms to extract the key from the ciphertext (cf.Section 9.8.2) running in time T KRA = minT bsk, T ksk, with

T bsk(N,α) = n · TTGSW (N,n, l, k, α) = n · l · (k+ 1) · TLWE(N,α) = n · l · (k+ 1) · 2(2·log(cqcs

))−1·n,

resp. T ksk(N,α) = n · TLWE(N,α) = n · 2(2·log(cqcs

))−1·n, and SKRA(N,α) = O(N logN) space.

9.8.3 Security Evaluation and Parameter Choices

In Table 9.2 we highlight the main security parameters regarding our ciphertexts, together with an

estimate of the security level that this setting achieves. Other additional parameters, related to the

various operations we need to perform, are the following:

The security parameters we use for the different kinds of ciphertexts. The estimated security

has been extracted from the plot in [CGGI17] and later verified with the estimator from Albrecht

et al. [APS15]:

82


Ciphertext Dimension − log2(Noise) Estimated security

Input 1024 30 > 150 bits

Key switching key 450 17 > 80 bits

Bootstrapping key 1024 36 > 100 bits

Tab. 9.2: The security parameters we use for the different kinds of ciphertexts. The estimated securityhas been extracted from the plot in [CGGI16] and checked with the LWE-estimator, see Section 9.8.4.1.

• Degree of the polynomials in the ring: N = 1024;

• Dimension of each TLWE instance: k = 1;

• Basis for the decomposition of TGSW ciphertexts: Bg = 1024;

• Length of the decomposition of TGSW ciphertexts: ` = 3;

• Basis for the decomposition during key switching: 8;

• Length of the decomposition during key switching: t = 30;

9.8.4 General Attacks on Variants of LWE

No quantum algorithms are known to speed-up solving the general LWE problem. In the classical

setting [BLP+13], the hardness of LWE is reduced to the hardness of the so-called binary-LWE, where

the secret vector s is, in FHE–DiNN, chosen from BN [X]k, essentially 0, 1N . The binary version of

LWE is in the same complexity class as the classical LWE, yet a factor of log q needs to be considered

in the dimension, i. e. essentially N 7→ N log q. That said, using binary-LWE one has to moderately

increase the lattice-dimension to remain on the targeted LWE security-level. When basing the security

on binary-LWE, an attack using the BKW algorithm, achieving a running time slightly sub-exponential

in N of order 2O( Nlog logN ), becomes more practical [KF15]. Experiments confirm that considerably

larger dimensions for binary-LWE can be practically tackled compared with standard LWE, and they

require a lower sample-complexity [KMW16].

9.8.4.1 LWE-Estimator for Practical Security Evaluation of FHE–DiNN

~0

b1

b2

Zn

L(B)

~0

b′1

b′2

Zn

L(B′)

fplll’s LLLReduction

Fig. 9.14: Step 1: Find a good basis for lattice Λq(A), e. g. usingfplll; Step 2: Sieve out invalid candidates.

Taking the Darmstadt LWE-

Challenges announced in [Dar19]

as a benchmark, the highest-

ranked algorithms to tackle the

hardest parameter-settings used

to be parallelized decoding ap-

proaches. Recently, records for

challenges with pairs (n, α) ∈(90, 0.07), (120, 0.025) have no-

tably been solved using a sieving

algorithm in less than a week on

a powerful machine. An asymptotic attack to LWE is a decoding approach.

Attacks based on a transformation to a Bounded Distance Decoding (BDD) instance derived from

the LWE instance, benefit from secrets smaller than the errors [BG14]. When given a binary-secret

83


LWE instance (Lq(At),b) with error e an attacker can trasform it into a BDD instance by writing(L⊥q (Im |At), (b,~0 )

)(9.11)

with the updated error vector (e, s) ensuring correctness. Solving the original instance is mathematically

equivalent to solving:

(Im |At)

e

s

−b

~0

= 0 mod q. (9.12)

What is the best computational choice depends on the best available implementations and the param-

eterization.

A BDD enumeration instance for LWE with parameters (n, α, q) requires a lattice-basis reduction

algorithm, e. g. fplll’s BKZ algorithm, A ∈ Zm×mq , and a target t ∈ Zmq , promised at most ||t− v|| ∈Θ(αq

√m) away from a lattice vector. The LWE-estimator by Albrecht et al. [APS15] allows looking

up values for the constants cLWE or cBKZ known to the public cryptographic community.

Remark 75 (Asymptotic Complexity of Attacking LWE.). Let α, β ∈ R, q = nα, ‖e‖ = nβ ∈O (poly(n)): TLWE = 2cLWE·n· logn

log(q/‖e‖) , with cLWE(cBKZ) and poly(n)- or 2n-space requirements. At-

tacking LWE in practice consists of two independent steps: lattice reduction and enumeration/sieving.

9.8.4.2 Attacking the Hybrid Approach of FHE–DiNN

FHE–DiNN, based on TFHE, uses a binary secret LWE. One idea of the hybrid scheme is not to

distribute key material as such, but to generate it on the fly using a suitably seeded pseudorandom

number generator (PRNG) adding an additional security assumption to the overall model.

As parameters for a security level of 128 [bit], we set n = m = 450, assume that A is a uniformly

random 450× 450 integer matrix modulo q = 232, interpreted as elements on the torus. The secret and

the error are both vectors of length 450 with independently identically distributed entries according

to a discrete Gaussian distribution of parameter α. In our attack scenario we assume an honest-but-

curious adversary, who observes all devices, other than the users’, obtaining access to full side-channel

information through timing, . . . on the server-side. From a high-level perspective, we obtain N LWE

samples from each TLWE sample , arranged as a matrix (A, b = As + e). The goal is to exploit this

information to mount a lattice-based attack, as parameters in FHE–DiNN were aggressively chosen

in order to allow fast evaluation speeds of neural network. We chose the BDD enumeration algorithm

as most powerful attack combined with pruning strategies to speed-up the decoding, more so as full

enumeration in smaller dimension can be embarrassingly parallelized, for sub-routines [HKM18].

9.9 Experimental Results

You don’t know what you are talking about until you implemented it.—David Naccache

We tried out our proposed FHE–DiNNapproach and list experimental results of our CPU-implementation

in terms of quality of the classification (accuracy) and run-time.

For our test networks, we show a comparison of accuracies when evaluating a trained 784:100:10

network on 10 000 MNIST test images, and contrast it with a network of only 30 hidden neurons.

FHE–DiNN follows the outline presented in Figure 9.17 for our example use-case given a network

able to perform digit recognition.

84


Pre-processing the MNIST database.

Initially, we made a pre-processing step of the MNIST data-set actually decreasing the number of inputs

from 784 to 16 · 16 = 256, possibly sacrificing accuracy of our network. Then we proceeded training

Fig. 9.15: Pre-processing of a Seven from the MNIST test set.

networks on the full 784-input pixel images.

Training a DiNN over Plaintext Data

In order to train the neural network, we first chose its topology, i. e., the number of hidden layers and

neurons per hidden layer, which we note separated by ‘:‘. We experimented with several values, and

settled for a neural network with a single hidden layer composed of 100 neurons demonstrating a good

accuracy, run-time trade-off.

For training the network, we fixed the initial value of the learning rate to η = 0.006 and we used

an approach called learning rate decay : η is multiplied by a constant factor (smaller than 1) every few

epochs. In particular, we multiplied the learning rate by 0.95 every 5 epochs of training. The idea

behind this procedure is that of progressively reducing the “width of the step” in the optimization

process. This results in wider steps – which will hopefully set the cost minimization towards the right

direction – at the beginning of the training, followed by narrower steps to “refine” the set of weights

that has been found. The size of the batches has been fixed to 10 and kept constant throughout the

entire training phase.

The discretization of the weights happens according to Equation (9.9), and choosing the final value

of tick size τ we experimented and found that τ = 10 is a good compromise between accuracy (more

possible values for the weights give the network more expressiveness) and the message space the homo-

morphic scheme will have to support.

In order to determine the noise parameters needed to parameterize the message space, we calculated

the maxima over all the L1-, and L2-norms of the weight vectors associated to each neuron per layer.

These values, once computed at pre-processing phase using the clear weights of the given network, define

the theoretical bounds our scheme is able to support. Evaluating the actual multisums of the complete

training set in one inference run, provide more efficient, practical bounds. An aggressive parameter

choice makes it is possible that specific inputs’ multisum exceeds the bounds leading to inaccuracies

during bootstrapping. If the test set distribution matches the training set distribution closely enough,

this should not observable when evaluating the network on new input samples.

In Figure 9.16 we show the FHE-friendly model of a neural network that we considered in this work.

In Table 9.3 we report the theoretical message space we would need to support and the message space

we actually used for our implementation. Equation (9.13) describes the mathematical representation

85


p1 p28

p784

1p1

Enc(p1)

2p2

Enc(p2)

3p3

Enc(p3)

p784

Enc(p784)

...

784

Hidden layer

1

1

22...

100

...10

Enc(S0)

Enc(S1)

Enc(S9)

Matth

iasMinihold

(0x5A44531D)

Fig. 9.16: Depiction of our neural network with 784:100:10–topology. It takes as input the pixels thatcompose the image and outputs the scores si assigned to each digit.

FHE–DiNN (784:30:10) FHE–DiNN (784:100:10)

‖W‖ worst. avg. ‖W‖ worst. avg.

1st layer 2338 4676 2500 1372 2744 1800

2nd layer 399 798 800 488 976 1000

1st layer ≈ 119 ≈ 69

2nd layer ≈ 85 ≈ 60

Tab. 9.3: The listed FHE–DiNN-settings show ‖W‖ = maxw ‖w‖1 in the first two rows responsiblefor message space adjustments with theoretical worst-case, and experimental, average-case bounds,resp. ‖W‖ = maxw ‖w‖2 in the last two rows.

of our perceptron M (cf. Definition 61) as composition of linear and non-linear functions, compare to

the general setting in Figure 9.4.

FHE–DiNN computes a weighted composition of functions with one input TLWE Sample c0 as

MFHE–DiNN :

TN [X]k −→ (TN [X]k)10

x := c0 7→ y := ~c2 =

100∑`2=1

ϕ1

(784∑`1=1

(c0)`1 · (w0→1)`1

)︸︷︷︸

~c1

`2

· (w1→2)`2 ,

(9.13)

with ten TLWE Samples ~cO := ~c2 ∈ (TN [X]k)10 as output, encrypting the perceptrons’ predicted

digit label likelihoods of the encrypted input c0 := cI = Enc(x) ∈ TN [X]k. Two consecutive,

dense layers were adaptively shaped to form a feed-forward DiNN, trained on (MNIST) samples map-

ping RnI → RnO , categorizing images of digits into 10 classes. The weights w0→1, w1→2 were pre-

computed, approximated, and Fourier transformed. The homomorphic evaluation made intermediate,

bootstrapped activations using ϕ1 = sign necessary, yielding a low-noise encryption of a weighted

86


ServerClient

COOPER

Alice

TLWEN

Encpk(∑

i piXi)

100 TLWEN

·∑i wiX−i

100 N -LWEExtract

100 n-LWEKeySwitch

100 N -LWEBootstrap2Sign

10 N -LWEScores

10 scores Dec7 argmax

Fig. 9.17: Application of private-preserving neural network evaluation using FHE in the Cloud-securitymodel, the user domain is separated from the server.

sum per encrypted input image. Counting the total number of neurons in the example network with

784:100:10–topology, we have nI + nH + nO = 894, where the number of inputs needs to be nI < N

to fit a whole image. We remark that only the hidden neuron’s output was activated with our boot-

strapping routine (Algorithm 9.2).

1. Encrypt the image as a TLWE ciphertext;

2. Multiply the TLWE ciphertext by the polynomial which encodes the weights associated to the

hidden layer. This operation takes advantage of FFT for speeding up the calculations;

3. From each of the so-computed ciphertexts, extract a 1024-LWE ciphertext, which encrypts the

constant term of the result;

4. Perform a key switching in order to move from a 1024-LWE ciphertext to a 450-LWE one;

5. Bootstrap to decrease the noise level. By setting the testVector, this operation also applies the

sign function and changes the message space of our encryption scheme for free.

6. Perform the multisum of the resulting ciphertext and the weights leading to the output layer4

7. Return the 10 ciphertexts corresponding to the 10 scores assigned by the neural network. These

ciphertext can be decrypted and the argmax can be computed to obtain the classification given

by the network.

Comparing our homomorphic evaluation to its classification in the clear we observed [BMMP18]:

Observation 76. The accuracy achieved when classifying encrypted images is close to that obtained

when classifying images in the clear.

Our trained network with 30 hidden neurons, achieves a classification accuracy of 93.55% in the clear

(cf. Table 9.6) and of 93.71% homomorphically. In the case of the network with 100 hidden neurons,

we have 96.43% accuracy in the clear and 96.35% on encrypted inputs. These gaps are explained by

the following observations.

4Note that we do not apply any activation function to the output neurons: we are only interested in being able toretrieve the scores and sorting them to recover the classification given by the network.

87


Observation 77. During the evaluation, some multisum-signs are flipped during the bootstrapping,

which does not significantly decrease the accuracy of the network.

We use aggressive internal parameters (e. g., N , α and, in general, all precision parameters such

as τ) for the homomorphic evaluation, knowing that this could lead the bootstrapping procedure to

return an incorrect result when extracting the sign of a message. In fact, we conjectured that the

neural network would be resilient to perturbations and experimental results proved that this is indeed

the case: when running our experiment over the full test set, we noticed that the number of wrong

bootstrappings is 3383 (respectively, 9088) but this did not change the outcome of the classification in

more than 196 (respectively, 105) cases (cf. Table 9.7).

Observation 78. The classification of an encrypted image might disagree with the classification of the

same image in the clear but this does not significantly worsen the overall accuracy.

This is a property that we expected during the implementation phase and our intuition to explain

this fact is the following: the network is assigning 10 scores to each image, one per digit, and when

two scores are close (i. e., the network is hesitating between two classes), it can happen that the

classification in the clear is correct and the one over the encrypted image is wrong. But the opposite

can also be true, thus leading to classifying correctly an encrypted sample that was misclassified in

the clear. We experimentally verified that disagreements between the evaluations do not automatically

imply that the homomorphic classification is worse than the one in the clear: out of 273 (respectively,

127) disagreements, the classification in the clear was correct 105 (respectively, 61) times, against 121

(respectively, 44) times in favor of the homomorphic one5 (cf. Table 9.7).

Observation 79. Using the modified version of the BlindRotate algorithm presented in Section 9.7.6

decreases the number of wrong bootstrappings.

Before stating some open problems, we conclude with the following note: using a bigger neural

network generally leads to a better classification accuracy, at the cost of performing more calculations

and, above all, more bootstrapping operations. However, the evaluation time will always grow linearly

with the number of neurons. Although it is true that evaluating a bigger network is computationally

more expensive, we stress that the bootstrapping operations are independent of each other and can

thus be performed in parallel. Ideally, parallelizing the execution across a number of cores equal to the

number of neurons in a layer (30 or 100 in our work) would result in that the evaluation of the layer

would take roughly the time of a bootstrapping (i. e., around 15 ms).

9.10 Comparison with Cryptonets

Cryptonets [DGBL+16] propagate real signals encoded as compatible plaintext, encrypted as one huge

input ciphertext. Regarding classification accuracy, the NN used by Cryptonets achieves 98.95 % of

correctly classified samples, when evaluated on the MNIST dataset. In our case, a loss of accuracy occurs

due to the preliminary simplification of the MNIST images, and especially because of the discretization

of the network. We stress however that our prime goal was not accuracy but to achieve a qualitatively

better homomorphic evaluation at the neuron level.

An entire image takes 28 · 28 · 766 kB ≈ 586 MB, a single image pixel takes 2 · 382 · 8192 bits (= 766

kB) with Cryptonets. However, with the same storage requirements, Cryptonets can batch 8192 images

together, so that the amortized size of an encrypted image is reduced to 73.3 kB. In the case of FHE–

DiNN, we are able to exploit the batching technique on a single image, resulting in that each encrypted

image takes ≈ 8.2 kB. Cryptonets has a 784:835:100:10–topology, meaning 945 inner, hidden neurons

5In the remaining cases, the classifications were different but they were both wrong.

88


Overall per Image

# Neurons Accuracy Eval [s] |c| [B] Enc [s] Dec [s]

Cryptonets 945 98.95 % 570 586 M 122 5

Cryptonets? 945 98.95 % 0.07 73.3 k 0.015 0.000 6

FHE–DiNN 30 93.71 % 0.49 ≈ 8.2 k 0.000 168 0.000 010 6

FHE–DiNN 100 96.35 % 1.64 ≈ 8.2 k 0.000 168 0.000 010 6

Cryptonets? is amortized per image (accumulating 8192 inferences)Tab. 9.5: Comparison with Cryptonets and its amortized version (denoted by Cryptonets?). We referto FHE–DiNN on DiNNs with one hidden layer composed of 30 and 100 neurons, respectively.

which we compare to FHE–DiNNs 100 neurons. Their complete homomorphic evaluation of the

network takes 570 seconds, whereas in our case it takes 0.49 s, respectively 1.6 s in the case of a 30

resp. 100 hidden neuron network. We remark that the networks that we use for our experiments are

considerably smaller than that used in Cryptonets, so we also compare the time-per-neuron and, in this

case, our solution is faster by roughly a factor 36; moreover, while Cryptonets supports image batching

(8192 images can be, and needs to be classified in 570 seconds), which results in only 0.07 s per image.

In the simplest use-case the user only wants a single image at a time to be classified, Cryptonets’ ability

to batch images together is not useful in this applications, but potentially in others where the same

user wants to classify a large number of samples at once.

Finally, the most distinctive achievement is scale-invariance, meaning that FHE–DiNN can keep on

computing over the encrypted outputs of an arbitrarily deep network, whereas Cryptonets are bounded

by the initial choice of parameters. In Table 9.5 we present a detailed comparison with Cryptonets.

9.10.1 Performance of FHE–DiNN on (clear) inputs x

The first column of Table 9.6 refers to the evaluation of the trained real-valued model in the clear. In

the second and third one all weights and biases have been discretized; and the difference is applying

a sigmoidal– respectively the sign(·)–activation function for every neuron in the hidden layer. These

numbers are contrasted with the full inference with encrypted inputs in the last column.

Table 9.6 serves as a baseline when comparing our originally trained, real-valued network with

discretizations. Furthermore, the percentages in the third column are the numbers to be compared

with accuracy drops in the encrypted setting.

R-NN DiNN w. hard sigmoid DiNN w. sign FHE–DiNN

30 neurons 94.76 % 93.76 % (−1 %) 93.55 % (−1.21 %) 93.71 % (−1.05 %)

100 neurons 96.75 % 96.62 % (−0.13 %) 96.43 % (−0.32 %) 96.35 % (−0.4 %)

Tab. 9.6: Performance metrics on (clear) inputs x.

9.10.2 Performance of FHE–DiNN on (encrypted) inputs Enc (x)

Finally, in Table 9.7, the results of fully homomorphic evaluation using the sign, a window-size of ω = 2

are presented. An image ciphertext providing 80 bits security takes 8.2 [kB] and an average of 1.64

89


[sec] per classification on an Intel Core i7-4720HQ CPU @ 2.60GHz.

Acc. Disagreements Total wrong BS when dis. Time

30 93.71% (-0.16%) 273 (105–121) 3383/300000 196/273 0.515 s

100 96.26% (-0.17%) 127 (61–44) 9088/1000000 105/127 1.679 s

30 w 93.46% (-0.09%) 270 (119–110) 2912/300000 164/270 0.491 s

100 w 96.35% (-0.08%) 150 (66–58) 7452/1000000 99/150 1.640 s

Tab. 9.7: Performance metrics on (encrypted) inputs Encpk (x), with window size w = 2.

Table 9.7 shows how individual, wrong bootstrapping out of a total of 100 · 10000 do not necessary

lead to a disagreement when evaluating all 10000 images, as well as only small accuracy deviations from

un-encrypted inputs. Interestingly, the trade-off between on-line and off-line computations when in-

creasing the window size, seems to mitigate accuracy losses. FHE–DiNN’s speed of private-preserving

inference and accuracy demonstrates the practicality of homomorphic encryption systems for a broad

range of machine learning algorithms.

The overall perspective from Table 9.6 shows the comparison results and leads to some open ques-

tions. The first column refers to the evaluation of our trained real-valued model in the clear which serves

as baseline. In the second and third columns all real weights and biases have been discretized. The

difference is whether we were applying TensorFlow’s sigmoidal hard sigmoid or the simple sign(·)–activation function for every neuron in the hidden layer. Finally, the last column signifies the fully

homomorphic evaluation using the sign with a window-size of w = 2.

Sizes [kB] LWE TLWE TGSW ksk bsk

CPU 1.80 8.19 49.15 15820.30 22118.40

FFT 7.20 16.38 98.30 31640.60 44236.80

Tab. 9.8: FHE–DiNN ciphertext sizes [kB] for n = 450, N = 1024, k = 1, ` = 3, t = 5.

Table 9.8 lists sizes of FHE–DiNN standard parameterization, which we proposed along our new

constructions for faster, practical FHE evaluation tailored at a wide range of AI applications in the

cloud, n = 450, N = 1024, k = 1, ` = 3, t = 5. We remark that the resource-heavy bsk (or ksk) can be

generated off-line.

90

Chapter 10

FHE & AI on GPUs (cuFHE–DiNN)

10.1 Practical FHE evaluation of neural networks using CUDA

Previous sections show that the run time of the external products, i. e. ExternMulToTLwe(), is the most

frequent and costly operation when evaluating our FHE-friendly neural nets, hence a bottleneck. A

possibility is speeding this step up by implementing parallelized processing on a GPU . Experimentally,

in a GPU -focused follow-up implementation of FHE–DiNN using CUDA dubbed cuFHE–DiNN,

we show figures of FFT-based GPU -versions over the plain CPU -version of individual algorithms:

With N = 1024, k = 1, averaging over 10 runs on random data, e. g. multiplication of numbers show

speed-ups more than 740x on NVIDIA RTX2080 TI compared to Intel(R) Xeon(R) CPU E3-1230 v6.

With N = 512, k = 1, we achieved experimental improvements of more than a factor of 50x

polynomial multiplications by using the complete FFT-transformation, point-wise multiplication, and

the inverse FFT-transformation.

Further, more complex routines show the run-time behaviour 40x (MuxRotate), 38x (BlindRotate,

BlindRotateAndExtract, and Bootstrap_woKS), and 8x (TGSW::ExternMulToTLwe()) for N = 512, k = 1, and of

roughly a factor of 12x (MuxRotate), 12x (BlindRotate, BlindRotateAndExtract, and Bootstrap_woKS), and

more than a factor of 14x (TGSW::ExternMulToTLwe()) for N = 1024, k = 1. The observations of how com-

ponent functions translate these figures to the case of the full homomorphic neural network evaluations

with cuFHE–DiNN, strengthen the confidence that FHE-friendly neural networks will benefit in the

near future from clever, yet trickier, GPU implementations as a next step into practicability of this

technology.

Future directions and open problems.

This work reveals a number of future research directions and, raises several interesting open problems:

The first one is about directly training FHE-friendly DiNNs, rather than simply discretizing an

already-trained model: Is the loss in precision inherent, when converting to DiNNs or is there an alter-

native training algorithm or regularization method for FHE-friendly homomorphic neural networks?

Another natural question is whether it is possible to batch several bootstrappings together using

TGSW-based FHE schemes, in order to improve the overall efficiency of the evaluation: Are there

efficient packing, or batching, techniques for refreshing TGSW ciphertexts in a parallelized fashion?

The methodology and use-case presented in this work is by no means limited to image recogni-

tion, but can be applied to other machine learning problems as well: How far can the FHE–DiNN-

framework be pushed to evaluate increasingly more generic cognitive architectures?

The max function is needed to homomorphically evaluate the widely-used max pooling layers. To the

best of our knowledge, only the introduced interactive technique from Section 9.7.9 can solve this: For

an efficient homomorphic evaluation, meeting the non-interactiveness of FHE, is there a homomorphic

routine (based on TFHE) computing the max function?

91

Part III

Cryptanalysis of FHE schemes

93

Cryptanalysis sets the stage of security claims in cryptology.

In this part we introduce new algorithms to solve variants of the subset-sum problem (SSP), in

particular multi-dimensional versions. We also relate previous discussions on lattice-based primitives,

e. g. as in FHE-constructions, to the SSP.

Furthermore, we study the subset-product problem, a variant which seems not to have undergone

much scrutiny despite it appearing in some contexts of practical interest. Our technical novelties rely on

a combination of maximal hypergraph partitioning, a polynomial reduction to the (one dimensional)

SSP. The resulting subset-product solver can take advantage of the problem’s sparsity much more

efficiently than the usual reduction. We study a property some SSP solvers can be endowed hat finds

each solution with equal probability, which we call equiprobability. We advance evidence that there

exist classical and quantum SSP solvers which are equiprobable.

95

Chapter 11

Underlying primitives and the

subset-sum problem (SSP)

11.1 Introduction

The subset-sum problem (SSP) is one of the most famous NP-complete problems from complexity

theory [Kar72]. It is another link between the classical problems studied in Section 3.2, and modern,

security assumptions based on Learning With Errors. Informally, its computational variant is stated as:

Fig. 11.1: SSP = Knapsack packing.

Definition 80 (The subset-sum problem (Informal)). Given a

set of n integers a1, . . . , an, and a integer target t, find and

output I ⊆ [n], i. e. a subset I comprised of w < n indices, such

that∑i∈I ai = t; or return ∅ signaling failure, if no such set

exists.

As usual, this computational statement has a decisional vari-

ant, where we merely ask if a solution exists. The SSP appears in

the cryptographic literature as a building block for a candidate

public-key cryptosystem [MH78], complexity theory [SE94b], and

combinatorial optimization [Sho86]. In many cases, there is a so-

lution by design, the decisional problem gets little attention, and the hard problem is finding a subset

that solves the SSP.

The integer SSP is referred to as knapsack problem, metaphorically packing items in a knapsack

until it is full (cf. Figure 11.1 ),formulated in a broader, more generic way. As range problem, defined

by a set of integers U , a range of permissible solutions given in terms of left resp. right bound, ` resp.

r, and a cost functional. In the case we focus on solely the set U := 1, 2, . . . , N = [N ] where N = 2n

is exponential in n the number of elements, varying cost functionals, e. g.∑i∈I ai, yet fixed target

t ∈ [` := t, r := t]. In this chapter, we denote by t←$U that t is sampled from a finite set U uniformly

at random. Variables in bold capital letters stand for matrices (e.g., A) and bold lowercase letters

represent vectors (e.g., a).

11.1.1 Links between the subset-sum problem and Learning With Errors

Theoretically studying the asymptotic complexity of lattice-based problems such as Learning With Er-

rors (LWE), and related hard problems like LPN and the integer subset-sum problem or other knapsack

variants is crucial as they are believed to withstand cryptanalysis using efficient quantum algorithms,

cf. Section 11.5. In order to give secure parameter recommendations of concrete instantiations, e. g. for

97

CHAPTER 11. UNDERLYING PRIMITIVES AND THE SUBSET-SUM PROBLEM (SSP)

HE-schemes as discussed in Part II for the use in practice, cryptanalysis needs to consider all alleys of

tackling a problem if necessary, i. e. exponential classical and quantum algorithms. Concrete instan-

tiations like FHE–DiNN discussed earlier, based on LWE ≤ SSP would be jeopardized, when fast,

generic SSP solvers are discovered for average-case instances.

LWE, as defined in Part II of this thesis, is a flexible problem that can be considered with various

error distributions that adds bounded random noise to the system of equations. LWR, a deterministic

version of the problem, has been studied too. There errors are carries occurring when adding numbers

whose sum is overflowing the base leading to deterministic noise.

In [MM11b] Micciancio and Mol analyze the duality of the LWE function family and the knap-

sack function family (defined more generally) and discuss sample-preserving reductions between the

conjectured hard decisional and computational problems, they pose. Regarded even from a greater dis-

tance, both problems can be casted as average-case instances of the bounded distance decoding (BDD)

problem.

Unfortunately, the cryptographic community is not aware of the concrete theoretical connections

between LWE and SSP, in the sense that tight reductions between the two exist.

More specifically, for an integer m ∈ N and a finite, additive Abelian group (G,+) with an element

a ∈ Gm, we define the functional

Fa :

Z −→ G

x 7→ Fa(x) :=∑i xiai.

The functional is conjectured to be a one-way function (cf. Definition 15), only inefficiently invertible

for many choices of Z, and G, i. e. for Z := Zm, G := ZN .

Given a list of integers a := (a1, a2, . . . am) ∈ Zm and a target sum t := Fa(x) =∑i xiai ∈ Z for

a uniformly chosen binary vector x ∈R 0, 1m, the problem of recovering x given (a, t) ∈ Gm+1 is

known as the subset-sum problem. Often a restricted a ∈ Gm is chosen to create an instance of the

subset-sum problem.

Knapsack problems in the group of vectors over Z, G = (Zkq ,+) where the inputs x are sampled from

the given LWE error distribution modulo q can be interpreted as duals of the LWE function family.

A successful attack on the knapsack problem hence possibly means that parameter recommendations

of practical and secure LWE have to be re-assessed and adapted to meet the desired security level.

Any given LWE instance can be translated, with polynomial run-time overhead of the reduction, to

an appropriate knapsack instance, within the same sample-complexity of the error distribution. This

needs to be taken into account in order to derive concrete parameters.

11.2 Solving the subset-sum problem

When we ask ourselves how feasible it is to solve the subset-sum problem, a trivial lower bound is Ω(n),

i. e. the run-time of a solver is necessarily lower bounded by some function linear in n, as clearly we

need to at least read every input ai. Obtaining better, tighter lower bounds requires careful analysis

that often is not straight-forward. It is still an open question whether this lower bound can be improved

in full generality. A natural upper bound is performing an exhaustive search, i. e. computing the cost

functional of all possible subsets and testing if the selected elements sum to the target element, gives

the bound O (2n). We remark that we implicitly omit terms only polynomial in n in the notation here,

and write O(2n), when we want to stress this fact and be more precise about the asymptotic behaviour.

However, we remark that at least some special cases can be tackled more efficiently. Naturally, any

super-increasing sequence, e. g. the powers-of-2, particularly familiar to every mathematician working

in computer-science, do not pose a real challenge, as a ppt algorithm A solving such instances is

98


straight-forward and known as binary decomposition.

Example 81. Let n = 10,a = [1, 2, 4, 8, 16, 32, 64, 128, 256, 512], and w = 5 in combination with the

target sum t = 617 < 210 form a (1, n, n, w,+)-subset problem instance (see Table 11.1).

As t = 512 · 1 + 256 · 0 + 128 · 0 + 64 · 1 + 32 · 1 + 16 · 0 + 8 · 1 + 4 · 0 + 2 · 0 + 1 · 1 = 617 =∑i∈I ai,

I, readily visible from the binary decomposition t = (617)10 = (1001101001)2, is the sought-after subsets

of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 of appropriate size.

Asymptotically, the best generic, classical algorithm for solving the SSP is the meet-in-the-middle

technique (MITM) due to Horowitz and Sahni [HS74], which achieves essentially a O(2n/2

)complexity.

It is a classical time-memory-trade-off (TMTO) applied to the SSP which we contrast with more sophis-

ticated, heuristic algorithms in Table 11.2, and that we take to the quantum domain in Theorem 98.

An important metric is the subset-sum problem’s density, defined by

DSSP :=n

log2

(maxi ai

) .

D < 0.9408

log2(maxi ai)

n

Few ’large’ ai

D ≈ 1

Same extent

1 < D

Many

’small’

ai

Fig. 11.2: Intuition: SSP instances with density D.

When this density is low, DSSP < 0.9408. . . , lat-

tice embedding and reductions (using algorithms

such as LLL [LLL82] or BKZ/BKZ 2 [CN11] that

are proven ppt in that case) provide a polyno-

mial time solution [Sha84,LO85a,CJL+92]. Con-

versely, when the density is high, a result of

Impagliazzo and Naor [IN96] states that distin-

guishing an induced DSSP = 1 SSP-distribution

from a uniform distribution is among the hard-

est. This qualifies subset-sum problem as a base

for pseudorandomness in cryptographic construc-

tions with the density as a good metric, cf. Fig-

ure 11.2 for an intuition. However, if DSSP > 1

attacks based on the generalized birthday–paradox might be an efficient choice [FP05]. As detailed

in Definition 30 a reduction from subset-sum problemto uSVPγ was given in [LO85b], where the pa-

rameter γ is a function of the density — larger DSSP, means smaller γ, and hence the uSVPγ problem

becomes harder.

Another avenue is to focus on average-case complexity, rather than worst-case complexity, at least

over some families of hard instances. Howgrave-Graham and Joux [HGJ10] provided a classical al-

gorithm which can be shown to run in O(20.337n

)and was later improved by Becker, Coron, and

Joux [BCJ11] to O(20.291n

). It is possible to speed-up the algorithm of Howgrave-Graham–Joux and

Becker–Coron–Joux using fine-grained data-access in the quantum setting, as shown by Helm and

May [HM18], who provided a heuristic quantum algorithm with O(20.226n

)average-case complexity on

hard instances combining the best known classical algorithms and quantum walks.

11.2.1 Variants of the subset-sum problem

Various natural extensions of the SSP have appeared, some of which being of industrial interest (e.g.,

in bin packing problems). We remark, that D stands for density and d for dimension in this work.

There are essentially three ways to extend the SSP:

1. The first direction is to replace numbers by vectors. For d-dimensional vectors of integers being

summed, Pan and Zhang [PZ16] show that the lattice techniques of SSP can be adapted provided

99


DM

d = 1

d > 1

d > 1k < n

0.9408 1

Coster et al. [CJL+92]

Pan and Zhang [PZ16]

C: Becker et al. [BCJ11]

Q: Helm-May [HM18]

Section 11.6

Figure 11.9

Section 11.6

Figure 11.9

(1, n, n, w,+)-subset problem

(d, n, n, w,+)-subset problem (d, k, n, w,+)-subset problem

(1, n, n, ·)-subset problem

The

orem

106 T

heorem112

The

orem

120

Fig. 11.3: Landscape of the (d, k, n, w,+)-subset problem family, with the best-known (classical Cand quantum Q, average-case complexity) algorithms, as a function of dimension d and general-ized density DM . Note that any (1, k, n, w,+)-subset problem instance is trivially embedded in a(1, n, n, w,+)-subset problem instance. Studying the (1, n, n, w, ·)-subset problem completes the pic-ture.

that the problem has low generalised density

DMSSP :=n

d · log2

(maxi,j ai,j

) < 0.9408 .

2. The second direction is to replace the ’+’ operation by another operation, such as the product ’·’,cf. [Odl84]). Such problems cannot be harder to solve than an equivalent SSP, see Section 11.6.5.

3. A third way to generalize the SSP is to consider sparsity: in many cases of interest, the vectors

in the d-dimensional variant of subset-sum problem have only few non-zero entries. This sparsity

naturally arises in many concrete instances, and to the best of our knowledge has not been

investigated yet (see Section 11.6).

The listed definitions lead us to introduce a generic notation for SSPand its variants, namely, we use n

to indicate the number of elements, d for their dimension, ? for the operation, and k < n for capturing

sparsity (see Table 11.1).

Abbreviation Problem name d k n w F Complexity T (n)

SSP subset-sum 1 n n w + O(

2n·e+o(1))

SPP subset-product 1 n n w · O(r` · 2`·e+o(1) + h

)MSSP multiple subset-sum d n n w + O

(r · 2n·e+o(1)

)k-MSSP k-sparse multiple subset-sum d k < n n w + O

(r` · 2`·e+o(1) + h

)Tab. 11.1: The (d, k, n, w, ?)-subset problem instantiated on some specific choices of parameters givesas special cases the SSP, SPP, MSSP, and k-MSSP. We also summarize the time complexity of ouralgorithms on these problems, in terms of the complexity of solving the SSP with n inputs. The numberof solutions of a problem is denoted by r, e ≤ 1/2 stands for the exponent appearing in the complexityof an equiprobable SSP oracle, while ` denotes the size of the largest hypergraph partition obtainedfor a matrix representing a sparse k-MSSP problem, and assuming a hypergraph partitioning solverrunning in time O (h).

100


11.3 Contributions

In this chapter we show how to leverage hypergraph partitioning techniques, and a property of SSP

solvers that we call equiprobability, to efficiently solve SSP variants and in particular sparse instances.

Complexity estimates for our algorithms are given in Table 11.1. As an application of our techniques,

we provide a quantum algorithm for the vertex cover problem (which can be seen as a variant of the

SSP) that is faster than the best-known classical algorithm, under the assumption that we have access

to an equiprobable SSP solver.

We prove that at least one simple SSP solver is equiprobable, and conjecture that more efficient

quantum solvers also inherently have this property. We also conjecture that classical solvers that are

not equiprobable can be easily randomized so as to become equiprobable.

Chapter organization. We introduce generic notations to refer to the different variants of SSP.

Section 11.5 defines equiprobability, and we prove that the meet-in-the-middle algorithm with Grover’s

search is equiprobable. We also discuss a variant of LLL giving us a classical equiprobable solver.

We then show how this property allows for efficient reductions for the multi-dimensional SSP, and in

particular the sparse case for which hypergraph decomposition techniques result in measurable speed

up. We provide two applications of our techniques: solving the subset product problem, and designing

a quantum algorithm for the vertex cover problem. For the latter, to the best of our knowledge the

algorithm is new and its complexity is lower than its best known classical counterpart.

11.4 Preliminaries

11.4.1 Definitions of subset-sum problems (SSP)

The existing literature define the subset sum problem in both decisional and computational ways,

tending to create confusion between the versions that are used. To eliminate any chances of potential

mistake, we will define both versions and clearly refer to them. As mentioned in the introduction, we

introduce a notation to capture many variants of the SSPat once:

Definition 82 (Computational MSSP). Let a1,a2, . . . ,an, t be d-dimensional vectors of integers.

The multi-dimensional subset-sum problem (MSSP) asks to find an index set I such that∑i∈I ai = t.

• when the dimension d is set to 1, we simply have the subset-sum problem (SSP).

• the elements in a1,a2, . . . ,an, t are sampled uniformly from a finite domain U , e. g. U = [2n].

Definition 83 (Computational k-MSSP). Let a1,a2, . . . ,an, t be d-dimensional vectors of elements

uniformly sampled from a domain U such that at most k elements are non-zero in the rows defined by

the matrix whose columns are a1, . . . ,an. The k-MSSP asks to find I such that∑i∈I ai = t.

The analogue of the subset-sum problem when the group operator becomes multiplication instead

of addition is the subset product problem, defined as follows:

Definition 84 (Computational SPP). Let a1, a2, . . . , an be a set of integers and t be a target product.

The subset-product problem asks to find I such that∏i∈I ai = t.

For convenience, and later referral, we provide a more explicit generalized Meta-definition for com-

putational SSP variants and a binary operation ?, intending mainly ? = + and ? = ·, here.

Definition 85 (Computational (d, k, n, ?)-Subset Problem). Let a1,a2, . . . ,an, t ∈(Ud)n+1 ⊆

(Nd)n+1

,

such that for each j ∈ [d], |aj,i 6= 0 | i ∈ [n]| ≤ k, where 0 denotes the neutral element for the operation

?. Find a promised solution I ⊆ [n] such that Fi∈Iai = t.

101


Here k is the sparsity, while n measures the size of the problem. The relationship with classical

problems is presented in Table 11.1.

Remark 86. Not all choices of operations make sense on vectors. For instance, When d > 1 and ? = •,we should consider ~u•~v to be the element-wise multiplication of vector entries: ~u•~v := (ui·vi, . . . , un·vn).

A decisional version of this problem can easily be derived. As mentioned in the introduction, in

most cases of interest the existence of a solution is guaranteed (e.g., in cryptographic schemes) and only

the computational problem is to be solved. However, from a complexity standpoint, every decisional

solver of the SSP problem can be converted into a solver for the computational version, with a loss in

reduction which is polynomial. Informally, the strategy is similar to a binary search: Given Σ a search

space, one checks if the decisional solver finds a solution globally. If this is not the case, the reduction

returns ∅. If solutions are found, we find the subset of Σ that generates the solution by “chopping”

parts in the search space: iteratively, we eliminate one of the n original elements while querying the

decisional solver to see if we are still left with a solution. The loop repeats until convergence (no other

elements can be removed without affecting the subset-sum) while the output consists of the remaining

elements.

Remark 87. When ? = +, it is always possible to turn a (d, k, n,+)-subset problem into an equivalent

(d, n, n,+)-subset problem by adding an appropriate constant to all entries. Similarly, in this setting,

one may assume that all entries are positive. Such a technique is not available in the case that ? = •.Definition 88 (Decisional (d, k, n, ?)-Subset Problem). Let ~a1, ~a2, . . . , ~an, t ∈

(Nd)n+1

, such that

for each j ∈ [d], |~aj,i 6= 0 | i ∈ [n]| ≤ k, where 0 denotes the neutral element for ?. Decide whether

there exists a subset I ⊆ [n] such that Fi∈I ~ai = t.

Remark 89. The reduction from decisional to computational SSP is necessarily polynomial. We briefly

sketch the algorithm. Let I be a set, and denote by I0 and I1 disjoint partitions of I. The following

algorithm terminates in at most log2 |I| steps and O (log2 |I|) calls to a decisional solver D: If D(I) is

false then output ∅. Otherwise, initialize a queue Q = I. While Q 6= ∅, pop Is ∈ Q, and split it into

Is = Is0∪Is1 , e. g. by consecutively singling out elements. If D(Is) then: If D(Isi) then add Isi to Q,

for i = 0, 1; if neither happens then output Is as solution. Repeat the partitioning step until a solution

is found.

Definition 90 (Reduced (d, k, n, ?)-Subset Problem). A (d, k, n, ?)-Subset Problem instance (A =

(a1, . . . ,an),~t) is reduced if the following conditions are met:

1. There is no (column) index i such that ai = ~0.

2. There is no (row) index i such that for all j, ai,j = 0.

Reduced instances are particularly nice to work with. While we may assume that any problem is

provided to us in reduced form. We will in fact make use of an even simpler form: to that end we

introduce the following notions.

Definition 91 (Block and bordered block diagonal form). Let A be a sparse matrix. We say that A is

in block diagonal form (BDF), respectively bordered block diagonal form (BBDF) if it can be written

ABDF =

A1 0 0 . . . 0

0 A2 0 . . . 0

......

.... . .

...

0 0 0 . . . Al

, respectively ABBDF =

A1 0 0 . . . 0

0 A2 0 . . . 0

......

.... . .

...

0 0 0 . . . Al

R1 R2 R3 . . . Rl

.

102


We refer to these as BDF, resp. BBDF, for short from now on.

If we interpret A as a graph with labeled edges, then the following definitions make sense:

Definition 92 (Strongly connected instance). An instance is said to be strongly connected when the

graph associated to its definition matrix ~B is strongly connected.

The partition of a graph into strongly connected components can be determined exactly in linear

time using for instance Tarjans algorithm. Without loss of generality, we may assume that we work

on a strongly connected instance. Otherwise, we can safely decompose the instance into independent,

smaller, strongly connected instances. Altogether, we have the following:

Lemma 93 (R-form). Without loss of generality, we can work on strongly connected, reduced instances.

Such instances are said to be in R-form.

Finally, some instances can be put in a nicer form by reordering their columns:

Definition 94 (Normal form). A (d, k, n, ?)-subset problem instance ((a1, . . . ,an),~t) = (A,~t) is in

normal form if we can write A =

(S N

), where S is sparse and of minimal width, D is non-sparse

(dense) and of positive width, and the rows of S are ordered in the lexicographic order.

Instances that can not be put in normal form are called maximally sparse.

11.4.2 Basic components for solving SSP

The most straightforward memory, but least time-efficient approach to solving an SSP instances is to

sequentially enumerate all the partitions; this takes time O (2n). As pointed out earlier, low-density

instances are in fact easier and can be dealt with efficiently by lattice reduction techniques. The point

of this section is to introduce tools to tackle hard SSPinstances, below the O (2n) classical and O(2n/2

)quantum (by Grover [Gro96]) complexity upper bounds.

At this point, the fastest algorithm is due to Helm-May [HM18] for the quantum setting and to

Becker-Coron-Joux for the classical case [BCJ11], see Table 11.2.

Next, we will discuss a basic technique for solving the SSP and how exactly the classical and quantum

versions of it differ.

11.4.2.1 Left-Right splits of the search space

Classical split.

Left-Right Split((a1, . . . , an), t):

0. π ∈ Perm(n)

1. P1 ← π1, π2, . . . , πn2

2. P2 ← πn2+1, πn

2+2, . . . , πn

3. S1 ←

(I1,∑

i∈I1⊆P1ai)

4. T ← Store(S1)

5. for each (I2 ⊆ P2):

6. if s2 := t−∑

i∈I2⊆P2∈ T :

7. return I = I1 ∪ I2

Fig. 11.4: Left/Right Split.

The natural extension of the naive exhaustive search algorithm

is a “split” of the index set I = 1, 2, . . . , n in twain, ob-

taining partitions, disjoint subsets, P0 =

1, 2, . . . , n2

and

P1 =n2 + 1, . . . , n

of the same size [HS74]. For the entries

indexed by P1, the algorithm computes all the possible subset

sums over the power-set of P1 and stores them in a table T of

size O(2n/2

). More efficient data structures may be used, such

as variations of Bloom filters. In a second phase, the algorithm

in Section 11.4.2.1 computes all the subset sums indexed by P2

and checks if(t − Fa(I2)

)∈ T, the table of all possible sums

from the left part.

103


Algorithm Reference log2 T (n)n Q/C Split

Exhaustive Search - 1 C 1

Grover [Gro96] 0.5 Q 1

Left-Right split [HS74] 0.5 C 1/2

Left-Right split + Moduli [SS81] 0.5 C 1/4

Grover + Moduli - 0.375 Q 1/4

Howgrave-Graham-Joux [HGJ10] 0.337 C 1/16

Quantum Left-Right split Theorem 98 0.333 Q 1/2

Quantum Walk - 0.333 Q 1/2

Quantum Walk + Moduli - 0.3 Q 1/4

Becker-Coron-Joux [BCJ11] 0.291 C 1/16

Bernstein-Jeffery-Lange-Meurer [BJLM13] 0.241 Q 1/16

Helm-May (BCJ + Quantum Walk) [HM18] 0.226 Q 1/16

Tab. 11.2: Comparison of the run-time dominating exponents of exponential-time algorithms for solvingthe SSP. Q stands for a quantum algorithm, C for a purely classical one. Domain splits only add apolynomial overhead that we omit in this table.

Quantum splits.

The above algorithm has a quantum counterpart [CRF+09] with time and memory complexity of

O(2n/3

). The speedup is offered by Grover’s search. One should proceed as in the classical case, but

taking two unbalanced splits P1 =

1, 2, . . . , n3

and P1 =n3 + 1, . . . , n

. Storing all the possible

subset sums indexed by P1 requires O(2n/3

)storage space. Checking the subset sums obtained from

P2 occur in the stored table can be done in time O(2n/3

).

11.4.3 Number of solutions for hard SSP instances

Average-case SSP instances.

Throughout this work, we take into consideration only the hard instances of SSP considered in an

average case setting. Such instances can be obtained as follows: let d be the required density of the

problem (say d = 1). The target to set to be t ≈ 2n/d, while (a1, . . . , an)←$ 0, 1, . . . , 2n/d − 1n. Naor

and Impagliazzio [IN96] showed that the hardest random instances are obtained when d = 1. Let ai be

an n-bit number. We define the mapping:

Fa : 0, 1n −→ N, x 7→ Σaixi

Actually the domain is not N but rather some [0, 1, . . . , N < n2n].

Upon generating t by randomly drawing x ∈R 0, 1n and computing t := Fi(ai, xi) you have at

least one solution. The probability for more is low, or more precise: Pr[t = Fa(x) : x ∈R 0, 1n] <

2−n.

Time complexity of the regarded algorithms, in particular the one depicted in Figure 11.8, depends

on the number r of solutions to the SSP. A back-of-the-envelope calculation, using indicator variables

104


that represent, dependent on the instance (a, t), whether the sum over the given set I evaluates to t,

shows that the expected number of solutions is a small constant.

X(a,t)(I) =

1, if t =

∑i∈I ai,

0, otherwise,

the number of solutions Y(a,t) :=∑I⊆[n]X(a,t)(I), as the probability an |I| = n

2 leads to exactly the one

target t < 2n is 12n and there are 2n sets I ⊆ [n]. We analyze more carefully and state the following:

Theorem 95. Let (a, t) be a (1, n, n, w,+)-subset problem instance with a1, a2, . . . , an being sampled

uniformly at random from the set 0, 1, . . . , N = [N ], with N := 2n. The probability p(a,t) that there

exists multiple solutions for this (1, n, n, w,+)-subset problem instance is upper-bounded by

p(a,t) ≤1√2π· Nσ, where σ2 =

(N + 1)2 − 1

12.

Proof. Let X be a random variable taking its values uniformly in [N ]. Let x1, . . . , xn be n realizations

of X. Let Yk =∑ki=1 xk, for all k ≤ n. The distribution of Y (according to [CR07]) is given by

Pr[Yk = t] =1

(N + 1)k

(k

ty

)N+1

, (11.1)

where(ky

)N+1

stands for the y-th multinomial coefficient occurring in the expansion of the polynomial

(1 + x+ . . .+ xN )kand such coefficients can be developed by the recursive formula:

(k

t

)N+1

=

bt(N−1)/Nc∑i=0

(k

t− i

)(t− ii

)N

. (11.2)

As a consequence of Equation (11.1), the probability that there exists k such that Yk = t is

p(a,t) = Pr[∃k ≤ n, Yk = t] =

n∑k=1

Pr[Yk = t] =

n∑k=1

1

(N + 1)k

(k

t

)N+1

.

Hence the probability, that the given subset-sum problem instance has a solution x1 = aI1 , x2 =

aI2 , . . . , xw = aIw , is p(a,t), when the xi are independent and identically distributed in [N ]. Assuming

independence, which is reasonable when k N , the probability of having two solutions is p2(a,t), p

3(a,t)

for three solutions, and so forth. When independence is not guaranteed, we do not have Pr[Yk = t∧Y ′k =

t] = Pr[Yk = t] · Pr[Y ′k = t] in general, but Pr[Yk = t] · Pr[Y ′k = t] serves an upper bound of the joint

probability. Hence, the expected number of solutions to randomly generated (1, n, n, w,+)-instances is

r : = E[Y(a,t)

]= E

∑I⊆[n]

X(a,t)(I) = E

[∣∣∣∣∣

#I :∑i∈I

ai = t, (a1, . . . , an, t)← [N ]n+1

∣∣∣∣∣]

= (11.3)

=

∞∑n=1

npn(a,t) =p(a,t)

(1− p(a,t))2. (11.4)

We use [Ege14], which estimates the probability of Pr[Yk = t] as follows:

Pr[Yk = t] =1

kσ√

2π· e−(t−k·µ)2

2σ2k , where µ =N

2, and σ2 =

(N + 1)2 − 1

12.

105


In this way, we can approximate p(a,t) as follows:

p(a,t) =

n∑k=1

Pr[Yk = t] =

n∑k=1

1

kσ√

2π· e−(t−k·µ)2

2σ2k

≤n∑k=1

1

kσ√

2π(as ex ≤ 1 for x ≤ 0)

≤ 1√2π· nσ.

(as

1

kσ√

2π≤ 1

σ√

2π

)In particular, evaluating the last bound shows that r = 0 if p(a,t) = 0 and,

r <

2, p(a,t) <

12 ,

1, p(a,t) <3−√

52 ≈ 0.3819.

Since nσ → 0, the probability that there exists at least one solution vanishes with large, random

instances. Therefore, we conclude that unless the instance has been “built-in” with a solution in the

beginning, it becomes increasingly unlikely a solution exists.

Remark 96. This concludes Theorem 106, as we can compute r =p(a,t)

(1−p(a,t))2and this number turns

out to be a small constant. Note that when n is large and given that σ =√

(N+1)2−112 with N = 2n, we

have that limn→∞nσ = 0, which essentially approximates p(a,t) to 0.

11.5 Solution Equiprobability and Equiprobable SSP solvers

We can now turn our attention to the essential property used throughout this part: equiprobability.

If we consider an SSP instance that has several solutions, what would a given SSP solver return?

Naturally, a deterministic algorithm (such as the classical meet-in-the-middle) can only be expected to

find the same solution every time it is called; but randomized solvers may exhibit a preference for some

solutions.

As we show, at least in one simple case, there are solvers for which such a preference does not

happen. All solutions are found, with equal probability, and we will leverage this property to build

upon in the coming sections. We introduce the following definition:

Definition 97 (Solution equiprobability). A (d, k, n, ?)-subset problem solver A is called (solution)

equiprobable if ∀(si, sj) solution-pairs of a (d, k, n, ?)-subset problem instance x, we have that

|Pr[si←$A(x, 1n)]− Pr[sj←$A(x, 1n)]| ∈ negl(n).

where negl(n) indicates that the remainder decreases faster than any polynomial in n.

This captures that an algorithm A′ given an appropriately generated hard instance of length n

fulfills Pr[si ← A′(a1, . . . , an, t

)] ≈ Pr[sj ← A′

(a1, . . . , an, t

)], for any i 6= j.

In the rest of this chapter we discuss solvers that rely on classical resp. quantum algorithms and

have the property to be equiprobable.

11.5.1 Equiprobable quantum SSP solvers

We have the following result:

Theorem 98. The presented Grover-type algorithm A (combining Figure 11.6 with Figure 11.6) is a

solution equiprobable (1, n, n, w,+)-subset problem solver, with w = n2 .

106


GroverSearch(f):

1. Initialize a qubit register with |φ0〉 = |1n〉

2. Apply H⊗n to have a uniform superposition over all states

3. Repeat Grover iteration step O(√

2n)

times:

3.1. Phase reflection: Uf

(I⊗H

)3.2. Reflection about the mean: −I + 2A

4. Return measurement x = |φ〉 with f(x) = 1.

Fig. 11.5: Grover’s search algorithm on a list with N = 2n elements (on a high level).

L1|L1| = 3√2n

L2

|L2| =3√

22n

Lout

|Lout| = const

|I1 ∪ I2| = n2

∑i∈I1

ai = t t− t′ =∑i∈I2

ai

Fig. 11.6: Superposition of 2n3 qubits simulates L2 when searching for collisions, collected in Lout.

Proof. Let (a1, . . . , an, t) be an SSP instance with multiple solutions, s1, . . . , sr, such that there exists

i 6= j with:

Pr[si ← A(a1, . . . , an, t

)] 6= Pr[sj ← A

(a1, . . . , an, t

)].

First, the “search” function f is defined such that pairs `2 = (t1 = F (I1), I1) ∈ L1 := L1(t) are encoded

and it takes input pairs (t2 = t − F (I2), I2) ∈ L2, where F denotes the evaluation of the addition

over the respective subsets of indices I1, I2 ⊆ 1, 2, . . . , n and L1 and L2 are lists. It outputs 1 if t is

attainable:

f((t2, I2)) := fL1(t)(`2) =

1, if t = t1 + t2 for any pair (t1, I1) ∈ L1(t),

0, otherwise.

The quantum operator Uf encoding f : L2 7→ 0, 1 is obtained via 1 |`2, y〉Uf−−→ |t2, f(`2)⊕ y〉.

The first part in the meet-in-the-middle algorithm with quantum search computes a list L1 of size

2n3 , storing all the subset-sums generated using subsets of

1, 2, . . . , n3

. L2 represents a list of size

2n3 , putting in superposition all the subset-sums generated using subsets of

n3 + 1, n3 + 2, . . . , n

. For

simplicity, let us assume that n3 ∈ N. L1 is (classically) encoded and accessed by the unitary function

Uf used in Grover’s search algorithm [Gro96]. The constructed superposition of 2n3 qubits allows to

omit storage of the hypothetical list L2 yet search for collisions more efficiently than classical algorithms

would permit. If the first part does not give an output (thus si ← ∅∧ sj ← ∅), the probabilities are the

1We remark that Uf is still a unitary matrix even when multiple roots for f exists.

107


same:

Pr[(si 6= ∅)← A

(a1, . . . , an, t

)]= Pr

[(sj 6= ∅)← A

(a1, . . . , an, t

)]= 0.

The second part consists of applying the Grover’s algorithm, as generally outlined in Figure 11.5. A

Grover iteration itself consists of two steps, applied to a large initial state:

1. Phase reflection: Let

|φ0〉 =1√2

2n3

(1n

3 +1, 1n3 +2, . . . , 1n)

=1

2n3

(1n

3 +1, 1n3 +2, . . . , 1n)

be the initial state, and the subscript indicate the position inside the state’s vector representation.

Let (i, j) be two indices such that f(xi) = f(xj) = 1 for i 6= j. Phase reflection will negate the

values in positions i and j in |φ0〉, thus obtaining

|φ1〉 =1

2n3

(1n

3 +1, 1n3 +2, . . . ,−1i, . . . ,−1j , . . . , 1n),

or for the general case, if the starting state is

|φ0〉 =1

2n3

(vn

3 +1, vn3 +2, . . . , vn),

we end up with

|φ1〉 =1

2n3

(vn

3 +1, vn3 +2, . . . ,−vi, . . . ,−vj , . . . , vn).

2. Reflection about the mean rotates the state towards the target state: This step computes the

average a of the elements in |φ1〉, and applies a transform v → −v + 2a. Through induction, it

is possible to show that the values of both elements in position i and j will be the same in state

|φ2〉, since the starting state is the same. Thus vi = vj , which means that, if the input state is

|φ1〉 =1

2n3

(vn

3 +1, vn3 +2, . . . ,−vi, . . . ,−vj , . . . , vn).

after the reflection about the mean, we arrive at the state

|φ2〉 =1

2n3

(−vn

3 +1 + 2a,−vn3 +2 + 2a, . . . , vi + 2a, . . . , vj + 2a, . . . ,−vn + 2a

).

Again, through induction, one can easily see that after repeating the previous steps for√

22n3 = O(2

n3 )

times, the state |φ2〉 has the same values for the marked elements (in positions) i and j, since the original

state has the values vi and vj set to 1 — up to the normalization factor. Therefore, the probability

to measure the value corresponding to position i is the same as measuring the value corresponding to

position j identifying the algorithm as solution equiprobable solver.

At this point, conjecturally, it seems that the very nature of quantum SSP algorithms makes them

inherently equiprobable; at the very least we do not have an easy counterexample, which would support

the view that equiprobability is a rather natural property of such algorithms. Therefore, we conjecture

that another quantum SSP solver is equiprobable:

11.5.1.1 Equiprobability of the HGJ solver

Theorem 99. The Howgrave-Graham–Joux algorithm [HGJ10] is an equiprobability SSP-solver.

108


Proof. Let (a1, . . . , an, t) be a (1, n, n, w,+)-subset problem instance with multiple solution index-sets

denoted J1, . . . , Jr, such that for i 6= j: Pr[Ji ← A(a1, . . . , an, t

)] 6= Pr[Jj ← A

(a1, . . . , an, t

)] holds.

The HGJ-algorithm computes sets Sijk, for each (i, j, k) ∈ 1, 2 × 1, 2 × 1, 2, as a random

collection of exactly r weight- n16 subsets Iijk of Sijk, where r ≤(n/2n/16

), where M,M1, Ii, Iij are random

and M 6= M1. The computation result in the following lists:

1. For each i, j, Lij1 = (∑(Iij1) mod M1, Iij1) : Iij1 ∈ Sij1;

2. For each i, j, Lij2 = (Jij −∑

(Iij2) mod M1, Iij2) : Iij2 ∈ Sij2;

3. For each i, j, Sij of Iij1∪Iij2 for all pairs (Iij1, Iij2) ∈ Sij1×Sij2 such that∑

(Iij1) ≡ Jij−∑

(Iij2)

mod M1;

4. For each i, Li1 = ∑(Ii1) mod M, Ii1) : Ii1 ∈ Si1;

5. For each i, Li2 = (Ji −∑

(Ii2) mod M, Ii2) : Ii2 ∈ Si2;

6. For each i, Si of Ii1∪Ii2 for all pairs (Ii1, Ii2) ∈ Si1×Si2 such that∑

(Ii1) ≡ Ji−∑

(Ii2) mod M

and Ii1 ∩ Ii2 = ;

7. L1 = ∑(I1), I1) : I1 ∈ S1;

8. L2 = (t−∑(I2), I2) : I2 ∈ S2;

9. S of I1 ∪ I2 for all pairs (I1, I2) ∈ S1 × S2 such that∑

(I1) = t−∑(I2) and I1 ∩ I2 = .

We need to show that the algorithm gives a equal probability of the solutions:

Since M1 is choose uniformly at random we are having the same probability of finding a solution.

In [HGJ10], the authors assume that σM values correspond the many solutions well-distributed modulo

M and since we are choosing the moduli at random in a uniform distribution of I1 ∪ I2 we will have

Pr[Ji ← A(a1, . . . , an, t

)] = Pr[Jj ← A

(a1, . . . , an, t

)].

11.5.1.2 Equiprobability of the Helm-May solver

Theorem 100. The Helm-May algorithm [HM18] is an equiprobability SSP-solver.

Proof. In [HM18], Helm and May tackle the problem with quantum walks combined with the algorithmic

techniques presented in [BCJ11].

Let (a1, . . . , an, t) be an SSP instance with multiple solution index sets denoted J1, . . . , Jr. Because

of an initial permutation, that provides equiprobability in each node of the Johnson graph, connecting

nodes that differ by 1 element, e. g. in the example of Figure 11.7, where BCJ is performed.

Each node in the product of Johnson graphs is selected uniformly at random over all nodes, so for

any i 6= j:

Pr[Ji ← A(a1, . . . , an, t

)] = Pr[Jj ← A

(a1, . . . , an, t

)].

As we presented in the proof of Lemma 98, the measuring of a state that represents a marked vertex,

i. e. a solution Ji or Jj is equal for the algorithm, identifying it is an equiprobable solver.

11.5.2 Equiprobable classical solvers

The case of classical algorithms is not as clear cut. Of course, classical SSP solvers that are deterministic

cannot be equiprobable.

We might turn however a deterministic solver into a randomized one, e. g. by initially randomly

shuffling the input, which does not change the solutions if reverted as a last step, and other techniques.

109


(1,2)

(1,3)

(1,4)(1,5)

(2,3)

(2,4)

(2,5)

(3,4) (3,5)

(4,5)

Fig. 11.7: Example of a Johnson graph, here J(5, 2), as used in Theorem 100.

Definition 101 (Randomized LLL Subset-Sum Solver). Consider a (1, n, n,+)-subset problem instance

given by A = (a1, a2, . . . , an), having D < 0.9408. . . density. The randomized Lenstra–Lenstra–

Lovasz [LLL82] solver performs an initial permutation of the indices of A from which affects the in-

stantiation of the basis matrix for LLL.

In this section, we require a refined definition, and identify the solution set I ⊆ [n] with a binary,

length n-vector x:

Definition 102 ((d, k, n, w,+)-subset problem). The notation of a (d, k, n, w,+)-subset problem in-

stance extends Item 1 of the conditions and specifies the weight wt(x) = |I| = ∑i xi of a solution vector

x.

Definition 103 (Randomized LLL SSP Solver). Let (a, t) denote a hard (1, n, n, w,+)-subset problem

instance, where w := n2 holds for a binary solution vector x, having low-density D < 0.9408. . . , and

let LLL for short, denote the well-known Lenstra-Lenstra Lovasz [LLL82] algorithm. A randomized LLL

solver performs an initial permutation of the indices of a = (a1, a2, . . . , an), which directly affects the

basis matrix B.

Theorem 104. The randomized LLL variant is an equiprobable (1, k, n, w,+)-subset problem solver.

To proof this statement, we are following a line of work of Schnorr et al. [SE94a], take the follow-

ing approach to prove equiprobability of finding solutions of the randomized LLL solver defined as in

Definition 101.

Equivalence of solving the SSP and the associated SVP instance

Starting with the given (1, k, n, w,+)-subset problem instance, a lattice L, resulting from the basis,

as defined in (11.5), is crafted, and we solve it by LLL– or more generally β–BKZ–reducing the matrix

B ∈ Z(n+1)×(n+2) in order to extract a short vector.

Starting with the given (d, k, n, w,+)-subset problem instance, we remark that finding the shortest

non-zero vector in L is equivalent to solving SVP in the lattice defined as intersection of lattices each

embedding one instance L1 ∩ L2 ∩ · · · ∩ Lk [PZ16].

We define a tweak τ ≥ √n for B, a uniformly generated instance of weight wt(x) =∑i xi = n

2 =: w,

which forms an embedding into the lattice basis of L := L(B) by setting

110


B := Bτ (a, t, w) =

bT0

bT1...

bTn

=

1 1 . . . 1 τw τt

τ τa1

τ τa2

In...

...

τ τan

. (11.5)

Theorem 105 (Casting low-density instances as SVP). Finding a short vector b ∈ Zn+2 in the lattice

L := L(B) solves any given, low-density (d, k, n, w,+)-subset problem instance, if the following hold:

1. b ∈ L,

2. ||b|| is small, e. g. the vector is short

3. |b1| = |b2| = · · · = |bn| = 1,

4. bn+1 = bn+2 = 0.

Proof. Suppose a vector b = (b1, b2, . . . , bn, bn+1, bn+2) ∈ Zn+2, returned by the randomized LLL solver

from Definition 101 satisfies the 4 constraints.

Then it immediately gives a valid solution x to the associated SSP, when simply computing x :=

(x1, x2, . . . , xn) as xi := g−1(bi) = |bi−1|2 ∈ 0, 1 for 1 ≤ i ≤ n, which is guaranteed by (condition 3).

Here, g is a simple affine transform over the reals g(x) = 2x − 1 with inverse g−1(x) = (x + 1)/2.

First of all,∑i xi = w ⇔ bn+1 = 0 holds for the penultimate column of B (cf.conditions 1, 4), and

secondly the last column enforces∑i aixi = t as this is equivalent to bn+2 = 0 (cf.conditions 1, 4).

Point 2 and the lower bound on the tweak τ make the LLL algorithm output a reasonably-short non-zero

vector in polynomial time.

On the other hand, suppose a vector x ∈ 0, 1n ⊆ Zn is a solution of (a, t, w), the given

(d, n, n, w,+)-subset problem instance. In particular, x satisfies∑i aixi = t because of the last column

and wt(x) =∑i xi = w as column n+ 1 ensures hence satisfying condition 4. It also corresponds to a

lattice vector, easily seen as specific coefficient vector c,

(c0, c1, c2, . . . , cn) ·B := (−1, x1, x2, . . . , xn) ·B = (b1, b2, . . . , bn+2) = b,

satisfying condition 1. Furthermore, the first n positions are bi ∈ −1, 0, thus condition 3 is fulfilled

and finally 2 holds as b is indeed short.

This shows the equivalence of efficiently solving the SSP and finding the SV in the constructed

lattice L which is guaranteed to happen in polynomial time as D <0.9408. . . and by construction.

Details on randomizing the solver

The reduction above works with the classical LLL algorithm, to obtain an equiprobable solver based on

the randomized LLL, three small modifications need to be taken into account randomized the algorithm:

1. Perform and remember an initial permutation P of the indices of a = (a1, a2, . . . , an) and construct

the basis matrix and initialize BP := P · Bτ (a, t, w) to embed the SSP instance in a lattice. If

there exists more than one solution x, by permuting the basis matrix in an initial step, none of

them is favored by the β–BKZ-algorithm internals, keeping in mind that LLL is β–BKZ with β = 2.

111


2. Fully β–BKZ-reduce BP (cf. Equation (11.5)) using the presented Definition 101.

3. The first row vector is a desired solution to the (1, n, n, w,+)-subset problem instance.

During the call, the SVP oracle subroutine finds the shortest vector of the respective β basis’ row

indices by enumeration of all possibilities (i. e. by exhaustively searching the shortest vector in a low-

dimensional sub-lattice), selects the one with smallest norm, and inserts it into the updated matrix.

This is done for consecutive blocks of size β. When random subsets of size β, of the remaining index

list are taken, the work-load hence the run-time does not increase. The algorithm is merely projecting

on sub-spaces as it proceeds identifying short vectors in the lattice. A short vector is found, without

algorithmically biased preferred choice, if this approach is combined with randomly choosing the sign,

so one of the two equally-short vectors, ‖b1‖ = ‖ − b1‖, is chosen in the end.

For a given low-density instance with density as high as D < 0.9408. . . , the presented solver from

Theorem 104 is a polynomial time algorithm, in the number of elements n.

11.6 Multidimensional subset-sum problem (MSSP)

We are now equipped to discuss the case of the multidimensional SSP. The general case is tackled by

reducing it to the one dimensional SSP, for which it suffices to show that increasing the dimension only

causes a polynomial factor to appear in the complexity analysis.

When given access to an equiprobable SSP solver, the reduction is particularly simple to describe.

Theorem 106 (MSSP ≤Cook SSP). Let MSSP denote an instance of the (d, n, n,+)-subset problem

with a promised solution I. Let OSSP be a solution equiprobable (1, n, n,+)-subset problem solver. As-

suming at least a solution, there exists a ppt algorithm A that solves MSSP querying OSSP a polynomial

number of times qualifying A as a ppt MSSP solver.

Proof. The main observation is that any solution I for MSSP is a solution to the (1, n, n,+)-subset problem

SSP` defined by bT` and t`. Thus, if SSP` has no solution, for any i, then we are certain that MSSP has

no solution. However, SSP` may have several solutions, even if MSSP has only one. These observations

lead to the algorithm detailed in Figure 11.8.

It remains to show that A terminates in polynomial time, and makes polynomially-many requests

to OSSP. First observe that if OSSP(bT` , t`) fails on any ` (no solutions), then A terminates by returning

∅ in O (1) steps in the worst case.

We turn to the case where the solution returned by OSSP(bT` , t`) is valid, but it does not satisfy the

MSSP constraint, i.e. the loop fails at some row index `. Let I` = (I1, I2, . . . , Ir) be the set of solutions

to SSP`. Since OSSP is solution equiprobable, it picks uniformly at random one solution Ik from I`.Thus in the worst case, we try O (r) calls to the oracle, and perform O (r) operations. We prove in

Theorem 95 that r is small if the distributions of elements is uniform.

The general case can be specialized in two ways. The first is to consider low-density instances; the

other is to consider sparse instances. The low-density case has already been discussed and can be dealt

with the algorithm of Pan and Zhang [PZ16], using a concrete lattice reduction algorithm such as LLL

as an SVP oracle. However, the sparse instances have not been previously identified as a potentially

interesting family; as we show, the situation k < n lends itself to several improvements that allows for

much faster algorithms.

A first step will be to turn the problem instance into a reduced BBDF form, with blocks A1, . . . ,As

(which may have different dimensions). Then, for each block(Ai,Ri

), we can tackle the k-MSSP

independently.

112


Algorithm A((b1,b2 . . . ,bd), t,OSSP

):

1. `←$ 1, . . . , d

2. I←$OSSP(bT` , t`)

3. if I = ∅ then return ∅

4. for each row k ∈ 1, 2, . . . , d:

5. t←∑j∈I bk,j

6. If t 6= tk then goto 2 at most O(r) times.

7. return I

Fig. 11.8: A (d, n, n,+)-subset problem solver.

11.6.1 Reducing SSP instances

Consider a d-dimensional SSP instance, with defining matrix A = (a1, . . . ,an). In this section we recall

that transforming A into¡ bordered block form is a special case of the hypergraph partitioning problem:

Definition 107 (Hypergraph Partitioning). Given a hypergraph H = (V,N ), where V is the set of

vertices and N is the set of hyper-edges (or nets) and an overall load imbalance tolerance c such that

c ≥ 1.0, the goal is to partition the set V into ` disjoint subsets, V1, . . . ,V` such that the number of

vertices in each set Vi is bounded by |V|/(c · `) ≤ |Vi| ≤ c · |V|/`, and a function defined over the

hyper-edges is optimized.

The hypergraph partitioning problem, while being hard in general (it is naturallyNP-complete), can

be efficiently approximated on sparse matrices when the number ` of partitions is small. Furthermore,

modern algorithms can leverage the ability of several processors to speed up computation: a parallel

multilevel partitioning algorithm presented in [TK06] has an asymptotic parallel run-time of O (n/p) +

O(p · `2 · log n

), where p is the number of processors.

In the following, we show the conversion into the bordered block diagonal form and will assume,

without loss of generality, that the matrix A defining the (d, k, n,+)-subset problem instance is given

in reduced BBDF.

11.6.2 Conversion into bordered block diagonal form (BBDF)

We detail how to use hypergraph partitioning techniques to turn the defining matrix into a bordered

block diagonal form (BBDF). Note that each block in the matrix can be dealt with independently. If a

(d, k, n,+)-subset problem instance cannot be further decomposed into smaller independent instances,

we say that it forms a single block.

Tewarson [Tew67] shows a transform that extracts the block diagonal form from a sparse, non-

singular matrix A. Aykanat, Pinar and Catalyurek [APC04], show how to use hypergraph techniques

in order to convert any rectangular sparse matrix into a BBDF, a result we use. Concretely, given a

sparse matrix A, matrices P ,Q are constructed such that the resulting matrix is BBDF.

Theorem 108 (Aykanat et al. [APC04]). Let HA = (V,N ) be the hypergraph representation of a given

matrix A. An `-way partition ΠHP = V1, . . . , V` = N1, . . . , N`, NS of HA gives a permutation of

A to an `-way BBDF form, where nodes in Vi, resp. internal nets in Ni, constitute the columns and

113


rows of the `-th diagonal block of ABBDF , resp. external nets in NS constitute the coupling rows of

ABBDF . Thus,

• minimizing the cutsize minimizes the number of coupling rows;

• balance among subhypergraphs infer balance among diagonal submatrices

We sketch the algorithmic approach described in [APC04]. The natural idea is to use hypergraph

partitioning to generate the blocks. A net ni is created for each row in A, while a vertex vj is set for

each column. Net ni will link the vertices (columns) having non-zero entries in row i. The connectivity

λi is defined as the number of partitions the net connects (all internal nets connecting vertices inside

a single partition have connectivity λ set to 1). Then a hypergraph partitioning is applied and the

matrix is reconstructed in a new form: the columns belonging to a partition Ui placed adjacently, while

the rows are indicated by the nets in the partition. The cost function that is used for the partition is

denoted by the connectivity coefficient of a net component — denoted λi (representing the number of

vertices connected by that net). Thus the cost function is simply set to cost(ΠA) :=∑

(λi − 1) .

Remark 109. In the context of matrix multiplication, the purpose is to minimize the number of par-

titions. However, in our case, under the hypothesis that each partition defines an MSSP with few

solutions, we would benefit from a larger number of partitions, since we would feed the OSSP solver with

a version of the problem as described below.

11.6.3 Solving k-MSSP for one BBDF–reduced block

Henceforth we shall consider a single reduced block of size n, dimension d, and sparsity k. Note that

after reduction, the sparsity coefficient k of the block is not the same as the original problem’s (this is

naturally also true of n and d).

One of the key features of having a reduced instance is that it lends itself to the following: by

summing up all the rows, we get a reduced (1, n, n,+) instance, which we call S. The key point is to

observe that a solution to the original problem is also a solution to S. Thus, we leverage a solution

equiprobable solver and use the following result:

Lemma 110 (Shadow casting). Consider a (d, k, n,+)-subset problem instance, defined by the matrix

B and target ~t. Let a′ be the vector obtained as follows:

a′j =

d∑i=1

Bi,j , ∀j ∈ [n] .

Then a solution to the problem defined by (A,~t) is a fortiori a solution to the (one dimensional) problem

defined by (A′, t′) = ((a′1,a′2, . . . ,a

′n), t1 + t2 + · · ·+ td)

Remark 111. We remark that there may be solutions to the shadow of a SSP instance that are not

a solution to the original MSSP problem. These inconsistent solutions are related yet not always help,

but if we assume access to an equiprobable solver OSSP, one can simply call OSSP a few times until the

correct answer is found for the original problem.

Also, by design, the shadow of a reduced instance does not contain any zero. It is thus indeed a

(1, n, n,+)-subset problem. Lemma 110 is not sufficient to decrease the SSP problem’s density as in the

shadow of an instance the maximal element may only be larger, not smaller, than before shadow–casting.

11.6.4 Assembling the blocks

Once every block has been solved, possibly in parallel, it is straightforward to reassemble a solution to

the original problem. As a result, the worst case complexity for the complete algorithm — which takes

114


an unreduced d · n SSP matrix as input, and returns a solution if it exists — can be analyzed:

Theorem 112. An instance of the (d, k, n,+)-Subset Problem problem can be solved in time O(r` · 2`·e + h

)using an equiprobable SSP solver OSSP and a hypergraph partitioning solver OHP running in time O (h).

Here, r stands for the maximal number of solutions in a block SSP instance, while ` stands for the width

of the largest block returned by OHP.

Proof. There are two problematic points: one has to deal with the “false positives” resulting from

shadow casting (which is done by repeating the algorithm O(r) times), and one has to check that the

residual rows, denoted as Ri in the BBDF, are taken into correctly compensated by the solution. This

translates to the fact that if (I1, . . . , I`) does not represent a solution set for (R1‖ . . . ‖R`), one may

need to obtain a new set of solutions and check it (or store the current ones and check each new Ii).

Note that while it is possible, for low-density blocks, to use efficient techniques leveraging SVP

oracles, we require that such oracles be equiprobable for the above result to apply.

Lemma 113. Let O be an SSP oracle that is equiprobable according to Definition 97 running in time

O (2n·e), for some positive sub-unitary constant e. The algorithm depicted in Figure 11.9 solving a

k-MSSP instance of dimension d in time at most O(d · 2n·e/k

).

Proof. In our proof, we use the following simplifying assumptions, and present our algorithm in Fig-

ure 11.9.

1. First, without loss of generality, we make the simplifying assumption that no columns in (11.6)

contain only 0 entries. If this is the case, we can simply eliminate such columns from the matrix.

2. Find δ ≤ d rows such that they overlap (see worst-case depicted in Equation (11.6)) and produce

an SSP instance with n nonzero entries, say A = (a1,a2, . . . ,aδ), i. e.∑δi=1 Ai,j 6= 0, ∀j ∈ [n].

3. Solve the SSP, independently defined by the integers in each of the δ rows. Using the currently

fastest algorithm, this can be done in O(2δ·0.226...

)time [HM18].

4. Produce the solution for the k-MSSP, from the set of solutions obtained for each of the δ rows.

A more compact representation of the k-MSSP problem is allowed by the notation introduced in Def-

inition 83 which considers each row as an independent SSP problem denoted ai,∀i ∈ [d]. We write

ki := size(ai) for the number of non-zero entries in ai to quantify sparsity. We make the simplifying

assumption that the matrix representation does not encompass all-zero columns - one can simply re-

move such columns. Also, we point out that the probability to have less than k elements set to 0 is

negligible under the condition that the elements are sampled uniformly. The core idea can be expressed

as a loop over the rows a1, . . .ad, performing: (1) a selection of the minimally-sized ai (once row

i has been selected, it is marked as being visited); (2) obtaining a solution for SSP problem defined

by ai; (3) obtaining an equivalent k-MSSP problem of dimension d − 1 by removing ai and updating

accordingly the target sums. Visually, this last step can be seen in the left side of Figure 11.9.

Correctness. We prove that if the k-MSSP has a solution, the algorithm in Figure 11.9 finds it with

overwhelming probability.

We give an analysis of the algorithm, initialized with the matrix corresponding to the sparse k-MSSP

instance. We perform an iterative search over the columns in A, by selecting a row aj such that the

number of non-zeros entries in such a row is minimized. In case the SSP oracle returns a solution Ij ,

we apply row and column eliminations on the matrix with respect to row i, and continue. However,

if at some latter iteration, no solution Ij is found, we return by backtracking. Thus, we implicitly use

115


k-MSSP–Solver(A, t):

(A, t) =

a1

. . .

ad

, t =

a1,1 a1,2 . . . , a1,n t1

a2,1 a2,2 . . . , a2,n t2

. . . . . . . . . . . .

ad,1 ad,2 . . . , ad,n td

1. Forall j ∈ [d] times:

2. i←$ 1, . . . , d : ki = size(ai) ≤ kj = size(aj), ∀j ∈ [n]

3. Ij←$OSSP(ai, ti)

4. If Ij = ∅ Then backtrack

5. Forall k ∈ Ij :

5.1. Update (A, t)←

a1,1 a1,2 . . . , a1,i−1 a1,i+1 . . . a1,n t1 − a1,i

a2,1 a2,2 . . . , a2,i−1 a2,i+1 . . . a2,n t2 − a2,i

. . . . . . . . . . . . . . . . . . . . . . . .

ad,1 ad,2 . . . , ad,i−1 ad,i+1 . . . ad,n td − ad,i

6. Update (A, t)←

a1 t1...

...

ai−1 ti−1

ai+1 ti+1

......

ad td

, removing i-th row.

7. Return I1 ∪ · · · ∪ Id

Fig. 11.9: A heuristic for solving k-MSSP sparse instances given oracle access to an SSP solver O.

memory to store the previously solutions. The memory complexity is dominated by the one of the SSP

oracle OSSP, assuming that both n, d are constants.

Worst-case. Our statement makes no assumption about the distribution of the elements among the

entries in the sparse matrix. We only make an assumption about the distribution from which the values

of the elements are sampled. First, we show a worst-case scenario in Equation (11.6) and argue that

for such a case, we there is no solution that avoids solving all SSP problems for each row.

(a11, a12, . . . a1k, 0, 0, . . . 0, 0, 0, . . . 0

)(0, 0, . . . 0, a21, a22, . . . a1k, 0, 0, . . . 0

)...

......

......

......

......

......(

0, 0, . . . , 0 0, 0, . . . 0, ad1, ad2, . . . adk)

(11.6)

Remark 114 (Probabilistic analysis). We first assume that the quantum version of the algorithm will

return a solution with equal probability. Then, we assume the number of solutions for typical hard SSP

and MSSP instances is small (less than n [BJLM13]); formally: rSSP < n and rMSSP < n pointing out

that rSSPi ≤ ri,k-MSSP.

116


The probability to find the right solution within one oracle query is therefore rSSPrMSSP

. Given a desired

probability p of success, there are on average⌈p · r1,k-MSSP

rSSP1

⌉·⌈p · r2,k-MSSP

rSSP2

⌉· . . . ·

⌈p · rl,k-MSSP

rSSPl

⌉evaluations of the SSP instance, assuming the solutions are equiprobable.

11.6.5 The subset-product problem (SPP)

As an example application of our previously developed techniques, we consider the subset product

problem. Until now we were discussing algorithms applicable only to (d, k, n, w, ?)-subset problem for

the additive operation, i. e. ? = +, whereas here we want to study algorithms for ? = ·, the multiplication

of integers, e. g. all selected integers are multiplied instead of being added.

It seems that little is known about the SPP although it is used in a multiplicative variant of the

Merkle-Hellman knapsack cryptosystem and the attacks by Odlyzko [Odl84]. The (1, n, n, ·)-subset problem

is a less prominent subset problem variant, however is used in the more recent knapsack-based cryptosys-

tem construction by Naccache-Stern [NS97] that relies on the assumption that solving given modular

multiplicative knapsack problem instances are hard. In any case, the SPP can be identified as being

weakly NP-complete as shown in Lemma 116, however in their construction the trapdoor can be

unlocked by the secret key transforming the instance into a ppt-solvable one. So far, no attack nor

a proof of security is known for the Naccache-Stern knapsack cryptosystem nor of its extensions to

polynomial rings, some of which are achieving semantic security CCA1, cf. Definition 19, Item 4.

Definition 115 (3-SAT problem). Given a finite set X of literals the 3-satisfiability problem asks for

determining the satisfiability assignment true, resp. false of a formula C in conjunctive normal form

where each clause Ci is limited to at most three literals.

Lemma 116 (3 − SAT ≤Karp SPP). The (1, n, n, ·)-subset problem is reducible to the 3-satisfiability

problem.

Proof. Given a setX = x1, x2 . . . , x` with ` := |X| appearing literals in a collection C = C1, C2, . . . , CNof its 3-element subsets. We define two ephemeral matrices by setting εi,j = 1, if Cj contains

xi, otherwise εi,j = 0, and εi,j = 1, if Cj contains ¬xi, else εi,j = 0. We remark that 1 ≤∑i εi,j ,

∑i εi,j ,

∑i εi,j + εi,j ≤ 3. Let p1 = 2, p2 = 3, . . . denote the first prime numbers, we now

define Pi := pN+i ·∏Nj=1 p

εi,jj , and P i := pN+i ·

∏Nj=1 p

εi,jj for all i ∈ 1, . . . , `. Pi, resp. P i uniquely

correspond to assigning the truth-value true, resp. false to every literal xi across all clauses. We have

a (1, n, n, ·)-subset problem, if we set n := 2(`+N), and

t :=∏i=1

pN+i ·N∏j=1

p3j , and a :=

(Pi, P i

i=1,2,...,`

,pj , p

2j

j=1,2,...,N

).

Assume I ⊆ [n] is a SPP solution, then exactly one assignment xi resp. ¬xi is true for all i, hence either

Pi, resp. P i divides t. This can be read from every k ∈ I, k ≤ 2`, by first setting k2 := k−12 , i := bk2c+1,

and setting xi := false if k2 ≡ 0 mod 2, else true. What remains is to argue that the transformation

steps are polynomial in the inputs. The first n primes can be generated in time O (n) (e. g. using the

sieve of Eratosthenes) and fit into O(n2 lnn

)space by the prime number theorem as pn ≈ n log n.

To detail on why the SPP is only weakly NP-complete, we emphasize that the target product is

exponential in n and in a strong reduction the input t would remain bounded by a polynomial in n

from a NP-complete problem such as 3-SAT is impossible unless P = NP.

117


11.6.6 Multiplicatively reduced instances and smoothness basis

The starting point is to consider a (1, n, n, ·) instance, reduced in the appropriate sense:

Definition 117 (Multiplicatively reduced instance). A (1, n, n, ·)-subset problem instance defined by

the sequence a1, . . . , an and target t is multiplicatively reduced if, for all i ∈ [n], gcd(ai, t) 6= 1.

It is immediate that a multiplicatively reduced instance is reduced (in the sense of Definition 90).

Henceforth we will assume without loss of generality that we have a multiplicatively reduced instance.

We then use the following notion:

Definition 118 (Smoothness basis). Let a1, . . . , an; t be a multiplicatively reduced (1, n, n, ·)-subset

problem instance; its smoothness basis is the set

B(a, t) = gcd(ai, t) | i ∈ [n] ∪ gcd(ai, aj) | i < j ∈ [n] .

Computing B(a, t) can be done naively in time O(n2)

using the Euclidean or Lehmer algorithm.

Also note that B(a, t) can be computed by distributing the task to independent computers working in

parallel. Finally, we insist that B(a, t) needs not be composed only of prime numbers.

If one knows the prime factors composing every numbers a1, . . . , an and t, then the set of these

factors can be used as a smoothness basis instead of B(a, t).2

11.6.7 Transforming an SPP instance into a k-MSSP instance

The key idea is that a multiplicatively reduced (1, n, n, ·)-subset problem instance can be transformed

into an equivalent (d, k, n,+)-subset problem instance, which then can be addressed by repeating the

method of solving one BBDF–reduced k-MSSP block at a time (c.f. Section 11.6.3).

In this context, it is often convenient to write this as an infinite product over all the primes, where

all but a finite number have a zero exponent. We define the the multiplicity νpi(ai) as the exponent of

the highest power of the prime pi that divides ai. After a decomposition of a composite number into a

product of smaller integers.

Using the fundamental theorem of arithmetic, that every non-negative integer has a unique repre-

sentation as an infinite product of prime powers, with finitely many associated non-zero exponents we

have: ai,j = νpi(ai), ai =∏p p

νp(n), 1 ≤ j ≤ d.

Lemma 119. Let B(a, t) = p1, . . . , pd, and ai,j be the multiplicity of pi as a divisor of ai. Similarly,

let tj be the multiplicity of pi as divisor of t. Then A = a1,a2, . . . ,an and t form a k-MSSP instance,

with vectors ai = ai,j1≤j≤d, t = tj1≤j≤d.A subset I ⊆ [n] is a solution to the k-MSSP instance iff it is a solution to the SPP instance.

This is illustrated in Figure 11.10. In order to give a sharp analysis, one has to investigate two

important metrics for this problem: the dimension d of the new MSSP instance, and its sparsity k.

Precisely, d ≤ n(n + 1)/2 is the number of distinct elements in the smoothness basis, and k is the

maximal number of factors for a number ai; an upper bound is given by considering the largest number

M in a1, . . . , an, and the average order ω(M) ∼ ln lnM [HR17,Har59].

Theorem 120. Any instance of the (d, k, n, ·)-subset problem can be solved in time O(r` · 2`·e + h

)using an equiprobable SSP solver OSSP and a hypergraph partitioning solver OHP running in time

O (h). Here r stands for the maximal number of solutions of an SSP instance queried in Figure 11.9

while ` stands for the width of largest block returned by OHP.

2Such may be the case in a quantum setting, using Shor’s algorithm. For large values of n this may be advantageousover computing B(a, t).

118


a1

a2

...

an

t

︸︷︷︸

SSP

−→

a1 = pa1,11 p

a1,22 · · · p

a1,dd

a2 = pa2,11 p

a2,22 · · · p

a2,dd

......

an = pan,11 p

an,22 · · · p

an,dd

t = pt11 pt22 · · · ptdd︸︷︷︸Decomposition over B(a, t)

−→ A =

a1,1 a2,1 · · · an,1

a1,2 a2,2 · · · an,2...

.... . .

...

a1,d a2,d · · · an,d

, t =

t1

t2...

td

Fig. 11.10: Transforming a (1, n, n, ·)-subset problem to a (d, k, n,+)-subset problem instance.

Proof. Using the algorithm described in Figure 11.8, we obtain a MSSP instance by decomposing

the elements with respect to a smoothness basis. This yields a new instance of an MSSP problem

representable through an (n2 +n) ·n matrix. This instance is solved via Theorem 112. The correctness

follows from Euclid’s algorithm and the proof of Theorem 112.

11.6.8 Full Example for transforming an SPP into k-MSSPand solving the SSP

To get a grasp on the introduced transformations, consider the following toy example for illustration.

Example 121. Let (a← (118, 22, 202, 52, 170, 33, 252, 68, 98, 31, 58, 0), t← 1009780611840) be an in-

stance of the (1, 12, n = 12, ·)-subset problem with promised solution I ⊆ [12], say I = 3, 4, 5, 6, 7, 8,of weight |I| = w = 6

12n = n2 = 6. We solve it as follows, in 5 descriptive steps:

1. Reduce and sort a to a′ ← (22, 33, 52, 68, 170, 202, 252) since 118, 98, 31, 58 and 0 do either not

divide t or do not contribute to the solution, and can be removed, so a′ is multiplicatively reduced.

2. Compute the basis B(a′, t)← 2, 3, 5, 7, 11, 13, 17, 101 using Euclid’s algorithm, cf. Definition 118.

3. Write t and the matrix A as row-wise decomposition over the prime factors in the chosen smooth-

ness basis B(a′, t), where column vectors correspond to elements in a′ as follows:

A =

1 0 2 2 1 1 2

0 1 0 0 0 0 2

0 0 0 0 1 0 0

0 0 0 0 0 0 1

1 1 0 0 0 0 0

0 0 1 0 0 0 0

0 0 0 1 1 0 0

0 0 0 0 0 1 0

, t =

8

3

1

1

1

1

2

1

.

4. For conciseness, we skip the representation in bordered block diagonal form, based on hyper-graph

partitioning techniques as defined in Definition 91, and apply shadow casting Lemma 110 directly.

This step leads to a n = 7, (1, 7, 7,+)-subset problem instance a′′ ← (2, 2, 3, 3, 3, 2, 5), t′′ ← 18

with density DSSP = 7log2 252 ≈ 1, and a promised weight |I ′′| = 6 solution.

119


5. This particular problem has(

32

)= 3 solutions, as there is one surplus element 2 in a′′, e. g.

(2, 2, 3, 3, 3, 2, 5), evaluating to (2+2)+3+3+3+5 = 18. By Lemma 110, one of these solutions is

a solution to the original problem. Using an equiprobable solver we hit the promised, hence correct,

solution with probability 13 ; or equivalently, we run the equiprobable SSP solver 3 times on aver-

age before successfully finding the sought-after subset I ′′ = 2, 3, . . . , 7 that easily translates to the

above I = 3, 4, 5, 6, 7, 8, relating to elements in a = (118, 22, 202, 52, 170, 33, 252, 68, 98, 31, 58, 0).

11.6.9 The Modular SPP Assumption

The cryptosystem by Naccache and Stern in [NS97] proposes an interesting cryptographic intractability

assumption based on the modular SPP problem, a variant of the group representation problem.

Definition 122 (Group Representation Problem). Let G be a group of prime order p. Given n + 1

group elements (a1, a2, . . . , an, t) ∈ Gn+1, find (x1, x2 . . . xn) ∈ Gn such that

gk11 · . . . · gknn = t ∈ G.

The security assumption underlying Naccache-Stern’s construction depends on a large prime p and

n chosen such that Pn < p, where Pn := pn# is the n-th primorial number, i. e. product of the first n

primes. Let s←$Zp and set vi ← s√pi, where pi is the i-th prime, i < n. This assumption states that it

is hard to find a string x ∈ 0, 1n such that∏ni=1 v

xii = t. This leads to investigate the modular SPP,

where each vi is sampled uniformly at random.

Definition 123 (Modular SPP). Let G be a group of order p. We limit our consideration the case

where p is prime. Given a1, a2, . . . , an, t ∈ G find (x1, x2 . . . xn) ∈ 0, 1n such that

n∏i=1

axii ≡ t mod p .

Remark 124. The modular SPP problem instance underlying Naccache-Stern’s construction can be

transformed such that it is sufficient to find the secret s with

n∏i=1

pxii ≡ ts mod p.

Lemma 125. The Modular SPP ≤Cook MSSP.

Proof. We tackle the modular SPP by choosing an appropriate basis (g1, g2, . . . , gn) and decomposing

each ai as ai = gki,11 · . . . · gki,nn . Analogously, we do the same with t and end up with an instance (A, t)

of the MSSP problem, cf. Figure 11.11.

Remark 126. In general the obtained matrix A is dense, i. e. is likely not sparse with k < n.

11.7 Conclusion and Open Problems

In this part we introduced new algorithms to solve variants of the subset-sum problem (SSP), in

particular multi-dimensional versions (MSSP) and the subset-product problem (SPP), a variant which

seems not to have undergone so much scrutiny. We studied the equiprobability property, important

to tackle sparse MSSPinstances, proved or endowed state-of-the-art SSP solvers with it. We advance

evidence that there exist classical and quantum SSP solvers which are equiprobable.

120


Decomposition in G:

a1 = gk1,11,1 · . . . · g

kn,nn,n

a2 = gk1,21,2 · . . . · g

kn,nn,n

......

an = gk1,n1,n · . . . · g

kn,nn,n

t = gk1,t1,t · . . . · g

kn,tn,t

Conversion into a k-MSSP instance:

(a1 = (k1,1, k2,1, . . . , kn,1), t1 = k1,t)

(a2 = (k1,2, k2,2, . . . , kn,2), t2 = k2,t)...

......

......

...

(an = (k1,n, k2,n, . . . , kn,n), tn = kn,t)

Fig. 11.11: Modular SSP decomposition and conversion to k-MSSP over G.

We proposed as open problem, studying relations to other complexity theoretic problems to employ

the k-MSSP technique in order to improve existing time upper-bounds of generic solvers. For example,

the exact vertex cover problem is historically related with the SSP problem, being used in the proof of

its NP-completeness, so tackling the vertex cover problem of covering a graph G = (V,E) with |V | = n

nodes using only a subset of cardinality k. The current lower bound for the vertex cover problem is

due to the work of Chen et al. [CKX10] with time complexity O(1.2738k + kn

)which could be linked

to the sparse, multi-dimensional version k-MSSP problem as promising alley of research.

121

Bibliography

[AAB+15] Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro,

Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian

Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz,

Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry

Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya

Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda

Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and

Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems,

2015. Software available from tensorflow.org. 64

[ABDP15] Michel Abdalla, Florian Bourse, Angelo De Caro, and David Pointcheval. Simple func-

tional encryption schemes for inner products. In Jonathan Katz, editor, PKC 2015,

volume 9020 of LNCS, pages 733–751, Gaithersburg, MD, USA, March 30 – April 1,

2015. Springer, Heidelberg, Germany. 65

[ACC+18] Martin Albrecht, Melissa Chase, Hao Chen, Jintai Ding, Shafi Goldwasser, Sergey Gor-

bunov, Shai Halevi, Jeffrey Hoffstein, Kim Laine, Kristin Lauter, Satya Lokam, Daniele

Micciancio, Dustin Moody, Travis Morrison, Amit Sahai, and Vinod Vaikuntanathan. Ho-

momorphic encryption security standard. Technical report, HomomorphicEncryption.org,

Toronto, Canada, November 2018. 47

[ADRS15] Divesh Aggarwal, Daniel Dadush, Oded Regev, and Noah Stephens-Davidowitz. Solv-

ing the shortest vector problem in 2n time using discrete Gaussian sampling: Extended

abstract. In Rocco A. Servedio and Ronitt Rubinfeld, editors, 47th ACM STOC, pages

733–742, Portland, OR, USA, June 14–17, 2015. ACM Press. 41

[AGKS05] Masayuki Abe, Rosario Gennaro, Kaoru Kurosawa, and Victor Shoup. Tag-KEM/DEM:

A New Framework for Hybrid Encryption and A New Analysis of Kurosawa-Desmedt

KEM. In EUROCRYPT, volume 3494 of LNCS, pages 128–146. Springer, 2005. 75

[ALS16] Shweta Agrawal, Benoıt Libert, and Damien Stehle. Fully secure functional encryption

for inner products, from standard assumptions. In Matthew Robshaw and Jonathan Katz,

editors, CRYPTO 2016, Part III, volume 9816 of LNCS, pages 333–362, Santa Barbara,

CA, USA, August 14–18, 2016. Springer, Heidelberg, Germany. 65

[And01] Ross J. Anderson. Why information security is hard-an economic perspective. In 17th An-

nual Computer Security Applications Conference (ACSAC 2001), 11-14 December 2001,

New Orleans, Louisiana, USA, pages 358–365. IEEE Computer Society, 2001. 35

[And14] Ross Anderson. Privacy versus government surveillance where network effects meet public

choice. In Proc. 13th Annual Workshop on the Economic of Information Security (WEIS

2014), 2014. 35

123

BIBLIOGRAPHY

[AP14] Jacob Alperin-Sheriff and Chris Peikert. Faster bootstrapping with polynomial error. In

Juan A. Garay and Rosario Gennaro, editors, CRYPTO 2014, Part I, volume 8616 of

LNCS, pages 297–314, Santa Barbara, CA, USA, August 17–21, 2014. Springer, Heidel-

berg, Germany. 46

[APC04] Cevdet Aykanat, Ali Pinar, and Umit V Catalyurek. Permuting sparse rectangular ma-

trices into block-diagonal form. SIAM Journal on Scientific Computing, 25(6):1860–1879,

2004. 113, 114

[APS15] Martin R. Albrecht, Rachel Player, and Sam Scott. On the concrete hardness of learning

with errors. J. Mathematical Cryptology, 9(3):169–203, 2015. 82, 84

[ARS+15] Martin R. Albrecht, Christian Rechberger, Thomas Schneider, Tyge Tiessen, and Michael

Zohner. Ciphers for MPC and FHE. In Elisabeth Oswald and Marc Fischlin, editors, EU-

ROCRYPT 2015, Part I, volume 9056 of LNCS, pages 430–454, Sofia, Bulgaria, April 26–

30, 2015. Springer, Heidelberg, Germany. 74

[AS00] Rakesh Agrawal and Ramakrishnan Srikant. Privacy-preserving data mining. SIGMOD

Rec., 29(2):439–450, May 2000. 37

[Bab85] Laszlo Babai. On Lovasz’ lattice reduction and the nearest lattice point problem (short-

ened version). In Proceedings of the 2Nd Symposium of Theoretical Aspects of Computer

Science, STACS ’85, pages 13–20, London, UK, 1985. Springer-Verlag. 42

[BBBV97] Charles H. Bennett, Ethan Bernstein, Gilles Brassard, and Umesh Vazirani. Strengths

and Weaknesses of Quantum Computing. SIAM J. Comput., 26(5):1510–1523, 1997.

http://dx.doi.org/10.1137/S0097539796300933, website accessed 2013-03-14. 28

[BCJ11] Anja Becker, Jean-Sebastien Coron, and Antoine Joux. Improved generic algorithms for

hard knapsacks. In Annual International Conference on the Theory and Applications of

Cryptographic Techniques, pages 364–385. Springer, 2011. 99, 100, 103, 104, 109

[BG14] Shi Bai and Steven D. Galbraith. Lattice decoding attacks on binary LWE. In Willy

Susilo and Yi Mu, editors, ACISP 14, volume 8544 of LNCS, pages 322–337, Wollongong,

NSW, Australia, July 7–9, 2014. Springer, Heidelberg, Germany. 83

[BGV12] Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. (Leveled) fully homomorphic

encryption without bootstrapping. In Shafi Goldwasser, editor, ITCS 2012, pages 309–

325, Cambridge, MA, USA, January 8–10, 2012. ACM. 46, 47, 50

[BJLM13] Daniel J. Bernstein, Stacey Jeffery, Tanja Lange, and Alexander Meurer. Quantum algo-

rithms for the subset-sum problem. In International Workshop on Post-Quantum Cryp-

tography, pages 16–33. Springer, 2013. 104, 116

[BLP+13] Zvika Brakerski, Adeline Langlois, Chris Peikert, Oded Regev, and Damien Stehle. Clas-

sical hardness of learning with errors. In Dan Boneh, Tim Roughgarden, and Joan Feigen-

baum, editors, 45th ACM STOC, pages 575–584, Palo Alto, CA, USA, June 1–4, 2013.

ACM Press. 44, 83

[BMMP18] Florian Bourse, Michele Minelli, Matthias Minihold, and Pascal Paillier. Fast homomor-

phic evaluation of deep discretized neural networks. pages 483–512, 2018. 3, 37, 49, 61,

71, 79, 87

124

http://dx.doi.org/10.1137/S0097539796300933

BIBLIOGRAPHY

[BMR18] Gustavo Banegas, Matthias Minihold, and Razvan Rosie. Equiprobability: Faster Algo-

rithms for Subset-Sum Variants and the Vertex Cover Problem. AQIS 2018, 2018. http:

//www.ngc.is.ritsumei.ac.jp/~ger/static/AQIS18/OnlineBooklet/208.pdf. 3

[Bon99] Dan Boneh. Twenty years of attacks on the rsa cryptosystem. NOTICES OF THE AMS,

46:203–213, 1999. 31

[BPTG14] Raphael Bost, Raluca Ada Popa, Stephen Tu, and Shafi Goldwasser. Machine learning

classification over encrypted data. Cryptology ePrint Archive, Report 2014/331, 2014.

https://eprint.iacr.org/2014/331. 79

[Bur12] C. Sidney Burrus. Fast Fourier Transforms. OpenStax CNX, 2012. https://cnx.org/

contents/[email protected]. 68

[BV11a] Zvika Brakerski and Vinod Vaikuntanathan. Efficient fully homomorphic encryption from

(standard) LWE. In Rafail Ostrovsky, editor, 52nd FOCS, pages 97–106, Palm Springs,

CA, USA, October 22–25, 2011. IEEE Computer Society Press. 46

[BV11b] Zvika Brakerski and Vinod Vaikuntanathan. Fully homomorphic encryption from ring-

LWE and security for key dependent messages. In Phillip Rogaway, editor, CRYPTO 2011,

volume 6841 of LNCS, pages 505–524, Santa Barbara, CA, USA, August 14–18, 2011.

Springer, Heidelberg, Germany. 46

[BV14] Zvika Brakerski and Vinod Vaikuntanathan. Lattice-based FHE as secure as PKE. In

Moni Naor, editor, ITCS 2014, pages 1–12, Princeton, NJ, USA, January 12–14, 2014.

ACM. 46

[C+15] Francois Chollet et al. Keras. https://github.com/keras-team/keras, 2015. 64

[CB16] Matthieu Courbariaux and Yoshua Bengio. Binarynet: Training deep neural networks

with weights and activations constrained to +1 or -1. CoRR, abs/1602.02830, 2016. 50,

64

[CCF+16] Anne Canteaut, Sergiu Carpov, Caroline Fontaine, Tancrede Lepoint, Marıa Naya-

Plasencia, Pascal Paillier, and Renaud Sirdey. Stream ciphers: A practical solution for

efficient homomorphic-ciphertext compression. In Thomas Peyrin, editor, FSE 2016, vol-

ume 9783 of LNCS, pages 313–333, Bochum, Germany, March 20–23, 2016. Springer,

Heidelberg, Germany. 74

[CdWM+17] Herve Chabanne, Amaury de Wargny, Jonathan Milgram, Constance Morel, and Em-

manuel Prouff. Privacy-preserving classification on deep neural network. IACR Cryptology

ePrint Archive, 2017:35, 2017. 50

[CGGI16] Ilaria Chillotti, Nicolas Gama, Mariya Georgieva, and Malika Izabachene. Faster fully

homomorphic encryption: Bootstrapping in less than 0.1 seconds. In Jung Hee Cheon

and Tsuyoshi Takagi, editors, ASIACRYPT 2016, Part I, volume 10031 of LNCS, pages

3–33, Hanoi, Vietnam, December 4–8, 2016. Springer, Heidelberg, Germany. 50, 52, 61,

65, 66, 67, 83

[CGGI17] Ilaria Chillotti, Nicolas Gama, Mariya Georgieva, and Malika Izabachene. Faster packed

homomorphic operations and efficient circuit bootstrapping for TFHE. In Tsuyoshi Takagi

and Thomas Peyrin, editors, ASIACRYPT 2017, Part I, volume 10624 of LNCS, pages

377–408, Hong Kong, China, December 3–7, 2017. Springer, Heidelberg, Germany. 52,

53, 55, 77, 82

125

http://www.ngc.is.ritsumei.ac.jp/~ger/static/AQIS18/OnlineBooklet/208.pdf

http://www.ngc.is.ritsumei.ac.jp/~ger/static/AQIS18/OnlineBooklet/208.pdf

https://eprint.iacr.org/2014/331

https://cnx.org/contents/[email protected]

https://cnx.org/contents/[email protected]

https://github.com/keras-team/keras

BIBLIOGRAPHY

[CHS+16] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio. Binarized Neural

Networks: Training Deep Neural Networks with Weights and Activations Constrained to

+1 or -1. ArXiv e-prints, February 2016. 64

[CJL+92] Matthijs J. Coster, Antoine Joux, Brian A. LaMacchia, Andrew M. Odlyzko, Claus-Peter

Schnorr, and Jacques Stern. Improved low-density subset sum algorithms. Computational

Complexity, 2:111–128, 1992. 99, 100

[CKKS17] Jung Hee Cheon, Andrey Kim, Miran Kim, and Yong Soo Song. Homomorphic encryption

for arithmetic of approximate numbers. In Tsuyoshi Takagi and Thomas Peyrin, editors,

ASIACRYPT 2017, Part I, volume 10624 of LNCS, pages 409–437, Hong Kong, China,

December 3–7, 2017. Springer, Heidelberg, Germany. 47

[CKX10] Jianer Chen, Iyad A Kanj, and Ge Xia. Improved upper bounds for vertex cover. Theo-

retical Computer Science, 411(40-42):3736–3756, 2010. 121

[CMS12] D. Ciresan, U. Meier, and J. Schmidhuber. Multi-column Deep Neural Networks for Image

Classification. ArXiv e-prints, February 2012. 58

[CN11] Yuanmi Chen and Phong Q. Nguyen. BKZ 2.0: Better lattice security estimates. In

Dong Hoon Lee and Xiaoyun Wang, editors, Advances in Cryptology - ASIACRYPT

2011 - 17th International Conference on the Theory and Application of Cryptology and

Information Security, Seoul, South Korea, December 4-8, 2011. Proceedings, volume 7073

of Lecture Notes in Computer Science, pages 1–20. Springer, 2011. 99

[Cop93] Don Coppersmith. Modifications to the number field sieve. Journal of Cryptology,

6(3):169–180, March 1993. 29

[CR07] Camila CS Caiado and Pushpa N Rathie. Polynomial coefficients and distribution of the

sum of discrete uniform variables. In Eighth Annual Conference of the Society of Special

Functions and their Applications, 2007. 105

[CRF+09] Weng-Long Chang, Ting-Ting Ren, Mang Feng, Lai Chin Lu, Kawuu Weicheng Lin, and

Minyi Guo. Quantum algorithms of the subset-sum problem on a quantum computer. In

2009 WASE International Conference on Information Engineering, 2009. 104

[CT15] Massimo Chenal and Qiang Tang. On Key Recovery Attacks Against Existing Somewhat

Homomorphic Encryption Schemes. In LATINCRYPT, volume 8895 of LNCS, pages

239–258. Springer, 2015. 75

[Cyb89] G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of

Control, Signals and Systems, 2(4):303–314, Dec 1989. 58

[Dar19] TU Darmstadt. LWE challenges. 2019. http://latticechallenge.org/svp-challenge.

80, 83

[Det18] Diabetic Retinopathy Detection. Identify signs of diabetic retinopathy in eye images.

2018. 61

[DGBL+16] Nathan Dowlin, Ran Gilad-Bachrach, Kim Laine, Kristin Lauter, Michael Naehrig, and

John Wernsing. Cryptonets: Applying neural networks to encrypted data with high

throughput and accuracy. Technical report, Microsoft Research Group, February 2016.

50, 88

126

http://latticechallenge.org/svp-challenge

BIBLIOGRAPHY

[DHS01] Richard O. Duda, Peter E. (Peter Elliot) Hart, and David G. Stork. Pattern classification.

Second edition, 2001. 78

[DLMW15] Itai Dinur, Yunwen Liu, Willi Meier, and Qingju Wang. Optimized interpolation attacks

on LowMC. In Tetsu Iwata and Jung Hee Cheon, editors, ASIACRYPT 2015, Part II, vol-

ume 9453 of LNCS, pages 535–560, Auckland, New Zealand, November 30 – December 3,

2015. Springer, Heidelberg, Germany. 74

[DPW19] Lo Ducas, Maxime Planon, and Benjamin Wesolowski. On the shortness of vectors to be

found by the ideal-svp quantum algorithm. Cryptology ePrint Archive, Report 2019/234,

2019. https://eprint.iacr.org/2019/234. 81

[dsp13] Kaggle data science projects. The cats vs dogs image recognition challenge. https:

//www.kaggle.com/c/dogs-vs-cats/data, 2013. 48

[Dwo06] Cynthia Dwork. Differential privacy (invited paper). In Michele Bugliesi, Bart Preneel,

Vladimiro Sassone, and Ingo Wegener, editors, ICALP 2006, Part II, volume 4052 of

LNCS, pages 1–12, Venice, Italy, July 10–14, 2006. Springer, Heidelberg, Germany. 37

[Ege14] Steffen Eger. Stirling’s approximation for central extended binomial coefficients. American

Mathematical Monthly, 121(4):344–349, 2014. 105

[FP05] Abraham D. Flaxman and Bartosz Przydatek. Solving medium-density subset sum prob-

lems in expected polynomial time. In Volker Diekert and Bruno Durand, editors, STACS

2005, pages 305–314, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg. 99

[Fri99] Matteo Frigo. A fast Fourier transform compiler. In Proceedings of the ACM SIGPLAN

1999 Conference on Programming Language Design and Implementation, PLDI ’99, pages

169–180, New York, NY, USA, 1999. ACM. 68

[FV12] Junfeng Fan and Frederik Vercauteren. Somewhat Practical Fully Homomorphic Encryp-

tion. IACR Cryptology ePrint Archive, 2012:144, 2012. 47

[GBC16] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.

http://www.deeplearningbook.org. 56, 57, 60

[Gen09] Craig Gentry. A fully homomorphic encryption scheme. PhD thesis, Stanford University,

2009. crypto.stanford.edu/craig. 45, 46

[GHS12] Craig Gentry, Shai Halevi, and Nigel P. Smart. Homomorphic evaluation of the AES

circuit. In Reihaneh Safavi-Naini and Ran Canetti, editors, CRYPTO 2012, volume

7417 of LNCS, pages 850–867, Santa Barbara, CA, USA, August 19–23, 2012. Springer,

Heidelberg, Germany. 46, 74

[GMP+13] Glenn Greenwald, Ewen MacAskill, Laura Poitras, Spencer Ackerman, and Dominic

Rushe. Microsoft handed the NSA access to encrypted messages, 12 July 2013. 35

[GPV08] Craig Gentry, Chris Peikert, and Vinod Vaikuntanathan. Trapdoors for hard lattices and

new cryptographic constructions. In Richard E. Ladner and Cynthia Dwork, editors, 40th

ACM STOC, pages 197–206, Victoria, BC, Canada, May 17–20, 2008. ACM Press. 42

[Gro96] Lov K. Grover. A fast quantum mechanical algorithm for database search. In Proceedings

of the twenty-eighth annual ACM symposium on Theory of computing, pages 212–219.

ACM, 1996. 103, 104, 107

127


https://www.kaggle.com/c/dogs-vs-cats/data

https://www.kaggle.com/c/dogs-vs-cats/data

http://www.deeplearningbook.org

crypto.stanford.edu/craig

BIBLIOGRAPHY

[GSW13] Craig Gentry, Amit Sahai, and Brent Waters. Homomorphic encryption from learning

with errors: Conceptually-simpler, asymptotically-faster, attribute-based. In Ran Canetti

and Juan A. Garay, editors, CRYPTO 2013, Part I, volume 8042 of LNCS, pages 75–92,

Santa Barbara, CA, USA, August 18–22, 2013. Springer, Heidelberg, Germany. 46, 50,

53

[Har59] Godfrey Harold Hardy. Ramanujan: Twelve lectures on subjects suggested by his life and

work. Chelsea Pub. Co., 1959. 118

[Has86] J Hastad. Almost optimal lower bounds for small depth circuits. In Proceedings of the

Eighteenth Annual ACM Symposium on Theory of Computing, STOC ’86, pages 6–20,

New York, NY, USA, 1986. ACM. 56

[Hes] Land Hessen. Datenschutzgesetz (GVBl. II 300-10). Gesetz- und Verordnungsblatt. 35

[HGJ10] Nick Howgrave-Graham and Antoine Joux. New generic algorithms for hard knapsacks.

In Annual International Conference on the Theory and Applications of Cryptographic

Techniques, pages 235–256. Springer Berlin Heidelberg, 2010. 99, 104, 108, 109

[HILL99] Johan Hastad, Russell Impagliazzo, Leonid A. Levin, and Michael Luby. A pseudorandom

generator from any one-way function. SIAM Journal on Computing, 28(4):1364–1396,

1999. 23

[HK07] Dennis Hofheinz and Eike Kiltz. Secure Hybrid Encryption from Weakened Key Encap-

sulation. In CRYPTO, volume 4622 of LNCS, pages 553–571. Springer, 2007. 75

[HKM18] Gottfried Herold, Elena Kirshanova, and Alexander May. On the asymptotic complexity

of solving LWE. Des. Codes Cryptography, 86(1):55–83, 2018. 84

[HM18] Alexander Helm and Alexander May. Subset Sum Quantumly in 1.17n. In Stacey Jeffery,

editor, 13th Conference on the Theory of Quantum Computation, Communication and

Cryptography (TQC 2018), volume 111 of Leibniz International Proceedings in Informatics

(LIPIcs), pages 5:1–5:15, Dagstuhl, Germany, 2018. Schloss Dagstuhl–Leibniz-Zentrum

fuer Informatik. 99, 100, 103, 104, 109, 115

[Hor91] Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural

Netw., 4(2):251–257, March 1991. 59

[HR17] Godfrey Harold Hardy and Srinivasa Ramanujan. The normal number of prime factors

of a number n. Quarterly Journal of Mathematics, XLVIII:76–92, 1917. 118

[HS74] Ellis Horowitz and Sartaj Sahni. Computing partitions with applications to the knapsack

problem. J. ACM, 21(2):277–292, 1974. 99, 103, 104

[HS14] Shai Halevi and Victor Shoup. Algorithms in HElib. In Juan A. Garay and Rosario

Gennaro, editors, CRYPTO 2014, Part I, volume 8616 of LNCS, pages 554–571, Santa

Barbara, CA, USA, August 17–21, 2014. Springer, Heidelberg, Germany. 47, 67

[HS15] Shai Halevi and Victor Shoup. Bootstrapping for HElib. In Elisabeth Oswald and Marc

Fischlin, editors, EUROCRYPT 2015, Part I, volume 9056 of LNCS, pages 641–670, Sofia,

Bulgaria, April 26–30, 2015. Springer, Heidelberg, Germany. 67

[HZRS15] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for

image recognition. CoRR, abs/1512.03385, 2015. 50

128

BIBLIOGRAPHY

[IN96] Russell Impagliazzo and Moni Naor. Efficient cryptographic schemes provably as secure

as subset sum. Journal of Cryptology, 9(4):199–216, September 1996. 99, 104

[Joh09] Sam Johnston. Cloud computing. via Wikimedia Commons, 2009. https://

creativecommons.org/licenses/by-sa/3.0. 36

[Kar72] Richard M. Karp. Reducibility among combinatorial problems. In Raymond E. Miller and

James W. Thatcher, editors, Proceedings of a symposium on the Complexity of Computer

Computations, held March 20-22, 1972, at the IBM Thomas J. Watson Research Center,

Yorktown Heights, New York., The IBM Research Symposia Series, pages 85–103. Plenum

Press, New York, 1972. 97

[Ker83] Auguste Kerckhoffs. La cryptographie militaire. Journal des sciences militaires, IX:5–83,

jan 1883. http://www.petitcolas.net/fabien/kerckhoffs/. 23

[KF15] Paul Kirchner and Pierre-Alain Fouque. An improved BKW algorithm for LWE with ap-

plications to cryptography and lattices. In Rosario Gennaro and Matthew J. B. Robshaw,

editors, CRYPTO 2015, Part I, volume 9215 of LNCS, pages 43–62, Santa Barbara, CA,

USA, August 16–20, 2015. Springer, Heidelberg, Germany. 83

[KH91] Anders Krogh and John A. Hertz. A simple weight decay can improve generalization.

In Proceedings of the 4th International Conference on Neural Information Processing

Systems, NIPS’91, pages 950–957, San Francisco, CA, USA, 1991. Morgan Kaufmann

Publishers Inc. 62

[KL14] Jonathan Katz and Yehuda Lindell. Introduction to Modern Cryptography, Second Edi-

tion. Chapman & Hall/CRC, 2nd edition, 2014. 23

[KMW16] Elena Kirshanova, Alexander May, and Friedrich Wiemer. Parallel implementation of bdd

enumeration for lwe. In Applied Cryptography and Network Security: 14th International

Conference, ACNS 2016, Guildford, UK, June 19-22, 2016. Proceedings, pages 580–591.

Springer International Publishing, 2016. 83

[KSH12] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with

deep convolutional neural networks. In Proceedings of the 25th International Conference

on Neural Information Processing Systems - Volume 1, NIPS’12, pages 1097–1105, USA,

2012. Curran Associates Inc. 56

[KTX08] Akinori Kawachi, Keisuke Tanaka, and Keita Xagawa. Concurrently secure identifica-

tion schemes based on the worst-case hardness of lattice problems. In Josef Pieprzyk,

editor, ASIACRYPT 2008, volume 5350 of LNCS, pages 372–389, Melbourne, Australia,

December 7–11, 2008. Springer, Heidelberg, Germany. 65

[Lad75] Richard E. Ladner. On the structure of polynomial time reducibility. J. ACM, 22(1):155–

171, January 1975. 21

[LBBH98] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to

document recognition. Proceedings of the IEEE, 86(11):2278–2324, November 1998. 57,

63

[LBOM98] Yann LeCun, Leon Bottou, Genevieve B. Orr, and Klaus Robert Muller. Efficient Back-

Prop, pages 9–50. Springer Berlin Heidelberg, Berlin, Heidelberg, 1998. 62, 64

129

https://creativecommons.org/licenses/by-sa/3.0

https://creativecommons.org/licenses/by-sa/3.0

http://www.petitcolas.net/fabien/kerckhoffs/

BIBLIOGRAPHY

[LCB98] Y. LeCun, C. Cortes, and C.J.C. Burges. The mnist database of handwritten digits.

http://yann.lecun.com/exdb/mnist/, 1998. 48, 58

[LLL82] Arjen Klaas Lenstra, Hendrik Willem Lenstra, and Laszlo Lovasz. Factoring polynomials

with rational coefficients. Mathematische Annalen, 261(4):515–534, 1982. 99, 110

[LM09] Vadim Lyubashevsky and Daniele Micciancio. On bounded distance decoding, unique

shortest vectors, and the minimum distance problem. In Shai Halevi, editor,

CRYPTO 2009, volume 5677 of LNCS, pages 577–594, Santa Barbara, CA, USA, Au-

gust 16–20, 2009. Springer, Heidelberg, Germany. 41

[LN14] Tancrede Lepoint and Michael Naehrig. A comparison of the homomorphic encryp-

tion schemes FV and YASHE. In David Pointcheval and Damien Vergnaud, editors,

AFRICACRYPT 14, volume 8469 of LNCS, pages 318–335, Marrakesh, Morocco, May 28–


[LO85a] J. C. Lagarias and Andrew M. Odlyzko. Solving low-density subset sum problems. J.

ACM, 32(1):229–246, 1985. 99

[LO85b] Jeffrey C. Lagarias and Andrew M. Odlyzko. Solving low-density subset sum problems.

Journal of the ACM (JACM), 32(1):229–246, 1985. 99

[LP11] Richard Lindner and Chris Peikert. Better key sizes (and attacks) for LWE-based encryp-

tion. In Aggelos Kiayias, editor, CT-RSA 2011, volume 6558 of LNCS, pages 319–339,

San Francisco, CA, USA, February 14–18, 2011. Springer, Heidelberg, Germany. 42, 43

[LPR10] Vadim Lyubashevsky, Chris Peikert, and Oded Regev. On ideal lattices and learning with

errors over rings. In Henri Gilbert, editor, EUROCRYPT 2010, volume 6110 of LNCS,

pages 1–23, French Riviera, May 30 – June 3, 2010. Springer, Heidelberg, Germany. 50,

52, 67, 80, 81

[MH78] Ralph C. Merkle and Martin E. Hellman. Hiding information and signatures in trapdoor

knapsacks. IEEE Trans. Information Theory, 24(5):525–530, 1978. 97

[MJSC16] Pierrick Meaux, Anthony Journault, Francois-Xavier Standaert, and Claude Carlet. To-

wards stream ciphers for efficient FHE with low-noise ciphertexts. In Marc Fischlin and

Jean-Sebastien Coron, editors, EUROCRYPT 2016, Part I, volume 9665 of LNCS, pages

311–343, Vienna, Austria, May 8–12, 2016. Springer, Heidelberg, Germany. 74

[MLL+12] Martn-Lpez Enrique, Laing Anthony, Lawson Thomas, Alvarez Roberto, Zhou Xiao-

Qi, and O’Brien Jeremy L. Experimental realization of shor's quantum factoring

algorithm using qubit recycling. Nature Photonics, 6:773, October 2012. 27

[MM11a] Daniele Micciancio and Petros Mol. Pseudorandom knapsacks and the sample complexity

of LWE search-to-decision reductions. In Phillip Rogaway, editor, CRYPTO 2011, volume

6841 of LNCS, pages 465–484, Santa Barbara, CA, USA, August 14–18, 2011. Springer,

Heidelberg, Germany. 43

[MM11b] Daniele Micciancio and Petros Mol. Pseudorandom knapsacks and the sample complexity

of LWE search-to-decision reductions. Cryptology ePrint Archive, Report 2011/521, 2011.

http://eprint.iacr.org/2011/521. 98

130

http://yann.lecun.com/exdb/mnist/

http://eprint.iacr.org/2011/521

BIBLIOGRAPHY

[MP12] Daniele Micciancio and Chris Peikert. Trapdoors for lattices: Simpler, tighter, faster,

smaller. In David Pointcheval and Thomas Johansson, editors, EUROCRYPT 2012,

volume 7237 of LNCS, pages 700–718, Cambridge, UK, April 15–19, 2012. Springer,

Heidelberg, Germany. 43, 44

[MR07] Daniele Micciancio and Oded Regev. Worst-case to average-case reductions based on

gaussian measures. SIAM J. Comput., 37(1):267–302, 2007. 42

[MVO96] Alfred J. Menezes, Scott A. Vanstone, and Paul C. Van Oorschot. Handbook of Applied

Cryptography. CRC Press, Inc., Boca Raton, FL, USA, 1st edition, 1996. 15

[Nat16] National Institute of Standards and Technology. Post-quantum cryptogra-

phy standardization, December 2016. https://csrc.nist.gov/Projects/

Post-Quantum-Cryptography/Post-Quantum-Cryptography-Standardization. 26

[Nat19] National Center for Supercomputing Applications. Hierarchical Data Format, 2019.

https://www.hdfgroup.org/solutions/hdf5/. 80

[NS97] David Naccache and Jacques Stern. A new public-key cryptosystem. In Walter Fumy, edi-

tor, EUROCRYPT’97, volume 1233 of LNCS, pages 27–36, Konstanz, Germany, May 11–

15, 1997. Springer, Heidelberg, Germany. 117, 120

[Odl84] Andrew M. Odlyzko. Cryptanalytic attacks on the multiplicative knapsack cryptosystem

and on shamir’s fast signature scheme. IEEE Trans. Information Theory, 30(4):594–600,

1984. 100, 117

[Pai99] Pascal Paillier. Public-key cryptosystems based on composite degree residuosity classes. In

Jacques Stern, editor, EUROCRYPT’99, volume 1592 of LNCS, pages 223–238, Prague,

Czech Republic, May 2–6, 1999. Springer, Heidelberg, Germany. 32

[Pei16] Chris Peikert. A decade of lattice cryptography, 2016. Monograph. 18, 41

[PTVF07] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery.

Numerical Recipes 3rd Edition: The Art of Scientific Computing. Cambridge University

Press, New York, NY, USA, 3 edition, 2007. 68

[PW08] Chris Peikert and Brent Waters. Lossy trapdoor functions and their applications. In

Richard E. Ladner and Cynthia Dwork, editors, 40th ACM STOC, pages 187–196, Vic-

toria, BC, Canada, May 17–20, 2008. ACM Press. 65

[PZ16] Yanbin Pan and Feng Zhang. Solving low-density multiple subset sum problems with

SVP oracle. J. Systems Science & Complexity, 29(1):228–242, 2016. 99, 100, 110, 112

[Reg05] Oded Regev. On lattices, learning with errors, random linear codes, and cryptography. In

Harold N. Gabow and Ronald Fagin, editors, 37th ACM STOC, pages 84–93, Baltimore,

MA, USA, May 22–24, 2005. ACM Press. 43, 44, 52

[Reg09a] Oded Regev. Lecture notes: Lattices in computer science, 2009. http://www.cims.nyu.

edu/~regev/teaching/lattices_fall_2009/index.htmlas of May 7, 2019. 18

[Reg09b] Oded Regev. On lattices, learning with errors, random linear codes, and cryptography.

Journal of the ACM, 56(6):34, 2009. Preliminary version in STOC’05. 43

[RSA78] Ron L. Rivest, Adi Shamir, and Leonard Adleman. A method for obtaining digital

signatures and public-key cryptosystems. Commun. ACM, 21(2):120–126, February 1978.

18

131

https://csrc.nist.gov/Projects/Post-Quantum-Cryptography/Post-Quantum-Cryptography-Standardization

https://csrc.nist.gov/Projects/Post-Quantum-Cryptography/Post-Quantum-Cryptography-Standardization

https://www.hdfgroup.org/solutions/hdf5/

http://www.cims.nyu.edu/~regev/teaching/lattices_fall_2009/index.html

http://www.cims.nyu.edu/~regev/teaching/lattices_fall_2009/index.html

BIBLIOGRAPHY

[RSW18] Miruna Rosca, Damien Stehle, and Alexandre Wallet. On the ring-LWE and polynomial-

LWE problems. In Jesper Buus Nielsen and Vincent Rijmen, editors, EUROCRYPT 2018,

Part I, volume 10820 of LNCS, pages 146–173, Tel Aviv, Israel, April 29 – May 3, 2018.


[SE94a] C. P. Schnorr and M. Euchner. Lattice basis reduction: Improved practical algorithms

and solving subset sum problems. Mathematical Programming, 66(1):181–199, Aug 1994.

110

[SE94b] Claus-Peter Schnorr and Martin Euchner. Lattice basis reduction: Improved practical

algorithms and solving subset sum problems. Mathematical programming, 66(1-3):181–

199, 1994. 97

[SEA18] Simple Encrypted Arithmetic Library (release 3.1.0). https://github.com/Microsoft/

SEAL, December 2018. Microsoft Research, Redmond, WA. 47

[SFH17] Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. Dynamic routing between capsules.

CoRR, abs/1710.09829, 2017. 80

[Sha49] Claude E. Shannon. Communication theory of secrecy systems. Bell Systems Technical

Journal, 28(4):656–715, 1949. 23, 29

[Sha84] Adi Shamir. A polynomial-time algorithm for breaking the basic merkle-hellman cryp-

tosystem. IEEE Trans. Information Theory, 30(5):699–704, 1984. 99

[She13] Henry Maurice Sheffer. A set of five independent postulates for boolean algebras, with

application to logical constants. Transactions of the American Mathematical Society,

14(4):481–488, 1913. 22

[Sho86] Peter W Shor. The average-case analysis of some on-line algorithms for bin packing.

Combinatorica, 6(2):179–200, 1986. 97

[Sho99] Peter W Shor. Polynomial-time algorithms for prime factorization and discrete logarithms

on a quantum computer. SIAM review, 41(2):303–332, 1999. 29

[SS81] Richard Schroeppel and Adi Shamir. A t=o(2n/2), s=o(2n/4) algorithm for certain np-

complete problems. SIAM J. Comput., 10(3):456–464, 1981. 104

[SS10] Damien Stehle and Ron Steinfeld. Faster fully homomorphic encryption. In Masayuki Abe,

editor, ASIACRYPT 2010, volume 6477 of LNCS, pages 377–394, Singapore, December 5–


[SV10] Nigel P. Smart and Frederik Vercauteren. Fully homomorphic encryption with relatively

small key and ciphertext sizes. In Phong Q. Nguyen and David Pointcheval, editors,

PKC 2010, volume 6056 of LNCS, pages 420–443, Paris, France, May 26–28, 2010.


[Tew67] RP Tewarson. Row-column permutation of sparse matrices. The Computer Journal,

10(3):300–305, 1967. 113

[TK06] Aleksandar Trifunovic and William Knottenbelt. A general graph model for representing

exact communication volume in parallel sparse matrix–vector multiplication. In Interna-

tional Symposium on Computer and Information Sciences, pages 813–824. Springer, 2006.

113

132

https://github.com/Microsoft/SEAL

https://github.com/Microsoft/SEAL

BIBLIOGRAPHY

[vGHV10] Marten van Dijk, Craig Gentry, Shai Halevi, and Vinod Vaikuntanathan. Fully homomor-

phic encryption over the integers. In Henri Gilbert, editor, EUROCRYPT 2010, volume

6110 of LNCS, pages 24–43, French Riviera, May 30 – June 3, 2010. Springer, Heidelberg,

Germany. 46

[Wil11] Colin P. Williams. Explorations in Quantum Computing. Springer, 2nd edition, 2011. 27

[Wyn31] C. E. Wynn-Williams. The Use of Thyratrons for High Speed Automatic Counting of

Physical Phenomena. Proceedings of the Royal Society of London Series A, 132:295–310,

July 1931. 27

[ZK16] Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. CoRR, abs/1605.07146,

2016. 50

[ZXZ18] Zhongxiang Zheng, Guangwu Xu, and Chunhuan Zhao. Discrete gaussian measures and

new bounds of the smoothing parameter for lattices. Cryptology ePrint Archive, Report

2018/786, 2018. https://eprint.iacr.org/2018/786. 42

[ZYC16] Qingchen Zhang, Laurence T. Yang, and Zhikui Chen. Privacy preserving deep compu-

tation model on cloud for big data feature learning. IEEE Transactions on Computers,

65(5):1351–1362, 2016. 50

[ZYL+17] TanPing ZHOU, XiaoYuan YANG, LongFei LIU, Wei ZHANG, and YiTao DING. Faster

bootstrapping with multiple addends. Cryptology ePrint Archive, Report 2017/735, 2017.

http://eprint.iacr.org/2017/735. 75, 77

133


http://eprint.iacr.org/2017/735

Practical homomorphic encryption and cryptanalysis

Documents

Transcript of Practical homomorphic encryption and cryptanalysis