arXiv:2204.02909v1 [math.ST] 6 Apr 2022

119
A Short Tutorial on Mean-Field Spin Glass Techniques for Non-Physicists Andrea Montanari 1 and Subhabrata Sen 2 April 7, 2022 1 Department of Electrical Engineering and Department of Statistics, Stanford University 2 Department of Statistics, Harvard University arXiv:2204.02909v1 [math.ST] 6 Apr 2022

Transcript of arXiv:2204.02909v1 [math.ST] 6 Apr 2022

A Short Tutorial on Mean-Field Spin Glass

Techniques for Non-Physicists

Andrea Montanari1 and Subhabrata Sen2

April 7, 2022

1Department of Electrical Engineering and Department of Statistics, Stanford University2Department of Statistics, Harvard University

arX

iv:2

204.

0290

9v1

[m

ath.

ST]

6 A

pr 2

022

ii

Contents

Introduction 1

1 The p-spin model and tensor PCA 5

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.1 Statistical model and Gibbs measure . . . . . . . . . . . . . . . . . . 5

1.1.2 Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1.3 Free energy, and its connections with estimation and informationtheory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2 Replica symmetric asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.1 The replica symmetric calculation . . . . . . . . . . . . . . . . . . . 12

1.2.2 Replica-symmetric ansatz . . . . . . . . . . . . . . . . . . . . . . . . 15

1.2.3 Special cases: I. The sequence model, k = 1 . . . . . . . . . . . . . . 16

1.2.4 Special cases: II. The spiked matrix model, k = 2 . . . . . . . . . . . 18

1.2.5 Special cases: III. Bayes-optimal estimation . . . . . . . . . . . . . . 21

1.2.6 Special cases: IV. Pure noise, λ = 0 . . . . . . . . . . . . . . . . . . 22

1.3 The replica symmetric phase diagram, h = 0 . . . . . . . . . . . . . . . . . . 23

1.4 Replica-symmetry breaking . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.4.1 The need for replica-symmetry breaking . . . . . . . . . . . . . . . . 23

1.4.2 One-step replica symmetry breaking (1RSB) free energy . . . . . . . 24

1.4.3 Interpretation of the 1RSB ansatz and choice of m . . . . . . . . . . 27

1.4.4 Back to the free energy . . . . . . . . . . . . . . . . . . . . . . . . . 29

1.5 The number of critical points . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1.5.1 The Kac-Rice formula . . . . . . . . . . . . . . . . . . . . . . . . . . 32

1.5.2 Applying the Kac-Rice formula . . . . . . . . . . . . . . . . . . . . . 34

1.5.3 Asymptotics of the number of critical points . . . . . . . . . . . . . . 35

1.6 Back to the replica method . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Bibliographic notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

1.7 Some omitted technical calculations . . . . . . . . . . . . . . . . . . . . . . 42

1.7.1 Proof of Lemma 1.1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 42

1.7.2 Proof of Eq. (1.2.11) . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2 The Sherrington-Kirkpatrick model and Z2 synchronization 47

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.1.1 The group synchronization problem . . . . . . . . . . . . . . . . . . 47

2.1.2 Extremal cuts of random graphs . . . . . . . . . . . . . . . . . . . . 49

2.2 The replica symmetric asymptotics . . . . . . . . . . . . . . . . . . . . . . . 52

iii

2.2.1 The replica calculation . . . . . . . . . . . . . . . . . . . . . . . . . . 522.2.2 The replica symmetric phase diagram . . . . . . . . . . . . . . . . . 562.2.3 Bayes optimal and maximum likelihood estimation . . . . . . . . . . 57

2.3 The replica symmetric asymptotics: cavity approach . . . . . . . . . . . . . 582.3.1 The distribution of cavity fields . . . . . . . . . . . . . . . . . . . . . 592.3.2 The free energy density . . . . . . . . . . . . . . . . . . . . . . . . . 62

2.4 Algorithmic questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642.4.1 From the cavity method to message passing algorithms . . . . . . . 652.4.2 Approximate Message Passing . . . . . . . . . . . . . . . . . . . . . 66

2.5 Replica symmetry breaking . . . . . . . . . . . . . . . . . . . . . . . . . . . 702.5.1 k-step replica symmetry breaking . . . . . . . . . . . . . . . . . . . . 702.5.2 Taking the limit of full RSB . . . . . . . . . . . . . . . . . . . . . . . 752.5.3 On the physical interpretation of Parisi’s PDE . . . . . . . . . . . . 78

2.6 Cavity method with replica symmetry breaking . . . . . . . . . . . . . . . . 802.6.1 A digression into Poisson processes . . . . . . . . . . . . . . . . . . . 812.6.2 The 1RSB cavity recursion . . . . . . . . . . . . . . . . . . . . . . . 832.6.3 Simplifying the cavity recursion . . . . . . . . . . . . . . . . . . . . . 862.6.4 Computing the free energy density . . . . . . . . . . . . . . . . . . . 88

2.7 Rigorous bounds via interpolation . . . . . . . . . . . . . . . . . . . . . . . 902.7.1 Thermodynamic limit: proof outline . . . . . . . . . . . . . . . . . . 912.7.2 Proof of the replica symmetric upper bound . . . . . . . . . . . . . . 912.7.3 Ruelle Probability Cascades . . . . . . . . . . . . . . . . . . . . . . . 932.7.4 The RSB upper bound: Proof of Theorem 10 . . . . . . . . . . . . . 96Bibliographic notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101Bibliographic notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

A Probability theory inequalities 105A.1 Basic facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105A.2 Basic inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105A.3 Concentration inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

B Summary of notations 107B.1 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107B.2 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107B.3 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107B.4 Probability distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

iv

Introduction

I thought fit to write out for you and explain in detail in the same book thepeculiarity of a certain method, by which it will be possible for you to get astart to enable you to investigate some of the problems in mathematics bymeans of mechanics. [. . . ] it is of course easier, when we have previouslyacquired, by the method, some knowledge of the questions, to supply the proofthan it is to find it without any previous knowledge.

— Archimedes, On the method

This tutorial is based on lecture notes written for a class taught in the Statistics De-partment at Stanford in the Winter Quarter of 2017 and then again in Fall 2021. The classwas called ‘Methods From Statistical Physics (Stats 369)’ and was addressed to studentsfrom Statistics, Mathematics, and Engineering Departments with a solid background inprobability theory, but no previous knowledge of physics. The objective was to provide aworking knowledge of some of the techniques developed over the last 40 years by theoreticalphysicists and mathematicians to study mean field spin glasses and their applications tohigh-dimenensional statistics and statistical learning.

Why mean field spin glasses?

The history of the subject in truly remarkable1. Spin glass models were introduced byphysicists in the 1970s to model the statistical properties of certain magnetic materials.Over the last half century, these models have motivated a blossoming line of mathematicalwork with applications to multiple fields, at first sight distant from physics.

From a mathematical point of view, spin glasses are high-dimensional probability dis-tributions, i.e. probability distributions over Rn, n 1, which are usually written in theGibbs-Boltzmann form

µ(dσ) ∝ eβH(σ) ν0(dσ) . (0.0.1)

Here ν0 is a ‘simple’ reference measure (for instance the uniform distribution over +1,−1nor the uniform distribution over the sphere Sn−1), the exponential weight H : Rn → R isknown as the Hamiltonian (the standard convention in physics is to refer to −H as theHamiltonian), and β > 0 is known as the inverse temperature.

Of course, any probability measure in Rn can be written in the form (0.0.1). However inspin glass models, H(σ) is typically as sum of polynomially many (in n) ‘simple’ terms (e.g.

1The reader interested in early historical overviews might wish to consult the sequence of seven expositoryarticles written on by Phil Anderson, between 1988 [And88] and 1990 [And90], or of course the unsurpassedcollection of articles (and accompanying introductions) in [MPV87]. For a personal account, see [Cha21].

1

low degree monomials in the coordinates σi), and hence the form (0.0.1) is meaningful. Itis worth mentioning that, while we focus for simplicity on probability measures over Rn,spin glass models have been studied in other product spaces X n as well.

In spin glass models, the HamiltonianH( · ) itself is random or, to be precise, H(σ)σ∈Rnis a stochastic process indexed by σ ∈ Rn (or σ ∈ Σ, where Σ denotes the support of ν0).Therefore the measure µ(dσ) is a random probability measure. A specific spin glass modelis defined by specifying the distribution of the process H (alongside the reference measureν0).

At first sight, studying random probability measures might seem a somewhat exoticendeavour. However, a little thought reveals that random probability measures are bothubiquitous and useful:

1. Consider a statistical estimation problem: we want to estimate an unknown vectorx ∈ Rn from some data Y . We will see several examples of this problem in thistutorial. A Bayesian approach postulates a prior distribution over x, and then forma posterior, namely the conditional distribution of x given Y , under that prior. Theposterior is a random probability distribution over Rn (because Y is random).

2. Consider an optimization problem of the form maxσ∈ΣH(σ). I many circumstances,we have a probabilistic model for H. For instance, this is the case in empirical riskminimization, which is the standard approach to statistical learning. Of course, inorder to understand the properties of this optimization problem, it is important tounderstand the geometry of the (random) superlevel sets ΣE := σ ∈ Σ : H(σ) ≥E. It turns out that—in many cases—the distribution (0.0.1) is closely related tothe uniform distribution over ΣE(β) for a certain E(β). Therefore, the Gibbs measure(0.0.1) is a powerful tool to explore the random geometry of superlevel sets.

3. In physics, the Hamiltonian H is random because, for instance, the spins σi aremagnetic moments associated to impurities at random locations in an otherwise non-magnetic material. More generally, H can be random because it is the Hamiltonianof a disordered system, whose the ‘disorder’ degrees of freedom are not thermalized.

Mean field spin glass models are a special subfamily of spin glass models. Roughlyspeaking, they are characterized by the fact that the coordinates of σ (the ‘spins’ inphysics language) are ‘indistinguishable’ from the point of view of the process2 H( · ). Thefirst model of this type, which we will study in Chapter 2, was first introduced by DavidSherrington and Scott Kirkpatrick in 1975 [SK75] to provide an idealized model that couldbe amenable to mathematical analysis.

In the near half-century since then, and starting with the invention of ‘replica symmetrybreaking’ by Giorgio Parisi [Par79a], physicists have developed a number of sophisticatednon-rigorous techniques to analyze mean field spin glasses and characterize their high-dimensional behavior. While several of the physicists’ results and technique are outstandingchallenges for mathematicians, since the early 2000’s, there has been increasing success inrigorizing some of these ideas. These developments have given birth to a rich and rapidlyevolving area of probability theory.

In parallel with these developments, it has become increasingly clear that understandingthe behavior of random high-dimensional probability distributions µ(dσ) and of random

2Formally, for any fixed number of vectors σ1, . . . ,σk, the joint distribution of (H(σ1), . . . , H(σk)) de-pends on σ1, . . . ,σk only through the joint empirical distribution of their coordinates n−1 ∑n

i=1 δσ1,i,...,σk,i .

2

high-dimensional objective functions H(σ) is crucial in a number of mathematical disci-plines beyond theoretical physics. We mentioned above high-dimensional statistics andstatistical learning: tools and intuitions from physics have found countless applications inthese areas in the last 10-15 years. In the opposite direction, high-dimensional statisticsand statistical learning have brought new questions and stimulated new developments inspin glass theory.

This tutorial is mainly aimed at researchers in Statistics, Mathematics, Computer Sci-ence, who want to learn some of the important tools and ideas in this area.

The style of this tutorial

This tutorial is deliberately written in a somewhat non-standard style, from several view-points:

Concrete problems. Rather than developing the theory in the most general setting,we focus on two concrete problems that are motivated by questions in statisticalestimation. Each of the next two chapters is dedicated to one such problem.

We use each of these examples as a pretext for presenting a number of mathematicaltechniques. We believe it is best to learn these technique on concrete applications.

Non-exhaustive. Our treatment is far from exhaustive, even for each of the specific prob-lems that we treat. On the other hand, while we use these examples as motivation,we do not hesitate in pursuing detours that are interesting, but indirectly related tothe original questions posed by those examples.

Rigorous vs. non-rigorous techniques. We present a mixture of non-rigorous and rig-orous techniques. Whenever something is proven (or a proof in the literature is in-dicated, or sketched) we emphasize this by using the labels ‘theorem’, ‘lemma’, andso on. On the other hand, we explain non-rigorous techniques on examples for whichrigorous alternatives (yielding the same conclusions) are available.

There are countless reasons for learning non-rigorous techniques in parallel with rigor-ous ones. Among others: (i) They have driven this research area; (ii) Properly used,they provide the correct answer with significantly less work; (iii) They apply morebroadly; (iv) They can provide invaluable insights/conjectures for rigorous research.

As shown by the quote above, reason (iv) was already acknowledged by Archimedesmore than two millennia ago.

Exercises. As explained above, this tutorial is based on a class taught at Stanford. Weinclude the exercises developed for that class (often generalizations of the modelstreated in the main text). The importance of hands-on practice in mathematicscannot be overstated. Even more so when learning a completely different point ofview (e.g. learning non-rigorous techniques for a mathematically minded person, orvice versa).

Acknowledgements

Teaching a class is always a dialogue between the teacher(s) and the student(s). Evenmore so for a small specialized class like this. Therefore, we are primarily indebted to

3

the students who took Stats 369 at Stanford: Jie Jun Ang, Mona Azadkia, Erik Bates,Zhou Fan, Henry Friedlander, Yanjun Han, Sifan Liu, Song Mei, Aran Nayebi, Phan MinhNguyen, Allan Raventos Knohr, David Ritzwoller, Christian Serio, Pragya Sur, MatteoSesia, Javan Tahir, Andy Tsao, Atsushi Yamamura, Guanyang Wang, Jun Yan, ZhenyuanZhang, Kangjie Zhou, Zhengqing Zhou.

We also acknowledge funding from the NSF grant CCF-2006489, the ONR grant N00014-18-1-2729 (AM) and a Harvard Dean’s Competitive Fund Award(SS).

4

Chapter 1

The p-spin model and tensor PCA

1.1 Introduction

In many modern data analysis problems, data take the form of a multi-dimensional ar-ray [Mør11]. Examples include collaborative filtering [BM16], applications of the momentmethod [HKZ12], image inpainting [LMWY13], hyperspectral imaging [LL10, SVdPDMS11],and geophysical imaging [KSS13].

Information is extracted from such data by positing a latent low-rank structure. Tensorprincipal component analysis (PCA) provides a simple formalization of this problem, whichhas attracted significant interest over the last few years.

It also provides an excellent sandbox for learning spin glass techniques, because thecorresponding spin glass model (known as the p-spin spherical model) has been studied inphysics for over 20 years, as discussed in the bibliography notes at the end of this chapter.

1.1.1 Statistical model and Gibbs measure

Consider the following signal-plus-noise model for a tensor Y ∈ (Rn)⊗k:

Y = λv0⊗k +W . (1.1.1)

Here v0 ∈ Rn, ‖v0‖2 = 1 is an unknown vector, λ ∈ R≥0 is the signal-to-noise ratio, andW ∈ (Rn)⊗k is a noise tensor. We want to estimate v0 given an observation of the tensorY . We will assume W to be a Gaussian symmetric tensor with independent entries upto symmetries. More precisely, let G = (Gi1,...,ik)i1,...,ik∈[n] be a tensor with i.i.d. entriesGi1,...,ik ∼ N(0, 1). For a permutation over k elements, π ∈ Sk, we let Gπ be the tensorobtained by permuting the indices of G, namely Gπi1,...,ik = Giπ(1),...,iπ(k) . We then let

W =1√k!n

∑π∈Sk

Gπ . (1.1.2)

Note in particular that W = W π and, for i1 < i2 < · · · < ik we have Wi1,...,ik ∼i.i.d.N(0, 1/n). Our analysis is fairly insensitive to the distribution of entries with coincidingindices, but the present choice is particularly convenient because the resulting distributionof W is invariant under rotations1 in Rn.

1Formally this means that, for any n × n orthogonal matrix Q, defining the tensor W ′ via W ′i1...ik :=∑j1...jk≤n

Wj1...jkQj1i1 · · ·Qjkik , we have W ′ d= W (W ′ is distributed as W ).

5

The maximum likelihood estimator for v0 minimizes the negative log-likelihood

− log p(Y |v0 = σ) = const.+n

2k!‖Y − λσ⊗k‖2F (1.1.3)

over unit norm vectors σ (where const. is a constant independent of σ). Here ‖T ‖F is theFrobenius norm of a tensor T , namely ‖T ‖2F :=

∑i1,...,ik

T 2i1...ik

.

We will denote by 〈 · , · 〉 the standard scalar product between tensors. Namely, giventwo tensors S,T ∈ (Rn)⊗k, we let

〈S,T 〉 =n∑

i1,...,ik=1

Si1,...,ikTi1,...,ik . (1.1.4)

In particular, ‖T ‖2F = 〈T ,T 〉. With this notation, minimizing the negative likelihood− log p(Y |v0 = σ) is equivalent to solving the following maximization problem

maximize 〈Y ,σ⊗k〉 , (1.1.5)

subject to σ ∈ Sn−1 , (1.1.6)

where Sn−1 ≡ x ∈ Rn : ‖x‖2 = 1 is the unit sphere in n dimensions. Explicitly, theobjective that is being maximized is 〈Y ,σ⊗k〉 =

∑ni1,...,ik=1 Yi1,...,ikσi1 · · ·σik .

An alternative estimation procedure is obtained by introducing a prior distributionon v0. Note that the problem is invariant under orthogonal rotations of v0. Hence it isnatural to use a prior that is also invariant under rotations. The only such prior is theuniform probability distribution over the unit sphere Sn−1, to be denoted by ν0( · ) (Wealert he reader that this is ‘nu-zero’, not ‘v-zero’ which we are using to denote the signal.).Applying Bayes rule, we obtain the following posterior

µBayes(dσ) =1

ZBayes(λ)exp

− n

2(k!)

∥∥Y − λσ⊗k∥∥2

F

ν0(dσ) (1.1.7)

=1

ZBayes(λ)exp

nλk!〈Y ,σ⊗k〉

ν0(dσ) . (1.1.8)

The constant ZBayes is completely determined by the normalization condition∫µBayes(dσ) =

1. It depends on λ and Y but we will often omit these dependencies whenever clear fromthe context.

It is worth opening a parenthesis on why it makes sense to study the uniform prior:the choice of a specific prior can lead to estimation methods that might perform poorly ingeneral. In the present case (and considering a rotation invariant loss), the uniform prioris ‘least favorable’ in the sense of statistical minimax theory. This is a consequence of theinvariance of the problem and is formalized by the Hunt-Stein theorem [LC06]. The leastfavorable property means two things:

1. The Bayes error for any other prior is not larger than the Bayes error for the uniformprior.

2. The Bayes estimator with respect to the uniform prior is minimax optimal, in thesense of performing at least as well on any distribution of v0 as on the uniform prior.

(In fact in the present case proving these properties on ν0 is a straightforward exercise.)

6

Motivated by the above two approaches to estimation (maximum likelihood and Bayes),we consider the following family of probability measures indexed by β, h ∈ R:

µn,λ,β,h(dσ) =1

Zn(β, λ, h)exp

nβ√2(k!)

〈Y ,σ⊗k〉+ nh〈v0,σ〉ν0(dσ) . (1.1.9)

In the following we will omit some of the arguments n, λ, β, h unless necessary. We will referto µn,λ,β,h(dσ) as the ‘Gibbs (or Boltzmann) measure’. We will also refer to the quantityH(σ) ≡ (nβ/

√2(k!)) 〈Y ,σ⊗k〉+nh〈v0,σ〉 appearing in the exponent as the ‘Hamiltonian’

(the standard definition in statistical physics differs from this by a constant factor).A few remarks:

1. The measure µn,λ,β,h(dσ) is in fact a random probability measure because it dependson the random tensor Y .

2. The maximum likelihood estimator is recovered as β → ∞, with h = 0. Indeed,in this limit the measure µn,λ,β,0 concentrates on the (almost surely unique up tosymmetry σ → −σ) maximizer of 〈Y ,σ⊗k〉 over the sphere.

3. The Bayes posterior corresponds to β = λ√

2/k!, h = 0: µn,λ,β=λ

√2/k!,0

(dσ) =

µBayes(dσ).

The line β = λ√

2/k! in the β, λ plane is known in physics as the ‘Nishimori line,’and some consequences of the tower property of conditional expectation are knownas Nishimori identities. In fact, these identities are quite natural from a Bayes per-spective.

4. For any β ∈ R≥0, h = 0 the above measure can be used to construct interestingestimators.

The term h〈v0,σ〉 is instead not accessible to a statistician, and is introduced as adevice to analyze the properties of µ. Eventually it needs to be set to zero.

1.1.2 Estimators

As mentioned above, we are interested in estimating the unknown vector v0. An estimatoris a map v : Y 7→ v(Y ) ∈ Rn. We will, in general, not require v(Y ) to have unit norm.We can evaluate the quality of an estimator through a risk function, which takes the form

Rn(v) := E `(v(Y ),v0) . (1.1.10)

Note that for k even, it is only possible to estimate v0 up to an overall sign. We willtherefore consider losses: ` : Rn × Rn → R that are invariant to this sign change. Threewell known examples are

• The overlap:

Qn(v) ≡ E |〈v(Y ),v0〉|‖v(Y )‖2‖v0‖2

. (1.1.11)

Note that good estimation corresponds to large overlap. Also notice that for this met-ric we can restrict without loss of generality to normalized estimators, e.g. estimatorssatisfying ‖v(Y )‖2 = 1 almost surely.

7

• The mean square error:

msen(v) ≡ E

mins∈+1,−1

∥∥v(Y )− sv0‖2. (1.1.12)

The inner minimization over the global sign s is introduced to make the loss invariantunder v0 → −v0.

• The matrix mean square error. In this case we attempt to estimate v0v0T, and is

therefore natural to consider a general matrix estimator M : Y 7→ M(Y ):

MSEn(M) ≡ E∥∥M(Y )− v0v0

T‖2F. (1.1.13)

Of course, given a vector estimator v, we can construct a matrix estimator viaM(Y ) ≡ v(Y )v(Y )T.

We can use the measure µn,λ,β,h to construct estimators in a number of ways. Thesimplest one is by taking expectation

vβ,λ(Y ) :=

∫Sn−1

σ µn,λ,β,0(dσ) . (1.1.14)

It is an elementary fact that (for β = λ√

2/k!), vβ,λ(Y ) maximizes a version of the overlapQ0n(v) in which the absolute value is dropped. Also it minimizes a version of the mean

square error mse0n(v) in which the minimization over s is dropped. In formulas :

Q0n(v) ≡ E

〈v(Y ),v0〉‖vβ(Y )‖2‖v0‖2

, mse0

n(v) ≡ E∥∥v(Y )− v0‖2

. (1.1.15)

A natural matrix estimator is

Mβ,λ(Y ) :=

∫Sn−1

σσT µn,λ,β,0(dσ) . (1.1.16)

This minimizes the matrix mean square error of Eq. (1.1.13).Note that the conditional expectation estimator vβ,λ(Y ) of Eq. (1.1.14) becomes iden-

tically zero for k even, because the Gibbs measure µn,λ,β,0 is invariant under reflections(µn,λ,β,0(A) = µn,λ,β,0(−A) for any A ⊆ Sn−1). Indeed, no non-trivial estimation is possibleunder the risk metrics Q0

n(v) and mse0n(v) in this case, because Y is left unchanged upon

change of v0 into −v0.Nevertheless the Gibbs measure µn,λ,β,0 contains useful information despite the reflec-

tion symmetry. Namely, it is expected that µn,λ,β,0 decomposes into a combination ofcomponents that are symmetric to each other:

µn,λ,β,0 =1

2µ+n,λ,β,0 +

1

2µ−n,λ,β,0 , (1.1.17)

µ−n,λ,β,0(A) = µ+n,λ,β,0(−A) ∀A ⊆ Sn−1 . (1.1.18)

Further each of the two components µ±n,λ,β,0 should be significantly more concentrated than

the mixture, and µ+n,λ,β,0 is mostly supported on 〈σ,v0〉 > 0. Setting h > 0 and letting

h→ 0 after n→∞ is expected to be equivalent to computing expectation with respect toµ+n,λ,β,0.

8

This picture suggests several estimators that are non-trivial, even if the conditionalexpectation (1.1.14) is 0. A simple idea would be to sample σ ∼ µn,λ,β,0, and use thisvector as an estimate. This however introduces additional sampling variance term in theestimation error.

An elegant alternative is to compute the matrix expectation Mβ,λ(Y ), and computing

its top eigenvector v1(Mβ,λ(Y )) and corresponding eigenvalue λ1(Mβ,λ(Y )). We thenestimate v0 using

v+β,λ(Y ) ≡

√λ1(Mβ,λ(Y )) · v1(Mβ,λ(Y )) . (1.1.19)

The reason for this notation is that v+(Y ) should be asymptotically equivalent to theexpectation of µ+

n,λ,β,0.

In statistical mechanics analysis, the symmetry problem is addressed by introducingthe symmetry breaking term +nh〈v0,σ〉 with h > 0 in the Gibbs measure. The behaviorof various estimators is analyzed by taking n→∞ first and h→ 0 thereafter. Because ofthe decomposition (1.1.17) it is assumed that this asymptotically equivalent to computingexpectations with respect to µ+

n,λ,β,0.

1.1.3 Free energy, and its connections with estimation and informationtheory

In the statistical mechanics approach, the first crucial step to study the properties of theGibbs measure µn,λ,β,h(dσ) is to compute the asymptotic free energy density:

φ(β, λ, h) = limn→∞

1

nlogZn(β, λ, h) , (1.1.20)

Zn(β, λ, h) =

∫Sn−1

exp nβ√

2(k!)〈Y ,σ⊗k〉+ nh〈v0,σ〉

ν0(dσ) , (1.1.21)

where the limit is assumed to exist almost surely and be non-random. In fact it is not hardto show that logZn is a Lipschitz continuous function of the standard Gaussian vectorG (remember the definition of W in Eq. (1.1.2)) and use Gaussian concentration (seeTheorem 14 in Appendix A) to prove that logZn concentrates exponentially around itsexpectation. This argument shows that the asymptotic free energy density is also equal to

φ(β, λ, h) = limn→∞

Φn(β, λ, h) , Φn(β, λ, h) ≡ 1

nE logZn(β, λ, h) (1.1.22)

assuming that the limit on the right-hand side exists. Proving the existence of the limitfor the expectation is a more difficult mathematical question, which can sometimes beaddressed by showing that the sequence Φn(β, λ, h) is superaddittive. We refer to thebibliographic notes at the end of this chapter.

A clarification on the terminology. The quantity φ defined above is a ‘density’ becauseof the normalization by n: it is the free energy ‘per particle’. Also, the standard physicsconvention is to refer to the quantity −φ/β as the free energy. The quantity φ is sometimesreferred to as ‘pressure’ or ‘free entropy’. Here we will neglect the factor (−1/β) forsimplicity.

The free energy density φ(β, λ, h) has interesting general properties and can be used ina number of ways to characterize the estimation problem. These properties are most easily

9

stated in terms of the non-asymptotic quantity

Φn(β, λ, h) ≡ 1

nE logZn(β, λ, h) . (1.1.23)

We also define the noiseless tensor X0, and the expectations vβ,λ,h(Y ), Xβ,λ,h(Y ):

X0 ≡ v0⊗k , (1.1.24)

vβ,λ,h(Y ) ≡∫Sn−1

σ µn,β,λ,h(dσ) , (1.1.25)

Xβ,λ,h(Y ) ≡∫Sn−1

σ⊗k µn,β,λ,h(dσ) . (1.1.26)

Note that vβ,λ,h(Y ), Xβ,λ,h(Y ) generalize the definitions in the previous sections to h 6= 0.However, these are not statistical estimators unless h = 0.

The following identities show that various expectations of interest can be estimated bycomputing derivatives of Φn(β, λ, h) with respect to its arguments.

Lemma 1.1.1. With the above definitions:

∂Φn

∂h(β, λ, h) = E

∫Sn−1

〈v0,σ〉µn,β,λ,h(dσ)

= E〈v0, vβ,λ,h(Y )〉

, (1.1.27)

∂Φn

∂λ(β, λ, h) =

β√2(k!)

E∫

Sn−1

〈v0,σ〉k µn,β,λ,h(dσ)

=β√2(k!)

E〈X0, Xβ,λ,h(Y )〉

,

(1.1.28)

∂Φn

∂β(β, λ, h) =

1√2(k!)

E∫

Sn−1

〈Y ,σ⊗k〉µn,β,λ,h(dσ)

=1√

2(k!)E〈Y , Xβ,λ,h(Y )〉

.

(1.1.29)

Proof. Differentiating Φn in h, we obtain,

∂Φn

∂h(β, λ, h) = E

[ 1

Zn(β, λ, h)

∫Sn−1

〈v0,σ〉 exp nβ√

2(k!)〈Y ,σ⊗k〉+ nh〈v0,σ〉

ν0(dσ)

]= E

[ ∫Sn−1

〈v0,σ〉µn,β,λ,h(dσ)]

= E[〈v0, vβ,λ,h(Y )〉

]where the final equality follows by the linearity of expectation under µn,β,λ,h(·). The othercomputations are similar.

Our objective is to determine the asymptotics of the estimation error under any ofthe metrics defined in the previous section, cf. Eqs. (1.1.11) to (1.1.13), for the estimatorv+β,λ(Y ). As mentioned in the previous section, the effect of a strictly positive h > 0

should be to break the symmetry and select the µ+β,λ component of the Gibbs measure, cf.

Eq. (1.1.17). In particular, this suggests

limn→∞

E〈v+β,λ(Y ),v0〉

= lim

h→0+limn→∞

E〈vβ,λ,h(Y ),v0〉

= lim

h→0+

∂φ

∂h(β, λ, h) , (1.1.30)

limn→∞

E‖v+

β,λ(Y )‖22

= limh→0+

limn→∞

E‖vβ,λ,h(Y )‖22

. (1.1.31)

Note that in order to determine the estimation error, we need to compute (the asymptoticsof) ‖v+

β,λ,h(Y )‖22. While Lemma 1.1.1 does not describe how to do this, it will result as a

10

byproduct the calculations in the next pages. The quantities in expectations on the right-hand side of Eqs. (1.1.30), (1.1.30) should concentrate around their mean, and thereforethe above imply

limn→∞

Qn(v+β,λ) = lim

h→0limn→∞

Qn(vβ,λ,h) , (1.1.32)

limn→∞

msen(v+β,λ) = lim

h→0limn→∞

msen(vβ,λ,h) . (1.1.33)

In other words, by computing the free energy, and taking derivatives of the result, we canaccess the estimation error. In the following, we will often omit mentioning explicitly thelimit h→ 0+.

The connection between free energy, information theory and estimation theory is par-ticularly elegant in the case of Bayes estimation, which corresponds to taking h = 0,β = λ

√2/k!. We use the notation

Φn,Bayes(λ) = Φn

(β = λ

√2

k!, λ, h = 0

). (1.1.34)

The next lemma connects free energy with mutual information and estimation error.Recall that the relative entropy (Kullback-Leibler divergence) between two probabilitydistributions µ, ν with µ absolutely continuous with respect to ν is defined as

D(µ‖ν) ≡ Eµ

logdµ

(1.1.35)

= Eνdµ

dνlog

, (1.1.36)

where dµdν is the Radon-Nikodym derivative of µ with respect to ν. Given two random

variables X, Y on the same probability space with laws µX , µY and joint law µX,Y , theirmutual information is defined as

I(X;Y ) ≡ D(µX,Y ‖µX × µY ) = E

logdµX,Y

d(µX × µY )(X,Y )

(1.1.37)

= E

logdµX|Y

dµX(X|Y )

, (1.1.38)

where in the last expression µX|Y is the conditional distribution of X given Y . For thereader who might not be familiar with information theory notation, we emphasize thatI(X;Y ) is not a function of the random variables X, Y or of the value they take, but oftheir joint law.

Since µBayes is the conditional distribution of v0 given Y , the mutual information is alsogiven by

I(v0;Y ) = E

log(dµBayes

dν0(v0)

). (1.1.39)

Note that in the above formula, the Radon-Nikodym derivativedµBayes

dν0(v0) is evaluated at

v0 and depends implicitly on Y because µBayes depends on Y .

Lemma 1.1.2. With the above definitions, and letting XBayes(Y ) ≡∫σ⊗kµBayes(dσ), we

have the identities

1

nI(v0;Y ) =

λ2

k!− Φn,Bayes(λ) , (1.1.40)

2

∂λI(v0;Y ) =

1

k!E‖XBayes(Y )−X0‖2F

. (1.1.41)

11

A proof can be found in Section 1.7.1, and is an example of a general relationshipbetween information and estimation.

1.2 Replica symmetric asymptotics

We next describe a first attempt at computing φ(β, λ) using the replica method. As wewill see, the result is correct at small enough β but not in general2. In the next section wewill see how to modify this prediction using the idea of ‘replica symmetry breaking’, andobtain the correct free energy density.

1.2.1 The replica symmetric calculation

We want to compute E logZn. The replica method tries to deduce this from the momentsof Zn, using the identity

E logZn = limr→0

1

rlogEZrn . (1.2.1)

This identity is perfectly correct, e.g. it follows from dominated convergence under verymild integrability conditions on Zn (which are valid in our case) at fixed n. However, afew strange things will happen next

1. We consider r integer.

2. We obtain a formal expression for EZrn that is only valid to leading exponentialorder, i.e. we compute a formula for

limn→∞

1

nrlogEZrn. (1.2.2)

3. We finally take the limit r → 0.

The success of this procedure entirely lies in choice of the ‘formula’ for limn→∞(1/nr) logEZrnand its dependence on r. A naive hope would be that the integer moments EZrn uniquelydetermine the distribution of Zn. However: (i) In most cases of interest the integrabilityconditions for this to happen (e.g. as in Carleman’s condition) are violated; (ii) Mostimportantly, we do not plan to compute these moments exactly, but only their exponentialgrowth rate limn→∞(1/nr) logEZrn.

It is easy to show3 that evaluating this exponential growth rate at integer r does notuniquely determine limn→∞ E logZn. Hence the actual mathematical content of the replicamethod is not in the elementary identity (1.2.1), but rather in the specific constructionof the formula for limn→∞(1/nr) logEZrn which surprisingly captures the probabilisticstructure of the problem.

Let us start by computing EZrn. Note that, by definition

Zn ≡∫Sn−1

expnh〈v0,σ〉+

nβλ√2(k!)

〈v0,σ〉k +nβ√2(k!)

〈W ,σ⊗k〉ν0(dσ). (1.2.3)

2However, it is correct on the important case of Bayes estimation, namely on the line β = λ√

2/k!.3For instance, let Zn = enXn where the random variables Xn satisfy a large deviation principle at rate

n, with convex rate function I(x). Evaluating the exponential growth rate of EZrn only determines I(x)on a set of points away from its minimum.

12

Using this formula r times, and writing the expectation over v0 explicitly as an integralover σ0, we get

EZrn =

∫(Sn−1)r+1

enh

∑ra=1〈σ0,σa〉+ nβλ√

2(k!)

∑ra=1〈σ0,σa〉k

E exp nβ√

2(k!)

r∑a=1

〈W , (σa)⊗k〉ν0,r+1(dσ) ,

(1.2.4)

where the integral is over σ = (σ0,σ1, . . . ,σr) ∈ (Sn−1)r+1, and ν0,r+1 ≡ ν0×· · ·× ν0. Wecan now take the expectation over W . Recalling the definition 1.1.2, we get

E exp nβ√

2(k!)

r∑a=1

〈W , (σa)⊗k〉

= E exp√nβ2

2

r∑a=1

〈G, (σa)⊗k〉

(1.2.5)

= expnβ2

4

r∑a,b=1

〈σa,σb〉k

(1.2.6)

We can substitute this in Eq. (1.2.4) to get

EZrn =

∫(Sn−1)r+1

expnh

r∑a=1

〈σ0,σa〉+nβλ√2(k!)

r∑a=1

〈σ0,σa〉k +nβ2

4

r∑a,b=1

〈σa,σb〉kν0,r+1(dσ)

(1.2.7)

Consider the (r + 1)× (r + 1) matrix Q = (Qab)0≤a,b≤r defined by

Qab = 〈σa,σb〉 for a, b ∈ 0, 1, . . . , r. (1.2.8)

Denote by fn,r(Q) the joint density of the random variables (Q)0≤a<b≤r when σ0,σ1, . . . ,σr

are distributed according to ν0,r+1 = ν0 × · · · × ν0. We then have

EZrn =

∫exp

nh

r∑a=1

Q0,a +nβλ√2(k!)

r∑a=1

Qk0a +nβ2

4

r∑a,b=1

Qkab

fn,r(Q) dQ . (1.2.9)

We need to estimate fn,r(Q) to leading exponential order. For Λ = (Λa,b)0≤a,b≤r a sym-metric positive-definite matrix, consider the following Gaussian measure on (Rn)r+1:

γΛ(dσ) =( n

)n(r+1)/2det(Λ)n/2 exp

− n

2

r∑a,b=0

Λa,b〈σa,σb〉

dσ (1.2.10)

We claim that (see Section 1.7.2)

fn,r(Q) = e−n(r+1)/2+n〈Λ,Q〉/2det(Λ)−n/2GΛ(Q)

gn(1)r+1, (1.2.11)

where GΛ is the density of (Qab)0≤a≤b≤r, Qa,b = 〈σa,σb〉, when σ has distribution γΛ andgn(x) is the density of ‖z‖22 at x when z ∼ N(0, In/n). By the local central limit theorem,we have gn(1) = Θ(n1/2), and GΛ(Q) = Θ(nr(r+1)/2), if Λ = Q−1. Hence

fn,r(Q).= inf

Λen〈Λ,Q〉−Tr(Ir+1)/2det(Λ)−n/2 (1.2.12)

= det(Q)n/2 . (1.2.13)

13

where the last line follows by substituting Λ = Q−1. (Here and below.= denotes equality to

the leading exponential order. Namely, we write f(n).= g(n) if limn→∞ n

−1 log[f(n)/g(n)] =0.)

Substituting in Eq. (1.2.9) and computing the integral over Q by Laplace method, weget

EZrn.= sup

Qexp

nh

r∑a=1

Q0,a +nβλ√2(k!)

r∑a=1

Qk0a +nβ2

4

r∑a,b=1

Qkab +n

2Tr log(Q)

(1.2.14)

= expn supQS(Q)

, (1.2.15)

where we introduced the function

S(Q) ≡ hr∑

a=1

Q0,a +βλ√2(k!)

r∑a=1

Qk0a +β2

4

r∑a,b=1

Qkab +1

2Tr log(Q) . (1.2.16)

Recalling (1.2.1) we get the following formal expression

φ(β, λ, h) = limr→0

1

rsupQS(Q) . (1.2.17)

What is the interpretation for the matrix Q? Recalling that vβ(Y ) =∫Sn−1 σµn(dσ),

we have

E‖vβ(Y )‖22

=

2

r(r − 1)

∑1≤a<b≤r

E

∫(Sn−1)r

Qa,bµn(dσ1) · · ·µn(dσr)

. (1.2.18)

Taking the limit n→∞ (with Laplace method) followed by r → 0 we find that –denotingby Q∗ the point realizing the supremum in Eq. (1.2.17), we have

limn→∞

E‖vβ(Y )‖22

= lim

r→0

2

r(r − 1)

∑1≤a<b≤r

Q∗a,b . (1.2.19)

Proceeding analogously for v0 that is represented as σ0, we get

limn→∞

E〈vβ(Y ),v0〉

= lim

r→0

1

r

r∑a=1

Q∗0,b . (1.2.20)

Let us emphasize that the expression (1.2.17) was the result of a formal manipulation,and giving mathematical sense to such formulas is still an open problem. Indeed theasymptotic formula (1.2.15) only holds for r ≥ 1 a positive integer.

The way one makes sense of this formula in the replica method is to proceed as follows:

• Make an ansatz for the saddle point Q, such that the resulting expression dependsanalytically on r.

• Take the limit r → 0.

The ansatz will be chosen to be a stationary point of S( · ). However, this is not sufficient tofix the choice of Q. We will start from a particularly simple choice and proceed graduallytowards the full form of the ‘replica symmetry breaking’ ansatz. This will be presented inthe context of the Sherrington-Kirkpatrick model in the next chapter.

14

1.2.2 Replica-symmetric ansatz

Note that the function S( · ) is invariant under permutation of the indices 1, . . . , r of thecolumns/rows of Q. Therefore there must exist stationary points that are invariant undersuch permutations. These must have the form (the first constraint comes from the factthat ‖σa‖2 = 1):

Qaa = 1 , for a ∈ 0, 1, . . . , r,Q0a = b , for a ∈ 1, . . . , r, (1.2.21)

Qa1a2 = q , for a1 6= a2, a1, a2 ∈ 1, . . . , r.

This is known as a replica-symmetric ansatz, or replica-symmetric saddle point, and we willuse the notation QRS whenever useful to specify that we are using this ansatz.

We next evaluate S(Q) for this ansatz. The tricky part is to compute Tr log(Q) =log det(Q). Let Q ∈ Rr×r be the principal submatrix of Q corresponding to rows/columnswith indices 1, . . . , r. By a simple application of the matrix determinant lemma

Tr log(Q) = log(1− b2〈1, Q−1

1〉)

+ Tr log(Q) . (1.2.22)

Next note that the eigenvalues of Q are λ1(Q) = 1 + (r− 1)q (with eigenvector equal to 1)and λ2(Q) = · · · = λr(Q) = 1 − q (with eigenspace given by the orthogonal complement

of 1). This also implies 〈1, Q−11〉 = r/(1 + (r − 1)q). It thus follows

Tr log(Q) = log(

1− b2r

1 + (r − 1)q

)+ log

(1 + (r − 1)q

)+ (r − 1) log(1− q) . (1.2.23)

We can then evaluate the exponential rate S(Q), cf. Eq. (1.2.16) on this ansatz

S(QRS) = hrb+βλ√2(k!)

rbk +β2r

4+β2r(r − 1)

4qk

+1

2log(

1− rb2

1 + (r − 1)q

)+

1

2log(

1 +rq

1− q)

+1

2r log(1− q) . (1.2.24)

While we derived this formula for r an integer, the resulting expression makes sense forarbitrary r, and in particular is of order Θ(r) as r → 0. We therefore take this limit anddefine ΨRS(b, q;β, λ, h) = limr→0 r

−1 S(QRS). We get

ΨRS(b, q;β, λ, h) = hb+βλ√2k!

bk +β2

4(1− qk)− 1

2

b2

1− q +q

2(1− q) +1

2log(1− q) .

(1.2.25)

We will omit the argument β, λ, h whenever clear from the context. According to (1.2.17),we should look for a critical point of ΨRS. The partial derivatives are

∂ΨRS

∂b(b, q) = h+ βλ

√k

2(k − 1)!bk−1 − b

1− q , (1.2.26)

∂ΨRS

∂q(b, q) = −kβ

2

4qk−1 − b2

2(1− q)2+

q

2(1− q)2. (1.2.27)

We are then lead to the following conjecture.

15

Conjecture 1.2.1. There exists a region RRS ⊆ R3 such that, for (β, λ, h) ∈ RRS, wehave

φ(β, λ, h) = ΨRS(b∗, q∗;β, λ, h) , (1.2.28)

where (b∗(β, λ, h), q∗(β, λ, h)) is a critical point of ΨRS, i.e. satisfies ∇ΨRS(b, q) = 0.Further, the asymptotic estimation accuracy of the β estimator vβ(Y ), and its norm

are given by

limn→∞

E|〈v0, vβ(Y )〉

∣∣ = b∗(β, λ, h = 0) , (1.2.29)

limn→∞

E‖vβ(Y )‖22

= q∗(β, λ, h = 0) . (1.2.30)

The last two limits are motivated by Eqs. (1.2.19) and (1.2.20).

Remark 1.2.1. Note that we did not write the RS ansatz as a supremum anymore. Indeedone puzzling feature of the replica trick is that the supremum is sometimes changed intoan infimum or a saddle point. This can—sometimes—be understood better using thevariational approach, which we will consider in the next chapter.

Remark 1.2.2. The above conjecture is somewhat vague in what concerns the region ofvalidity of the replica symmetric ansatz RRS. Indeed our calculation gives no clue as towhere the latter does hold. We will see in Section 1.2.6 that the conjecture cannot possiblybe correct for all values of the parameters.

The replica-symmetry breaking calculation of Section 1.4 allows to determine the preciseregion of validity RRS.

Remark 1.2.3. The asymptotic free energy density of this model was characterized rigor-ously by Michel Talagrand (for even k) [Tal06a] and Wei-Kuo Chen (for general k) [Che13],in the case λ = 0. In fact these papers cover all values of β, beyond the regime within whichthe replica symmetric asymptotics is correct and prove a more general, replica symmetrybreaking, expression. For pointers to the mathematical literature in the case λ > 0 werefer to the bibliographical notes at the end of this chapter.

In the next sections we will consider some special choices of the parameters (β, λ, k)that lead to special simplifications. This will help to develop some intuition about thereplica symmetric asymptotics and to understand some of its implications. Throughout wewill set h = 0, and omit the argument h.

1.2.3 Special cases: I. The sequence model, k = 1

In the cases k ∈ 1, 2 the replica symmetric ansatz is expected to be correct. There areseveral heuristic reasons for this, which should be clear once we discuss replica-symmetrybreaking. One important remark is that in these cases the Hamiltonian H(σ) has a uniquelocal maximum on Sn−1 and hence the Gibbs distribution µβ,λ,n is unimodal. As we willsee, replica-symmetry breaking is related to a (quite dramatic) breakdown of unimodality.

Further, the model (1.1.1) can be treated by elementary means in these cases. Itcorreponds to the standard sequence model for k = 1 and to the symmetric matrix spikedmodel for k = 2 (for which classical tools from random marix theory can be used).

For k = 1, the model (1.1.1) reduces to

y = λv0 +w . (1.2.31)

16

In words: we are observing a uniformly random unit vector v0 ∈ Rn, ‖v0‖2 = 1 corruptedby i.i.d. Gaussian noise

Equation (1.2.25) yields

ΨRS(b, q;β, λ) =βλ√

2b+

β2

4(1− q)− 1

2

b2

1− q +q

2(1− q) +1

2log(1− q) . (1.2.32)

Solving the equations ∇ΨRS(b, q) = 0 yields4

q∗(β, λ) = 1 + x∗(β, λ)−√

2x∗(β, λ) + x∗(β, λ)2 , (1.2.33)

b∗(β, λ) =βλ√

2

− x∗(β, λ) +

√2x∗(β, λ) + x∗(β, λ)2

(1.2.34)

x∗(β, λ) ≡ 1

β2(1 + λ2). (1.2.35)

Substituting in Eq. (1.2.32) we obtain the RS prediction for φ(β, λ). It is interesting tofurther specialize this result.

Consider Bayes estimation corresponding to β = λ√

2. Using Eqs. (1.2.33) to (1.2.35)it is immediate that

bBayes(λ) = qBayes(λ) =λ2

1 + λ2, (1.2.36)

where we used the notation bBayes(λ) = b∗(β = λ√

2/k!, λ), qBayes(λ) = q∗(β = λ√

2/k!, λ).Hence, Conjecture 1.2.1 implies

limn→∞

E|〈v0, vBayes(y)〉

∣∣ = bBayes(λ) =λ2

1 + λ2, (1.2.37)

limn→∞

E‖vBayes(y)− v0

∥∥2

2

= 1− 2bBayes(λ) + qBayes(λ) =

1

1 + λ2. (1.2.38)

It is easy to rederive Eq. (1.2.37) using classical tools. Here is a sketch. Recall that thesignal v0 is uniformly distributed on the unit sphere Sn−1. For large n, this distrinutionis close to the isotropic Gaussian. We will therefore assume, without further justification,the simper model v0 ∼ N(0, In/n), and defer to Exercise ?? the task of dealing with theuniform distribution over Sn−1. In this case, the Bayes posterior is also Gaussian. A simplecalculation yields

µBayes,Gauss(dσ) =1

Zexp

− n

2(1 + λ2) ‖σ‖22 + nλ〈y,σ〉

dσ . (1.2.39)

We therefore have

vBayes(y) =λ

1 + λ2y , (1.2.40)

whence Eqs. (1.2.37), (1.2.38) follow.

4There are in fact two solutions. We select the branch that has q∗(β, λ)→ 0 as β → 0.

17

1.2.4 Special cases: II. The spiked matrix model, k = 2

For k = 2, the model (1.1.1) yields

Y = λv0 v0T +W , (1.2.41)

where W ∈ Rn×n, W = W T is a matrix sampled with the GOE(n) (Gaussian OrthogonalEnsemble) distribution. Namely (Wij)i≤j are independent random variables with Wii ∼N(0, 2/n) for 1 ≤ i ≤ n, and Wij ∼ N(0, 2/n) for 1 ≤ i < j ≤ n.

ΨRS(b, q;β, λ) =βλ

2b2 +

β2

4(1− q2)− 1

2

b2

1− q +q

2(1− q) +1

2log(1− q) . (1.2.42)

The stationarity condition ∇ΨRS(b, q) = 0 has three solutions with q∗, b∗ ≥ 0, correspond-ing to different ‘phases’ of the model.

1. Uninformative/paramagnetic phase:

qP(β, λ) = bP(β, λ) = 0 . (1.2.43)

Remembering the interpretation of q∗(β, λ), b∗(β, λ) in Conjecture 1.2.1, we obtain

limn→∞

E‖vβ(Y )‖22

= lim

n→∞E∣∣〈v0, vβ(Y )〉

∣∣ = 0 . (1.2.44)

In other words, under µn(dσ), σ has zero mean. Obviously the estimator vβ(Y ) isnot useful in this case. Further analysis reveals that µn(dσ) is roughly uniform inthe sense that the distribution of (

√nσi1 , . . . ,

√nσi`) converges to N(0, I`) for any

fixed ` as n → ∞. On the other hand correlations are strong enough for σσT tobe positively correlated with Y . Namely, using Lemma 1.1.1, the replica symmetricformula implies

limn→∞

E∫

Sn−1

〈Y ,σσT〉µn(dσ)

= 2∂ΨRS

∂β(0, 0;β, λ) . (1.2.45)

We expect the paramagnetic solution to give the correct asymptotics for β, λ small.Expanding around this stationary point, we get

ΨRS(b, q) =β2

4− 1

2(1− βλ)b2 +

1

4(1− β2)q2 + o(b2, q2) . (1.2.46)

Note that for β < 1 and λ < 1/β, (bP, qP) is a maximum with respect to b and aminimum with respect to q. At these boundaries, new solutions appear.

2. Symmetric/spin glass phase. For β > 1 a distinct solution appears:

qSG(β, λ) = 1− 1

β, (1.2.47)

bSG(β, λ) = 0 . (1.2.48)

In this case the measure µn(dσ) has non-zero mean vβ(Y ), with

limn→∞

E‖vβ(Y )‖22

= 1− 1

β. (1.2.49)

18

However this mean is random and asymptotically uncorrelated with v0: limn→∞ E〈vβ(Y ),v0〉 =0. Again, the estimator vβ(Y ) is not useful in this case: the Gibbs measure is con-centrated around a random direction, which is however uncorrelated with the signal.Notice that this is particularly dangerous from an inference perspective, since it mightlead to overconfidence in a completely noisy estimate.

The value of the free energy is

ΨRS(bSG, qSG) = β − 3

4− 1

2log β . (1.2.50)

Expanding around this point, we get

ΨRS(b, q) =ΨRS(bSG, qSG) +1

2β2(β − 1)

(q − qSG

)2 − 1

2β(1− λ)

(b− bSG

)2(1.2.51)

+ o((b− bSG

)2+(q − qSG

)2). (1.2.52)

Hence for β > 1 (which is required for this to be a solution) and λ < 1, (bSG, qSG)is a maximum with respect to b and a minimum with respect to q. At the boundaryλ = 1, a new solution appears.

3. A recovery/ferromagnetic phase. This solution exists for β < 1 and λ > 1/β or β ≥ 1and λ > 1:

qR(β, λ) = 1− 1

βλ, (1.2.53)

bR(β, λ) =

√(1− 1

λ2

)(1− 1

βλ

). (1.2.54)

Which of the three solutions above should we pick? Of course for large λ the recoverysolution (qR, bR) is the one that makes sense, while for small λ we should switch to either(qP, bP) or (qSG, bSG). In particular for λ small and β small, we expect µn(dσ) to beapproximately uniform over the sphere and hence (qP, bP) should make sense. At firstsight, the derivation in the previous section, and in particular Eq. (1.2.17) seems to suggestthat we have to maximize ΨRS and hence select the stationary point that corresponds toa largest value of ΨRS(q, b). Somewhat surprisingly, this turns out to be incorrect. Forinstance, for λ = 0 and β small (qP, bP) is the only stationary point, and it seems to makeintuitive sense. However, it is a maximum with respect to b and a minimum with respectto q.

Indeed, the following two heuristic rules are normally used:

(A) The correct solution (q∗, b∗) should be a maximum with respect to b and a mini-mum with respect to q. Namely, ΨRS(q∗, b∗) = maxb ΨRS(q∗, b) and ΨRS(q∗, b∗) =minq ΨRS(q, b∗).

(B) At a phase transition line between solutions (q1, b1) and (q2, b2) we must have ΨRS(q1, b1) =ΨRS(q2, b2).

Rule (B) follows from the observation that Φn(β, λ) is uniformly continuous in the pa-rameters (β, λ). We will provide justification of rule (A) when we study the interpolationmethod in the next chapter.

19

RecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecoveryRecovery

SGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformativeUninformative

0.0

0.5

1.0

1.5

2.0

0.0 0.5 1.0 1.5 2.0β

λ

Figure 1.1: The phase diagram of the spiked matrix model (case k = 2 of the spiked tensor).

Some terminology is useful at this point. The model parameters (β, λ) take values inR = R2

≥0. In the present context, phase diagram is a partition of the space of parame-ters R into regions: R = R1 ∪ R2 ∪ . . . such that within region Ri, a specific solutionqi(β, λ), bi(β, λ), depending analytically on β, λ yields the correct prediction for the freeenergy. Each region typically corresponds to a qualitatively different behavior of the Gibbsmeasure µn, as in the present example.

Figure 1.1 depicts the phase diagram for the k = 2 model:

Uninformative : RP =

(β, λ) : β < 1, λ < 1/β, (1.2.55)

Spin glass : RSG =

(β, λ) : β ≥ 1, λ < 1, (1.2.56)

Recovery : RR =

(β, λ) : λ ≥ max(1, 1/β). (1.2.57)

We can use Eqs. (1.2.53), (1.2.54) to compute the mean square error in the recovery phase:

MSE(β, λ) = limn→∞

E

min(‖vβ(Y )− v0‖22, ‖vβ(Y ) + v0‖22

). (1.2.58)

We get, for (β, λ) ∈ RR,

MSE(β, λ) =1

λ2+

[√1− 1

λ2−√

1− 1

βλ

]2

. (1.2.59)

It is interesting to consider two specific lines in the (β, λ) plane, corresponding to Bayesoptimal estimation (and therefore minimax optimal estimation, since the uniform prior isleast favorable), and maximum likelihood estimation.

On the Bayes line β = λ, the mean square error is minimized:

mseBayes(λ) = min

(1,

1

λ2

). (1.2.60)

The limit β → ∞ corresponds to the maximum likelihood estimator v∞(Y ) = v1(Y )and

mseML(λ) = 2− 2

√(1− 1

λ2

)+. (1.2.61)

20

0 2 4 6 8 10

λ

0.0

0.5

1.0

1.5

2.0

2.5

3.0I(

v0;Y

)/n

λc0 2 4 6 8 10

λ

0.0

0.2

0.4

0.6

0.8

1.0

b∗

λc

Figure 1.2: Spiked tensor model with k = 3, on the Bayes line. Left: Mutual information. Right:Order parameter.

As mentioned above, for k = 2 the replica results can be proved easily by randommatrix theory, cf. Exercise 1.5.

1.2.5 Special cases: III. Bayes-optimal estimation

Bayes optimal estimation is recovered for β = λ√

2/k!, and the fact that in this casethe Gibbs measure µn coincides with the posterior leads to additional simplifications. Inthis case (setting h = 0), the replica symmetric stationarity conditions ∇(b,q)ΨRS = 0, cf.Eqs. (1.2.26), (1.2.27), admit a solution with q = b. Indeed, this identity is expected tohold as a consequence of the interpretaton of the ‘order parameters’ b, q in Eq. (1.2.1):

E〈v0, vBayes(Y )〉

= E

〈Ev0|Y , vBayes(Y )〉

= E

‖vBayes(Y )‖22

. (1.2.62)

Substituting this in the replica symmetric free energy (1.2.25), for β = λ√

2/k!, weobtain ΨBayes(b;λ) ≡ ΨRS(b, q = b;β = λ

√2/k!, λ, h = 0),

ΨBayes(b;λ) =λ2

2k!(1 + bk) +

1

2b+

1

2log(1− b) . (1.2.63)

Lemma 1.1.2 suggests that ΨBayes(b;λ) should determine the asymptotic behavior of themutual information. This fact can be proved rigorously (we refer to the bibliographic notesfor further references.)

Theorem 1 ([LML+17]). With the above definitions

limn→∞

1

nI(v0;Y ) =

λ2

k!− sup

b≥0ΨBayes(b;λ) . (1.2.64)

Setting to 0 the derivative of ΨBayes(b;λ) with respect to b (or, equivalently, usingEqs. (1.2.26), (1.2.27)), we obtain that the stationary point b∗ must satisfy

b =ξ bk−1

1 + ξ bk−1, ξ ≡ λ2

(k − 1)!. (1.2.65)

The behavior of the solution is illustrated in Figures 1.2, 1.3. The uninformative pointbP(λ) = 0 is always a solution of Eq. (1.2.65), and a local maximum of the free energy

21

2.8 2.9 3.0 3.1 3.2 3.3

λ

0.65

0.70

0.75

0.80

0.85

I(v

0;Y

)/n

2.8 2.9 3.0 3.1 3.2 3.3

λ

0.0

0.2

0.4

0.6

0.8

1.0

b∗

λc

Figure 1.3: Same as in Figure 1.2: zoom around the critical point.

b 7→ ΨBayes(b;λ). For λ > λs(k), two new solutions appear: a second local maximumbR(λ) > 0, corresponding to non-trivial recovery, and a local minimum 0 < bunst < bR.The point at which a pair of stationary points of the free energy functional appear isreferred to as ‘spinodal point’ in statistical physics. In this case, it is explicitly kiven by

λs(k)2

(k − 1)!=

(k − 1)k−1

(k − 2)k−2. (1.2.66)

The boundary between the two phases, λc(k) > λs(k) is determined by the condition

ΨBayes(bP(λ);λ) = ΨBayes(bR(λ);λ) . (1.2.67)

Solving this equation numerically, we get λc(k = 3) ≈ 2.955.

1.2.6 Special cases: IV. Pure noise, λ = 0

For λ = 0, the condition∂ΨRS∂b = 0, cf. Eq. (1.2.26), implies b = 0. Substituting in

Eq. (1.2.25), and dropping the arguments b = 0, λ = 0, we get

ΨRS(q;β) =β2

4(1− qk) +

q

2(1− q) +1

2log(1− q) . (1.2.68)

The stationarity condition (1.2.26) is satisfied provided

q

(1− q)2=kβ2

2qk−1 . (1.2.69)

For every β, this admits a paramagnetic solution qP = 0, which is a local minimum of ΨRS.For β > βs,RS(k), two new solution appear 0 < qunst < qSG which are the solutions of

qk−2(1− q)2 =2

kβ2. (1.2.70)

The spinodal point is given explicitly by

βs,RS(k)2 =(k − 2)k−2

2kk−1. (1.2.71)

22

The spin glass solution qSG is a local minimum of ΨRS and becomes a global minimumfor β > βc,RS(k), where the critical point βc,RS(k) > βs,RS(k), is given by the conditionΨRS(qP;β) = ΨRS(qSG;β).

We append the subscript RS to emphasize that –in this case– the replica symmetricsolution is only a good approximation but ultimately incorrect (i.e. not asymptotially exactas n→∞). Within the replica method, this can be seen after we study replica-symmetrybreaking.

1.3 The replica symmetric phase diagram, h = 0

For h = 0, and general λ, β, the stationarity condition ∇ΨRS(b, q) = 0 reduces to (cf.Eq. (1.2.26), (1.2.27)):

βλ

√k

2(k − 1)!bk−1 =

b

1− q ., (1.3.1)

kβ2

4qk−1 +

b2

2(1− q)2=

q

2(1− q)2. (1.3.2)

We note that qP = bP = 0 is always a stationary point for this problem. This corre-sponds to the paramagnetic/ uninformative fixed point for this problem. To obtain thespin-glass stationary point, we set bSG = 0 and solve for qSG to obtain the following fixedpoint equation

qk−2SG (1− qSG)2 =

2

kβ2. (1.3.3)

Finally, the recovery fixed point can be obtained by solving for bRS, qRS. As usual, weexpect the recovery fixed point to be the correct one for λ large, while the paramagneticor the spin glass fixed point should be the correct one for small β, λ.

1.4 Replica-symmetry breaking

1.4.1 The need for replica-symmetry breaking

Throughout the previous section, we have implicitly assumed that the RS formula 1.2.25gives the correct formula for the limiting free energy. However, it is known that for β largeenough and λ sufficiently small, ΨRS no longer gives the correct free energy. Historically,this was determined first in the context of the Sherrington-Kirkpatrick (SK) model, whichwe will study in the next chapter. Here we will give a short preview about this model, tomotivate the need to go beyond replica symmetry.

The SK model is a Gibbs probability measure over σ ∈ ±1n. Let W ∼ GOE(n) bea random matrix from the Gaussian Orthogonal Ensemble (this coincides with the casek = 2 of the random tensor defined in the previous sections). At any inverse temperatureβ > 0, we consider the Gibbs measure and free energy

µβ(σ) ≡ 1

Zn(β)eH(σ) , H(σ) =

β

2

∑i,j

Wijσiσj , (1.4.1)

Φn(β) =1

nE log

∑σ∈±1n

eH(σ). (1.4.2)

23

A direct computation reveals that

Φn(β)− β∂Φn

∂β=

1

nEnt(µβ) ≥ 0.

Here Ent(ν) is the Shannon entropy of the discrete probability distribution ν: Ent(ν) :=−∑x ν(x) log ν(x). If the RS ansatz is correct, then limn→∞Φn(β) = ΨRS(β) and therefore

0 ≤ limn→∞

1

nEnt(µβ) = ΨRS(β)− β∂ΨRS

∂β(β) . (1.4.3)

(Note that the limit n→∞ and the derivative with respect to β can be exchanged becauseβ 7→ Φn(β) is convex.)

In their 1975 paper, Sherrington and Kirkpatrick [SK75] computed the RS predictionΨRS(β): we will repeat the same calculation in the next chapter. Among other things theycomputed the prediction for the asymptotic entropy density (1.4.3), and showed that thenon-negativity constraint is violated for β > β0, with β0 a sufficiently large constant.

This implies that the replica symmetric free energy fails to provide the correct answer atsufficiently low temperature. Surprisingly, the phyisicists’ reaction was not that the replicamethod (i.e. guessing the limit as the number of replicas r → 0 from the computationsfor integer r) was incorrect, but that the the assumption of a replica-symmetric stationarypoint Q, cf. Eq. (1.2.21) has to be replaced by a different one. Even if the functional S(Q)is invariant under permutations of the r replicas, the dominating stationary points hadto be non-symmetric. In physics language, the group of symmetries Sr is ‘spontaneouslybroken.’

1.4.2 One-step replica symmetry breaking (1RSB) free energy

The correct form the matrix Q (the correct way to break the replica symmetry) wasintroduced by Parisi in 1979 [Par79a]. The construction of the replica symmetry breaking(RSB) matrix Q is hierarchical and proceeds in rounds. Here we will describe the first stepof this construction (one-step replica symmetry breaking, or 1RSB), which is sufficient forthe model studied here. The full construction will be described in the next chapter5.

In 1RSB, we divide the indices [r] = 1, · · · , r into r/m blocks, each block having melements. We note that the functional S(Q) in (1.2.16) is invariant to the permutation ofthe indices in each block, as well to the permutation of the distinct blocks. These forma subgroup of the group of permutations Sr. We look for a stationary point Q that is

invariant under this subgroup. Namely, we fix a partition [r] = ∪r/m`=1 I`, |I`| = m and set

Qaa = 1 a ∈ 0, 1, · · · , r/m,Q0a = b a ∈ 1, · · · , r/m,Qab = q1 a, b ∈ I`, ` ∈ 1, · · · , r/m,Qab = q0 otherwise.

It is not difficult to check that this is indeed the most general matrix that is invariantunder the subgroup described above (under the additional condition Qaa = 1 for all a thatfollows from the intepretation of Q).

5An interesting question is the following: ‘Is there a simple criterion to decide how many steps of theRSB construction need to be taken to get the exact asymptotics of the free energy?’ While physicists havesome intuition about this, no good mathematical theory exists to provide guidance, and this question isaddressed on a case-by-case basis.

24

This is usually referred to as the 1RSB ansatz and we will refer to the stationary pointas Q1RSB when we use this ansatz. We next compute S(Q1RSB). The tricky part againis to compute Tr log(Q1RSB). To this end, we proceed similarly to the replica-symmetricanalysis. We partition the matrix and denote by Q as the r × r submatrix correspondingto the indices 1, · · · , r. Using the Schur-complement formula, we have,

Tr log(Q1RSB) = log(1− b2〈1, Q−11〉) + Tr log(Q). (1.4.4)

To evaluate the expression above, we need a detailed understanding of the eigenvaluesand eigenvectors of the matrix Q. It is not hard to see that λ1 = 1 + (m− 1)q1 + (r−m)q0

is an eigenvalue with eigenvector 1 with multiplicity one. The other distinct eigenvaluesare λ2 = 1 − q1 + m(q1 − q0) with multiplicity (r/m) − 1 (the corresponding eigenvectorsbeing vectors that are constant within blocks, and orthogonal to the all-ones vector), andλ3 = 1− q1 with multiplicity (r/m)(m−1) (for the orthogonal complement of the previoussubspaces). Now, we can compute

〈1, Q−11〉 =

r

1 + (m− 1)q1 + (r −m)q0. (1.4.5)

Therefore (1.4.4) immediately implies that

Tr log(Q1RSB) = log(

1− b2r

1 + (m− 1)q1 + (r −m)q0

)+ log(1 + (m− 1)q1 + (r −m)q0)

+( rm− 1)

log(1− q1 +m(q1 − q0)) +r

m(m− 1) log(1− q1). (1.4.6)

Recall the functional S(Q) from (1.2.16):

S(Q) =βλ√2(k!)

k∑a=1

Qk0a +β2

4

r∑a,b=1

Qkab +1

2Tr log(Q). (1.4.7)

To evaluate this functional under the 1RSB ansatz, we note that:

k∑a=1

Qk0a = rbk,

k∑a,b=1

Qkab = r +r

mm(m− 1)qk1 + (r2 − r

mm2)qk0 . (1.4.8)

We define the 1RSB free energy functional

Ψ1RSB(b, q0, q1,m) = limr→0

1

rS(Q1RSB). (1.4.9)

Plugging in the expressions obtained above and taking the limit as r → 0, we obtain

Ψ1RSB(b, q0, q1,m) =βλ√2(k!)

bk +β2

4[1− (1−m)qk1 −mqk0 ] +

1

2

q0 − b21− (1−m)q1 −mq0

+1

2mlog(1− (1−m)q1 −mq0)− 1−m

2mlog(1− q1). (1.4.10)

Notice that Ψ1RSB also depends on β, λ, but we will omit this dependency unless needed.Before analyzing the 1RSB free energy, we carry out some sanity checks to ensure the

consistency of this solution with the RS solution studied earlier. Notice that the space ofreplica symmetric matrices Q is a subset of the set of 1RSB matrices, and can be recoveredin at least three different ways:

25

1. q1 = q0 = q. In this case, it is easy to check that the free energy is independent of mand

Ψ1RSB(b, q, q,m) = ΨRS(b, q) .

2. m → 0. It is somewhat less obvious that we should recover the RS ansatz in thiscase, but we recall that r = 0, and therefore this limit coincides with m→ r. Indeed,we obtain

limm→0

Ψ1RSB(b, q0, q1,m) = ΨRS(b, q1) .

3. m = 1. In this case the 1RSB ansatz Q1RSB reduces to the RS one with q = q0. It iseasy to see that in this case Ψ1RSB(b, q0, q1,m)→ ΨRS(b, q0).

Now we analyze the stationary points of the 1RSB free energy functional Ψ1RSB. Tosimplify the analysis, throughout this discussion, we set

λ = 0 .

We will leave it as an exercise for the reader to generalize the discussion below to λ > 0.Recall, from equation (1.2.29) that b is the asymptotic scalar product between the signal

v0 and the estimate vβ(Y ). As a result, we expect b = 0 in for λ = h = 0, and indeed,this choice satisfies the stationarity condition ∂bΨ1RSB = 0.

In order to determine q0, q1, we need to solve the stationarity conditions:

∂Ψ1RSB

∂q0=∂Ψ1RSB

∂q1= 0.

It is easy to check that there are always stationary points of the form (q0 = 0, q1). We willrestrict ourselves to stationary points of this form. This assumption is in fact not required,and we could study the larger space of solutions with q0 ≥ 0, but eventually the correctsolutions (maximizing Ψ1RSB) are obtained by setting q0 = 0. Although the choice q0 = 0appears ad hoc at this point, its motivation will become clear when we discuss the purestates decomposition of the state space under the 1RSB heuristic in the rest of this section.

Setting q0 = b = 0 (and dropping for simplicity the dependence upon these arguments),the free energy functional reduces to the following simple form:

Ψ1RSB(q1,m) =β2

4[1− (1−m)qk1 ] +

1

2mlog(1− (1−m)q1)− 1−m

2mlog(1− q1).

(1.4.11)

∂Ψ1RSB

∂q1= −kβ

2

4(1−m)qk−1

1 +1−m

2

q1

(1− (1−m)q1)(1− q1).

For k ≥ 3, q1 = 0 is always a stationary point. This corresponds to the paramagneticreplica symmetric stationary point studied in the previous sections. Thus we look forconditions under which other non-vanishing stationary points appear. If we set q1 6= 0, anystationary point must satisfy the equation

f1(q1) := qk−21 (1− q1)(1− (1−m)q1) =

2

kβ2=

2

kT 2. (1.4.12)

26

0.00

0.05

0.10

0.15

0.20

0.25

0.00 0.25 0.50 0.75 1.00q1

f 1

(a)

0.00

0.25

0.50

0.75

1.00

0.00 0.05 0.10 0.15 0.20 0.25T

q 1*(β

, m)

(b)

Figure 1.4: On the left, we plot f1 as a function of q1, k = 3, m = 0.5. In the high-temperatureregime (shown in BROWN) (1.4.12) does not have any solution. New solutions appear at

T = Td(m) (shown in RED). For T < Td(m) (shown in BLUE), there are two roots—the largerroot is physically relevant. On the right, we

Here T = 1/β is the temperature.The behavior of the fixed points is shown in Fig 1.4. We note that for T > Td(m),

the fixed point equation has no roots and therefore the paramagnetic fixed point is theonly solution. Therefore, we expect that for T sufficiently large, the replica symmetric freeenergy gives the right solution. Two new solutions appear for T < Td(m). We will choosethe larger root q1,∗(β,m) as the physically meaningful stationary point. To motivate thischoice, recall the RS heuristic, where (1.2.30) motivates that q∗ → 1 as β →∞. Althoughwe are working with the 1RSB heuristic, this choice turns out to be the correct one.Further, analysis of the free energy functional reveals that for T > Td(m), q1,∗(β) is a localminimizer of the free energy functional, and Ψ1RSB(q1,∗(β,m),m) < Ψ1RSB(0,m). Theexplicit dependence of q1,∗(β,m) on T and effect of varying m are exhibited in Figure 1.4.

This clarifies the choice of q1 for a fixed m,β. We next address the choice of m.

1.4.3 Interpretation of the 1RSB ansatz and choice of m

Following the same heuristics as that for q0, q1, we will seek to minimize the functionalwith respect to m. However, we first need to specify the domain of the parameter m. Tothis end, we postpone this analysis for the moment, and go back to the replica method todevelop an interpretation of the parameter m.

Recall the Gibbs measure µn of Eq. (1.1.9), which we will write in compact form as

µn(dσ) =1

ZneHn(σ) dσ , (1.4.13)

with dσ the uniform probability measure on the sphere Sn−1. Let σ1,σ2 ∼i.i.d. µn (equiv-alently (σ1,σ2) ∼ µn ⊗ µn). We will compute the moments of 〈σ1,σ2〉, averaged withrespect to the law of the Gaussian tensor W .

For any integer `, we have

E[ ∫

(Sn−1)2〈σ1,σ2〉`µn(dσ1)µn(dσ2)

]= E

[Z−2n

∫(Sn−1)2

〈σ1,σ2〉`eHn(σ1)+Hn(σ2)dσ1dσ2

].

Using the replica trick, we have,

E[ ∫

(Sn−1)2〈σ1,σ2〉`µn(dσ1)µn(dσ2)

]= lim

r→0

1

E[Zrn]E[Zr−2n

∫(Sn−1)2

〈σ1,σ2〉`eHn(σ1)+Hn(σ2))dσ1dσ2

].

27

We evaluate the right-hand side for r a positive integer. Using the definition of the partitionfunction Zn, we obtain,

E[Zr−2n

∫(Sn−1)2

〈σ1,σ2〉`eHn(σ1)+Hn(σ2)dσ1dσ2

]= E

[ ∫(Sn−1)r

〈σ1,σ2〉`eHn(σ1)+···+Hn(σr)dσ1 · · · dσr]

=2

r(r − 1)

∑1≤a<b≤r

E[ ∫〈σa,σb〉`e

∑ra=1H(σa)dσ1 · · · dσr

]=

2

r(r − 1)

∑1≤a<b≤r

∫Q`ab exp (ng(Q))fn,r(Q)dQ. (1.4.14)

Proceeding analogously for the factor E[Zrn], and inverting as usual the limits r → 0 andn→∞ (and recalling the definition of S(Q)), we obtain

limn→∞

E[ ∫

(Sn−1)2〈σ1,σ2〉`µn(dσ1)µn(dσ2)

]= lim

r→∞

2

r(r − 1)

∑1≤a<b≤r

limn→∞

∫Q`ab exp (nS(Q))dQ∫

exp (nS(Q))dQ.

(1.4.15)

As n → ∞, the integral is dominated by the overlap matrix Q∗ which maximizes thefunctional S(Q). We will evaluate the functional at the 1RSB stationary point Q1RSB:

limn→∞

E[ ∫

(Sn−1)2〈σ1,σ2〉`µn(dσ1)µn(dσ2)

]= lim

r→0

2

r(r − 1)

∑1≤a<b≤r

(Q1RSBab )`.

Next, we compute the right-hand side for the 1RSB stationary point Q1RSB to obtain∑1≤a<b≤r

(Q1RSBab )` = −r

2(1−m)q`1 −

r

2(1− (1−m))q`0 +O(r2).

This implies

limn→∞

E[ ∫〈σ1,σ2〉`µn(dσ1)µn(dσ2)

]= mq`0 + (1−m)q`1.

If we set ρn to be the law of the overlap 〈σ1,σ2〉, then the heuristic computation abovesuggests that, as n→∞,

ρnw⇒ ρ := mδq0 + (1−m)δq1 . (1.4.16)

(Herew⇒ denotes weak convergence.)

Equation (1.4.16) suggests that we should minimize the free energy functional overm ∈ [0, 1]. Further, it implies the following picture for the behavior of the random Gibbsdistribution µn (We will focus for simplicity on the case β = λ = h = 0). For β sufficientlysmall, we expect the replica symmetric ansatz to be correct, and q0 = q1 = 0 or m = 1. Inthis case, two iid samples σ1,σ2 are typically approximately orthogonal, i.e., 〈σ1,σ2〉 ≈ 0.This behavior is observed for iid samples from the uniform distribution on Sn−1. In thisrespect, for β sufficiently small, the measure µβ is qualitatively similar to the uniformdistribution: it does not have strong correlation among its variables.

28

When β crosses a critical point, a genuine the 1RSB solution appears (i.e. a solutionwith q0 < q1 and m ∈ (0, 1)). Hence, the behavior of the overlap distribution changesdramatically. In this case, for σ1,σ2 ∼i.i.d. µβ, the overlap 〈σ1,σ2〉 concentrates aroundone of two values, either q0 = 0 or q1 > 0. The interpretation is as follows. The measure µnconcentrates around regions of sphere which correspond to high values of the HamiltonianHn( · ). For β sufficiently large, there are O(1) such regions on the sphere, that are well-separated, i.e. separated a distance of order one. These regions are often referred to as“pure states” or “clusters” in the physics parlance. The fact that q0 = 0 indicates thatthese regions are approximately orthogonal. For two iid samples drawn from the measureµn, either they both belong to the same cluster, in which case their overlap concentratesaround q1, or they belong to different clusters, in which case their overlap concentratesaround 0. Intuitively, we expect q1 → 1 as β → ∞, as the pure state concentrates veryclose to the configurations with the highest values of the Hamiltonian Hn( · ).

We can decompose the Gibbs measure into pure states as

µ(dσ) =N∑α=1

wαµn,α(dσ) , (1.4.17)

µn,α(dσ) =1

µn(Ωα)µn(dσ)1Ωα(σ) , wα = µn(Ωα) . (1.4.18)

Under this description of the measure µβ, the parameter m assumes added significance.Note that wα is the probability that σ belongs to state α, and therefore the probabilitythat two replicas belong to the same state is given by

∑αw

2α. Therefore we expect

1−m = limn→∞

E[ N∑α=1

w2a

]. (1.4.19)

As we will see below, the weights (wα) are random and not equal, even in the limit n→∞.If we nevertheless assumed there was a non random number of states N , all of equal size, theabove would imply N = 1/(1−m). While this last statement is —as we said— incorrect,and in fact we will correct it below, the intuition is still valid. The number m quantifiesthe number of pure states, with m close to 0 corresponding to one or a few pure states,while m close to one corresponding to a large number of pure states (still order one in n).

1.4.4 Back to the free energy

Next, we return to the study of the 1RSB free energy functional Ψ1RSB(q1,m). We wishto minimize this functional jointly with respect to q1,m. We have already studied theminimization problem with respect to q1, for any fixed value of m ∈ [0, 1] yielding theminimizer q1,∗(β,m).

We now study the minimization of Ψ1RSB(q1,∗(β,m),m) as a function of m. Recall that

Ψ1RSB(0,m) =β2

4= ΨRS(q0 = 0),

Ψ1RSB(q1, 1) =β2

4= ΨRS(q0 = 0).

Thus if we set q1 = 0, the value is independent of m. We further recall that q1 = 0 is alwaysa stationary point. Thus at sufficiently high temperature, q1 = 0 is the only stationarypoint, and we recover the replica symmetric solution.

29

0.000

0.005

0.010

0.015

0.0 0.5 1.0 1.5m

g(m

)

T > Td(1)

(a)

0.000

0.001

0.002

0.003

0.0 0.6 1.0 1.2m

g(m

)

Ts<T<Td(1)

(b)

0.000

0.005

0.010

0.015

0.020

0.0 0.6 1.0 1.2m

g(m

)

T<Ts

(c)

Figure 1.5: We plot g(m) vs. m for k = 3. In subplot (a), we consider T > Td(1); the onlyphysically relevant solution is q1 = 0 and we are back to the paramagnetic solution. In (b), we

consider Ts < T < Td(1); m(T ) is represented in BLUE. The function g restricted to the interval[m(T ), 1] is non-negative, and thus the replica symmetric free energy is correct in this regime.

Finally (c) represents the setting T < Ts. g(m) on the interval [m(T ), 1] (represented by the BLUEand RED points attains negative values). Thus the 1RSB free energy strictly improves on the

replica symmetric approximation.

At lower temperatures, we set

g(m) ≡ Ψ1RSB(q1,∗(β,m),m)− β2

4,

where for any m > 0, we recall that q1,∗(β,m) is the largest root of the equation (1.4.12),whenever such a root exists.

We encounter the following cases according to the temperature T :

1. T > Td(1). In this case, a non-trivial solution q1,∗(β,m) exists only for m ≥ m(T ) >1. The only physically relevant fixed point has q1 = 0 and we are back to theparamagnetic solution.

2. Td(1) > T > Ts. In this case, there exists m(T ) such that a non-trivial fixed pointq∗(β,m) exists for m ∈ [m(T ), 1]. However, in this regime, Ψ1RSB(q1,∗(β,m),m) >ΨRS(q0 = 0) for all m ∈ [m(T ), 1]. In this case, as before, the relevant fixed point ism = 1, a situation where q1 is undetermined. Thus the replica symmetric free energyis still correct in this temperature regime.

3. T < Ts, In this regime, there exists m(T ) such that a non-trivial fixed point q∗(β,m)exists for m ∈ [m(T ), 1]. Moreover, there exists m∗(T ) ∈ [m(T ), 1] such that

Ψ1RSB(q1,∗(β,m∗(T )),m∗(T )) < ΨRS(q0 = 0).

30

0.00

0.25

0.50

0.75

1.00

0.0 0.2 0.4 0.6T

m∗(

T)

Figure 1.6: The dependence of m∗(T ) on T . Note the critical static temperature Ts (in blue)

In this case, we choose the fixed point (m, q1) = (m∗(T ), q1,∗(β,m∗(T ))).

Finally, we exhibit the dependence of m∗(T ) on T in Figure 1.6.

One might wonder whether the transition temperature Td(1) has any physical signifi-cance, since the free energy density does not exhibit any non-analyticity at this tempera-ture. It turns out that his temperature corresponds to an important change in the structureof the Gibbs measure, which we will further explore below, but is useful to summarize here.

Imagine that we lower the temperature T of the system continuously. At sufficientlyhigh temperature, the measure µβ behaves roughly like the uniform measure. At Td(1),the system spontaneously breaks up into exponentially many pure states, each of whichhas negligible weight. Since the number of states that contribute to the Gibbs measure isexponentially large, two replicas have negligible probability to belong to the same state.Hence, the overlap concentrates around a single value, and the replica-symmetric freeenergy gives the correct asymptotics.

While this transition does not affect the free energy, it is important in determining thebehavior of various sampling algorithms. This splitting of the state space into many purestates should hinder the exploration of the state space by Glauber or Langevin dynamics.This is sometimes referred to as the dynamical phase transition.

As the temperature is further lowered, the pure states become smaller and a smallernumber of them dominates the Gibbs measure. At the point Ts, an O(1) number of purestates dominate, and this dramatically changes the nature of the measure µn, and theoverlap distribution. This is usually referred to as the static phase transition (or ‘randomfirst order’ phase transition). Below this temperature, the replica symmetric free energyfails to be correct, and must instead be replaced by the 1RSB free energy.

1.5 The number of critical points

In the previous sections, we have studied the Gibbs measure µn(dσ) = expH(σdσ/Znusing the replica method. In this section, we turn to the direct study of properties of thelandscape H(σ).

We will focus again on the case λ = 0. We rescale the Hamiltonian by a factor n to

31

define h(σ) ≡ H(σ)/n, that is

h(σ) =1√

2(k!)〈W,σ⊗k〉 , (1.5.1)

where the symmetric Gaussian tensor W is defined as per Eq. (1.1.2). (This is not to beconfused with the perturbation h〈v0,σ〉 that we previously added to the Hamiltonian andnow set to zero.)

Of course, the free energy density can be thought of as a way to explore the structureof the energy H(σ):

φ(β) = limn→∞

1

nlog

∫exp nβh(σ)ν0(dσ).

For instance, it is not hard to prove that

limβ→∞

φ(β)

β= lim

n→∞E[

maxσ∈Sn−1

h(σ)].

The quantity on the right-hand side is referred to as the ‘ground state energy’ in physics(the usual convention in physics is to call −h the Hamiltonian, and minimize the latter).

Here, we will take a different approach, and study the structure of the critical points ofthe function h( · ) directly. For B ⊂ R, we denote by Cn(B) the number of critical pointsσ of h with h(σ) ∈ B:

Cn(B) =∣∣σ : gradh(σ) = 0, h(σ) ∈ B

∣∣ . (1.5.2)

Here gradh denotes the Riemannian gradient on the sphere Sn−1 and |A| is the cardinalityof the set A. In suitable coordinates, we have gradh(σ) = P⊥σ∇h(σ), where P⊥σ = I−σσT

is the projector onto the linear space orthogonal to σ.In the rest of this section we will estimate the expectation of Cn(B).

1.5.1 The Kac-Rice formula

The Kac-Rice formula allows us to compute the expected number of critical points forGaussian processes, subject to certain regularity conditions.

Lemma 1.5.1 (Kac-Rice formula [AT07]). We have, for any Borel set B ⊂ R,

E[Cn(B)] =

∫Sn−1

E[D(σ)1h(σ)∈B|P⊥σ∇h(σ) = 0]φσ(0)dσ , (1.5.3)

D(σ) ≡∣∣detσ⊥ [P⊥σ∇2h(σ)P⊥σ − 〈σ,∇h(σ)〉I]|,

where φσ( · ) is the density of P⊥σ∇h(σ) with respect to the Lebesgue measure on V ⊥σ∼= Rn−1

(where V ⊥σ is the tangent space to Sn−1 at σ, which we identify with the linear spaceorthogonal to σ). Finally, the determinant is computed on V ⊥σ (namely, detσ⊥(M) isthe determinant of the matrix obtained by representing the operator M with respect to anarbitrary basis on V ⊥σ ).

A detailed proof of the above lemma may be found in the paper by Auffinger, Ben-Arous, Cerny [AAC13], derived using the differential geometric framework of [AT07]. Tokeep the discussion self-contained, we will sketch the proof of this result.

32

Proof. We will consider the special case B = R, i.e. computing the expectation of the totalnumber of critical points Cn = Cn(R). The generalization to other sets B uses the sameideas.

We note that σ is a critical point if and only if P⊥σ∇H(σ) = 0. For ε > 0, defineηε : Rn × Sn−1 → R,

ηε(v,σ) =

1

ωn−1εn−1 if ‖P⊥σ v‖2 ≤ ε,0 o.w.,

where ωn is the volume of the unit ball in Rn. We will need the following lemma.

Lemma 1.5.2. If all the critical points are non-degenerate, then limε→0

Cn,ε = Cn.

We delay the proof of Lemma 1.5.2 and complete the sketch. We have,

E[Cn] = E[

limε→0

∫Sn−1

D(σ)ηε(∇H(σ);σ)dσ]

(a)=

∫Sn−1

limε→0

E[D(σ)ηε(∇H(σ);σ)]dσ

=

∫Sn−1

limε→0

E[D(σ)|‖P⊥σ∇H(σ)‖2 ≤ ε]1

ωn−1εn−1P[‖P⊥σ∇H(σ)‖2 ≤ ε]dσ

=

∫Sn−1

E[D(σ)|P⊥σ∇H(σ) = 0]φσ(0)dσ.

Note that the inversion of limit and expectations in step (a) requires further justification,for which we defer to [AT07, AAC13].

We next sketch the proof of Lemma 1.5.2.

Proof sketch of Lemma 1.5.2. Note that ‖P⊥σ∇h(σ)‖2 = 0 if and only if σ is a criticalpoint. Further, in this case, it is possible to check that there are only finitely many criticalpoints almost surely. We will refer to these critical points as σ1, · · · ,σm. We define

Ωε = σ ∈ Sn−1 : ‖P⊥σ∇h(σ)‖2 ≤ ε.

For all ε > 0 small enough, Ωε = ∪mi=1Ω(i)ε where σi ∈ Ω

(i)ε and Ω

(i)ε ⊆ B(σi, δ(ε)) where

B(x0, r) is the ball of radius r around x0 and δ(ε) ↓ 0 as ε→ 0. In this case, we have,

Cn,ε =n∑i=1

∫Ω

(i)ε

D(σ)1

ωn−1εn−1dσ

=

n∑i=1

D(σi)

ωn−1εn−1VolSn−1(Ω(i)

ε )(1 + oε(1)).

Thus it remains to compute VolSn−1(Ω(i)ε ). For ε > 0 sufficiently small, we consider the

approximate decomposition σ =√

1− ‖x‖22 σi + x, where x ∈ V ⊥σ . We set

Ω(i)ε = x ∈ V ⊥σi :

√1− ‖x‖22 σi + x ∈ Ω(i)

ε .

For ε small, we have,

VolSn−1(Ω(i)ε ) = VolV ⊥σ (Ω

(i)ε )(1 + oε(1)).

33

To compute VolV ⊥σ (Ω(i)ε ), we note that for any vector σ(x) = σi + x, we have,

P⊥σ(x) = I − σ(x)σ(x)T = P⊥σi − xσTi − σixT +O(‖x‖22).

Thus we have,

∇h(σ(x)) = ∇h(σi) +∇2h(σi)x+O(‖x‖22).

P⊥σ∇h(σ(x)) = [P⊥σi∇2h(σi)P⊥σi − 〈σi,∇h(σi)〉I]x+O(‖x‖22) := M(σi)x+O(‖x‖22).

We define E(i)ε = x ∈ V ⊥σi : ‖M(σi)x‖2 ≤ ε. The above approximate computation

suggests that

Vol(Ω(i)ε ) = Vol(E(i)

ε )(1 + oε(1)).

We note that E(i)ε = x ∈ V ⊥σi : ‖M(σi)x‖2 ≤ ε is an ellipse. Therefore, the volume

follows upon changing coordinates and computing the volume of a sphere. We concludethat

Vol(Ω(i)ε ) = Vol(E(i)

ε )(1 + oε(1)) = D(σi)ωn−1εn−1.

This concludes the proof sketch.

1.5.2 Applying the Kac-Rice formula

We next apply the Kac-Rice Formula 1.5.1 to the spherical p-spin model, obtaining thefollowing exact expression for the expected number of critical points.

Lemma 1.5.3. For any Borel set B ⊂ R, we have,

E[Cn(B)] =[(k − 1)(n− 1)

2

]n−12 2√π

Γ(n2 )E[∣∣det(W n−1 − tZIn−1)

∣∣1Z∈√2nB], (1.5.4)

where W n−1 ∼ GOE(n − 1) (in particular has independent entries above the diagonal

Wij ∼ N(0, 1/(n− 1))) is independent of Z ∼ N(0, 1), and t ≡√

k(k−1)(n−1) .

Proof. For our specific Gaussian process h(σ), recalling the definition of the symmetricGaussian tensor, we have,

h(σ) =1√2n

1

k!

∑π,i1,··· ,ik

Gπi1,··· ,ikσi1σi2 · · ·σik .

(∇h(σ))i =k√2n

1

k!

∑π,l1,··· ,lk−1

Gπi,l1,··· ,lk−1σl1 · · ·σlk−1

.

(∇2h(σ))ij =k(k − 1)√

2n

1

k!

∑π,l1,··· ,lk−2

Gπi,j,l1,··· ,lk−2σl1 · · ·σlk−2

.

Thus (h(σ),∇h(σ),∇2h(σ)) is jointly Gaussian. We note that the covariances depend onlyon the inner products and are thus invariant under rotations. Therefore, to describe the

34

marginal distribution of (h(σ),∇h(σ),∇2h(σ)) for any fixed σ ∈ Sn−1, we will assume,without loss of generality that σ = e1 = (1, 0, · · · , 0). In this case, it is easy to see that

h(σ) =1√2nG1,··· ,1

(∇h(σ))i =1√2n

[Gi,1,··· ,1 +G1,i,1,··· ,1 + · · ·+G1,··· ,1,i].

(∇2h(σ))ij =1√2n

[Gi,j,1,··· ,1 + · · ·+G1,··· ,1,i,j ].

Thus in this case, (h(σ),P⊥σ∇h(σ),P⊥σ∇2h(σ)P⊥σ ) are mutually independent and themarginal distributions may be described as follows.

h(σ)d=

1√2nZ

P⊥σ∇h(σ)d=

√k

2ngn−1,

P⊥σ∇2h(σ)P⊥σd=

√k(k − 1)(n− 1)

2nW n−1,

where Z ∼ N(0, 1), gn−1 ∼ N(0, In−1) andW n−1 ∼ GOE(n−1). As stated earlier, Z, g,Ware independent.

We can now use Lemma 1.5.1 in our setup. We have, φσ(0) =(nπk

)n−12

. Also, we

note that the integrand in Lemma 1.5.1 is independent of σ in this case. Further, weobserve that 〈σ, h(σ)〉 = kh(σ) and therefore, using independence, we are left with theunconditional expectation. This implies that

E[Cn(B)] =( nπk

)n−12 2πn/2

Γ(n2 )E[∣∣∣det

(√k(k − 1)(n− 1)

2nW n−1 −

k√2nZIn−1

)∣∣∣1 1√2nZ∈B

],

where we have plugged in An = 2πn/2

Γ(n2

) , the surface measure of Sn−1. A final simplification

of the coefficients completes the proof.

1.5.3 Asymptotics of the number of critical points

Armed with Lemma 1.5.3, we can study the asymptotic behavior of the expected numberof critical points.

Lemma 1.5.4. We have, as n→∞,

E[Cn(B)].= exp

n supx∈B

S(x),

where

S(x) ≡ Ω(x

√2k

k − 1

)− x2 +

1

2log(k − 1) +

1

2, (1.5.5)

Ω(x) =

x2

4 − 12 for |x| ≤ 2,

x2

4 − 12 −

|x|√x2−44 + log

(|x|2 +

√x2

4 − 1)

for |x| ≥ 2.(1.5.6)

35

Proof. Lemma 1.5.3, along with independence of W n−1, Z implies that,

E[|det(W n−1 − tZIn−1)|1Z∈√2nB] =

∫BE[|det

(W n−1 −

√2k

k − 1xIn−1

)|]√n

πexp (−nx2)dx.

(1.5.7)

Further, we have,

E[|det(W n−1 − xIn−1)|] = E[exp[(n− 1)

∫log(λ− x)sn−1(dλ)

], (1.5.8)

where sn = 1n

∑ni=1 δλi is the empirical spectral measure of the matrix W n. The Wigner

semi-circle law governs that as n→∞, almost surely,

snw⇒ s∞(dx) =

√4− x2

2π1|x|≤2dx.

We define

Ω(x) ≡∫

log |λ− x|s∞(dλ) . (1.5.9)

An exercise in complex analysis reveals that Ω(x) is indeed given by Eq. (1.5.6).The following Lemma establishes the leading exponential behavior of the determinant.

Lemma 1.5.5. As n→∞,

E[|det(W n−1 − xIn−1)|] = exp (nΩ(x) + o(n)). (1.5.10)

We have, from (1.5.7) and Lemma 1.5.5, using Laplace method,

E[|det(W n−1 − tZIn−1)|1Z∈√2nB].= sup

x∈Bexp

[n(

Ω(x

√2k

k − 1

)− x2

)]Therefore the desired claim follows from Lemma 1.5.3.

Remark 1.5.1. Equations (1.5.5) and (1.5.6) imply that, for |x| ≥ εd ≡√

2(k−1)k the

exponential growth rate takes the form

S(x) =1

2log(k − 1)− k − 2

4(k − 1)x2 − 1

2log(k

2

)−x√x2 − ε2

d

ε2d

− log(x+√x2 − ε2

d) .

(1.5.11)

It remains to prove Lemma 1.5.5. We do not provide a complete proof and refer thereader to Auffinger, Ben-Arous, Cerny [AAC13] and Subag [Sub17] for formal arguments.The argument presented here only implies that the right-hand side of Eq. (1.5.10) is anupper bound on the desired expectation6.

Consider Eq. (1.5.8). The Wigner semi-circle law suggests that as n→∞,

E[

exp[(n− 1)

∫log(λ− z)sn−1(dλ)

]].= exp (nΩ(z) + o(n)), (1.5.12)

provided sn concentrates closely around s∞. This might be formalized using the followinglarge deviation result of empirical spectral distribution of GOE random matrices, due toBen Arous and Guionnet [AG97].

6Note that in many circumstances this is sufficient since E[Cn(B)] is anyway an upper bound on thetypical number of critical points.

36

Theorem 2 ([AG97]). For any two probability measures µ, ν, define the metric

d(µ, ν) = sup∫

f d(µ− ν)(x) : f 1− Lipschitz, ‖f‖∞ ≤ 1.

Then the sequence sn satisfies a large deviation principle (LDP) with respect to this topol-ogy. In particular, for any ε > 0, there exists a constant c(ε) > 0 such that

P[d(sn, s∞) > ε] = exp (−c(ε)n2 + o(n2)).

This theorem implies that deviations of averages∫f(λ)sn(dλ) from the asymptotic

value∫f(λ)s∞(dλ) are exponentially small in n2. It is then easy to deduce the following

lemma.

Lemma 1.5.6. For any ψ : R→ R such that 0 < ε ≤ ψ(x) ≤M <∞ for all x ∈ R,

limn→∞

1

nlogE

[∏ψ(λi)

]=

∫ψ(λ)s∞(dλ).

We note that, in order to prove Lemma 1.5.5, we would need to apply the last lemmato ψ(λ) = |λ − z| which is unbounded and not bounded away from 0. Hence the resultdoes not directly follow by an application of Lemma 1.5.5.

Nevertheless, Lemma 1.5.5 has a number of direct consequences that are nearly asuseful, and we outline here:

• A better upper bound on the typical number of critical points is obtained by com-puting a truncated first moment E[Cn(B)1G1 ] where Gn is a high-probability event(i.e. limn→∞ P(Gn) = 1).

Fixing η > 0, we can let Gn ≡ maxi≤n |λi(W n)| ≤ 2 + η which holds with high-probability by the Bai-Yin law. As a consequence, it is sufficient to have 0 < ε ≤ψ(x) ≤M <∞ for x ∈ [−2, 2], which holds as soon as z 6∈ [−2, 2].

• Using ψε(λ) = |λ − z| ∨ ε, and taking ε ↓ 0 at after n → ∞ yields an upper boundfor all z.

Auffinger, Ben-Arous, Cerny [AAC13] carry out a more general analysis for criticalpoints with fixed indices (recall that the index of a critical point is the number of negativeeigenvalues of the Hessian at that point). Their results can be roughly summarized asfollows.

Theorem 3 ([AAC13]). If Cn,k(B) denotes the number of critical points of h(σ) in B withindex n− k (i.e. k positive eigendirections), then

E[Cn,k(B)].= sup

x∈Bexp (nSk(x) + o(n)).

for some appropriate complexity function Sk. Further:

1. S0(x) = S(x) for x ≥ εd and S0(x) = −∞ for x < εd.

2. For any fixed k ≥ 1, Sk(x) < S0(x) for x > εd, Sk(εd) = S0(εd), and Sk(x) = −∞for x < εd.

37

In other words, the last result implies that local maxima dominate the total count ofcritical points with energy h(σ) > εd. For h(σ) < εd it turns out instead that the typicalindex of critical points is linear in n: the landscape is dominated by saddles with a largenumber of positive eigendirections.

The above analysis has immediate implications on the ground state energy. Define

ε∗ ≡ supx ∈ R : S(x) ≥ 0

. (1.5.13)

Corollary 1.5.7. The following holds almost surely

limn→∞

maxσ∈Sn−1

h(σ) ≤ ε∗ . (1.5.14)

Of course we have ε∗ > εd strictly, and therefore there is a large interval of energysuch that the energy landscape is dominated by local maxima. This is believed to have adirect impact on algorithms: both sampling and search algorithms are expected to fail atfinding configurations with h(σ) ∈ (εd, ε∗], and in fact we have rigorous evidence for thisexpectation.

Subag [Sub17] carries out a related second moment argument to establish that thetypical behavior of local maxima is governed by that of its expectation in an interval ofenergies. In particular, this result implies that Eq. (1.5.14) holds with equality. (The lastconclusion also follows from rigorous proofs of Parisi formula.)

1.6 Back to the replica method

We now go back to the non-rigorous replica method, and explore some connections to therigorous analysis of the last section. We use the replica method to zoom into the lowtemperature regime T → 0. Recall the 1 RSB free energy functional

Ψ1RSB(q1,m) =β2

4[1− (1−m)qk1 ] +

1

2mlog(1− (1−m)q1)− 1−m

2mlog(1− q1).

The 1-RSB free energy is given by Ψ1RSB(q1,∗(T,m∗(T )),m∗(T )), for any temperature T .For any m, the corresponding minimizer q1,∗(T,m) satisfies the fixed point equation

q1,∗(T,m)k−2(1− q1,∗(T,m))(1− (1−m∗)q1,∗(T,m)) =2

kT 2. (1.6.1)

We assume m = µT as T → 0 for some constant µ, and setting q1,∗(T,m) = 1−z∗T +o(T ),Taylor expansion of (1.6.1) around 1 yields that z∗ must satisfy the equation

z∗(µ+ z∗) =2

k=⇒ z∗ := z∗(µ) =

√2

k+µ2

4− µ

2.

Plugging in these values, we have that

limβ→∞

1

βΨ1RSB(q1,∗(T,m),m) = e1RSB(µ),

e1RSB(µ) =1

4[µ+ kz∗(µ)] +

1

2µlog(

1 +µ

z∗(µ)

).

38

We set e∗ = minµ e1RSB(µ). e∗ is interpreted as the limiting ground state energy, that is

e∗ = limn→∞

E[maxσ

h(σ)].

Here we discover the first connection to the rigorous study of the last section. We obtainthat e∗ = infx ≥ 0 : S(x) ≤ 0. This suggests that for any δ > 0 there are exponentiallymany critical points with energy at least maxσ h(σ)− δ. Further, we define φ(µ) = µe(µ)and evaluate the Fenchel-Legendre transform

Σ(ε) = supµφ(µ)− εµ.

To find the dual Σ(ε), we define µ∗ = argmin φ(µ) and set

ε =1

2µ+

1

4kz∗(µ

∗) +1

2

1

[µ∗ + z∗(µ∗)].

Here we run into the second surprising fact— the Fenchel- Legendre dual Σ(ε) turns outto be in fact exactly equal to the complexity function S(ε) derived in the last section !

This surprise was explained within the framework of the replica method by Remi Monas-son, and this argument is therefore usually referred to as Monasson’s argument. We sketchthis argument to explore the connections between the complexity and the free energy. Westart with the partition function

Zn =

∫Sn−1

exp [nβH(σ)]dν0(σ).

Recall the intuitive picture of the random measure µβ. We believe that Sn−1 = ∪Nα=1Γ(α)

may be decomposed into “pure” states and the measure decomposes naturally into a convex

combination of distributions over the pure states. Thus we have, µβ(dσ) =∑N

α=1w(α)µ

(α)β (dσ),

where µ(α)β (·) ∝ µβ(·)1Γ(α) is the restriction of the measure onto the pure state. We define

Z(α)n =

∫Γ(α)

exp [nβH(σ)]dν0(σ).

Thus the weight of a pure state w(α) = Z(α)nZ . We assume that Z(α) .

= exp [nf (α)] and let

N =∑

α δfα denote the counting measure corresponding to the limiting free energy of thepure states. Thus it suggests that

Zn.= exp[n sup

f :Σβ(f)≥0(f + Σβ(f))],

where Σβ(f) is the cluster complexity functional, which captures the number of pure stateswith free energy f . This captures the energy-entropy tradeoff in determining the freeenergy. Now, consider the free energy of a coupled system with m “real” replicas.

Zn(m) =

∫(Sn−1)m

exp[nβ

m∑a=1

H(σa) + nδg(σ1, · · · ,σm)] m∏i=1

dν0(σi).

Here g(σ1, · · · ,σm) is any function, for example, g(σ1, · · · ,σm) =∑

1≤a<b≤m〈σa,σb〉,which favors configurations that align together, while δ > 0 is a small perturbation. We

39

note that if we let δ → 0 for any fixed n, we obtain Zn(m) = (Zn)m and the systemde-couples into m independent systems. However, if δ > 0 is fixed while n→∞, we expectthat the replicas will “condense” on the same pure state. The perturbation with g shouldbe looked upon as a device to select one of the pure states. This is analogous to the choiceof a pure state in the Curie-Weiss model by introducing an external magnetic field. Thusif we then send δ → 0, we expect, (since the system is 1-RSB, once the pure state has beenselected, everything is “uniform”)

Zn(m).=

N∑α=1

(exp [nf (α)])m.

Thus we expect that as n→∞,

Zn(m).= exp

[n sup

f(mf + Σβ(f))

].

We compute the asymptotics of Zn(m) using the replica method as earlier. For r ≥ 1 amultiple of m, we obtain,

E[(Zn(m))r/m].= exp [nS(Q)],

where S(·) is the same action functional derived in (1.2.16). We evaluate the function onthe 1-RSB saddle point Q1−RSB. Finally, as before, we send r → 0 to obtain

limr→0

m1

n(r/m)logE[(Zn(m))r/m] = mΨ1RSB(m).

Now, this calculation suggests something remarkable! We derive, as a result, that form ∈ [0, 1],

mΨ1RSB(m) = supf

(mf + Σβ(f)). (1.6.2)

Thus the complexity functional is related to the 1-RSB free energy functional throughLegendre-Fenchel duality. Formally, we can therefore invert the free energy functionalΨ1RSB functional to obtain the complexity function Σβ(f). This explains the relationobserved between the ground state energy functional e1RSB and the complexity functionS(·) at zero temperature.

We continue to explore the connections between the free-energy, complexity and theRS to 1-RSB phase transition. We recall that intuitively, we expect

Zn.= exp [n max

f :Σβ(f)≥0(f + Σβ(f))].

When the maximum occurs at f = f∗, we intuitively believe that the free energy is domi-nated by exp[nΣβ(f∗)] pure states, each having energy f∗. We invert (1.6.2) to express Σβ

as a function of m and obtain the stationary conditions

Σβ = mΨ1RSB(m)−mf,

f =d

dm

[mΨ1RSB(m)

],

40

where implicitly, we set Σβ and f to be functions of m. Consider first the unconstrainedproblem maxf (f+Σβ(f)). The stationary condition in this case is Σ′β(f) = −1. Further, it

is easy to see that if Σβ and f are expressed as functions of m, thendΣβdf = −m. Therefore,

the unconstrained maximum corresponds to m = 1. In this case,

f =d

dm

[mΨ1RSB(m)

]∣∣∣m=1

= Ψ1RSB(1) + Ψ′1RSB(1), Σβ = −Ψ′1RSB(1).

We note that this implies that the unconstrained maximizer is a feasible solution to theproblem only if Ψ′1RSB(1) ≤ 0. Recall the static and dynamic critical temperatures Tsand Td := Td(1) defined earlier. Further, recall from Fig 1.5 the behavior of the functionΨ1RSB(m) as a function of m.

For Ts < T < Td, Ψ′1RSB(1) ≤ 0, and thus the unrestricted maximum is feasible. Inthis case, we set m = 1 and the free energy is the RS free energy. However, the complexityΣβ > 0. This validates the multiple pure states picture we had introduced earlier. Indeed,in this case, we expect to have exponentially many pure states, each corresponding to theenergy f∗ = f(1). In case T < Ts, the complexity Ψ′1RSB(1) > 0, and thus the unrestrictedmaximum is infeasible. At this point, recall m∗ = m∗(T ), the minimizer of m 7→ Ψ1RSB(m).In this case, using the stationary conditions obtained above, we have, Σβ = −m2Ψ′1RSB(m)and therefore, Σβ ≥ 0 if and only if Ψ′1RSB(m) ≤ 0, or equivalently, if m ≤ m∗. Further,d

df (f + Σ(f)) = (1−m) ≥ 0 and thus the optimal solution in this case is to chose m = m∗.For m = m∗, Σβ = 0 and f = Ψ1RSB(m∗). In this case, we expect that the free energy isgoverned by O(1) pure states, each corresponding to the energy fs := Ψ1RSB(m∗).

Further, physicists also predict the structure of the “extremal process” of the pure stateswith f = fs + c

n . For such states, using Taylor expansion around f = fs, physicists predictthat exp[nΣβ(f)] ≈ exp[−nm∗(f−fs)] and thus expect that the extremal process, suitablyscaled, should converge to a Poisson point process with intensity exp[−m∗x]. Such purestates are expected to have weight wα = exp [xα]/

∑α exp [xα], which is usually referred

to as the Poisson-Dirichlet distribution with parameter m∗. This provides a very detaileddescription of the dominant states and their complexities at all temperatures. As a finalsanity check, it is not hard to see that if states are distributed with weights given byPoisson-Dirichlet(m∗), then marginal probability of two iid draws to be from the samecluster is 1−m∗, which corroborates with our earlier interpretation of this parameter.

This completes our analysis of the p-spin model. We take up the Sherrington-Kirkpatrickmodel of spin glasses in the next chapter.

Bibliographic notes

We refer to [CT91] for a general definition of mutual information, and its properties.The replica method was first applied to the spherical p-spin model (corresponding to the

λ = 0 case of the model treated here) by Crisanti and Sommers [CS92]. The same authorsalso studied the energy landscape in [CS95], and the Langevin dynamics in [CHS93].

The extension to λ > 0 is treated in [GS00]. The spherical SK model (correspondingto k = 2 and λ = 0) was first studied by [KTJ76] in physics.

The celebrated Baik-Ben Arous-Peche phase transition [BBAP05] was first discoveredby Hoyle and Rattray using the replica method in [HR03, HR04].

For a rigorous treatment of the spherical model, see [Tal06a] which covers the case ofeven k, and [Che13]. which solves the general case. These papers do indeed consider themore general ‘mixed p-spin model’.

41

In the context of (1.1.1), the hypothesis testing problem H0 : λ = 0 vs. H1 : λ > 0has been examined extensively in the recent literature, under different assumptions on thespike v0. In the case of the uniform spherical prior on v0, [MR14, MRZ16] establishedthe existence of λ− < λ+ such that for λ > λ+, it is possible to detect the signal withvanishing errors as n → ∞. On the other hand, for λ < λ−, the Total Variation distancebetween these measures converges to zero. [PWB20] analyzed this problem for spherical, iidRademacher and sparse Rademacher prior on the spike v0, and derived analogous constantsλ−, λ+ for each prior. Along the way, they improved the estimates in [MR14, MRZ16]for the spherical prior. [LML+17] characterized the limiting mutual information betweenthe tensor Y and the latent vector v0 for any product prior on v0, and derived a sharpconstant λp such that consistent detection (i.e. with vanishing Type I and Type II errors)is possible whenever λ > λp. Finally, [Che19] established that this is the sharp detectionthreshold for this problem, in that the null and alternative distributions are asymptoticallyindistinguishable for any λ < λp.

1.7 Some omitted technical calculations

1.7.1 Proof of Lemma 1.1.2

Proof. Recall that

I(v0;Y ) = E

log(dµBayes

dν0(v0)

)= E

[nλk!〈Y ,v0

⊗k〉]− nΦn,Bayes(λ).

The first identity follows upon dividing both sides by n and noting that E[〈Y ,v0⊗k〉] = λ.

Therefore,

1

n

∂λI(v0;Y ) =

k!− ∂Φn,Bayes

∂λ(λ).

Recall from (1.1.34) and (1.1.1) that

Φn,Bayes(λ) =1

nE log

∫Sn−1

expnλk!〈Y ,σ⊗k〉

ν0(dσ)

=1

nE log

∫Sn−1

expnλ2

k!〈v0,σ〉k +

k!〈W ,σ⊗k〉

ν0(dσ).

Differentiating in λ, we obtain,

∂Φn,Bayes

∂λ(λ) = E

∫Sn−1

k!〈v0,σ〉k +

1

k!〈W ,σ⊗k〉

µBayes(dσ). (1.7.1)

Using Gaussian integration by parts and Bayes rule, we have,

E∫Sn−1

〈W ,σ⊗k〉µBayes(dσ) = λ(1− E〈XBayes(Y ),X0〉).

Plugging this back, we obtain

1

n

∂λI(v0;Y ) =

λ

k!(1− E〈XBayes(Y ),X0〉) =

λ

2(k!)E‖XBayes(Y )−X0‖2F

. (1.7.2)

This completes the proof.

42

1.7.2 Proof of Eq. (1.2.11)

We use the following change-of-measure trick.

Lemma 1.7.1. Let (Ω,F) be a measurable space and consider two sigma-finite measure µ,ν on it. Let X : Ω→ Rd be measurable and assume it has densty fµ(x), fν(x) with respectto Lebesgue measure, when (Ω,F) is endowed with µ or ν. Assume that ν is absolutelycontinuous with respect to µ, with Radon-Nikodym derivative which depends on ω onlythrough X(ω), and write (with an abuse of notation) dν

dµ(X(ω)) for this derivative. Then

fµ(x) = fν(x)dν

dµ(x) . (1.7.3)

Applying this to µ = γΛ, ν = γI , X = Q, we obtain

GΛ(Q) = det(Λ)n/2 e−n(〈Λ,Q〉−〈I,Q〉)/2GI(Q) . (1.7.4)

Substituting in Eq. (1.2.11), and recalling Qii = 1, we conclude that it is sufficient to provethe clam for Λ = I, namely

fn,r(Q) =GI(Q)

gn(1)r+1. (1.7.5)

This follows from the fact that the distributin of σa ∼ N(0, In/n), conditioned on ‖σa‖2 =1 (corresponding to the density on the right hand side) coincides with the uniform measureover σa ∈ Sn−1.

Exercises

Exercise 1.1. Prove Eqs. (1.1.27), (1.1.28).

Exercise 1.2. Assume that for each ` ∈ 1, . . . , k, we observe a tensor Y ` ∈ (Rn)⊗`,given by

Y ` = λ` v0⊗` +W ` (1.7.6)

where W ` ∈ (Rn)⊗` is a random Gaussian tensor distributed as in Section 1.1.1 (with `replacing k), and v0 is a unit vector. We use Y = (Y 1, . . . ,Y k) to denote collectively allthe observations, W = (W 1, . . . ,W k) to denote all the noise tensors, and λ = (λ1, . . . , λk)for all the model parameters.

(a) Write the maximum likelihood estimator for v0. Assuming a uniform prior for v0 ∈Sn−1, derive the Bayesian posterior distribution of v0 given Y .

(b) Introducing additional parameters β, h define an analogue of the random probabilitymeasure (1.1.9) for this case. In particular, it should include the Bayes measure andthe maximum likelihood problem as special cases.

(c) Derive the replica-symmetric expression for the free energy density in this model.

43

Exercise 1.3 (Concentration of the free energy). Consider the log-partition function as afunction of the noise G (where we use the construction (1.1.2)), namely, for fixed β, λ, h,let

Zn(G) =

∫Sn−1

exp nβλ√

2(k!)〈v0,σ〉k + β

√n

2〈G,σ⊗k〉+ nh〈v0,σ〉

ν0(dσ) , (1.7.7)

Fn(G) ≡ logZn(G) . (1.7.8)

Use Gaussian concentration (Theorem 14) to prove that, for any t ≥ 0

P(∣∣ logZn − E logZn

∣∣ ≥ t) ≤ 2 e−t2/(nβ2) . (1.7.9)

Exercise 1.4 (Second moment method). The second moment method is a simple rigoroustechnique that uses the first two moments of the partition function to characterize theasymptotic free energy density.

(a) Let Zn ≥ 0 be a sequence of random variables. Prove that, if

limn→∞

1

nlog

EZ2n

EZn2= 0 , (1.7.10)

limn→∞

1

nlogEZn = φ , (1.7.11)

then for any ε > 0, we have

P(e−εnEZn ≤ Zn ≤ eεnEZn

)≥ e−o(n) . (1.7.12)

[Hint: Use Markov and Paley-Ziegmund inequalities.]

(b) Let Zn be the partition function of the spherical model (1.1.9), for h = 0. UseGaussian concentration proved in the previous exercise, cf. Eq. (1.7.9) to prove that,under assumptions (1.7.10), (1.7.11), we have n−1 logZn → φ in probability.

(c) Use Eq. (1.2.15) to compute the limit on the left-hand side of (1.7.10), again for Znthe partition function of the spherical model (1.1.9), for h = 0.

(d) Following from previous point, assuming k ≥ 2, prove that there exists β0, λ0 > 0such that, if β ≤ β0, λ ≤ λ0, Eq. (1.7.10) holds.

(e) Show that there exist values of β, λ for which condition (1.7.10) does not hold.

[We do not require a completely rigorous proof for this point. It is acceptable toevaluate numerically the expression for the limit derived at point (c).]

Exercise 1.5. Consider the spherical model with k = 2 and λ = 0. In other words, weare interested in the partition function

Zn(β;W ) =

∫Sn−1

enβ〈σ,Wσ〉/2ν0(dσ) , (1.7.13)

where W = W T is a GOE(n) matrix, i.e. a matrix with entries (Wij)i≤j independent, withWij ∼ N(0, 1/n) for i < j and Wii ∼ N(0, 2/n). The objective of this exercise is to obtaina rigorous proof of the free energy formula, for β < 1.

It is useful to recall the following known facts about the eigenvalues ofW , to be denotedby λ1(W ) ≥ λ2(W ) ≥ · · · ≥ λn(W ).

44

Theorem 4. Let, for each n ∈ N, W n ∼ GOE(n). Then we have, almost surely,limn→∞ λ1(W n) = 2, limn→∞ λn(W n) = −2. Further, denoting by sn = n−1

∑ni=1 δλi(W )

the empirical spectral distribution , we have snw⇒ s∞ almost surely, where

w⇒ denotes weakconvergence and s∞ is the non-random probability distribution on R defined by

s∞(dx) =1

√4− x2 1|x|≤2 dx . (1.7.14)

(a) For M ∈ Rn×n a symmetric positive definite matrix, let fM ( · ) denote the density of‖x‖2 when x ∼ N(0, (nM)−1). Prove the following identity, valid for all z > λ1(W ),

Zn(β;W ) =(β−1eβz−1)n/2

det(zI −W )1/2

[fβ(zI−W )(1)

fI(1)

]. (1.7.15)

(b) Prove that this implies, for β < 1, almost surely

limn→∞

1

nlogZn(β;W ) = φ(β) ≡ −1

2log β +

1

2βz − 1

2− 1

2

∫log(z − λ) s∞(dλ) ,

(1.7.16)

where z > 2 solves the equation

β =

∫1

z − λ s∞(dλ) . (1.7.17)

Use the above expression for s∞ to compute explicitly the integrals in the last twodisplays.

(c) Compare the result for φ(β) obtained at the previous point, with the replica sym-metric formula in Section 1.2.4.

Exercise 1.6. Consider the one-step replica symmetry breaking expression for the freeenergy density, given in Section 1.4, and assume λ = 0, whence b = 0. Further con-sider stationary points with q0 = 0, and denote by Ψ1RSB(q1,m;β) = Ψ1RSB(b = 0, q0 =0, q1,m;β, λ = 0) the corresponding free energy functional.

(a) Write a program to find the correct stationary point q1,∗, m∗ for given β, k.

(b) Plot q1,∗ = q1,∗(β), m∗ = m∗(β) and the resulting free energy density Ψ1RSB(q1,∗(β),m∗(β)),as a function of T = 1/β for k ∈ 3, 4, 5.

(c) Compute numerically the critical temperature Ts for k ∈ 3, 4, . . . , 10.Exercise 1.7. As in the previous exercise, assume λ = b = q0 = 0, and denote byΨ1RSB(q1,m;β) = Ψ1RSB(b = 0, q0 = 0, q1,m;β, λ = 0) the corresponding free energyfunctional. We will consider the limit β →∞ (equivalently, T = 1/β → 0).

(a) Assuming m = µT for some µ ∈ R>0 as T → 0, and q1,∗(β,m) = 1− c(µ)T + o(T ),compute c(µ).

(b) Substituting in the free energy functional, show that

Ψ1RSB(q1,∗(β,m = µT ),m = µT ;β) = β e1RSB(µ) + o(β) , (1.7.18)

and compute the function e1RSB(µ).

45

(c) What is the meaning of e∗1RSB = minµ>0 e1RSB(µ)? Compute numerically this quan-tity for k ∈ 3, 4, . . . , 10.

Exercise 1.8. Consider the spherical model with h = λ = 0, cf. Eq. (1.1.9), with k = 3.

(a) Choose two temperatures T1, T2 with T1 ∈ (0, Ts) and T2 ∈ (Ts, Td). For eachof these temperatures, minimize the 1RSB expression of free energy Ψ1RSB(q1,m),cf. Eq. (1.4.11) to compute the thermodynamic order parameter q1,∗(m), and thecorresponding free energy Ψ(m) = Ψ1RSB(q1,∗(m),m). Plot q1,∗(m) and Ψ(m) as afunction of m.

(b) Recall that the internal energy of pure states corresponding to parameter m is givenby f(m) = d

dm [mΨ(m)]. Compute an analytic expression for this function by takingthe derivative of mΨ1RSB(q1,∗(m),m). Evaluate this expression numerically and plotf(m) as a function of m.

(c) By taking the Legendre-Fenchel dual of the function mΨ(m), compute the complexityfunction Σ(f) and plot it as a function of f , for each T ∈ T1, T2.[Concretely, you can obtain a plot of Σ(f) by a parametric plot of mΦ(m)−mf(m)versus f(m).]

(d) Denote by [fd, fs] the interval of free energies over which Σ(f) ≥ 0 (with fd corre-sponding to threshold states and fs to states with the largest free energy). Computefs and fd for T ∈ T1, T2. Check that Σ′(fs) > −1 for T = T1 and Σ′(fs) < −1 forT = T2. Compute also f∗, i.e. the value that maximizes f + Σ(f) over f ∈ [fs, fd].

How can you compute fs, fd, f∗ directly from Ψ(m)?

(e) Use the answer to the last question at the previous point to compute fs(T ), fd(T ),f∗(T ) on a grid of values of T and plot them together as a function of T . Identify Ts,Td on this plot.

(f) This question is optional. Recall that the picture behind 1RSB is that the measureµ decomposes in pure states

µ(dσ) =

N∑α=1

w(α)µ(α)(dσ) . (1.7.19)

The internal energy of state α can be defined by

u(α) =1√2n

∫Sn−1

〈G,σ⊗k〉µ(α)(dσ) . (1.7.20)

Can you use the replica method to compute the typical internal energy of states withinternal free energy f ∈ [fs, fd]. Denote this by u(f).

Use this expression for the temperatures T1, T2 considered above and plot u(f) asa function of f . Report the values of u(fs) and u(fd).

46

Chapter 2

The Sherrington-Kirkpatrickmodel and Z2 synchronization

2.1 Introduction

In this chapter, we study the Sherrington-Kirkpatrick (SK) model of spin glasses. Thisclassical model was introduced by David Sherrington and Scott Kirkpatrick in 1975 [SK75]as a simplified model for spin glasses, which are a class of magnetic alloys. Sherrington andKirkpatrick believed that their model was amenable to more straightforward mathematicalanalysis than the more realistic lattice model of Edwards and Anderson [EA75]. However, ittook several years until Giorgio Parisi came up with a heuristic derivation of the asymptoticfree energy density, introducing the idea of replica symmetry breaking [Par79a], and threedecades before Michel Talagrand proved Parisi’s formulas [Tal06b]. Efforts to analyze theSK model have enriched statistical physics and probability since its introduction, and stillmany interesting questions remain open.

As motivation for studying the SK model, we will use a statistical estimation problem,group synchronization, which we introduce in Section 2.1.1. As further motivation, wediscuss the connection with extremal cuts of random graphs in Section 2.1.2.

2.1.1 The group synchronization problem

An instance of the group synchronization problem is defined by a finite graph G = (V,E)and a group G. Without loss of generality, we assume V = [n] and assign a directionarbitrarily to each edge i, j ∈ E.

Let x0 = (x0,1, · · · , x0,n) ∈ GV denote an unknown assignment of group elements tothe vertices of the graph. We observe Yij : (i, j) ∈ E, where Yij is a noisy version ofx−1

0,ix0,j , and would like to estimate the unknown object x0. Of course this is possible onlyup to a global shift by a group element, since x′0 = (gx0,1, · · · , gx0,n), g ∈ G yields thesame observations as x0.

A special case of practical interest is G = SO(3), which arises in imaging applications,such as structure from motion [OVBS17]. In this case, the main question is to reconstructa 3D image of an object from numerous 2D images taken by multiple cameras placed inunknown positions. The first step is to determine the relative positions of the cameras,which leads to the synchronization problem over SO(3).

Arguably, the simplest version of this problem arises when G = Z2 = ±1, · andG = Kn, where Kn is the complete graph on n vertices. Given an unknown signal x0 =

47

(x01, · · · , x0n) ∈ ±1n we observe, for i < j,

Yij =λ

nx0ix0j +Wij , (2.1.1)

where Wij ∼ N(0, 1/n) are independent random variables and λ ≥ 0. We can form amatrix Y ∈ Rn×n by letting Yji = Yij and assigning Yii arbitrarily. A convenient choice isto take Yii = (λ/n) +Wii with Wii ∼ N(0, 2/n), whence the above model can be rewrittenin matrix notation as

Y =λ

nx0x

T0 +W , W ∼ GOE(n) . (2.1.2)

Of course, this is in no way the most general form of the Z2 synchronization problem. Inparticular, we could consider more general noise models defined by a transition probabilitykernel Q (a ‘noisy channel’), which in this case is uniquely defined by two probabilitymeasures Q(·|+ 1) and Q(·| − 1) on a measurable space X . In this more general setup, foreach edge (i, j) we observe

Yij ∼ Q( · |x0ix0j) , (2.1.3)

and these observations are conditionally independent given x0.We notice in passing that —if the transition kernel Q is known— there is no loss of

generality in assuming the observations to be real-valued, i.e. Yij ∈ X = R. Indeed,without loss of generality Q( · | + 1), Q( · | − 1) Q0 for some measure Q0 on X (takeQ0( · ) = Q( · |+ 1) +Q( · | − 1)). The likelihood of the observations is

L(Y |x0) =∏i<j

(dQ(Yij |+ 1)

dQ(Yij | − 1)

)x0ix0j2

√dQ(Yij |+ 1)

dQ0· dQ(Yij | − 1)

dQ0.

where we wrotedQ(Yij |+1)dQ(Yij |−1) for the ratio

dQ(Yij |+1)dQ0

/dQ(Yij |−1)

dQ0. Thus, by the Fisher-Neyman

factorization theorem [LC06], the variables

Yij = log(dQ(Yij |+ 1)

dQ(Yij | − 1)

), i < j ≤ n (2.1.4)

are sufficient statistics for x0 given Y . We can thus assume without loss of generality thatour observations are real valued.

We return to the study of the model (2.1.1), which observes the signal with additiveGaussian noise. For maximum likelihood estimation, we solve

maximize 〈Y ,σσT〉 ,subject to σ ∈ ±1n.

(2.1.5)

To carry out Bayesian inference, we assign a prior on x0 which is uniform on +1,−1n,and form the posterior distribution

µBayes(σ) =1

ZBayes(λ)exp

− n

4

∥∥∥Y − λ

nσσT

∥∥∥2

F

(2.1.6)

=1

ZBayes(λ)exp

λ2〈Y ,σσT〉

. (2.1.7)

48

As in the previous chapter, this model enjoys a large invariance group. Namely, it isequivariant under multiplications x0 7→ Dx0 with D a diagonal matrix with +1,−1entries on the diagonal. The uniform prior is the only invariant prior under this group andtherefore, by the Hunt-Stein’s theorem [LC06], the Bayes estimator with respect to thisprior is also minimax optimal.

As for the spiked tensor model, to study these estimation procedures in one unifiedframework, we introduce the following probability measure over σ ∈ +1,−1n:

µβ(σ) =1

Zn(β, h)exp

β2〈Y ,σσT〉+ h〈σ,x0〉

, (2.1.8)

where β > 0 is the inverse temperature and Zn(β, h) is the partition function. The caseh = 0, λ = β corresponds to Bayes estimation, while we recover maximum likelihood forh = 0, β →∞.

2.1.2 Extremal cuts of random graphs

The Sherrington-Kirkpatrick model is intimately connected to some canonical optimiza-tion problems on sparse random graphs. The most direct connection is with the max-cutproblem. Given a graph G = (V,E) max-cut seeks to partition the vertices V into twogroups (not necessarily of equal size) such that the number of edges connecting the twoparts is maximized. This problem is known to be NP-hard in the worst case, and indeedhard to approximate [PY91, ABE+05].

The analysis of this problem on random instances provides a concrete benchmark forpractical algorithms, and has directly motivated the development of novel algorithms. Twostandard random graph distributions are the Erdos-Renyi ensemble GER(n, d/n) and therandom regular graph GReg(n, d).

An Erdos-Renyi random graph G ∼ GER(n, d/n) has n vertices, and is generated byconnecting each pair of vertices i, j by an edge independently with probability d/n. Wewill consider the sparse case in which d > 0 is a constant independent of n. In this model,the degree of a randomly chosen vertex converges in distribution to Poisson(d) as n→∞.In particular, d is the average vertex degree.

On the other hand, a random regular graph G ∼ GReg(n, d) is constructed by a uniformdraw among all d-regular (simple) graphs on n vertices. (The set of such graphs is non-empty if nd is even.)

Let MaxCut(G) denote the number of edges in the max-cut of the graph G. The nextresult connects the typical value of the max-cut on Erdos-Renyi and random regular graphsto the “ground state energy” of the Sherrington-Kirkpatrick model.

We begin by defining a constant P∗ as follows:

P∗ = limn→∞

1

2nE[

maxσ∈±1n

〈σ,Wσ〉], W ∼ GOE(n) .

Note that the optimization problem on the right-hand side corresponds to finding the modeof the distribution (2.1.8) in the special case λ = h = 0.

The existence of the constant P∗ follows directly from the analyses presented in thesubsequent sections, and a variational characterization of this constant is given by theParisi formula (see Section 2.7 and references provided there).

49

Theorem 5 ([DMS17]). Let Gn be distributed as GER(n, d/n) or GReg(n, d). Then wehave,

MaxCut(Gn)

n=d

4+ P∗

√d

4+ od(

√d). (2.1.9)

(We say that a sequence of random variables satisfies Xn = od(√d) if there exists a

deterministic function f(d) = o(√d) such that P[|Xn| ≤ f(d)]→ 1 as n→∞.)

Before providing a heuristic argument to justify Theorem 5, let us interpret this result.Given any graph G, a naive randomized strategy is to assign the vertices into two groupsuniformly at random. This strategy cuts about half of the edges in expectation. AsE[|E|] = nd/2 + on(n), the expected cut size for this strategy is nd/4 + on(n), and theactual cut size is tightly concentrated around the expectation1. We thus obtain the lowerbound

MaxCut(Gn)

n≥ d

4+ on(1) . (2.1.10)

which explains the d/4 term appearing in (2.1.9). The√d-term captures the effect of

optimization. The theorem states that—to leading order—optimizing the cut size in arandom graph is equivalent (from the point of view of the value achieved) to optimizingthe random quadratic form 〈σ,Wσ〉 over the hypercube σ ∈ ±1n (this coincides withoptimizing the SK Hamiltonian).

Next, we provide a heuristic justification for this correspondence. We assume, to bedefinite, Gn ∼ GER(n, d/n). Let A = (Aij)i,j≤n denote the adjacency matrix of Gn;formally, Aij = 1(i, j ∈ E). We have

P(Aij = 1) = 1− P(Aij = 0) =d

n.

Observe that we can encode a vertex partition via a vector σ ∈ ±1n: the verticeswith σi = +1 form one group, while vertices with σi = −1 form the complementary group.Let cut(σ) denote the number of cut-edges in this partition, i.e. edges i, j ∈ E withσiσj = −1. We observe that

cut(σ) =1

2

∑i<j

Aij(1− σiσj) . (2.1.11)

This implies

MaxCut(Gn)

n=

1

2nmaxσ∈±1

∑i<j

Aij(1− σiσj)

= maxσ∈±1

[ d

2n2

(n2

2−(∑

i

σi

)2)+

1

2n

∑i<j

(Aij −

d

n

)(1− σiσj)

]=√d · max

σ∈±1

[√d2n2

(n2

2−(∑

i

σi

)2)− 1

2n√d

∑i<j

(Aij −

d

n

)σiσj

]+ on(1),

(2.1.12)

1This can be proven by computing the variance, or using a bounded difference martingale argument.

50

where the last display follows from Chebychev inequality. Indeed, for any ε > 0,

P[∣∣∣∑

i<j

(Aij −

d

n

)∣∣∣ > εn]≤

E[(∑

i<j

(Aij − d

n

))2]ε2n2

≤ d

2ε2n→ 0.

Returning to Eq (2.1.12), we make two observations: (i) The first term in the optimizationproblem is deterministic, while the second term is random; (ii) The random variablesAij− d

n : i < j are independent, with mean zero and E[(Aij− dn)2] = d

n +on(1). Theorem5 is proven by establishing the following:

(i) At the cost of a od(√d) term, we can replace the Aij − d

n variables by independentGaussians with mean zero and matching variances. Formally, one can show that

(recall thatd= denotes equality in distribution):

MaxCut(Gn)

n

d= maxσ∈±1

[ d

2n2

(n2

2−(∑

i

σi

)2)−√d

2n

∑i<j

Wijσiσj

]+ od(

√d).

(2.1.13)

(ii) In (2.1.13), the Gaussian optimization term will yield a term of order√d, whereas

the first deterministic term is of order d. Thus for large d > 0, the optimum value isapproximately obtained by first optimizing the deterministic part, and subsequentlyoptimizing the random part subject to this constraint. In this specific setting, notethat the deterministic part is optimized by setting

∑i≤n σi = 0; incorporating this

constraint, we wish to solve

maximize∑i<j

Wijσiσj , (2.1.14)

subject to σ ∈ ±1n,n∑i=1

σi = 0. , (2.1.15)

(iii) A direct application of Gaussian concentration (see Appendix A) establishes that theoptimal value above concentrates around its expectation. Finally, it is relativelystraightforward to see that the constraint

∑ni=1 σi = 0 can be dropped without

changing significantly the value of this optimization problem.

This completes a sketch of the proof of Theorem 5.The techniques outlined above can be used for analyzing several related graph cut

quantities on random graphs. Recall the definition of cut(σ) from (2.1.11). For a graph Gwith adjacency matrix A = (Aij)i,j≤n, we introduce,

MaxBis(G) = maxσ∈±1n:

∑i σi=0

cut(σ),

MinBis(G) = minσ∈±1n:

∑i σi=0

cut(σ).

We refer to these as the max-bisection and min-bisection of G: they are the extremalsizes of cuts that partition the graph into equal components. The next result collects theanalogue of Theorem 5 for these quantities.

51

Theorem 6 ([DMS17]). Let Gn be distributed as GER(n, d/n) or GReg(n, d). Then wehave,

MaxBis(Gn)

n=d

4+ P∗

√d

4+ od(

√d), (2.1.16)

MinBis(Gn)

n=d

4− P∗

√d

4+ od(

√d). (2.1.17)

Remark 2.1.1. Based on the cavity method, [ZB10] conjectured that

MinBis(GReg(n, d)) + MaxCut(GReg(n, d)) = on(n).

Combining (2.1.9) and (2.1.17), we have,

MinBis(GReg(n, d)) + MaxCut(GReg(n, d)) = nod(√d).

This is significantly weaker than the claim, but provides some supporting evidence to theveracity of this conjecture.

We finally note that the connection between optimization with random instances andspin glass models is significantly deeper than what is sketched here. In particular, gener-alizations of the techniques developed in the next sections exist for random sparse graphs.These allow, to cite just one example, to derive exact (albeit non-rigorous) predictions forlimn→∞MaxCut(Gn)/n in the above graph models, at fixed d. We refer to [MM09] for anentrypoint to that literature.

2.2 The replica symmetric asymptotics

2.2.1 The replica calculation

In this section, we initiate the study of the partition function using the replica method andstudy the replica symmetric asymptotics. For the sake of simplicity we will focus on thecase h = 0, and drop the arguments of the partition function Zn = Zn(β, h = 0). Writingexplicitly the dependence of Y on x0 and the random matrix W , cf. Eq. (2.1.2), we have

Zn =∑

σ∈±1nexp

βλ2n〈x0,σ〉2 +

β

2〈W ,σσT〉

. (2.2.1)

Recall that x0 is uniformly distributed on ±1n and W = (G +GT)/√

2n, where G =(Gij)n×n is a square matrix with i.i.d. N(0, 1) entries. We will compute E[Zn(β)r] forinteger r, then send n → ∞, followed by r → 0 to evaluate the free energy in the replicamethod.

Taking expectations with respect to the randomness of x0 and W , we have,

E[Zrn] =1

2n

∑σ0,σ1,··· ,σr

expβλ

2n

r∑a=1

〈σ0,σa〉2E exp

β√2n〈G,

r∑a=1

σa(σa)T〉

=1

2n

∑σ0,··· ,σr

expβλ

2n

r∑a=1

〈σ0,σa〉2 +β2

4n

∥∥∥ r∑a=1

σa(σa)T∥∥∥2

F

=

1

2n

∑σ0,··· ,σr

expβ2rn

4+βλ

2n

r∑a=1

〈σ0,σa〉2 +β2

2n

∑1≤a<b≤r

〈σa,σb〉2.

52

We note that the summands depend on the replicas σ0, · · · ,σr only via their overlapsQab = 1

n〈σa,σb〉. We are therefore in a situation similar to the one encountered whenapplying the replica method to the tensor PCA model. We need to estimate an integral(or sum) over a high-dimensional space (in this case +1,−1nr) and the integrand onlydepends on a low-dimensional function on that space (in the present case r(r − 1)/2-dimensional). This sum can be estimated by large deviations tools.

In the present case, the calculation is simplified by the following elementary observation.For x ∈ R,

ex2/2 =

∫R

exp(− q2

2+ xq

) dq√2π.

This is often referred to as the ‘Gaussian disintegration trick’ or as the ‘Hubbard-Stratonovichtransform’ in the physics literature. An application of this trick for each overlap linearizesthe expectation and we obtain,

E[Zrn] =

∫1

2n

∑σ0,··· ,σr

expnC(Q) + β2

∑1≤a<b≤r

Qa,b〈σa,σb〉+ βλr∑

a=1

Q0,a〈σ0,σa〉DQ,

C(Q) =β2r

4− β2

2

∑1≤a<b≤r

Q2a,b −

βλ

2

r∑a=1

Q20,a,

where we use DQ to denote the measure

DQ =r∏

a=1

√βλn

2πdQ0,a

∏1≤a<b≤r

√β2n

2πdQa,b.

We define

zr(Q) =1

2

∑σ0,··· ,σr∈±1

exp[βλ

r∑a=1

Q0,aσaσ0 + β2

∑1≤a<b≤r

Qa,bσaσb].

We define the “action” functional S(Q) = C(Q) + log zr(Q) and note that, by Laplacemethod:

E[Zrn] =

∫exp

nS(Q)

DQ = exp

nmax

QS(Q) + o(n)

. (2.2.2)

Thus, using the replica method, we expect the asymptotic free energy density to begiven by

φ = limr→0

1

rS(Q∗) ,

∂S

∂Q

∣∣∣∣Q∗

= 0 , (2.2.3)

S(Q) =β2r

4− β2

2

∑1≤a<b≤r

Q2a,b −

βλ

2

r∑a=1

Q20,a + log zr(Q) . (2.2.4)

Note that we replaced maxQ S(Q) which appears in Eq. (2.2.2) with S(Q∗) where Q∗ isonly required to be a stationary point of S(Q). Indeed, as we saw in the previous chapter,in the limit r → 0, the relevant stationary point is often not a maximum.

53

We will begin by evaluating the action functional on the replica symmetric saddle point.This saddle point is again motivated by symmetry considerations, in that the action S( · )is invariant to a permutation of the indices 1, · · · , r. Thus in this case, we consider

QRSa,b =

1 if a = b

b if a = 0, b ∈ 1, · · · , rq o.w.

Notice that the diagonal elements are once again set by the condition 〈σa,σa〉/n = 1. Wewish to evaluate S(QRS). To this end, we have,

zr(QRS) =

1

2

∑σ0,··· ,σr∈±1

exp[βλb

r∑a=1

σ0σa + β2q∑

1≤a<b≤rσaσb

]

=∑

σ1··· ,σr∈±1

exp[βλb

r∑a=1

σa +β2q

2

( r∑a=1

σa)2− β2qr

2

]

= E[ ∑σ1,··· ,σr

exp[βλb

r∑a=1

σa + β√qg

r∑a=1

σa − β2qr

2

]]= exp

[− β2qr

2

]E[(2coshβ(λb+

√qg))r

],

where g ∼ N(0, 1). Finally, as r → 0, we have,

1

rlog zr(QRS) = −1

2β2q + E log[2coshβ(λb+

√qg)] + o(r).

Further, we have, at the replica symmetric saddle point,∑r

a=1Q20,a = rb2 and

∑1≤a<b≤rQ

2a,b =

− r2q

2 +O(r2). Plugging these back into S(Q), we obtain the replica-symmetric free energyfunctional

limr→0

1

rS(QRS) = ΨRS(b, q) , (2.2.5)

ΨRS(b, q) =1

4β2(1− q)2 − 1

2βλb2 + E[log 2cosh(β(λb+

√qg))]. (2.2.6)

For sanity check, we note that in case we plug in b = q = 0, we obtain ΨRS(0, 0) =β2/4 + log 2. The stationary point b = q = 0 corresponds usually to the ‘annealed’ freeenergy density limn→∞ n

−1 logE[Zn(β)]. Indeed it is easy to check that this quantitycoincides with β2/4 + log 2.

As in the previous chapter, we look for stationary points of (b, q) 7→ ΨRS(b, q) to derivethe replica symmetric free energy density. To this end, we study the stationarity conditions

∂ΨRS(b, q)

∂q=β2q

2− β2

2E[tanh2(β(λb+

√qG))] = 0,

∂ΨRS(b, q)

∂b= −βλb+ βλE[tanh(β(λb+

√qG))] = 0,

where G ∼ N(0, 1) and the first equation is derived by an application of Gaussian integra-tion by parts. This implies the following system of fixed point equations for b, q:

b = E[tanh(β(λb+√qG))],

q = E[tanh2(β(λb+√qG))]. (2.2.7)

54

Before we start analyzing the phase diagram resulting from the replica symmetricansatz, we discuss the interpretation of the parameters b, q, ant their connection withthe marginal distribution of each spin σi under the measure µβ. Set xβ(Y ) =

∑σ σµβ(σ),

the expectation under the measure µβ. Note that, for any β, xβ(Y ) is a valid estimator ofx0, and it coincides with the Bayes estimator for β = λ, and with the maximum likelihoodestimator for β →∞. We will write xβ = xβ(Y ) for simplicity.

Recall from the previous chapter that b is interpreted as the asymptotic overlap of xand x0 (equivalently, this is the expected overlap between σ ∼ µβ and x0), while q shouldbe the overlap of x with itself (equivalently, this is the expected overlap between σ1 andσ2, where σ1,σ2 ∼iid µβ). In formulas

b = limn→∞

1

nE[〈xβ,x0〉] , (2.2.8)

q = limn→∞

1

nE[‖xβ‖22] . (2.2.9)

Without loss of generality, we can set x0 = 1. In this case, using the exchangeability ofthe coordinates of σ, we can rewrite b = limn→∞ E[xβ,1] and q = limn→∞ E[x2

β,1].

Note that µβ is a random distribution and consequently µβ,i( · ), the marginal distribu-tion of σi under µβ is a random distribution supported on +1,−1. We parametrize thedistribution as

µβ,i(σi) =eβhiσi

2 cosh(βhi). (2.2.10)

The quantity hi is referred to as the ‘effective field’ on the spin σi. Under this parametriza-tion, we have xβ,1 = tanh(βh1) and therefore

b = limn→∞

E[tanh(βh1)], q = limn→∞

E[tanh2(βh1)].

Comparing with the fixed point equation system (2.2.7), these equations suggest that thelaw of h1 converges to N(λb, q) as n→∞. With the replica method, this can be confirmedby computing other moments E[tanh`(βh1)]. This heuristic will be extremely useful in thediscussion of the replica symmetric cavity method in Section 2.3.

Summarizing, the replica symmetric ansatz corresponds to the following picture of themeasure µβ:

• Under σ ∼ µβ, the coordinates of σ = (σ1, . . . , σn) are roughly independent (in thesense of finite-dimensional distributions).

• The marginals of this measure can be parametrized as in Eq. (2.2.10).

• The effective fields hi are roughly i.i.d. (in the sense of finite-dimensional distribu-tions) with common law N(λb, q).

• The parameters b, q solve the fixed point equations (2.2.7).

We will see that the replica symmetric picture is correct at high temperature (and inparticular on the line β = λ), but in general it is incorrect. As in the previous chapter,replica symmetry breaking will be needed to get the correct behavior.

55

2.2.2 The replica symmetric phase diagram

We next study the solutions of the fixed point equations (2.2.7) and their dependence on(β, λ). When multiple solutions exist, the asymptotic free energy density, and the qualita-tive behavior of µβ are determined by the solutions that maximize the replica symmetricfree energy functional ΨRS with respect to b and minimize it with respect to q.

Qualitatively different solutions correspond to qualitatively different behaviors of µβ(phases):

1. Uninformative/paramagnetic phase. bP = qP = 0 is always a solution to the fixedpoint equations (2.2.7). For (λ, β) sufficiently small, this is the only solution, and thesystem is said to be in the paramagnetic phase. According to the interpretation of b, qin the previous section, we expect limn→∞ n

−1E[〈xβ,x0〉] = 0, limn→∞ n−1E[‖xβ‖2] =

0, and the effective fields are hi ≈ 0. In other words, the Gibbs measure µβ is roughlyuniform (in the sense of the finite-dimensional marginals), and uncorrelated with thesignal x0. In this phase, the free energy density coincides with the annealed one

ΨRS(bP , qP ) = β2

4 + log 2.

2. Spin glass phase. For λ sufficiently small, and β sufficiently large, another fixed pointbSG = 0, qSG > 0 emerges and the system is referred to be in the spin-glass phase. Inthis phase, limn→∞

1nE[〈xβ,x0〉] = 0 while limn→∞

1nE[‖xβ‖22] = qSG > 0. In words,

the Gibbs measure is far from uniform, since its barycenter xβ is far from zero. Onthe other hand, it is uncorrelated with the true signal x0.

The cavity fields are asymptotically i.i.d. hi ∼ N(0, q∗). Hence the finite dimensionalmarginals of the Gibbs measure have approximate product form but are not uniform,and carry no information about x0.

The free energy prediction is

ΨRS(0, qSG) =β2

4(1− qSG)2 + E[log 2 cosh(β

√qSGG)],

qSG = E[tanh2(β√qSGG)].

By studying the last equation, it is possible to see that such a solution exists forβ > 1, and indeed it is the dominant solution for all β > 1 and λ < λc(β), for someλc(β). We leave this calculation as an exercise for the reader.

3. Recovery/ Ferromagnetic phase. For λ sufficiently large, we have a fixed point bR >00, qR > 00. In this case, limn→∞ n

−1E[〈xβ,x0〉] = bR > 0 and thus the estimator ispositively correlated with the signal. The cavity fields have asymptotic distributionhi ∼ N(λbRx0,i, qR).

Remark 2.2.1. The reader has probably noticed that, strictly speaking, xβ(Y ) = 0identically because the Gibbs measure µβ of Eq. (2.1.8) is symmetric under σ → −σ forh = 0. We already encountered this problem in tensor PCA, see Section 1.1.2, and thesame considerations apply to the present case.

Summarizing, we can redefine xβ(Y ) by using the principal eigenvector v1(M) and

principal eigenvalue λ1(M) of the matrix of Mβ ≡∑σ σσ

Tµβ(σ), as x+β (Y ) =

√λ1(M)v1(M).

The estimation error of x+β (Y ) (for losses that are invariant under x+

β 7→ −x+β ) is expected

to be asymptotically the same as for xβ,h(Y ) (where the measure µβ is perturbed withh > 0), in the limit h→ 0 after n→∞.

56

0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.250.0

0.2

0.4

0.6

0.8

1.0

(x,x

0/n

)2

Bayesc

Figure 2.1: Reconstruction accuracy in the Z2 synchronization model, as a function of thesignal-to-noise ratio λ. Blue line: asymptotic accuracy of the Bayes-optimal estimator as predicted

2.2.3 Bayes optimal and maximum likelihood estimation

For β = λ, the Gibbs measure coincides with the Bayes posterior. Using properties ofposterior expectation, it is easy to see that b∗ = q∗, where b∗ is determined as the solutionof the fixed point equation

b∗ = E[tanh(λ2b∗ +√λ2b∗G)]. (2.2.11)

The free energy density on this line is ΨBayes(b∗) where

ΨBayes(b) ≡λ2

4(1− 2b− b2) + E[log 2 cosh(λ2b+

√λ2bg)].

The asymptotic free energy in this case has been derived rigorously in [KM09] and thecorresponding estimation error was characterized in [DAM16].

By studying Eq. (2.2.11) the reader can show that a solution b∗ > 0 exists (and domi-nates the free energy) for λ > 1, while for λ ≤ 1, the only solution to this equation is b∗ = 0.Since we are studying the Bayes-optimal estimator, the phase transition at λBayes

c = 1 hasthe interpretation of a fundamental statistical barrier for any estimator:

• For λ ≤ λBayesc no estimator can achieve a positive correlation with the signal, i.e.

limn→∞ n−1E[〈x,x0〉] = 0 for any estimator x.

• For λ > λBayesc the optimal estimator achieves a positive correlation with the signal,

i.e. limn→∞ n−1E[〈xBayes,x0〉] > 0 (with xBayes = xβ=λ the Bayes estimator).

A phenomenon of this type is sometimes referred to as a ‘weak-recovery threshold.’ Wenote in passing that —in this specific model— the same threshold is achieved by a simplerprincipal component analysis (PCA) estimator that computes the leading eigenvector of Y .However, the PCA estimator does not achieve the optimal correlation above the thresholdλBayesc .

57

On the Bayes line, the vector of cavity fields is asymptotically distributed as h =λb∗x0 +

√b∗ g for g ∼ N(0, In). Letting y ≡ h/(λb∗) and τ ≡ (λ2b∗)

−1/2, the marginals ofthe posterior read

µBayes,i(σi) ≈eλhiσi

2 cosh(λhi)=

1

z(yi)exp

− 1

2τ2(yi − σi)2

. (2.2.12)

In other words, the Bayes posterior is asymptotically equivalent (in the sense of finite-dimensional marginals) to that of a much simpler model in which we want to estimate avector x0 ∈ ±1n, observed under additive Gaussian noise, y = x0 + τg, g ∼ N(0, In).

How does the maximum likelihood estimator (MLE) xML = xβ=∞ (which solves Eq. (2.1.5))compare to Bayes estimation xBayes = xβ=λ? It is easy to show that, for specific metrics,maximum likelihood is strictly sub-optimal to the Bayes estimator. The simplest case isthe one of (normalized) matrix mean square error. Letting MBayes(Y ) ≡ E[x0x

T0 |Y ] =

Mβ=λ(Y ), and MML(Y ) ≡ xMLxTML = Mβ=∞(Y ) (cf. Remark 2.2.1 for the definition of

Mβ(Y )), we have:

MSEn(MBayes) ≡1

n2E∥∥x0x

T0 − E[x0x

T0 |Y ]

∥∥2

F

=

1

n2E∥∥x0x

T0 − MML(Y )

∥∥2

F

− 1

n2E∥∥MML(Y )− MBayes(Y )

∥∥2

F

≤ MSEn(MML)− 1

n2

(E∥∥MML(Y )

∥∥2

F

1/2 − E∥∥MBayes(Y )

∥∥2

F

1/2)2

≤ MSEn(MML)− (1− q∗,M )2 + on(1) .

In the last step we used the fact that n−2E∥∥MBayes(Y )

∥∥2

F

= q2∗,M + on(1). The replica

method predicts q∗,M = q∗ < 1, but for obtaining a gap between maximum likelihood andBayes, it is sufficient to show q∗,M < 1, which relatively easy to argue.

A much more subtle question is whether the MLE achieves the optimal weak recoverythreshold. We denote the MLE threshold as

λMLc ≡ infλ > 0 : lim

n→∞

1

nE[|〈xML(Y ),x0〉|] > 0. (2.2.13)

Interestingly, the replica symmetric calculation predicts λMLc > 1. On the other hand,

taking into account replica symmetry breaking yields λMLc = 1. A complete proof that

λMLc = 1 is at present an open problem. (See [JMRT16] for further discussion.)

2.3 The replica symmetric asymptotics: cavity approach

In this section, we rederive the results of the previous section using a different approach:the cavity method. The cavity method offers an alternative route to derive the resultsof the replica method. While still non-rigorous— it does not make use of formal tricks(number of replicas going to zero) and its assumptions are more explicit. As for thereplica method, it provides a hierarchy of approximations, that are referred to as replicasymmetric (RS), one-step replica-symmetry breaking (1RSB), and so on, matching theanalogous hierarchy in the replica method. This terminology is sometimes misleading fora newcomer, since replicas do not really appear in the cavity method, and assumptions arein terms of probabilistic dependency structures, rather than RSB patters.

We will emphasize the key assumptions as items C1, C2, and so on.

58

We want to compute the free energy density, φ ≡ limn→∞1nE logZn(β, λ), where we

recall the partition function Zn(β, λ) from Eq. (2.2.1). The cavity method starts with thefollowing simple observation

1

nE logZn(β, λ) =

1

n

n−1∑i=0

Ai, Ai = E[

logZi+1(β, λ)

Zi(β, λ)

].

Now, elementary considerations about Cesaro averages imply that φ = limn→∞An, pro-vided the limit exists. The cavity method seeks to compute this limit. Physically, An isthe expected change in free energy of the system when a single spin is added to a modelwith n spins. The basic idea is to write this change in terms of the Gibbs measure of then-spins model. It is therefore crucial to characterize the marginals of this measure.

2.3.1 The distribution of cavity fields

Recall the Gibbs measure of the n-spins system, cf. Eq. (2.1.8), which we rewrite here inthe case h = 0, for the reader’s convenience

µ(σ) =1

Znexp

∑i<j≤n

Yijσiσj

. (2.3.1)

For any A ⊂ [n], we define σA = σi : i ∈ A. For any two subsets A,B of [n] withA∩B = ∅, we define νA→B(σA) to be the marginal of the σA spins in the modified modelwhere the σB variables have been “removed”. Formally:

νA→B(σA) ∝∑

σi:i∈Bc\A

expβ

∑i<j:i,j∈Bc

Yijσiσj

.

In physics language, we are creating a ‘cavity’ by removing the spins σB = σi : i ∈ Bfrom the system: this image is at the origin of the name of the method. In the specialcase in which B = j is a single vertex, we will write νA→j = νA→j, and analogouslyif A = i. We also denote by µB the marginal distribution of σB when σ ∼ µ. We willrefer to νA→B as a ‘cavity marginal’ and to µB as a ‘marginal’.

We can express ordinary marginals in terms of cavity marginals. For instance, consid-ering the case B = i, we have

µi(σi) =1

z′i

∑σ[n]\i

expβ

∑k<l:k,l∈[n]\i

Yklσkσl + βσi∑j∈[n]\i

Yijσj

=

1

z′′i

∑σ[n]\i

ν[n]\i→i(σ[n]\i) expβσi

∑j∈[n]\i

Yijσj

. (2.3.2)

(Here and in the following lines, we will introduce normalization constants denoted by z,z′, etc, which are implicitly defined by the normalization condition of quantities on theleft-hand side.)

We note that the pairwise interactions among the spin variables σk, σl under ν[n]\i→iare of order Ykl = O(1/

√n), and thus it is reasonable to assume that the entries are

approximately independent.

C1 We assume that a negligible error is made in replacing ν[n]\i→i(σ[n]\i) by the productof marginals

∏l∈[n]\i νl→i(σl) in Eq. (2.3.2).

59

The cavity method assumes this explicit independence and thus posits that

µi(σi) =1

zi

∑σ[n]\i

∏l∈[n]\i

νl→i(σl) exp[βσi

∑l∈[n]\i

Yilσl

]+ on(1). (2.3.3)

Now, we parametrize

µi(σi) =eβhiσi

2 cosh(βhi)νi→j(σi) =

eβhi→jσi

2 cosh(βhi→j), (2.3.4)

where hi→j is usually referred to as the cavity field. Using Eq. (2.3.3), we get

hi =1

2βlog[ ∑

σl:l∈[n]\i exp[βYilσl + βhl→iσl]∑σl:l∈[n]\i exp[−βYilσl + βhl→iσl]

]+ on(1)

=1

∑l∈[n]\i

log[ cosh[βYil + βhl→i]

cosh[−βYil + βhl→i]

]+ on(1) (2.3.5)

=∑l∈[n]\i

u(Yil, hl→i) + on(1),

where we defined

u(Y, h) =1

βatanh

[tanh(βY ) tanh(βh)

]. (2.3.6)

(We used the trigonometric identities cosh(x+ y) = cosh(x) cosh(y) + sinh(x) sinh(y) andatanh(x) = 1

2 log 1+x1−x . Further, we set Yii = 0 without loss of generality. )

We can further simplify the above formula by using the fact that Yij = O(1/√n). By

Taylor expansion u(Y, h) = Y tanh(βh) +O(Y 3) and therefore

hi =∑l∈[n]\i

Yil tanh(βhl→i) + on(1). (2.3.7)

Note that the cavity fields (hl→i)l∈[n]\i are random variables independent of (Yil)l∈[n]\i(since they are functions of (Ykl)k,l∈[n]\i). Further recalling the definition of Y , we get

hi =λ

nx0,i

∑l∈[n]\i

x0,l tanh(βhl→i) +∑l∈[n]\i

Wil tanh(βhl→i) + on(1). (2.3.8)

We will consider, without loss of generality, x0 to be fixed (the reader can consider x0 = 1).

Let P(n)hi|Y −i be the conditional distribution of hi given (Ykl)k,l∈[n]\i. Since the Wil are

independent Gaussian random variables2, we get

dist(P

(n)hi|Y −i ,N(λbix0,i, qi)

)= on(1) , (2.3.9)

bi ≡1

n

∑l∈[n]\i

x0,l tanh(βhl→i) , (2.3.10)

qi ≡1

n

∑l∈[n]\i

tanh2(βhl→i) (2.3.11)

2This conclusion does not actually rely on Gaussianity, and follows more generally using the central limittheorem.

60

In other words P(n)hi|Y −i is approximately Gaussian with mean λbix0,i and variance qi. (The

mathematically minded reader can think of dist as a distance metrizing weak topology.)

Note that both bi and qi are averages over n terms. It is reasonable to think that theseterms concentrate, which is our next assumption.

C2 We assume that bi and qi concentrate around their expectations b(n)

and q(n):

b(n)

= E[x0 tanh(βh→)] , q(n) = E[tanh2(βh→)]. (2.3.12)

Here (x0, h→) is a random variable distributed as any one of the (x0,l, hl→i).

Note that the latter assumption is implied by independence of (hl→i)l∈[n]\i, but much

weaker than it. Under this assumption, the conditional distribution P(n)hi|Y −i is essentially

independent of Y −i, and hence we conclude that the unconditional distribution P(n)hi

is alsoapproximately normal:

dist(P

(n)hi,N(λb

(n)x0,i, q

(n)))

= on(1) . (2.3.13)

At this point we need to ‘close’ the equations (2.3.12) for b(n)

, q(n) which parametrizethe distribution of the effective fields. To this hand, we begin by noting that the distributionof the cavity fields h→ in the system with n spins is essentially the same as the distributionof the effective fields in a system with n− 1 spins. Both quantities are parametrization ofsingle spin marginals as per Eq. (2.3.4), in a system with n − 1 spins. There is howeverone difference: for the system with n−1 spins, the mean and variance of the Yij are scaledby a factor 1/(n − 1) instead of 1/n. This amounts to a difference of order 1/n in λ andβ. It is reasonable to believe

C3 The distribution of the effective fields P(n)h has a limit as n→∞. Further, the limit

distribution depends continuously on β, λ.

Together, these assumptions imply that the distribution of h→ on the right-hand side

of Eq. (2.3.12) is approximately N(λb(n)x0, q

(n)). Therefore b(n)

, q(n) satisfy

b(n)

= E[x0 tanh(β(λx0b(n)

+

√q(n)G))] + on(1) ,

q(n) = E[tanh2(β(λx0b(n)

+

√q(n)G))] + on(1) .

where G ∼ N(0, 1). These equations imply that b(n) → b∗, q

(n) → q∗, where b∗, q∗ solve thelimit equations

b = E[tanh(β(λb+√qG))] , q = E[tanh2(β(λb+

√qG))] . (2.3.14)

Note that this is exactly the same set of equations derived from the stationary pointconditions of the replica symmetric free energy for this problem (2.2.7)! The presentderivation was longer, but the assumptions are transparent, and the Gaussian distributionappears more naturally.

61

2.3.2 The free energy density

Armed with the description of cavity fields derived in the previous section, we return tothe computation of the free energy as outlined in the beginning of the section. Recall thatwe wish to compute φ = limn→∞An, where An = E log[Zn+1(β, λ)/Zn(β, λ)]. We defineW =

√nW and without loss of generality, set x0i = 1. Thus we have,

Zn+1(β, λ) =∑

σ∈±1n+1

exp βλ

n+ 1

∑1≤i<j≤n+1

σiσj +β√n+ 1

∑1≤i<j≤n+1

Wijσiσj

.

Setting βn+1 = β√

n+1n , λn+1 = λ

√n+1n , direct computation yields

Zn+1(βn+1, λn+1) =∑σ

expβλn

∑1≤i<j≤n

σiσj +β√n

∑1≤i<j≤n

Wijσiσj + βσn+1

n∑i=1

Yi(n+1)σi

.

= Zn(β, λ)∑σ

ν[n]→(n+1)(σ[n]) expβσn+1

n∑i=1

Yi(n+1)σi

.

We assume that the distribution ν[n]→(n+1)( · ) factorizes as in point C1 above to get

Zn+1(βn+1, λn+1)

Zn(β, λ)=∑σ

n∏i=1

νi→(n+1)(σi) eβYi(n+1)σiσn+1 ·

(1 + on(1)

)=

∑σn+1

n∏i=1

coshβ(Yi(n+1)σn+1 + hi→(n+1))

cosh(βhi→(n+1))

· (1 + on(1)),

where we introduce the cavity fields as in (2.3.4). Next, using the trigonometric identitiescosh(x+ y) = cosh(x) cosh(y) + sinh(x) sinh(y) and sinh(σx) = σ sinh(x) for σ ∈ ±1, wehave,

Zn+1(βn+1, λn+1)

Zn(β, λ)=[ n∏i=1

cosh(βYi(n+1))] ∑σn+1

n∏i=1

(1 + σn+1 tanh(βYi(n+1)) tanh(βhi→(n+1)))

=[ n∏i=1

cosh(βYi(n+1))] ∑σn+1

exp[ n∑i=1

log(1 + σn+1 tanh(βYi(n+1)) tanh(βhi→(n+1))].

(2.3.15)

First, we note that each Yi(n+1) is O( 1√n

) and therefore, using Taylor expansion, we expect

n∏i=1

cosh(βYi(n+1)) =n∏i=1

exp[β2Y 2

i(n+1)

2+O

( 1

n3/2

)]= exp

[β2

2+ o(1)

].

Further, using Taylor expansion, we expect,

n∑i=1

log(1 + σn+1 tanh(βYi(n+1)) tanh(βhi→(n+1))) =

σn+1

n∑i=1

tanh(βYi(n+1)) tanh(βhi→(n+1))−1

2

n∑i=1

tanh2(βYi(n+1)) tanh2(βhi→(n+1)) + on(1).

= βσn+1

n∑i=1

Yi(n+1) tanh(βhi→(n+1))−β2

2E[tanh2(βhi→(n+1))] + on(1),

62

where we use the approximation that tanh(x) = x + O(|x|3) as x → 0 and E[Y 2i(n+1)] =

1/n + o(1/n). Recalling the equation for the effective field (2.3.7) and the fixed pointequations for q, b (2.3.14), we obtain

n∑i=1

log(1 + σn+1 tanh(βYi(n+1)) = βhn+1σn+1 −β2

2q + on(1) .

Plugging these back into (2.3.15), we obtain,

Zn+1(βn+1, λn+1)

Zn(β, λ)= exp

[β2

2(1− q)

] ∑σn+1

exp [βhn+1σn+1] ·(1 + on(1)

)(2.3.16)

= exp[β2

2(1− q)

]2 cosh(βhn+1→i) ·

(1 + on(1)

).

Taking expectations and using the fact that the law of hn+1 converges to N(λb∗, q∗), weobtain,

E logZn+1(βn+1, λn+1)

Zn(β, λ)=β2

2(1− q∗) + E[log 2 cosh(β(λb∗ +

√q∗G))] + on(1), (2.3.17)

where G ∼ N(0, 1).Finally, we need to evaluate log

[Zn(βn, λn)/Zn(β, λ)

]. To this end, we note that (for

µ = µn,β,λ)

Zn(βn, λn)

Zn(β, λ)=∑σ

µ(σ) exp

βnλn − βλn

∑i<j

σiσj +βn − β√

n

∑i<j

Wijσiσj

=∑σ

µ(σ)[1 +

βλ

n2

∑i<j

σiσj +β

2n3/2

∑i<j

Wijσiσj + on(1)].

Thus we have,

E logZn(βn, λn)

Zn(β, λ)= T1 + T2 + on(1),

T1 =βλ

n2

∑i<j

E[∑σ

µ(σ)σiσj

], T2 =

β

2n3/2

∑i<j

E[Wij

∑σ

µ(σ)σiσj ].

We analyze each term in turn. In order to estimate T1, we assume, as in point C1 above,that σi, σj are approximately independent under µ, and hence∑

σ

µ(σ)σiσj =(∑

σi

µi(σi)σi

)(∑σj

µj(σj)σj

)+ on(1)

= tanh(βhi) tanh(βhj) + on(1) .

Therefore

T1 =βλ

2· E( 1

n

n∑i=1

tanh(βhi))

+ on(1)

(a)=βλ

2· E

tanh(βh1)2

+ on(1)

(b)=βλ

2· E

tanh(β(λb∗ +√q∗G))

2+ on(1)

(c)=βλ

2· b2∗ + on(1) .

63

In step (a) we used the assumption that n−1∑n

i=1 tanh(βhi) concentrates around its expec-tation, as per point C2; in (b) we used the asymptotic characterization of the distributionof effective fields derived above; in (c) we used the fixed point equations (2.3.14).

Next, using Gaussian integration by parts, we obtain

T2 =β

2n3/2

∑i<j

E[∂∂Wij

∑σ

µ(σ)σiσj

]=

β2

2n2

∑i<j

E[1−

(∑σ

µβ(σ)σiσj

)2]. (2.3.18)

Again, assuming as per point C1 (and as in the calculation of T1) that σi, σj are approxi-mately independent under µ, and recalling the definition of effective fields, we get

T2 =β2

2n2

∑i<j

E[1−

(tanhβhi tanhβhj

)2]+ on(1)

=β2

2− β2

2E

(1

n

n∑i=1

(tanhβhi)2

)+ on(1)

(a)=β2

2− β2

2E(tanhβh1)22 + on(1)

(b)=β2

2

1− Etanh(β(λb∗ +

√q∗G))22+

+ on(1)

(c)=β2

2

(1− q2

∗)

+ on(1) .

Here we proceeded analogously to the computation of T1: in step (a) we used the assump-tion that n−1

∑ni=1 tanh2(βhi) concentrates around its expectation, as per point C2; in (b)

we used the asymptotic characterization of the distribution of effective fields derived above;in (c) we used the fixed point equations (2.3.14).

Finally, combining Eq. (2.3.17) with these results for T1, T2, we obtain

An = E logZn+1(β, λ)

Zn(β, λ)

= E logZn+1(βn+1, λn+1)

Zn(β, λ)− E log

Zn+1(βn+1, λn+1)

Zn+1(β, λ)

=β2

4(1− q∗)2 − βλ

2b2∗ + E[log 2 cosh(β(λb∗ +

√q∗G))] + on(1)

= ΨRS(b∗, q∗) + on(1) .

where ΨRS(b, q) is the replica symmetric free energy functional which we already derivedin Section 2.2.1.

Thus the cavity method successfully recovers the RS free energy density predictionφ = ΨRS(b∗, q∗), without relying on replicas, and instead making specific assumptions C1,C2, C3 about the structure of the Gibbs measure. As we will see in Section 2.6, theseassumptions can be modified to obtain RSB predictions.

2.4 Algorithmic questions

As we discussed above, one way to perform estimation is to compute the mean of themeasure µ, i.e.

xβ(Y ) ≡∑

σ∈+1,1nµn,β(σ)σ . (2.4.1)

64

We sweep under the carpet the technical nuisance that, in the present model, the meanon the right-hand side is zero for symmetry reasons. We assume throughout that thissymmetry is broken, e.g. by additional side information or by a spectral initialization.

Writing Eq. (2.4.1) does not solve the problem because evaluating the right-hand siderequires summing over exponentially many vectors σ. In this section we briefly discuss thequestion of computing this average via efficient algorithms.

2.4.1 From the cavity method to message passing algorithms

Computing the effective fields hii≤n is of course equivalent to computing the marginalsof the Gibbs measure µ, and is therefore sufficient to compute the mean vector xβ(Y ). Wesaw in turn that the effective fields are closely related to the cavity fields hi→j, and itis therefore natural to try to compute the latter. Indeed hi = h′i→0 where h′i→j are cavityfields in a fictitious model in which we added a variable σ0.

By repeating the argument to derive Eq. (2.3.5), we obtain a set of equations that—under similar assumptions— are approximately satisfied by the cavity fields, namely

hi→j =∑

l∈[n]\i,j

u(Yil, hl→i) + on(1) , (2.4.2)

u(Y, h) =1

βatanh

[tanh(βY ) tanh(βh)

]. (2.4.3)

Indeed note that Eq. (2.3.5), applied to the system from which σj has been removed,yields hi→j =

∑l∈[n]\i,j u(Yil, hl→i,j) + on(1), and the last display follows from the

approximation hl→i,j ≈ hi→j .It is useful to rewrite these equations for an Ising model on a general graph G = (V =

[n], E) (i.e., for the Gibbs measure (2.1.8) with Yij set to 0 whenever (i, j) 6∈ E). We have

xi→j =∑

l:(i,l)∈E,l 6=j

u(Yil, xl→i) , ∀(i, j) ∈ E . (2.4.4)

This provides a set of 2|E| equations for the 2|E| unknowns (xi→j : (i, j) ∈ E). Here wechanged our notation from hi→j to xi→j to emphasize the fact that we regard these asequations in the unknowns (xi→j : (i, j) ∈ E), whose solutions might (or might not) berelated to the actual cavity fields.

Two questions arise naturally: (i) What algorithms can be used to solve the systemof equations (2.4.4) in polynomial time? (ii) What is the relation between the solutions(or approximate solutions) found by such algorithms and the actual cavity fields (hi→j :(i, j) ∈ E)?

Given the form in which the equations (2.4.4) are written there is natural iterativealgorithm that we might hope to use:

x(t+1)i→j =

∑l:(i,l)∈E,l 6=j

u(Yil, x(t)l→i) , ∀(i, j) ∈ E . (2.4.5)

This is known as belief propagation (BP) or the sum-product algorithm [Gal62, WJ08,MM09]. Following [RU08], we refer to it as a ‘message passing algorithm’. This name

refers to the fact that we can think of each variable x(t)i→j as a message passed from vertex

i to vertex j.

65

Abstractly, we can think of it as defining a map FY ( · ) : R2|E| → R2|E|, for which theabove iteration can be written as

x(t+1) = FY (x(t)) , x(t) ≡ (x(t)i→j : (i, j) ∈ E) , (2.4.6)

Existence of fixed points of this recursion (and hence existence of solutions of Eqs. (2.4.4))

follows from Brouwer’s fixed point theorem (applied to the variables ui→j = u(Yij , x(t)i→j)).

Convergence and relation with the actual cavity fields are far more subtle questions.There is one case in which these questions have a simple answer, as stated below.

Proposition 2.4.1. If G = (V,E) is a finite tree, then the belief propagation iteration(2.4.5) converges to a unique fixed point x∗ = (x∗i→j : (i, j) ∈ E) in at most diam(G)iterations.

Further, at this fixed point, the variables x∗i→j coincide with the cavity field x∗i→j = hi→j,and the effective fields are given by hi =

∑l:(i,l)∈E u(Yil, x

∗l→i).

We leave the proof as an exercise for the reader [MM09]. If the graph G is not a tree,the last proposition can still be of use. One case of interest is the one of locally tree-likegraphs. We refer to the bibliography section for pointers to this literature.

2.4.2 Approximate Message Passing

We are interested in a setting that is seemingly opposite to the tree case described above.Namely, in our setting G = Kn is the complete graph over n vertices. We will see thatnevertheless, the iteration (2.4.5) (or its close relatives that we will develop here) producecorrect marginals.

We begin by noting that Yij = O(1/√n) and therefore we can linearize the function

u(Y, x) in Y , as we did in the previous sections. We thus replace the BP iteration by thefollowing simpler form:

x(t+1)i→j =

∑l:(i,l)∈E,l 6=j

Yil tanh(βx(t)l→i) .

Notice that the connection with the Gibbs measure (2.1.8) is encoded in the nonlinearfunction tanh(βx) appearing on the right-hand side. The following arguments do notdepend on this specific function and they are more transparent if we replace tanh(βx) bya generic function ft : R→ R (possibly dependent on the iteration number):

x(t+1)i→j =

∑l:(i,l)∈E,l 6=j

Yilft(x(t)l→i) .

If G is the complete graph, we can rewrite this iteration as (setting for simplicity Yii = 0for all i ∈ [n])

x(t+1)i→j =

n∑l=1

Yilft(x(t)l→i)− Yijft(x

(t)j→i) .

In other words, we can write x(t)i→j = x

(t)i + δ

(t)i→j , where δ

(t)i→j = O(1/

√n) and

x(t+1)i =

n∑l=1

Yilft(x(t)l→i) ,

δ(t+1)i→j = −Yijft(x(t)

j→i) .

66

Substituting x(t)i→j = x

(t)i + δ

(t)i→j on the right-hand side and Taylor expanding, we get

x(t+1)i =

n∑l=1

Yilft(x(t)l ) +

n∑l=1

Yilf′t(x

(t)l )δ

(t)l→i +O(1/

√n) ,

δ(t+1)i→j = −Yijft(x(t)

j ) +O(1/√n) .

Finally, using the second equation in the first one, we get

x(t+1)i =

n∑l=1

Yilft(x(t)l )−

(n∑l=1

Y 2il f′t(x

(t)l )

)ft−1(x

(t−1)i ) +O(1/

√n)

=n∑l=1

Yilft(x(t)l )− dtft−1(x

(t−1)i ) +O(1/

√n) ,

dt ≡1

n

n∑l=1

f ′t(x(t)l ) .

In the last step we used a central limit theorem heuristic to replace the sum∑n

l=1 Y2il f′t(x

(t)l )

by dt.If we drop the O(1/

√n) terms in the above iteration, and rewrite it in matrix notation,

we have derived the following iteration

x(t+1) = Y ft(x(t)l )− dtft−1(x(t−1)) , (2.4.7)

dt =1

n

n∑i=1

f ′t(x(t)i ) . (2.4.8)

Here it is understood that ft : R→ R acts entrywise on vectors, namely ft(x) ≡ (ft(x1), . . . , ft(xn)).We also note that the diagonal entries of Y have a negligible impact of this iteration (indeedYii ∼ N(λ/n, 2/n)) and therefore we include them. Also, by convention we set d0 = 0.

Regardless of the sequence of arguments that brought us to this iteration, it definesa perfectly reasonable class of algorithms. A specific algorithm in this class is defined bytwo sequence of functions ftt≥0 and gtt≥0 (see below). Spelling out the details, thealgorithm proceeds as follows:

1. Initialize x(0). The initialization can be taken to be a function of the data matrixY or of additional side information. An example of the first type is the spectralinitialization x(0) =

√nv1(Y ) (where v1(Y ) is the principal eigenvector of Y ). An

example of the second type arises if we observe y = εx0 + g for g ∼ N(0, In) and setx(0) = y.

2. Iterate the map (2.4.7) for a number t∗ of steps.

3. Output

x(t∗) = gt(x(t∗)) . (2.4.9)

Here again, it is understood that gt is applied to x(t∗) entrywise.

This family of algorithms is an example of an even broader class of algorithms known asApproximate Message Passing (AMP) algorithms.

Remarkably, AMP admits an exact characterization in the limit n → ∞, for any con-stant number of iterations.

67

Theorem 7 ([BM11]). Let F : R2 → R be any function that is locally Lipschitz and withat most polynomial growth |F (x)| ≤ C(1 + ‖x‖2)k for constants C, k. Further assume theinitial condition x(0) and true signal x0 to be deterministic and such that n−1

∑ni=1 δx0,i,x

(0)i

converges weakly and in k moments to the law of (X0, X(0)). Finally assume the functions

ft to be Lipschitz.Define the sequences at, qt, t ≥ 0, by letting, for t ≥ 0

at+1 = λE[X0ft(atX0 +√qtG)], qt+1 = E[ft(atX0 +

√qtG)2],

(X0, G) ∼ Unif(+1,−1)⊗ N(0, 1), (2.4.10)

with initialization a0 ≡ EX0f0(X(0)), q0 ≡ Ef0(X(0))2.Then we have, for each t ≥ 1, almost surely,

limn→∞

1

n

n∑i=1

F (x0,i,x(t)i ) = EF (X0, atX0 +

√qtG) , (2.4.11)

where expectation is with respect to G ∼ N(0, 1) independent of X0.

Proof idea. We will only sketch the main idea of the proof that uses a technique introducedby Bolthausen [Bol14], and subsequently generalized in [BM11] and follow up work.

For the sake of simplicity, we consider the case λ = 0, x(0) = 0, which implies at = 0for all t. For t = 1, we have, x(1) = W f0(0), which is a Gaussian vector with ap-proximately i.i.d. N(0, f0(0)2) entries. Indeed, for a fixed unit vector u ∈ Rn, we have

Wud=√

2/ng0u+ Pug/√n, with g0 ∼ N(0, 1) independent of g ∼ N(0, In).

If we try to use the same argument to characterizeW f1(x(1)) we run into some difficultyas x(1) is no-longer independent of W . Indeed it is a useful exercise to check that, for anon-constant function f1, the vector W f1(x(1)) is not well approximated by normal vectorwith i.i.d. entries. In general, at the t-th step, x(t) is not independent of W and thus thisproblem persists.

The difficulty is that x(t) is a complicated non-linear function ofW and hence there doesnot seem to be hope to obtain its distribution in closed form. Hower, things simplify if wetry to compute the conditional distribution ofW given x(1), · · · ,x(t). Note that conditionalon x(1), · · · ,x(t), f0(x(0)), · · · , ft(x(t−1)) are also given, and therefore, conditioning onx(1), · · · ,x(t) is equivalent to conditioning on

Et =W : x(1) = W f(x(0)), · · · ,x(t) + dt−1ft−2(h(t−2)) = W ft−1(h(t−1))

.

This is equivalent to conditioning on a set of linear measurements of a Gaussian vector W .Equivalently, we are interested in the conditional distribution of the Gaussian vector Wsubject to W ∈ Et an affine space. This is well known to be the conditional expectationplus the projection onto E0

t (the linear space corresponding to Et) of a fresh Gaussianvector. In formulas

W |Etd= E[W |x(0), · · · ,x(t)] + P⊥t W

newP⊥t , (2.4.12)

where W new is an independent copy of Y and P t is the orthogonal projector onto the linearspan of f0(x(0)), · · · , ft(x

(t−1)), and P⊥t = I − P t is its orthogonal complement. Thusconditioning on x(0), · · · ,x(t−1), we have

W ft(x(t))|Et

d= E[W |x(0), · · · ,x(t)] ft(x

(t)) +W newP⊥t ft(x(t))− P tW

newP⊥t f(x(t)) .

68

The term W newP⊥t ft(x(t)) is approximately Gaussian with independent entries of variance

‖P⊥t ft(x(t))‖22/n, sinceW new is independent of x(0), · · · ,x(t). The term P tWnewP⊥t f(x(t))

is small since it is a low-dimensional projection of a high-dimensional Gaussian.

It remains to track the E[W |x(0), · · · ,x(t)] ft(x(t)). This can be done by induction

showing that it can be approximately decomposed into a term that is canceled by thecorrection −dtft−1(x(t−1)) of Eq. (2.4.7), and a linear combination of x(1), . . . ,x(t), whichis Gaussian by induction hypothesis. We refer the reader to [BM11] for a formalization ofthese ideas and a rigorous proof.

Remark 2.4.1. The last theorem is an application of the more general theorem provedin [BM11], and substantial generalizations of the latter have been proved in the literature.These include non-symmetric matrices, non-Gaussian matrices, including matrices withindependent or non-identically distributed entries, algorithms with memory and so on. Werefer to the bibliography section for pointers to this literature.

The case of random initializations x(0) is covered by Theorem 7 as long as x(0) is inde-pendent of W , and the empirical distribution n−1

∑ni=1 δx0,i,x

(0)i

converges almost surely.

On the other hand, the case of a spectral initialization x(0) =√nv1(Y ) is not covered

by this theorem, since this initialization is correlated with the noise matrix W . On theother hand, as shown in [MV21], spectral initializations can be treated within the sameframework discussed here.

The iteration (2.4.10) for parameters at, qt is known as ‘state evolution,’ a term in-troduced in [DMM09]. Theorem 7 allows one to determine the accuracy of the estimatorxt produced by the algorithm, see Eq. (2.4.9). Choosing F (x0, x) = (x0 − gt(x))2 in thetheorem, we obtain,

limn→∞

1

n

∥∥xt − x0

∥∥2

2= E

(X0 − gt(atX0 +

√qtZ)

)2. (2.4.13)

The function gt that minimizes the right-hand side is gt(y) = EX0|atX0 +√qtZ = y.

Note that the resulting error only depends on the ‘signal-to-noise’ ratio bt = a2t /(λ

2qt), andtherefore we would like to choose functions f1, . . . , ft as to maximizes this quantity. (Thefactor λ2 is introduced for future convenience.)

A recursive argument shows that the optimal functions, and the corresponding quanti-ties (bs)s≥1 are given by

bt+1 = EX0E[X0|λbtX0 +

√btG]

,

ft(y) = E[X0|λbtX0 +√btG = y] .

Since X0 ∼ Unif(+1,−1), it is easy to compute these conditional expectations to get

bt+1 = E

tanh(λ2bt +√λ2btG)

, (2.4.14)

ft(y) = tanh(λy) . (2.4.15)

We therefore recover the nonlinearity ft(y) = tanh(βy) that was derived heuristically inthe last section, with the Bayes choice β = λ.

We proved that AMP with nonlinearity ft(y) = tanh(λy) is optimal among all thealgorithms of the form Eq. (2.4.7). We refer to this specific algorithm as ‘Bayes AMP,’because it appears to make, at each step, the Bayes-optimal choice. Note that the fixed

69

point condition corresponding to Eq. (2.1.7) coincides with the one that characterizes the(global) Bayes optimal estimator, Eq. (2.2.11). This points to a connection between thetwo.

This finding motivates two questions:

• Does Bayes AMP converge (in the large n and large t limits) to the Bayes optimalestimator?

• Fixing a number of iterations t, is Bayes AMP optimal (in the sense of statisticalaccuracy) among all algorithm with comparable computational complexity?

The answer is positive to both questions, provided a small amount of side information or aspectral initialization are used to break symmetry at the first iteration [DAM17, CMW20].Figure 2.1 illustrates these points by reporting the accuracy achieved by Bayes AMP andcomparing it with the optimal Bayes accuracy in the large n limit.

2.5 Replica symmetry breaking

We have derived the replica symmetric asymptotic free energy density of the model (2.1.8)using two alternative techniques —the replica method and the cavity method. In bothcases, we made specific assumptions that allowed us to complete the calculation. Withinthe replica method, we assumed the overlap matrix to take the replica symmetric form ofEq. (2.2.5). Within the cavity method, we assumed specific properties C1, C2, C3 of theGibbs measure.

On the other hand, we know that at low enough temperature (large β), the replica sym-metric prediction for the free energy density is necessarily incorrect. Indeed, as explainedin Section 1.4.1, if the prediction was correct, i.e. if we had φ(β, λ) = ΨRS(b∗, q∗;β, λ) atall temperatures (here we are restoring the dependence of these quantities upon β, λ), thiswould imply Ent(µβ,λ) < 0 which is impossible (Recall that Ent(µ) denotes the entropy ofprobability distribution µ).

In this section, we correct this inconsistency by introducing the full Replica SymmetryBreaking (fRSB) asymptotics for the free energy, originally due to Parisi [Par79a]. Parisiformula was eventually established rigorously thanks to the work of Talagrand [Tal06b],Panchenko [Pan14] and many other authors. In this section we will derive this formulausing the replica method. In Section 2.6, we will show how to modify the cavity methodto incorporate replica-symmetry breaking. Finally we will describe some rigorous resultsusing interpolation techniques in Section 2.7.

To simplify our treatment, we will focus on the case λ = 0, h = 0. This case is notvery interesting from the point of view of statistical estimation in the Z2-synchronizationmodel. Indeed, it corresponds to the case in which the data matrix Y = W is pure noise.However, the generalization to λ > 0 is relatively straightforward, and left as an exercisefor the reader.

2.5.1 k-step replica symmetry breaking

Recall the action functional derived by the replica method in Section 2.2.1. The replicamethod predicts the asymptotic free energy density to be given by φ = limr→0

1rS(Q∗),

70

where

S(Q) =β2r

4− β2

2

∑1≤a<b≤r

Q2ab + log zr(Q) , (2.5.1)

zr(Q) =∑

σ1,··· ,σr∈±1

expβ2

∑1≤a<b≤r

Qabσaσb,

and Q∗ is a suitable stationary point of S. Note that we slightly simplified this expressionwith respect to Eq. (2.2.4). Indeed, since we are assuming λ = 0, we can take Q0a = b = 0for all a ∈ 1, . . . , r and redefine Q to be an r × r matrix. For general λ, we would haveto keep track of b and optimize the action S with respect to b as well.

The RS stationary points QRS are the only ones that are symmetric with respect topermutations of the replicas 1, . . . , r. In the context of the p-spin model, we improvedon the RS free energy by introducing the 1RSB ansatz. The 1RSB subspace is defined by

partitioning 1, · · · , r = ∪r/m`=1 I` with |I`| = m and setting

Q1RSBab =

1 if a = b ∈ 0, 1, · · · , r,q1 if a 6= b ∈ I`, ` ∈ 1, · · · , r/m,q0 otherwise.

(2.5.2)

Substituting in S(Q), taking the limit r → 0 and optimizing over q0, q1 and m obtains the1RSB free energy. This calculation was originally carried out in [Par79b]. It yields a betterbehavior at low temperature, but predicted limβ→∞ limn→∞ n

−1Ent(µβ) to be approxi-mately −0.01, which is impossible. The same paper, however, suggested that iterating thesame construction infinitely many times would yield the correct stationary point3 of S.

Indeed the correct asymptotic free-energy is obtained by constructing a k-step RSBansatz: a subspace of overlap matrices with reduced symmetry, that can be obtained byiterating the 1RSB ansatz. The correct limiting free energy density can be obtained in thelimit k →∞. This more general calculation was carried out in [Par79a].

We construct a hierarchical partition of 1, · · · , r into blocks as follows. At the first

step, set 1, · · · , r = tr/m0

l1=1 Bl1 for some integer m0 ≥ 1, with |Bl1 | = m0. Next, partition

each Bl1 = tm0/m1

l2=1 Bl1l2 , with |Bl2 | = m1. We proceed iteratively, and finally partition

Bl1l2···lk−1= ∪mk−2/mk−1

lk=1 Bl1,··· ,lk with |Bl1,··· ,lk | = mk−1 for each l1, · · · , lk. Let J denotethe r× r matrix of all ones, and let JS denote the r× r matrix with ones on the |S| × |S|block indexed by S ⊂ 1, · · · , r and zero otherwise. Given the hierarchical partitionBl1,··· ,lm : m ≤ k and 0 ≤ q0 ≤ q1 ≤ · · · ≤ qk ≤ 1, the kRSB ansatz consists of matricesof the form

QkRSB =q0J + (q1 − q0)∑l1

JBl1 + (q2 − q1)∑l1l2

JBl1l2+

· · ·+ (qk − qk−1)∑

l1,··· ,lk

JBl1,··· ,lk + (1− qk)I .

A schematic representation of the kRSB saddle point is given in Figure 2.2.Before we attempt to study the kRSB free energy functional, it is useful to interpret

the kRSB ansatz in terms of the overlap distribution. This will also clarify the meaning of

3Apparently, this informal remark in [Par79b] was criticized by one of the referees of the manuscript.

71

q0q0

q0

q1 q1

q1

Figure 2.2: A schematic representation of the kRSB saddle point

parameters m0, · · · ,mk−1, q0, · · · , qk, and the range over which they should be optimized.Recall the replica analysis leading to (1.4.15), which predicts that if ρn is the distribution ofn−1|〈σ1,σ2〉| for σ1,σ2 i.i.d. samples from µn(·), then ρn ⇒ ρ, where, for any continuousbounded function f , ∫

[0,1]f(q) ρ(dq) = lim

r→0

1

r(r − 1)

∑a6=b

f(Q∗ab),

where Q∗ is the stationary point of S(Q) that achieves the correct free energy density.We next evaluate the last expectation on the kRSB stationary point. We have∑

a6=bf(Qab) =

(r2 − rm0

)f(q0) +

r

m0

(m2

0 −m0m1

)f(q1) + · · ·+ f(qk)

r

mk−1(m2

k−1 −mk−1).

(2.5.3)

Taking the limit r → 0, we obtain,∫[0,1]

f(q) ρ(dq) = m0f(q0) + (m1 −m0)f(q1) + · · ·+ (1−mk−1)f(qk),

which implies

ρ =

k∑l=0

(ml −ml−1)δql , mk = 1, m−1 = 0 . (2.5.4)

In words, the kRSB ansatz assumes that the asymptotic overlap distribution is formed byk + 1 point masses at ql, l ∈ 0, . . . , k, with probablities (ml −ml−1) respectively. Thisalso implies that we should take 0 ≤ m0 ≤ m1 ≤ · · · ≤ mk−1 ≤ 1 in the limit r → 0.

Viceversa, any kRSB saddle point QkRSB can be specified by a probability measureρ on the interval [0, 1] with k + 1 point masses. Equivalently, it can be specified by thedistribution function q 7→ ρ([0, q]). In physics, this is referred to as the “functional orderparameter.” We can therefore think of the RSB action as a function of the probabilitydistribution ρ.

72

Next, we turn to the computation of the kRSB free energy functional. We will encodethe parameters in vectors q = (q0, · · · , qk) and m = (m0, · · · ,mk−1). Note that usingf(q) = q2 in Eq. (2.5.3) above, we have,

limr→0

1

r

∑a<b

(QkRSBab )2 = −1

2

m0q

20 + · · ·+ (1−mk−1)q2

k

. (2.5.5)

Therefore, it remains to compute limr→0 r−1 log zr(Q

kRSB). To this end, we note that∑a6=b

QkRSBab σaσb =

∑a,b

QkRSBab σaσb − r

=q0

(∑a

σa)2

+ (q1 − q0)∑l1

( ∑a∈Bl1

σa)2

+ · · ·

+ (qk − qk−1)∑

l1,··· ,lk

( ∑a∈Bl1,··· ,lk

σa)2

+ (1− qk)r − r. (2.5.6)

Therefore, (2.5.1) implies,

zr(QkRSB) =

∑σ1,··· ,σr∈±1

expβ2

2

∑a6=b

QkRSBab σaσb

= e−β

2qkr/2 · T, (2.5.7)

T = E ∑σ1,··· ,σr

exp[β√q0g0

(∑a

σa)

+ βk∑

m=1

√qm − qm−1

∑l1,··· ,lm

gl1,··· ,lm

( ∑a∈Bl1,··· ,lm

σa)]

,

where expectation is with respect to g0, gl1,··· ,lm : l1, . . . , lm ∈ N,m ≤ k, a collection ofindependent N(0, 1) random variables. The above expression is obtained by applying the“Gaussian disintegration” trick to each term in the exponent (2.5.6), as to linearize theexponent with respect to the σa’s.

We next carry out the sum over σ1, . . . , σr. First, we fix a ∈ 1, · · · , r. Then thereexist unique l1, l2, · · · , lk such that a ∈ Bl1 , a ∈ Bl1l2 , · · · , a ∈ Bl1l2···lk . The terms involvingσa are ∑

σa

exp[βσa

(√q0g0 +

√q1 − q0gl1 + · · ·+

√qk − qk−1gl1l2···lk

)]= 2 cosh

[β(√

q0g0 +√q1 − q0gl1 + · · ·+

√qk − qk−1gl1l2···lk

)]. (2.5.8)

Further, for any fixed (l1, · · · , lk), there are exactly mk−1 many a ∈ 1, · · · , r such that σa

belongs to the same partitions Bl1 , · · · , Bl1l2···lk . These sums will give the same contributionas (2.5.8) and thus we obtain,

zr(QkRSB) = E

[ ∏l1,··· ,lk

(2 cosh

[β(√

q0g0 +√q1 − q0gl1 + · · ·+

√qk − qk−1gl1l2···lk

)])mk−1]

=: E[ ∏l1,··· ,lk

Zmk−1

l1,··· ,lk

].

Here E denotes expectation with respect to the random variables g0, gl1,··· ,lm : l1, . . . , lm ∈N, m ≤ k. This expectation can be computed recursively, by evaluating only k + 1 one-dimensional integrals as we next explain.

73

For any fixed l1, · · · , lk−1, (gl1l2···lk)lk≤mk−2/mk−1are i.i.d. random variables. Thus for

fixed l1, · · · , lk−1, we can first compute the expectation

Zmk−1

l1,··· ,lk−1:= Egl1···lk−1,lk

[Zmk−1

l1l2···lk−1,lk

]= Egl1···lk−1,1

[Zmk−1

l1l2···lk−1,1

],

where we note, that for each fixed l1, · · · , lk−1, this expectation is the same for all lk. Theexponent mk−1 was introduced on the left-hand side for future convenience. Indeed, thereare mk−2/mk−1 such terms, and thus we have

zr(QkRSB) = E

∏l1,··· ,lk−1

[Zl1,··· ,lk−1

]mk−2.

We repeat this operation iteratively to define Zmk−1

l1,··· ,lk−2= Egl1l2···lk−1

[[Zl1,··· ,lk−1

]mk−2]]

,

Zl1,··· ,lk−3and so on. Finally, we will obtain zr(Q

kRSB) = Eg0 [(Z0)r], where Z0 is only afunction of g0.

Summarizing, let g0, . . . , gk ∼ N(0, 1), and define the sequence of random variables(Zi)0≤i≤k via the recursion

Zk = 2 cosh[β(√

q0g0 +√q1 − q0g1 + · · ·+

√qk − qk−1gk

)],

Zi−1 = E[Zmi−1

i

∣∣g0, . . . , gi−1

],

F (q,m) = E logZ0 .

(2.5.9)

We note that each Zi is a function of g0, . . . , gi and the correspondence with the variablesdefined above is that Zl1...li is distributed as Zi for each i ≤ r. Thus we obtain

limr→0

1

rlog zr(Q

kRSB) = F (q,m) .

Finally, plugging everything back into (2.5.1), using (2.5.5) and (2.5.7), we obtain limr→0 r−1S(QkRSB) =

ΨkRSB(q,m), where

ΨkRSB(q,m) =β2

4(1− 2qk) +

β2

4

k∑l=0

(ml −ml−1)q2l + F (q,m). (2.5.10)

This completes the derivation of the kRSB free energy functional. As a sanity check, wecheck that in case k = 0, we recover the RS free energy. In this case, we set m0 = 1 andqk = q to get

ΨkRSB(q,m)∣∣∣k=0

=β2

4(1− 2q) +

β2

4q2 + E[log 2 cosh(β(

√qg))] = ΨRS(q) . (2.5.11)

Another case of interest is 1RSB. We set m = (m0,m1) with 0 ≤ m0 ≤ m1 = 1 andq = (q0, q1) with 0 ≤ q0 ≤ q1 ≤ 1. In this case, the function F reads:

F (q0, q1,m0) =1

m0Eg0[

log(Eg1[(

2 coshβ(√q0g0 +

√q1 − q0g1)

)m0])]

.

In the next sections, we will try to elucidate the meaning and origin of the kRSB freeenergy functional by describing other approaches towards its derivation (cavity and inter-polation methods). The fact that it yields the correct free energy density at all temperatureswas eventually proven by Talagrand [Tal06b] and Panchenko [Pan14].

74

Theorem 8 ([Tal06b]). Consider the SK model (2.1.1). Setting Zn(β) to be the partitionfunction of Eq. (2.2.1) at λ = 0, we have,

φ := limn→∞

1

nlogZn(β) = inf

k,q,mΨkRSB(q,m),

where the limit holds almost surely and in L1.

2.5.2 Taking the limit of full RSB

Given a kRSB order parameter q = (q0, · · · , qk) and m = (m0, · · · ,mk−1), it is easy to‘embed it’ in the (k + 1)RSB space, e.g.,

q′ = (q0, · · · , qk, qk+1 = 1), m′ = (m0, · · · ,mk−1,mk = 1) . (2.5.12)

We leave it to the reader to check that Ψ(k + 1)RSB(q′,m′) = ΨkRSB(q,m), whence

infq,m

Ψ(k + 1)RSB(q,m) ≤ infq,m

ΨkRSB(q,m) . (2.5.13)

In other words Ψ∗kRSB := infq,mΨkRSB(q,m) form a sequence of non-increasing upperbounds on the free energy density, and

φ = infk≥0

Ψ∗kRSB = limk→∞

Ψ∗kRSB . (2.5.14)

Two cases are possible:

• The infimum is achieved. Letting k denote the smallest integer such that Ψ∗kRSB =infk′≥0 Ψ∗k′RSB, the model (the Gibbs measure) is said be in a kRSB phase.

When this happens, most often k = 0 (RS) or k = 1, although models with any kcan be constructed.

• The infimum is not achieved. In this case the correct free energy density is onlyobtained by taking the limit k →∞. This scenario is known as full replica symmetrybreaking (fRSB).

Full RSB is believed to hold for the SK model at any β > 1, although a rigorousproof is missing.

Motivated by these remarks we will next derive a description of the k →∞ limit. Webegin by noting that we can assume without loss of generality that q0 = 0—indeed anykRSB order parameter can be expressed as an equivalent (k+1)RSB order parameter withq0 = 0, simply by taking m0 = 0. Equivalently, setting q0 = 0 amounts to adding an atomat 0 of weight m0 to the functional order parameter ρ.

Next, we reformulate the recursive construction of the Parisi functional in Eq. (2.5.9).We construct a sequence of functions Φi : R→ R, proceeding backwards for i ∈ 0, . . . , k.Letting E[ · ] denote expectation with respect to g ∼ N(0, 1)

Φk(x) = log 2 cosh(βx),

Φi(x) =1

milogE

[exp

(miΦi+1(x+

√qi+1 − qig)

)], ∀i ∈ 0, . . . , k − 1 . (2.5.15)

75

Using Eq. (2.5.9), it is not hard to see that F (q,m) = Φ0(0). More generally, the relationbetween the functions Φi, and the random variables Zi defined there is

logZi = Φi

( i∑`=1

√qi − qi−1gi

). (2.5.16)

We next claim that for each i ∈ 0, . . . k,

Φi(x) = Φ(x, qi)−β2

2(1− qk) , (2.5.17)

where Φ : [0, 1] × R → R, (q, x) 7→ Φ(q, x) is the unique solution of the following an-tiparabolic PDE (we denote by ∂q, ∂x the partial derivatives with respect to q and x andby ∂xx the second derivative with respect to x):

∂qΦ +1

2

[∂xxΦ + ρ(q)

(∂xΦ

)2]= 0,

Φ(x, 1) = log 2 cosh(βx) .

Here ρ(q) = ρ([0, q]) is the cumulative distribution function corresponding to the asymp-totic overlap distribution ρ(dq) and encodes the kRSB order parameter of Eq. (2.5.4). Inparticular (recall that q0 = 0, qk+1 = 1, mk = 1):

ρ(q) = mi for q ∈ [qi, qi+1), i ∈ 0, . . . , k . (2.5.18)

Finally, Eq. (2.5.18) is understood to be solved backwards in ‘time’ q starting from theboundary condition at q = 1.

The claim (2.5.17) can be proved by induction, proceeding backward from i = k. Indeednotice that, for q ∈ [qi, qi+1), the solution of the PDE (2.5.18) must satisfy

∂qΦ +1

2

[∂xxΦ +mi

(∂xΦ

)2]= 0 . (2.5.19)

In this interval, define W (x, q) ≡ exp(miΦ(x, q)), a change of variables known as ‘Cole-Hopftransformation’. Note that

∂qW = miemiΦ∂qΦ, ∂xW = mie

miΦ∂xΦ ,

∂xxW = miemiΦ∂xxΦ +mi(∂xΦ)2

,

and therefore, W satisfies the (time reversed) heat equation

∂qW +1

2∂xxW = 0 .

The solution of this equation is well known to admit a probabilistic representation: for anyq′ ≥ q, we have W (x, q) = E[W (x +

√q′ − q g, q′)], where expectation is with respect to

g ∼ N(0, 1). Hence the function Φ has the following representation for q ∈ [qi, qi+1):

Φ(x, q) =1

milogE

[exp

(miΦ(x+

√qi+1 − qg, qi+1)

)]. (2.5.20)

76

Using this representation, we can easily prove claim (2.5.17). For i = k, notice that mk = 1and therefore

Φ(x, qk) = logE[2 coshβ

(x+

√1− qk g

)]=

1

2β2(1− qk) + log 2 coshβx ,

as claimed. Next, we assume that the claim holds for Φi+1( · ) and Φ( · , qi+1), and prove itfor Φi( · ) and Φ( · , qi). By Eq. (2.5.20) and the induction hypothesis, we have

Φ(x, qi) =1

milogE

[exp

(miΦi+1(x+

√qi+1 − qig) + (β2/2)(1− qk)

)]= Φi(x) +

1

2β2(1− qk) .

We can now represent the kRSB free energy functional (2.5.10) in terms of the solutionthe Parisi PDE, which we denote by Φ(x, q; ρ) in order to emphasize its dependence on ρ:

ΨkRSB(q,m) = −1

4β2 +

β2

4

∫[0,1]

q2ρ(dq) + Φ(0, 0; ρ).

Note that the right-hand side makes sense for any probability measure ρ, not only a prob-ability measure formed by a finite number of atoms.

One mathematical difficulty here is that defining the functional for arbitrary ρ re-quires showing existence and uniqueness of solutions of the PDE (2.5.18) for ρ(q) a gen-eral distribution function (not just piecewise constant). While this is possible (see be-low), an alternative approach was originally developed by Guerra [Gue03]. This paperestablished that for two probability measures µ, ν on [0, 1] with finitely many atoms,|Φ(0, 0;µ) − Φ(0, 0; ν)| ≤

∫ 10 |µ([0, x]) − ν([0, x])|dx. Thus, if the space of all probability

measures on [0, 1] is metrized by d(µ, ν) =∫ 1

0 |µ([0, x]) − ν([0, x])|dx, then ρ 7→ Φ(0, 0; ρ)is Lipschitz and can thus be extended uniquely to the space of all probability measures on[0, 1].

Thus for any probability measure ρ on [0, 1], we define the free energy functional

P(ρ) = −β2

4+β2

4

∫q2ρ(dq) + Φ(0, 0; ρ). (2.5.21)

In terms of the free energy functional P, the asymptotics of the free energy density is

limn→∞

1

nlogZn(β) = inf

ρP(ρ). (2.5.22)

As a final sanity check, we note that for λ = h = 0 and β sufficiently small, we expect thefree energy functional to be minimized at ρ = δ0 (indeed this can be proved to be the case).In this case, Φ(0, 0; δ0) = logE[2 cosh(βg)] for g ∼ N(0, 1). Thus Φ(0, 0; δ0) = β2/2 + log 2and P(δ0) = β2/4+log 2 = limn→∞

1n logE[Zn(β)]. This agrees with the intuition that for

sufficiently high temperature, and in zero external field, the asymptotic free energy densitycoincides with the annealed free energy density.

At first sight, the formula (2.5.22) for the free energy density might appear difficultto use because requires to solve an infinite-dimensional optimization problem. On thecontrary, it turns out to be very useful.

77

0 1

1

0 1

1

0 1

1

qM

β < 1 β > 1 β → ∞

Figure 2.3: A schematic representation of the distribution function of ρ∗ for different values of theinverse temperature β

.

The minimization over ρ can be approximated numerically by restricting to distributionswith k atoms either at fixed of variable locations. A particular consequence of (2.5.22) isthe value of the ground state energy:

limn→∞

1

nmax

σ∈+1,−1nH(σ) = P∗ ≡ lim

β→∞

1

βinfρ

P(ρ) . (2.5.23)

An evaluation of the right-hand side yields P∗ ≈ 0.763166726567 [CR02, Sch08].Several rigorous results are available about the functional ρ 7→ P(ρ). Auffinger-Chen

[AC15b] and Jagannath-Tobasco [JT16] establish that P(ρ) is strictly convex in ρ andtherefore has a unique minimizer (recall that, as mentioned above, P(ρ) is continuous inρ). Recall that ρ∗ is expected to be give the asymptotic overlap distribution as n → ∞.Figure 2.3 illustrates the expected behavior of ρ∗ for different temperatures β. We referthe reader to [AC15a] for rigorous results about the minimizer.

2.5.3 On the physical interpretation of Parisi’s PDE

Let Φ(x, q) = Φ(x, q; ρ) be the solution of the PDE (2.5.18) for a certain probability dis-tribution ρ. Define s(x, q) = β−1∂xΦ(x, q). Differentiating the PDE (2.5.18) with respectto x, we note that s(·, ·) solves

∂qs+1

2∂xxs+ βρ(q)s(x, q)∂xs = 0 , (2.5.24)

s(x, 1) = tanh(x) .

For a general function s : R× [0, 1]→ R (under suitable regularity conditions), we can alsodefine the stochastic differential equation (SDE):

dXq = βρ(q)s(Xq, q)dq + dBq .

where Bq : q ≥ 0 is a standard Brownian motion. We first observe that if s solves (2.5.24),then s(Xq, q) is a martingale (i.e. E[s(Xq′ , q

′)|Xq] = s(Xq, q) for all q ≤ q′). This followsupon a direct application of Ito’s lemma. However, to keep the discussion elementary andself-contained, we sketch a proof below. We have,

E[s(Xq+ε, q + ε)]|Xq = x]− s(x, q)

= ∂xs(x, q)E[Xq+ε −Xq|Xq = x] + ε∂qs(x, q) +1

2∂xxs(x, q)E[(Xq+ε −Xq)

2|Xq = x] + o(ε).

= ε(∂qs+

1

2∂xxs+ βρ(q)s∂xs

)+ o(ε).

78

Upon differentiating E[s(Xq′ , q′)|Xq] with respect to q′, we obtain the desired conclusion.

Notice in particular that the martingale propery implies

s(x, q) = E[tanh(βX1)|Xq = x] . (2.5.25)

Given any probability distribution ρ, we can construct Φ, s and the stochastic processXq : q ≥ 0, so that s(Xq, q) is a martingale. However these quantities assume a specialphysical significance when ρ = ρ∗ is the unique minimizer of the Parisi functional P(ρ).We will refer to the corresponding diffusion as X∗q : q ≥ 0.

Among other properties, using the replica method, de Almeida-Lage [DAL83] predictedthat the law X∗1 is the asymptotic distribution of the cavity fields. Recall the Gibbs measureµ = µβ of Eq. (2.1.8), where we are here assuming λ = h = 0. We already introduced theparametrization of its one-dimensional marginals in terms of effective fields hi:

µi(σi) =eβhiσi

2 cosh(βhi).

Consider the probability distribution of the effective field at vertex i (with respect to

the random matrix W ) P(n)hi

. By exchangeability of the vertices, this is independent of

i ∈ 1, . . . , n, so we can focus on P(n)h1

. In the context of the replica symmetric cavity

method, we derived the behavior of P(n)h1

and argued that (as long as the replica symmetricassumption holds) it converges weakly to a Gaussian distribution, see Section (2.3.1).

As we discussed above, the replica symmetric ansatz only holds at high temperature

(small β) or large enough external field. In general, P(n)h1

converges weakly as n→∞ to thelaw of X∗qM , where qM ≡ supq : q ∈ supp(ρ∗) is the maximum overlap in the support ofρ∗. It is a useful exercise for the reader to check that the Gaussian limit of Section (2.3.1)is recovered in the replica symmetric phase (i.e. if ρ∗ = δq∗ is a point mass).

Finally, we note that explicit stationary conditions can be derived for the Parisi measureρ∗, which involve the diffusion (Xt)t∈[0,q]. To be definite, we ρ(q) = ρ([0, q]) as a distributionfunction, namely a non-negative, non-decreasing left continuous function on the interval[0, 1] , with ρ(1) = 1. Omitting technical details, the variation of P(ρ), cf. Eq. (2.5.21),with respect to a change in ρ can be shown to take the form:

P(ρ+ ε∆)−P(ρ) =1

2β2ε

∫ 1

0

E[s(Xq, q)

2]− q

∆(q) dq + o(ε) . (2.5.26)

Hence we obtain the following stationarity conditions:

Fρ(q) ≡∫ qM

q

E[s(Xt, t)

2]− t

dt , (2.5.27)

q ∈ supp(ρ∗) ⇒ Fρ(q) = 0, F ′ρ(q) = 0 , (2.5.28)

q ∈ [0, 1] ⇒ Fρ(q) ≥ 0 , (2.5.29)

The martingale condition (2.5.25) can also be used to express the above conditions in a

different form. Given Xq, let X(1)q : q ≥ q0 and X(2)

q : q ≥ q0 be two independent

copies the diffusion Xt started at X(1)q = X

(2)q = Xq. Then we have

E[s(Xt, t)

2]

= E[tanh(βX(1)1 ) tanh(βX

(2)1 )].

79

2.6 Cavity method with replica symmetry breaking

In Section 2.3, we studied the SK model using the cavity method. We made a certainnumber of assumptions and recovered the replica symmetric free energy. However, thereplica symmetric free energy is incorrect at sufficiently low temperature (i.e. β sufficientlylarge). This prompts the natural questions: Can we sharpen the replica symmetric cavitymethod to study the free energy of the system at low temperature? In particular, can wemodify/relax our assumptions as to derive the RSB free energy?

In this section, we will accomplish this goal, in the sense that we will recover the 1RSBfree energy. This approach can be extended to k-steps replica symmetry breaking in anatural way, but we will refrain from doing this.

The replica symmetric cavity method hinges on the absence of global correlations inthe model. In particular, two fixed coordinates σi, σj are approximately independent whenσ ∼ µβ. We stated a stronger form (or consequence) of this assumptions as C1 in Section2.3.

At low temperature, approximate independence does not hold any longer, and thereplica symmetric results are inaccurate. Note that the distribution of the overlap, andthe fact that it does not concentrate is intimately related to this emergence of globalcorrelations. It is perhaps useful to revisit this point in a simple way. We use the shorthand〈f(σ1, . . . ,σk)〉µ ≡

∫f(σ1, . . . ,σk)µ⊗k(dσ) for the expectation with respect to the Gibbs

measure (even if the expectation is actually a finite sum). Considering the mean squarecorrelation of two variables σi, σj , we have

Corr2n =1

n2

n∑i,j=1

E(〈σiσj〉µ − 〈σi〉µ〈σj〉µ

)2=

1

n2

n∑i,j=1

E〈σiσj〉2µ

− 2

n2

n∑i,j=1

E〈σiσj〉µ〈σi〉µ〈σj〉µ

+

1

n2

n∑i,j=1

E〈σi〉2µ〈σj〉2µ

=

1

n2

n∑i,j=1

E〈σ1i σ

1jσ

2i σ

2j 〉µ− 2

n2

n∑i,j=1

E〈σ1i σ

1jσ

2i σ

3j 〉µ

+1

n2

n∑i,j=1

E〈σ1i σ

2jσ

3i σ

4j 〉µ

= EQ212 − 2EQ12Q13+ EQ12Q34 .

Here in the last equation, we introduced the overlap Qab ≡ n−1∑n

i=1 σai σ

bi , and used

Ef(σ1, . . . ,σk) for E∫f(σ1, . . . ,σk)µ⊗k(dσ).

If the overlap concentrates, i.e. if the overlap distribution converges weakly to a pointmass ρn → ρ, ρ = δq∗ , then the above calculation shows that the mean square correlationvanishes asymptotically limn→∞ Corr2n = 0. We need therefore to give up the assump-tion that variables σ = (σ1, . . . , σn) are approximately pairwise independent under theGibbs measure. This seems hopeless since non-product form measures are very complex todescribe.

Remarkably, Mezard, Parisi and Virasoro [MPV86, MPV87] developed a set of assump-tions that allow one to carry out the cavity calculation, and recover kRSB asymptotics forany k. This assumption can be described as a specific form of conditional independenceof the coordinates of σ. This will be our topic of interest for the rest of the section (fork = 1), although we will use a somewhat more mathematical language than the originalphysics papers. We will pause in the next section to develop some of this language.

80

2.6.1 A digression into Poisson processes

Our starting point is a Poisson Point Process (PPP) φα : α ∈ N with intensity measureΛ(x)dx on R. This is the unique point process on R such that, letting N(B) =

∑α 1(φα ∈

B) denote the number of points in the Borel set B ⊂ R, the following two properties hold:

• For any Borel set B, N(B) ∼ Poisson(∫B Λ(x)dx).

• For any collection of disjoint Borel setsB1, · · · , Bk, the random variablesN(B1), · · · , N(Bk)are mutually independent.

This definition generalizes immediately to PPP’s on Rd for any d ≥ 1. We recall that animmediate consequence of the definition is that, for any (measurable) bounded functiong : R→ R:

E∑

α

g(φα)

=

∫g(x) Λ(x) dx . (2.6.1)

(This can be proved by first considering the case in which g is a simple function, i.e.g(x) =

∑mk=1 ck1Bk for some Borel sets Bk.)

In fact, the following characterization of PPP’s (see e.g. [DVJ98, Chapter 7.4]) will beuseful.

Lemma 2.6.1. xα : α ∈ N follows PPP with intensity ρ if and only if for all boundedg : R→ R,

E[

exp∑α

g(xα)]

= exp[ ∫ (

eg(x) − 1)ρ(dx)

].

Further, the above are equivalent to this identity holding for all g bounded continuous.

The reader can easily check that this identity holds when g is a simple function.We will be specifically interested in the PPP φα : α ∈ N with intensity function

Λ(x) = exp[−mx] for x ∈ R, m > 0. The following properties of the process will bespecifically relevant for us.

Lemma 2.6.2. The PPP φα : α ∈ N with intensity Λ(x)dx = e−mxdx has the followingproperties:

1. The process has a finite maximum almost surely, i.e. there exists a point φα0 suchthat φα0 > φα for every α 6= α0.

2. Almost surely, there exists a (measurable) reordering of the points φα : α ∈ N =φ(k) : k ∈ N such that φ(1) ≥ φ(2) ≥ · · · .

3. For 0 < m < 1,∑

α eφα <∞ almost surely.

Proof. To prove point 1, note that the expected number of points φα ≥ 0 is

EN(R≥0) =

∫ ∞0

e−mx dx =1

m<∞ . (2.6.2)

Hence, almost surely, the process contains a finite number of non-negative (distinct) points.Any time this happens, there is a maximum point as well, since a finite set always has amaximum.

81

Let us now consider point 3. We separate the contributions of positive and negativepoints: ∑

α

eφα

=∑α

eφα1φα≥0 +

∑α

eφα1φα<0

≡ S≥0 + S<0 .

By point 1, the first sum S≥0 only contains finitely many terms and therefore is finitealmost surely. As for the second sum, by Eq. (2.6.1) we have

ES<0 =

∫ 0

−∞exe−mx dx =

1

1−m <∞ .

Therefore S<0 < ∞ almost surely, which proves the claim. Note that here we used in acrucial way the fact that m ∈ (0, 1).

Finally, point 2 is both very easy to accept and somewhat subtle from a measure-theoretic perspective. We refer the reader to [Pan13, Chapter 2] for a rigorous proof.

Our next lemma introduces the crucial property that makes this PPP particularlyimportant: its behavior under random i.i.d. perturbations.

Lemma 2.6.3. Let ∆α : α ∈ N be a sequence of i.i.d. random variables (‘marks’)independent of the PPP φα : α ∈ N. Assume that the common distribution ν of the ∆α

is such that Cν =∫

exp(m∆) ν(d∆) <∞. Then we have,

φα + ∆α : α ∈ N ∼ PPP(CνΛ).

Proof. The proof follows from a direct application of Lemma 2.6.1. We first note that(φα,∆α) : α ∈ N is a PPP on R2 with intensity exp(−mx)dx ⊗ ν(d∆). Thus, for anyg : R→ R bounded

E

exp[∑

α

g(φα + ∆α)]

= exp∫

(eg(x+y) − 1)e−mx dx ν(dy)

= exp∫

(eg(x) − 1)e−m(x−y) dx ν(dy)

(2.6.3)

= expCν

∫(eg(x) − 1) e−mx dx

.

By Lemma 2.6.1, this completes the proof.

Lemma 2.6.3 establishes a stability property of this PPP. First note that the intensityCνΛ(x)ds = Cνe

−mxdx can be rewritten as CνΛ(x)dx = Λ(x−x0)dx for x0 = −m−1 logCν ,i.e. the change in intensity is just a translation. Hence the lemma can be restated as follows:if we perturb the PPP by i.i.d. random shifts, the resulting point process is distributedaccording to the original one, shifted by a deterministic amount x0.

An instructive image is proposed in [RA05]. Imagine that the φα are positions ofcars in a car race. In a given interval of time (say, one minute), each car moves by anindependent and identically distributed amount ∆α. The new car positions are thereforeφα + ∆α. The overall race moved forward by a distance x0 but—once this shift isaccounted for—the distribution of car position is unchanged apart from a reordering. Thisimage motivates the name Indy500 for this process.

82

Liggett [Lig78] studied this PPP in connection with interacting particle systems. Hisanalysis implies that—under certain technical conditions—it is the only point processesthat enjoys the above stability property. Of course, uniqueness holds up to a global shiftor, equivalently, multiplying the intensity e−mxdx by a constant. Notice that, as a conse-quence of stationarity, the point process of gaps φ(1) − φ(α) : α ∈ N remains unchangedafter random shifts. Ruzmaikina-Aizenman [RA05] showed that all the point processessatisfying this weaker condition are convex combinations of PPP’s with intensity e−mxdx(with varying m).

A strengthening of the above invariance property is established in the following result.

Lemma 2.6.4. Let (X ,F , µ) be a probability space and ∆ : X → R, h : X → R twomeasurable functions. Let (φα, xα) : α ∈ N be a PPP on R × X with intensity measureexp(−mφ)dφ ⊗ ν(dx). We assume Cν =

∫em∆(x)ν(dx) < ∞ for some m > 0 and define

the probability measure νm(dx) ≡ C−1ν em∆(x)ν(dx). Then we have:(

φα + ∆(xα), h(xα))

: α ∈ N∼ PPP

(Cνe

−mφdφ⊗ νm h−1).

(Here νm h−1 denotes the law of h(X) where X ∼ νm.)

Proof. The proof proceeds again by an application of Lemma 2.6.1. We have, for anybounded continuous g : R2 → R,

E

exp[∑

α

g(φα + ∆(xα), h(xα))]

= exp∫ [

eg(u+∆(x),h(x)) − 1]e−mudu ν(dx)

= exp

∫exp

[eg(u,h(x)) − 1

]em∆(x)e−mudu ν(dx)

= exp

∫ [eg(u,h(x)) − 1

]e−mudu νm(dx)

.

This completes the proof.

2.6.2 The 1RSB cavity recursion

Armed with this machinery, we return to the study of the cavity method. Recall that thecavity method adds the (n + 1)th spin to a system comprising of n spins (in our earlieranalysis, we deleted a node, but this is basically the same analysis), computes the effectivefield on the (n+ 1)th spin in terms of the effective fields in the system comprising n spins,and finally imposes the constraint that these fields should have the same law in the largesystem limit. The definition of cavity fields and effective fields was given in Eq. 2.3.4, andin Eq. (2.3.5) we derived the recursion

hn+1 =

n∑i=1

u(Wn+1,i, hi→n+1) + on(1), (2.6.4)

u(W,h) =1

βatanh

[tanh(βW ) tanh(βh)

]. (2.6.5)

As discused above, at low temperature, the approximate independence that was usedto derive the last equation no longer holds. As already discussed in the previous chapter,within a 1RSB phase, we expect the Gibbs measure µ to decompose (approximately) into

83

the convex combination of O(1) ‘pure states’ µα. Namely, letting +1,−1n = ]Mα=1Ωα]Ω0

a partition of the set of spin configurations, we have

µ(σ) =

M∑α=1

wα µα(σ) + ε(σ) ,

∑σ

|ε(σ)| = on(1) ,

µα(σ) ≡ 1

ZαeβH(σ)1σ∈Ωα , Zα ≡

∑σ∈Ωα

eβH(σ) , Z ≡∑

σ∈+1,−1neβH(σ) .

(Recall that H(σ) = 〈W ,σσT〉 is the Hamiltonian.) Each component µα is referred toas a ‘pure state’ under the 1RSB heuristic. Under this decomposition, we have, Z =∑M

α=1 Zα+on(1). Each pure state is expected to have the same energy on the macroscopic

scale, so that logZα− logZ = O(1) for all α ∈ 1, · · · ,M. We expect that as n→∞, thenumber of pure states M also diverges; further, we expect that a suitably re-scaled versionof the “pure-state” free energies φα = logZα − an converges to a PPP with intensityexp(−mx)dx, for some 0 ≤ m < 1. The PPP interpretation of the pure state free energiesimplies that wα ∝ exp[φα] is distributed as Poisson-Dirichlet distribution with parameterm.

Within a 1RSB heuristic we expect each µα to be approximately product form. Moreprecisey, we expect that average correlations between σi and σj for fixed i 6= j to be smallwhen σ ∼ µα. Notice that we can define effective fields for µα exactly as we did for µ.Namely, letting µαn+1 be the (n+ 1)-th marginal of µα, we introduce the parametrization

µαn+1(σn+1) =eβh

αn+1σn+1

2 cosh(βhαn+1σn+1).

Analogously, we denote the cavity fields within state α by hαi→n+1i≤n. We expect that thenumber of pure states M →∞ as n→∞, and we would like to to study the distributionof hαi→n+1 for a “random” pure state α. This is an important difference with respectto the replica symmetric cavity method at this point: even for fixed disorder variablesWij : 1 ≤ i, j ≤ n, hαi→n+1 has a different value across pure states α and we will describethis variability by a probability distribution on the real line which we denote by νi. Notethat the distribution νi will in general depend on i and the realization of Wij : 1 ≤ i, j ≤ nvariables: therefore ν1, . . . , νn are random probability distributions. Our objective will beto derive a distributional recursion for their law.

When the (n+ 1)th spin is added to a system with n spins, the free energy φα ≡ logZα

of each state α changes, and thus changes their relative weights wα. Eventually we wouldlike to focus on the effective fields in states with large free energy, and therefore we shouldkeep track of these changes as well.

We already computed the effective field on σn+1 in Section 2.3:

hαn+1 =n∑i=1

u(Wn+1,i, hαi→n+1) + on(1) ≡ h(hαi→n+1) + on(1) . (2.6.6)

A similar computation (under the same assumptions) allows one to compute the change in

84

free energy:

∆α = log[Zαn+1(βn+1)

Zαn (β)

]= ∆0 + log 2 cosh

n∑i=1

u(Wn+1,i, hαi→n+1)

]−

n∑i=1

log cosh[βu(Wn+1,i, h

αi→n+1)

]= ∆0 + ∆(hαi→n+1) + on(1) , (2.6.7)

where the constant ∆0 ≡∑

i≤n log coshβWi,n+1 is independent of the state α and hencewill be irrelevant in the cavity recursion.

In words, the addition of the new spin transforms the weights of the pure states byshifting their free energies φα to ∆α + φα . The shifts ∆α depend on the cavityfields. If we assume that these shifts are i.i.d. across α, and that the distribution of φαis independent of n up to a global shift, then it follows that φα must be a PPP withintensity Ce−mxdx (as we saw in the last section). Since a global shift in these free energiesis immaterial, we can set C = 1.

We denote by xα = (hαi→n+1 : 1 ≤ i ≤ n) the collection of cavity fields and think of thisquantity as a “mark” taking values in Rn. Following the above discussion, we obtain that(φα, xα) : α ∈ N (approximately for large n) is a PPP:

(φα, xα) : α ∈ N ∼ PPP(e−mxdx⊗

n∏i=1

νi(dhi)). (2.6.8)

Lemma 2.6.4 suggests that

φα + ∆α, hαn+1 ∼ PPP(Cνe

−mxdx⊗ νm h−1), (2.6.9)

νm(dx) ≡ 1

Cνem∆(x)

n∏i=1

νi(dxi) , Cν ≡∫em∆(x)

n∏i=1

νi(dxi) . (2.6.10)

Hence, in the system with n + 1 spins (φαn+1, hαn+1) is approximately distributed as a

PPP with intensity Cνe−mxdx⊗ νn+1, consistently with our assumption at size n. Further

νn+1 satisfies, for any x ∈ R,

νn+1(hn+1 ≤ x) ≈n1

Zνn+1

∫1( n∑i=1

u(Wn+1,i, hi→n+1) ≤ x)em∆(hi→n+1)

n∏i=1

νi(dhi→n+1)

(2.6.11)

(Here Zνn+1 is a normalization constant and we neglected errors due to finite n).

Note that this procedure specifies νn+1 as a (complicated) function of the probabilitymeasures νi : i ≤ n. In the next section we will see that this recursion can be furthersimplified for large n, but it is instructive to step back and have a look at the mathematicalstructure that emerges from the 1RSB cavity method. In settings that are more complicatedthan the SK model (e.g. models on sparse graphs), this structure survives, and furthersimplifications are not possible.

Denoting by Pr(Ω) the space of probability measures over a Polish space Ω, the aboverecursion (2.6.11) defines a map Un : (Pr(R))n → Pr(R). Now, assume that when viewedas a function of Wij : 1 ≤ i < j ≤ n + 1 variables, νi are i.i.d. random measures with

85

distribution Q. We expect therefore that the law of Un(ν1, · · · , νn) is approximately Q,when νi : i ≤ n are i.i.d. samples from Q. In other words:

νn+1d= Un(ν1, · · · , νn) + on(1) . (2.6.12)

Here on(1) means that a suitable distance4 between νn+1 and Un(ν1, · · · , νn) vanishes asn→∞.

Hence the 1RSB cavity method leads, in general, to a distributional recursion in thespace of probability measures Pr(R). Solving these recursion amounts to finding a distri-butional fixed point Q.

2.6.3 Simplifying the cavity recursion

We emphasize that our discussion of the 1RSB cavity method above is quite general, anddoes not depend too heavily on the features of the model. For instance, the same argumentcan be carried out for spin glasses on sparse random graphs, with very minor modifications.For the SK model, it is possible to simplify the distributional recursion (2.6.11), by using acentral limit theorem (CLT) heuristic. Throughout this calculation, we will write An ≈n Bnto mean that dist(An, Bn)→ 0 for a suitable notion of distance, which we will not specify.

We begin by simplifying the free energy shift ∆(hii≤n) of Eq. (2.6.7). Using Wn+1,j =O(1/

√n), and the fact that, for x small, log cosh(x) = x2/2 + O(x4), and that u(W,h) =

W tanh(βh) +O(n−3/2), we get

∆(hii≤n) = log 2 cosh[β

n∑i=1

u(Wn+1,i, hi)]−

n∑i=1

log cosh[βu(Wn+1,i, hi)

]= log 2 cosh

n∑i=1

Wn+1,i tanh(βhi)]− β2

2

n∑i=1

W 2n+1,i

(tanh(βhi)

)2+O(n−1)

= log 2 cosh[β

n∑i=1

u(Wn+1,i, hi)]

+ ∆n . (2.6.13)

The term ∆n is approximated by the sum of n independent sub-Gaussian quantities, each oforder 1/n, and hence will tightly concentrate around a constant. We can therefore neglectit in the cavity recursion (as it will be cancelled by the normalization of the measure νn+1).

We next approximate the cavity field update (2.6.6) to get

h(hi) =n∑i=1

Wn+1,i tanh(βhi) +O(n−1) . (2.6.14)

Using the last equation together with Eq. (2.6.13), we can replace Eq. (2.6.11) by thefollowing distributional recursion

νn+1(hn+1 ≤ x) ≈n1

Zνn+1

∫1( n∑i=1

ui(hi) ≤ x)(

2 cosh[β

n∑i=1

ui(hi)])m n∏

i=1

νi(dhi),

(2.6.15)

ui(hi) ≡Wn+1,i tanh(βhi) .

4As our derivation is heuristic, we will not make the distance notion precise, but the mathematicallyminded reader can think, for instance of bounded Lipschitz distance.

86

Next, we analyze the limit of this distributional recursion (it is important to remindourselves that the distributions νn+1 are themselves random). Notice that Eq. (2.6.15)takes the form of the composition of three steps: (i) a change of variables from hi toui(hi); (ii) a convolution; (iii) a reweighting by the factor (2 cosh(· · · ))m. To emphasizethis structure, we write

νn+1(dh) ≈n1

Zνn+1

(2 coshβh

)mPn+1(dh) , (2.6.16)

Pn+1 = Law(Yn+1) , Yn+1 ≡n∑i=1

Wn+1,i tanh(βhi) . (2.6.17)

Conditional on W (and therefore on the measures ν1, . . . , νn), Yn+1 is the sum of n inde-pendent terms, each of order 1/

√n. We can therefore hope to apply the CLT and conclude

that

Pn+1 ≈n N(hn+1, σ2n+1) , (2.6.18)

hn+1 =n∑i=1

Wn+1,iEνi

tanh(βhi), (2.6.19)

σ2n+1 =

1

n

n∑i=1

Varνi

tanh(βhi). (2.6.20)

We therefore conclude that, for large n, the measures νi, i ≤ n+ 1, are well approximatedby a two parameter family

νi(dhi) ≈n1

Zνi(2 coshβhi)φG(hi;hi, σ

2i ) dhi , (2.6.21)

where φG(x; a, v2) denotes the Gaussian density with mean a and variance v2. The param-eters hi and σ2

i satisfy the recursion of Eqs. (2.6.19), (2.6.20).

The above calculation derives a complete description of the variability of cavity fieldswith respect to the pure state, for a given vertex i and a given realization of W . We nextturn our attention to the distribution of the pair (hi, σ

2i ) with respect to randomness W .

We begin by considering Eq. (2.6.20). The measures νi : i ≤ n are i.i.d. with commonlaw Q. Equivalently, because of Eq. (2.6.21), we can think of Q as the common distributionof (hi, σ

2i ). Thus using a law of large numbers, we expect as n→∞,

σ2n → σ2 := Eν∼Q

[Varh∼ν [tanh(βh)]

].

Next consider Eq. (2.6.19). Again, since the measures νi : i ≤ n are i.i.d. withcommon law Q, and each term on right-hand side is of order 1/

√n (because of the factor

Wn+1,i), we can apply the central limit theorem, and conclude that hn+1 is approximately

Gaussian with mean 0 and variance τ2n = 1

n

∑ni=1

(Eνi [tanh(βhi)]

)2. We thus conclude

that Q (which we now think as a distribution over (hi, σ2)) has a particularly simple form:

Q(dhi,dσ2i ) ≈n φG(hi; 0, τ2)dhi ⊗ δσ2 , (2.6.22)

87

where τ2, σ2 are deterministic parameters that satisfy the following equations:

σ2 = Eh∼N(0,τ2)

[Varh∼νh [tanh(βh)]

], (2.6.23)

τ2 = Eh∼N(0,τ2)

[(Eh∼ν [tanh(βh)]

)2], (2.6.24)

νh(dh) ≡ 1

Z(h)(2 coshβh)φG(h;h, σ2) dh . (2.6.25)

In other words, the distribution of the cavity field h in these equations takes the formh = τ G0 + σZ1, where G0 ∼ N(0, 1), and conditional on G0, Z1 ∼ ντG0 . We can writeexpectation with respect to Z1 in terms of expectation with respect to G1 ∼ N(0, 1) inde-pendent of G0, via a change of measure

EZ1 [f(G0, Z1)] =EG1 [f(G0, G1)(2 cosh(β[τG0 + σG1]))m]

EG1 [(2 cosh(β[τG0 + σG1]))m].

Using this representation and defining q0 ≡ τ2, q1 ≡ τ2 + σ2, we can rewrite Eqs. (2.6.23),(2.6.24) as

q0 = EG0

[(EG1 [tanh(βh)(2 cosh(βh))m]

EG1 [(2 cosh(βh))m]

)2], (2.6.26)

q1 = EG0

[EG1 [(2 cosh(βh))m(tanh(βh))2]

EG1 [(2 cosh(βh))m]

], (2.6.27)

h =√q0G0 +

√q1 − q0G1 . (2.6.28)

These coincide exactly with the stationary point conditions for the 1RSB free energy func-tional derived in Section 2.5.1.

We note that a similar cavity analysis can be carried out for k-steps replica symmetrybreaking. The key idea is that now the decomposition in states has a hierarchical structurewith states clustered in ancestral states, and so on. We will describe this hierarchicalstructure in section 2.7.

2.6.4 Computing the free energy density

The free energy density may also be computed using the cavity method, under a 1RSBscenario. As in the replica symmetric case, the starting point is to write the asymptoticfree energy density as

φ(β) = limn→∞

An , An ≡ E[

logZn+1(β)

Zn(β)

], (2.6.29)

An = E[

logZn+1(βn+1)

Zn(β)

]− E

[log

Zn+1(βn+1)

Zn+1(β)

](2.6.30)

≡ EB(1)n − EB(2)

n . (2.6.31)

for βn+1 = β√n+ 1/n. The first term captures the effect of adding a spin going from n

to n + 1 spins, and the second is the effect of rescaling the temperature on the original nspins. (Note that the change is negligible for what concerns the couplings between σn+1

and σi, i ≤ n.)

88

Let us start from the first term, that is the effect of adding a spin. We already saw inthe previous section that this has the effect of shifting the free energy of pure states fromφα to φα + ∆α, where, by Eq. (2.6.7),

∆α = ∆0 + ∆(hαi→n+1) + on(1) (2.6.32)

Further, by the Taylor expansion log cosh(x) = x2/2+O(x4) and the law of large numbers:

∆0 ≡∑i≤n

log coshβWi,n+1 =β2

2+O(n−1) . (2.6.33)

Since the total partition function is the sum of partition functions for each pure state, wehave

B(1)n = log

[∑α

eφα+∆α

]− log

[∑α

eφα]. (2.6.34)

Under the 1RSB assumption, the shifts ∆α are independent of the free energies φα. Wesaw in Lemma 2.6.3 that (asymptotically) φα + ∆αα∈N are distributed as φα + x0α∈Nwhere φα is a copy of the process φα, and x0 ≡ m−1 logEs[em∆], where Es denotesexpectation with respect to the pure states (at given disorder.) Therefore

B(1)n ≈n log

[∑α

eφα]

+1

mlogEs[em∆]− log

[∑α

eφα]

≈nβ2

2+

1

mlogEs exp

m∆(hi→n+1)

+ F − F .

Here F and F are identically distributed random variables. Therefore

EB(1)n ≈n

β2

2+

1

mE logEs exp

m∆(hi→n+1)

, .

Recall that by the approximations in Eq. (2.6.13), we have

∆(hii≤n) = log 2 cosh[β

n∑i=1

Wn+1,i tanh(βhi)]− β2

2

n∑i=1

W 2n+1,i

(tanh(βhi)

)2+O(n−1)

= log 2 cosh[β

n∑i=1

Wn+1,i tanh(βhi)]− β2

2n

n∑i=1

Ehi∼νi(

tanh(βhi))2

+ on(1)

Further, by the CLT∑n

i=1Wn+1,i tanh(βhi) is approximately Gaussian, with variance (q1−q0) and mean hn+1 :=

∑ni=1Wn+1,iEνi tanh(βhi). We therefore conclude that

EB(1)n ≈n

β2

2− β2

2Eν∼QEh∼ν

(tanh(βh)

)2+

1

mEhn+1

logEG1∼N(0,1)

(2 coshβ(hn+1 +

√q1 − q0G1)

)m.

Finally,using the 1RSB fixed point equations for the first term, and recalling that hn+1 isapproximately N(0, q0), we get

EB(1)n ≈n

β2

2(1− q1) +

1

mEG0 logEG1∼N(0,1)

(2 coshβ(

√q0 +

√q1 − q0G1)

)m.

(2.6.35)

89

Here it is understood that G0, G1 ∼ N(0, 1) independent.

We next consider term B(2)n . By Eq. (2.3.18) and calculations thereafter we have

B(2)n =

β

2n3/2

∑i<j

E[ ∂

∂Wij

∑σ

µ(σ)σiσj

]=

β2

2n2

∑i<j

E[1−

(∑σ

µ(σ)σiσj

)2].

Introducing replicas and recalling the notation 〈 · 〉µ for expectation with respect to theGibbs measure µ⊗2, we have

1

n2

∑i,j

(〈σiσj〉µ

)2=

1

n2

∑i,j

〈σ1i σ

1jσ

2i σ

2j 〉µ = 〈Q2

12〉µ,

where, as usual Q12 ≡ 〈σ2,σ2〉/n. Therefore

EB(2)n ≈n

β2

4

(1− E〈Q2

12〉µ)

≈nβ2

4

(1−

∫q2 ρ(dq)

)(2.6.36)

≈nβ2

4

(1−mq2

0 − (1−m)q21

).

Putting together Eqs. (2.6.35) and (2.6.36), we obtain the 1RSB prediction φ = Ψ1RSB(q0, q1,m),where

Ψ1RSB(q0, q1,m) =β2

2(1− q1) +

β2

4

(1−mq2

0 − (1−m)q21

)+

1

mEG0 logEG1∼N(0,1)

(2 coshβ(

√q0 +

√q1 − q0G1)

)m.

This is easily seen to coincide with the formula derived in Section 2.5.1 using the replicamethod.

2.7 Rigorous bounds via interpolation

A complete proof of the Parisi formula for the Sherrington-Kirkpatrick model goes beyondthe scope of these notes, and we refer to the textbook by Panchenko [Pan13] for an in-depth treatment. We will limit ourselves to stating and proving two key results on the pathtowards a complete proof: the existence of the limit of the free energy density, and the factthat the kRSB free energy functional provides an upper bound on the actual free energydensity. Both results are proved via a surprising interpolation argument, introduced in[GT02, Gue03]. Our presentation will differ from the original one, in that it will be basedon the construction of Ruelle Probability Cascades, as first introduced in [ASS03]. Thisallows for a more transparent argument.

We next state these results, and will devote the rest of this section to their proof.

Theorem 9 (Thermodynamic limit [GT02]). Let Zn(β) denote the partition function(2.2.1), with λ = 0. As n → ∞, n−1 logZn(β) converges almost surely to a determin-istic constant.

90

Theorem 10 (Replica symmetry breaking upper bound [Gue03]). With the same notationsas in the previous theorem

φ := limn→∞

1

nlogZn(β) ≤ inf

k,q,mΨkRSB(q,m),

where ΨkRSB(·, ·) is the kRSB free energy functional (2.5.10).

In the next subsection we provide an outline of the proof of Theorem 9. The rest ofthis section is devoted to the proof of Theorem 10.

2.7.1 Thermodynamic limit: proof outline

First recalling W = (G + GT)/√

2n, for (Gij)i,j≤n ∼iid N(0, 1), we have (omitting forsimplicity the dependence on β)

Zn =∑

σ∈+1,−1nexp

β√2n〈σ,Gσ〉

. (2.7.1)

It is then easy to show that G 7→ n−1 logZn(β) is a Lipschitz function, with Lipschitzconstant c0β/

√n, and therefore by Gaussian concentration (Theorem 14), we have

P∣∣∣ 1n

logZn −1

nE logZn

∣∣∣ ≥ t ≤ 2 e−Cnt2/β . (2.7.2)

It is therefore sufficient to prove that n−1E logZn has a limit (by Borel-Cantelli).The existence of a limit follows from sub-additivity. Recall the following basic analysis

lemma [Fek23].

Lemma 2.7.1. Let (an)n∈N be a sequence of real numbers, satisfying, for all n1, n2:

an1+n2 ≥ an1 + an2 . (2.7.3)

Then the limit limn→∞ an/n exists (eventually equal to supn an/n, which could be ∞).

The key step is then to prove the following inequality

E logZn1+n2 ≥ E logZn1 + E logZn2 . (2.7.4)

Physically, this means that a system of n1 + n2 spins has (expected) free energy that issmaller or equal than the sums of free energies of a system of n1 spins and a system of n2

spins.The proof of this inequality is obtained by defining a HamiltonianHt(σ), σ ∈ +1,−1n

that continuously interpolates between the two cases (two separate systems and a singleone), and showing that the free energy is monotone along this path.

2.7.2 Proof of the replica symmetric upper bound

In this subsection, we prove the Guerra upper bound, Theorem 10, in the k = 0 case(replica symmetric free energy formula).

Recall that in the replica symmetric phase, we expect the coordinates of σ ∼ µ tobe approximately independent, with effective fields which are Gaussian with mean 0 andvariance q. Replica symmetric interpolation explicitly builds this intuition into the proof.

91

Let W = (G+GT)/√

2n, for (Gij)i,j≤n ∼iid N(0, 1), and g ∼ N(0, In) an independentstandard Gaussian vector. We introduce the interpolating partition function (we suppressas before the dependence on β)

Zn(t) =∑

σ∈+1,−1nexp

β√2n

√t〈σ,Gσ〉+ β

√q(1− t)〈g,σ〉

,

and define the interpolating free energy density φn(t) ≡ n−1E logZn(t), where E denotesthe expectation with respect to both G and g variables.

We note that φn(1) = φn is the free energy density of the SK model. For t = 0, thesystem de-couples, and the free energy can be computed directly. We have,

φn(0) =1

nE

log∑σ

exp[β√q∑i

giσi]

= E log 2 cosh(β√qg),

for g ∼ N(0, 1).

Now, we have, φn = φn(1) = φn(0) +∫ 1

0 φ′n(t)dt. Straightforward differentiation yields

dφn(t)

dt= E

[µt

(β〈σ,Gσ〉√8tn3

)− µt

( β√q

2n√

1− t〈g,σ〉)]

:= T1 + T2,

where µt(·) denotes the average with respect to the Gibbs measure corresponding to theinterpolated system. Namely

µt(σ) ≡ 1

Zn(t)exp

β√2n

√t〈σ,Gσ〉+ β

√q(1− t)〈g,σ〉

. (2.7.5)

Using Gaussian integration by parts, we have,

T1 =β2

4n2

n∑i,j=1

E[1−

(µt(σiσj)

)2].

Introducing the replicated Gibbs measure, we have,

T1 =β2

4n2

n∑i,j=1

E[1− µ⊗2

t (σ1i σ

1jσ

2i σ

2j )]

=β2

4

(1− E[µ⊗2

t (Q212)]),

where σ1,σ2 are two i.i.d. replicas drawn from µ and Q12 = 1n〈σ1,σ2〉 is the “overlap”

between the replicas. A similar computation yields

T2 =β2q

2

(1− E[µ⊗2

t (Q12)]).

Combining, we obtain,

dφn(t)

dt= T1 + T2 =

β2

4(1− q)2 − β2

4E[µ⊗2t

((Q12 − q)2

)]≤ β2

4(1− q)2.

92

Thus we obtain

φn = φn(0) +

∫ 1

0

dφn(t)

dtdt

= E log 2 cosh(β√qg) +

β2

4(1− q)2 − β2

4E[µ⊗2t

((Q12 − q)2

)]= ΨRS(q)− β2

4E[µ⊗2t

((Q12 − q)2

)].

where the expression ΨRS(q) coincides with the replica symmetric formula already derivedin Section 2.2. Note that the above sequence of equalities have two interesting consequences:

1. φn ≤ infq≥0 ΨRS(q) for any n. The replica symmetric formula provides an entirelynon-asymptotic upper bound.

2. The RS asymptotics is correct (i.e. limn→∞ φn = infq ΨRS(q)) if and only if E[µ⊗2t

((Q12−

q)2)]→ 0 (in probability over t), i.e. if and only if the overlap concentrates around

a t-independent value.

2.7.3 Ruelle Probability Cascades

To establish Guerra’s replica symmetry breaking upper bound, a family of random measurescalled the Ruelle Probability Cascades (henceforth denoted as RPCs) will prove to beextremely useful. We introduce these random measures in this section, and study some oftheir properties to familiarize ourselves with them.

Our starting point is the PPP xα : α ∈ N with intensity function Λ(x) = m exp(−mx),which we already studied in Section 2.6.1 (with a slightly different normalization). We firstexamine some integrability properties of these point processes.

Lemma 2.7.2. For 0 < a < m, E[(∑

α exα)a]

<∞.

Proof. We note that for x, y > 0 and a ≤ 1, (x+ y)a ≤ xa + ya and therefore,

E[(∑

α

exα)a]≤ E

[(∑α

exα1xα>0

)a]+ E

[(∑α

exα1xα<0

)a].

To control the first term, we use the same observation as above to obtain

E[(∑

α

exα ]1xα>0

)a]≤ E

[∑α

eaxα1(xα > 0)]

=

∫ ∞0

me−(m−a)x dx =m

m− a <∞.

We note that for z > 0 and a < 1, za ≤ 1 + z and thus, for the second term, we have,

E[(∑

α

exα1xα<0

)a]≤ 1 + E

[∑α

exα1xα<0

]= 1 +

∫ 0

−∞me−(1−m)x dx = 1 +

m

1−m.

This completes the proof.

93

Lemma 2.7.2 allows us to establish the following property of these Poisson processes.

Lemma 2.7.3. For xα : α ∈ N a PPP with intensity Λ(x) = m exp(−mx)dx. Then

log(∑

α exα)

is integrable.

Proof. It suffices to establish that[

log(∑

α exα)]

+and

[log(∑

α exα)]−

are integrable.

To this end, we note that for z ≥ 1, log z ≤ za/a for some a small enough, and thus[log(∑

α

exα)]

+≤ 1

a

(∑α

exα1xα>0

)a.

Lemma 2.7.2 immediately establishes the integrability of the positive part. The control ofthe negative part is more direct. We have

∑α exp[xα] ≥ exp[x(1)], where x(1) denotes the

largest point in the process. Thus the proof is complete if we can prove that E[(x(1))−] <∞.This follows from the classical observation that x(1) has the Gumbel distribution and thata Gumbel distribution is integrable. Indeed,

P[x(1) ≤ t] = P[Poisson

(∫ ∞t

me−mx dx)

= 0]

= exp(− e−t

).

This completes the proof.

Let ∆α : α ∈ N be a set of i.i.d. marks such that E[exp(m∆1)] < ∞. Lemma2.6.3 establishes that xα + ∆α : α ∈ N is another PPP with intensity CmΛ(x)dx, withCm = E[exp(m∆)].

This invariance property, together with Lemma 2.7.3, allows us to establish the followingfact, which we already used informally in the context of the 1RSB cavity method in Section2.6.4.

Lemma 2.7.4. Let xα : α ∈ N be a PPP with intensity Λ(x) = m exp[−mx]dx and let∆α : α ∈ N be i.i.d. marks with Cm = E[exp[m∆1]] <∞. Then we have,

E[

log(∑

α

exp[xα + ∆α])]

= E[

log(∑

α

exp[α])]

+1

mlogE[exp[m∆1]].

Proof. Using Lemma (2.6.3), we immediately have,

log(

exp[∑

α

exα+∆α

])d= log

(∑α

exα)

+1

mlogCm.

Both sides are integrable using Lemma 2.7.3, and taking an expectation completes theproof.

We will see that similar formulae will play a crucial role in the context of RPCs andthe proof of the Guerra replica symmetry breaking upper bound.

Note that the last statement can be rewritten in terms of the probability measureνα ∝ exp(xα):

E[

log(∑

α

ναe∆α

)]=

1

mlogE[em∆1 ] . (2.7.6)

94

We already saw in Section 2.6.4 that the right hand side gives rise to a term in the 1RSBfree energy functional.

To define the RPC, we fix k ≥ 1 and a set of values 0 < m1 < · · · < mk < 1. Themeasure constructed will be indexed by Nk. It is very helpful to think of Nk as the leaves ofa rooted infinite tree T = (V,E), with vertex set V = N0∪N1∪· · ·Nk. The root of the treewill be denoted by o. The vertices 1, 2, · · · will be the children of the root, and joinedto the root via edges in the tree T . The vertices (1, n) : n ≥ 1 will similarly be childrenof the vertex 1. In general, for any 1 ≤ l ≤ k − 1, consider the vertex α = (n1, · · · , nl).Then all vertices αn = (n1, · · · , nl, n) will be children of α and joined to α via edges.We denote by |α| the level of vertex α ∈ V . Further, for α = (n1, · · · , nl), we denote byP(α) = o, n1, n1n2, · · · , n1n2 · · ·nl the unique path in the tree joining the root o to thevertex α.

For each α ∈ V , |α| = l ≤ k, let Πα be a PPP with intensity measure Λl+1(x)dx =ml+1 exp(−ml+1x)dx. The point processes Πα : α ∈ V are mutually independent. Wenote that each Πα is almost surely countable, and therefore, for α ∈ V with |α| ≤ k − 1,the elements of Πα may be naturally associated to the children of α. Formally, for Πα =xα1, xα2, · · · , we associate xαn with the child αn of the vertex α. For |α| ≤ k, we definezα =

∑β∈P(α) xβ.

Finally, we define a (random) probability measure indexed by α ∈ Nk:

να =1

Zνezα , α ∈ Nk , Zν ≡

∑β∈Nk

ezβ . (2.7.7)

This is the RPC corresponding to the parameters m = (m1, · · · ,mk). The next lemmaestablishes that the RPC is well defined, in that the weights can be normalized almostsurely to yield a valid probability measure.

Lemma 2.7.5. For k ≥ 1 and m = (m1, · · · ,mk),∑

α∈Nk exp(zα) <∞ almost surely.

Proof. The proof proceeds by induction on k. For k = 1, zα : α ∈ N is the PPP withintensity Λ1(x)dx = m1 exp(−m1x)dx. This case was already established in Lemma 2.6.2,point 3.

Now we assume that the result has been verified for RPC’s with k − 1 levels. Then wehave, using the definition of RPC∑

α:|α|=k

ezα =∑

α:|α|=k−1

ezα(∑

n

exαn)

=∑

α:|α|=k−1

exp[zα]Uα,

where we define Uα =∑

n exp(xαn) for α ∈ V with |α| ≤ k − 1. Note that for any αwith |α| = k − 2, xαn : n ∈ N is a PPP with intensity mk−1 exp(−mk−1x)dx. FurtherUαn : n ∈ N are i.i.d. marks, independent of xαn : n ∈ N. Finally, using the definitionof Uαn and Lemma 2.7.2, we have, for |α| = k − 2,

E[Umk−1

α1 ] = E[(∑

n

exp[xα1n])mk−1

]<∞,

since xα1n : n ∈ N is a PPP with intensity mk exp(−mkx)dx and mk−1 < mk. The proofof Lemma 2.7.4 establishes that, always for |α| = k − 2,∑

n

exαnUαnd= Ck−2

∑n

exαn ,

95

for Ck−2 =(E[U

mk−1

α1 ])1/mk−1

. Thus we have,

∑α:|α|=k

ezαd= Ck−2

∑α:|α|=k−2

ezα∑n

exαn = Ck−2

∑α:|α|=k−1

ezα <∞ a.s.,

where the last assertion follows from the induction hypothesis. This completes the proof.

2.7.4 The RSB upper bound: Proof of Theorem 10

In this section, we establish Guerra’s replica symmetry breaking upper bound.

We start by reminding the form of the kRSB free energy functional of Section 2.5,albeit in a slightly different form. For k ≥ 1, fix 0 = m0 < m1 ≤ · · · ≤ mk−1 ≤ mk = 1and a function ψk : Rk → R. For g1, · · · , gk ∼iid N(0, 1), we define the following sequenceof random variables

ψk = ψk(g1, · · · , gk),...

ψl = ψl(g1, · · · , gl) =1

ml+1logE

[exp(ml+1ψl+1(g1, · · · , gl+1))

∣∣∣g1, . . . , gl

],

...

ψ0 =1

m1logE

[exp[m1ψ1(g1)]

]. (2.7.8)

Note that, as implied by the notation ψl is a measurable function of (g1, . . . , gl). Jensen’s in-equality implies that the construction above is well defined as soon as E exp(mkψk(g1, · · · , gk)) <∞. We recall that the kRSB free energy is recovered as follows:

ψk(x1, · · · , xk) = log 2 cosh(β

k∑i=1

√qi+1 − qi xi

),

ΨkRSB(q,m) = −1

4β2 +

1

4β2

k∑i=0

(mi+1 −mi)q2i + ψ0 , (2.7.9)

where 0 = q0 ≤ q1 ≤ · · · ≤ qk ≤ qk+1 = 1 and mk+1 = 1.

The next lemma establishes a connection between the recursive construction of therandom variable ψk, and the RPC’s introduced in the last section.

Lemma 2.7.6. Consider the RPC να : |α| = k corresponding to the parameters m =(m0, · · · ,mk−1,mk). Let gα : α ∈ N ∪ · · · ∪ Nk be a family of independent standardGaussian variables independent of the RPC. For any α = (n1, n1n2, · · · , n1n2 · · ·nk), wedefine Gα = (gn1 , gn1n2 , · · · , gn1n2···nk).

Then, for any function ψk : Rk → R such that E exp(ψk(g1, · · · , gk)) <∞, we have

ψ0 = E[

log(∑

α

να exp[ψk(Gα)])].

96

Proof. Recall that να ∝ exp(zα), zα =∑

β∈P(α) xβ, where xβ : β ∈ V \o is constructedusing i.i.d. PPP’s at each level in the k-level infinite tree. Thus it suffices to prove

E[

log(∑

α

exp(zα + ψk(Gα)))]

= E[

log(∑

α

exp(zα))]

+ ψ0.

We use induction on k. In fact, we prove the stronger statement: for any sequence hα : α ∈Nk satisfying E[exp(mkh1)] < ∞ and independent of να : α ∈ Nk and gα : α ∈ V \φ,we have,

E[

log(∑

α

exp(zα + ψk(Gα) + hα

))]= E

[log(∑

α

exp(zα + hα

))]+ ψ0. (2.7.10)

We note that the Lemma follows upon setting hα = 0.

The case k = 1 corresponds to a PPP with intensity m1 exp(−m1x)dx and followsimmediately using Lemma 2.7.4. Indeed,

E[

log(∑

α

exp(zα + ψk(Gα) + hα

))]= E

[log(∑

α

exp(zα))]

+1

m1logE[em1h1 ] + ψ0

= E[

log(∑

α

exp(zα + hα

))]+ ψ0.

Next, we assume that (2.7.10) holds up to k − 1 levels. Then we have,∑α∈Nk

exp[zα + ψk(Gα) + hα] =∑

α∈Nk−1

ezα∑n

exp[xαn + ψk(Gα, gαn) + hαn].

We set Qα =∑

n exp(xαn + ∆αn), with ∆αn = ψk(Gα, gαn) + hαn. Define the sigma-field Fk−1 = σ(xα, gα : |α| ≤ k − 1). Note that xαn : n ∈ N is a PPP with intensitymk exp(−mkx)dx and conditional on Fk−1, ∆αn are i.i.d. marks, independent of xαn :n ∈ N for each α ∈ Nk−1. Further, we have,

(E[

exp[mk∆αn]|Fk−1

]) 1mk =

(E[

exp[mkh]]) 1

mk

(Egαn

[exp[mkψk(Gαn, gαn)]

]) 1mk .

We set cmk = E[exp(mkh)] and define∑

n exp(xαn) = exp(uα). Using the recursive con-struction of the functions ψl, we note that

Qα∣∣Fk−1

d= ceψk−1(Gα)

∑n

exαn . (2.7.11)

(The two random variables have the same distribution conditional on Fk−1.)

Note that xαn : n ∈ N is independent of Gα : |α| ≤ k − 1 and thus the equality indistribution is true unconditionally. Thus we have established that∑

α∈Nkexp[zα + ψk(Gα) + hα]

d= c

∑α∈Nk−1

exp[zα + ψk−1(Gα) + uα],

∑α∈Nk

ezα+hα d= c

∑α∈Nk−1

ezα+uα ,

97

where the last relation follows upon setting ψk = 0. Finally, Lemma 2.7.2 implies thatE[exp[mk−1uα]] <∞ and thus

E[

log( ∑α∈Nk

exp(zα + ψk(Gα) + hα

))]= log c+ E

[log( ∑α∈Nk−1

exp(zα + ψk−1(Gα) + uα

))],

E[

log( ∑α∈Nk

exp(zα + hα

))]= log c+ E

[log( ∑α∈Nk−1

ezα+uα)].

Therefore Eq. (2.7.10) follows directly for k-level RPCs using the induction hypothesis.This completes the proof.

We now collect these results to express the kRSB free energy functional of Eq. (2.7.9)in terms of RPCs.

Lemma 2.7.7. Let να : α ∈ Nk be the weights of a RPC with parameters m =(m0, · · · ,mk−1,mk) with m0 = 0 ≤ m1 ≤ · · · ≤ mk−1 < mk = mk+1 = 1. Letgα : α ∈ N ∪ · · · ∪ Nk be i.i.d. standard Gaussian random variables, independent ofthe RPC. Define the following processes indexed by α ∈ Nk:

hα ≡ β∑

γ∈P(α)

√q|γ| − q|γ|−1 gγ , (2.7.12)

sα = β∑

γ∈P(α)

√q2|γ|

2−q2|γ|−1

2. (2.7.13)

where q0 = 0 ≤ q1 ≤ · · · ≤ qk−1 ≤ qk = 1. Then we have,

ΨkRSB(q,m) = E[

log(

2∑α

να cosh(hα))]− E

[log(∑

α

να esα)]. (2.7.14)

Proof. Equation (2.7.9) already implies that

ΨkRSB(q,m) = E[

log(

2∑α

να cosh(hα))]− 1

4β2 +

1

4β2

k∑i=0

(mi+1 −mi)q2i .

It is therefore sufficient to compute the second term of the formula (2.7.14).Define the function ψk : Rk → R by

ψk(x1, · · · , xk) = βk∑i=1

√q2i

2− q2

i−1

2xi .

We define the random variable ψ0 using the recursive construction (2.7.8), starting withψk. Note that

ψk−1(g1, . . . , gk−1) =1

mklogEgk

[exp

(βmk

k∑i=1

√q2i

2− q2

i−1

2gi

)]

= β

k−1∑i=1

√q2i

2− q2

i−1

2gi +

β2

4mk(q

2k − q2

k−1).

98

Continuing this recursion, we obtain

ψ0 =β2

4

k∑i=1

mi(q2i − q2

i−1)

=β2

4− β2

4

k∑i=0

(mi+1 −mi)q2i .

An application of Lemma 2.7.6 completes the proof.

Before proving the replica symmetry breaking upper bound, we need one final ingredient—a Gaussian integration by parts lemma. This result is a version of Stein’s lemma whoseproof is immediate with the only technical difficulty that we consider Gaussian processesindexed by countable sets (see, e.g., [Pan13]).

Lemma 2.7.8. Let Σ be a countable set and let x(σ), y(σ) : σ ∈ Σ be two centeredGaussian processes indexed by Σ. Define C(σ1, σ2) = E[x(σ1)y(σ2)]. For any measure µ0

on Σ, consider the probability measure µ(σ) ∝ exp[y(σ)]µ0(σ). Denoting the expectation ofa function f under µ as µ(f), we have,

E[µ(x(σ))] = E[µ⊗2(C(σ1, σ1)− C(σ1, σ2))],

where σ1, σ2 are i.i.d. samples drawn from µ.

We are finally in position to prove Theorem 10. The argument is similar to the one forreplica symmetric upper bound given in Section 2.7.2. The main difference is that insteadof interpolating between the Sherrington-Kirkpatrick measure and a product measure withi.i.d. gaussian effective fields, we interpolate to a system where the effective field on eachspin is given by a RPC. Indeed, this form for the effective field is crucial for the emergenceof the Parisi formula.

Proof of Theorem 10. Recall the SK HamiltonianH(σ) = (β/2)∑

i,j≤nWijσiσj = (β/2)〈σ,Wσ〉,where W = (G+GT)/

√2n, and G = (Gij)i,j≤n are i.i.d. standard Gaussian random vari-

ables. It is useful to recall that H(σ)σ∈+1,−1 is a centered Gaussian process withcovariance

E[H(σ1)H(σ2)] =β2

2nE[〈G, σ1 ⊗ σ1〉〈G, σ2 ⊗ σ2〉

]=β2

2n〈σ1,σ2〉2 . (2.7.15)

Fix parameters m = (m0, · · · ,mk−1,mk), q = (q0, · · · , qk−1, qk) as in the statementof Lemma 2.7.7, and let να : α ∈ Nk be an RPC corresponding to the parameters m,independent of the disorder variables Wij : i, j ≤ n. Recall also the Gaussian fieldshα : α ∈ N ∪ · · · ∪ Nk and sα : α ∈ N ∪ · · · ∪ Nk introduced in Eqs. (2.7.12), (2.7.13).Let hα,i : α ∈ Nk, 1 ≤ i ≤ n and sα,i : α ∈ Nk, 1 ≤ i ≤ n be i.i.d. copies of hα sαrespectively, independent of both Wij : i, j ≤ n and να : α ∈ Nk.

For t ∈ [0, 1] define the interpolating Hamiltonian

Ht(σ, α) =√tH(σ) +

√t

n∑i=1

sα,i +√

1− tn∑i=1

hα,iσi. (2.7.16)

99

and define the interpolating partition function and free energy density as

Zn(t) ≡∑σ,α

να exp[Ht(σ, α)] , (2.7.17)

φn(t) ≡ 1

nE[logZn(t)] , (2.7.18)

where E denotes expectation over the joint distribution of all random variables.First, we evaluate the interpolating free energy at t = 0 and t = 1. At t = 0, we have,

φn(0) =1

nE[

log[∑

α

να

[ n∏i=1

2 cosh(hα,i)]]]

.

To evaluate the right-hand side, let g1, · · · , gn ∼ N(0, Ik) be i.i.d. random vectors, with

gi = (gl,i : 1 ≤ l ≤ k). Consider now ψ(n)k = log

[∏ni=1 2 cosh(β

∑kl=1

√ql − ql−1gl,i)

].

We define ψ(n)l , 0 ≤ l ≤ k, by the same recursion as (2.7.8), with the expectation over gl

replaced by the expectation with respect to the variables gl,i : 1 ≤ i ≤ n. An obvious

modification of Lemma 2.7.6 now implies that φn(0) = ψ(n)0 /n. Finally, we note that

ψ(n)k =

n∑i=1

log 2 cosh(βk∑l=1

√ql − ql−1gl,i).

Thus using the independence of the Gaussian variables, we have,

φn(0) =1

(n)0 = ψ0 = E

[log(

2∑α

να cosh(hα))].

Next, consider the case t = 1:

φn(1) =1

nE[

log(∑

σ

eβH(σ))]

+1

nE log

(∑α

να exp[ n∑i=1

sα,i

])(2.7.19)

= φn + E[

log(∑

α

ναesα)],

where φn ≡ n−1E[

log(∑

σ exp[H(σ)])]

is the SK free energy density, and the last equality

follows from the same considerations as for φn(0).The claim of the theorem follows if we establish that φn(1) ≤ φn(0). To show this, we

will proceed as in the replica symmetric upper bound, namely we will prove that φ′n(t) ≤ 0for all t ∈ [0, 1]. It is useful to define the random Gibbs measure µt on the augmented space+1,−1n×Nk associated with the HamiltonianHt(σ, α). Namely, for (σ, α) ∈ ±1n×Nk,we have,

µt(σ, α) ≡ 1

Zn(t)να e

Ht(σ,α) .

We express the derivative of the interpolating free energy in terms of this Gibbs mea-sure. Denoting by µt(f) ≡∑σ,α f(σ, α)µt(σ, α) the expectation of any function under the

measure µt(·) and defining C((σ1, α1), (σ2, α2)) ≡ E[∂tHt(σ1, α1)Ht(σ

2, α2)], we have,

φ′n(t) =1

nE[µt

( ∂∂tHt(σ, α)

)]=

1

nE[µt

(C((σ1, α1), (σ1, α1))− C((σ1, α1), (σ2, α2))

)]. (2.7.20)

100

The last equality in the display above follows using Gaussian integration by parts, as pre-sented in Lemma 2.7.8, with (σ1, α1), (σ2, α2) two i.i.d. samples drawn from the measureµt(·). Now, (2.7.16) implies

∂tHt(σ, α) =

1

2√tH(σ) +

1

2√t

n∑i=1

sα,i −1

2√

1− t

n∑i=1

hα,iσi.

The covariance C((σ1, α1), (σ2, α2)) may now be obtained directly.

1

nC((σ1, α1), (σ2, α2)) =

1

2nE[H(σ1)H(σ2)] +

1

2E[sα1sα2 ]− 1

2n

n∑i=1

E[hα1hα2 ]σ1i σ

2i .

:= T1 + T2 + T3.

Term T1, was already computed in Eq. (2.7.15), yielding

T1 =β2

4Q2

12,

where we set, as usual, Q12 ≡ 〈σ1, σ2〉/n.To evaluate T2 and T3 we recall that α1, α2 ∈ Nk may be thought of as leaf vertices in

the infinite tree with k-levels. We denote by α1∧α2 the most recent ancestor of α1, α2 anduse |α1 ∧ α2| to denote the level of this ancestor. Therefore, we have,

T2 =β2

2E[sα1sα2

]=β2

4

∑γ∈P(α1∧α2)

(q2|γ| − q2

|γ|−1) =β2

4q2α1∧α2 .

An analogous computation yields T3 = β2

2 q|α1∧α2|Q12. Combining, we obtain,

1

nC((σ1, α1), (σ2, α2)) = T1 + T2 + T3 =

β2

4(Q12 − q|α1∧α2|)

2.

We note that C((σ1, α1), (σ1, α1)) = 0 and thus, Eq. (2.7.20) yields

φ′n(t) = −β2

4Eµ⊗2t

((Q12 − q|α1∧α2|)

2)≤ 0 . (2.7.21)

This completes the proof.

Bibliographic notes

We refer the interested reader to [Pan13, Tal10, Tal11] for rigorous textbook introductionsto mean-field spin glasses and related models arising in statistical physics. The book[Pan13] also contains an excellent account of the historical development of this area inmathematical probability.

The RS formula for the limiting mutual information of the Z2 synchronization prob-lem (corresponding to the case β = λ) was established rigorously using several differentapproaches. Among others, [DAM16] derive the limit of the free energy by utilizing analgorithmic approach based on the Bayes optimal Approximate Message Passing scheme,and a related approach was developed in [BDM+18]. A result for general distributions of

101

the spike x0 (always with i.i.d. coordinates) was established in [LM19]. The upper boundis derived using Guerra interpolation, while the lower bound is established the Aizenman-Sims-Starr scheme along with an “overlap concentration” formula. An alternative proofis obtained using the elegant “adaptive interpolation method” [BM19]. In this approach,one establishes first that the overlap concentrates at a fixed value, and then adapts theinterpolation path according to the value of this overlap. A third independent proof wasobtained in [EAK18]—they directly analyzed the partition function restricted to specificvalues of the overlap between the sample and the planted signal, and established that theglobal optimum is attained at the replica symmetric fixed point. Moving beyond the lim-iting log-partition function, [EAKJ20] characterized O(1) fluctuations for the log-partitionfunction below the IT threshold. This establishes that below the IT threshold, the nullmodel (i.e. λ = 0) and the planted model are mutually contiguous, and that detection andestimation have the same information theoretic threshold in these spiked matrix models.

The connections between extremal cuts of random graphs and ground states of mean-field spin glasses has been explored in-depth. [Sen18] extended the results in [DMS17] to ageneral class of discrete optimization problems on sparse random graphs and hypergraphs.The special case of the random KSAT problem was treated in [Pan18c], and unbalancedcuts in [JS20]. These results are intimately connected to the question of universality of theground state of the Sherrington-Kirkpatrick model to the law of the disorder [CH06, Cha05].

Approximate Message Passing algorithms have been used crucially in recent years toanalyze diverse estimators in high-dimensional inference problems e.g. the LASSO [BM12],M-estimators [DM16], Maximum Likelihood Estimators [SC19]. We refer the interestedreader to the recent survey [FVRS21] for an in-depth discussion of these results.

The Approximate Message Passing Algorithm has been extended in several ways, andstronger state evolution guarantees are now known; see e.g. [JM13, Ran11, BMN20, RSF19,Fan20]. [CL21] extended the universality guarantees of the AMP algorithm to Lipschitznonlinearities, and AMP algorithms which depend on all the past iterates.

Analogues of the Parisi formula have been established for vector spin glasses [Pan18a],Potts spin glass models [Pan18b], and multispecies models with positive definite interac-tions [Pan15, BCMT15, BS21]. This line of work crucially exploits the notion of “syn-chronization” due to Panchenko, which facilitates the lower bound computation in thesemodels.

Exercises

Exercise 2.1. The Ising p-spin model is the following probability distribution over σ ∈+1,−1n:

µβ,n(σ) =1

Zn(β)exp

β√

2nk−2(k!)〈W ,σ⊗k〉

, (2.7.22)

where W ∈ (Rn)⊗k is a zero-mean Gaussian tensor, with the same distribution as inSection 1.1.1. In other words, this is the same as the spherical p-spin model studied inChapter 1, except for the normalization of β, and the fact that σ ∈ +1,−1n. For k = 2,it corresponds to the Sherrington-Kirkpatrick model studied in this chapter.

Use the replica method to derive the replica symmetric free energy and the corre-sponding stationarity conditions. It might be useful to recall the following consequenceof Varadhan’s lemma. Let xn be a sequence of random vectors taking values in a fixed

102

bounded convex set K ⊆ Rd, with law µn. Assume that µn satisfies a large deviationsprinciple with convex rate function I(x). Let f : K → R be continuous. Then

limn→∞

1

nlogEenf(xn) = sup

x0∈KinfλA(λ,x0) , (2.7.23)

A(λ,x0) = f(x0)− 〈λ,x0〉+ limn→∞

1

nlogEen〈λ,xn〉 . (2.7.24)

Exercise 2.2. Consider the Sherrington-Kirkpatrick model, namely the probability mea-sure (2.1.8) with λ = 0. In this case the replica symmetric prediction is given by Eq. (2.2.6),with b = 0. We denote the free energy functional by ΨRS(q;β).

(a) Plot the function ΨRS(q;β) versus q for two values of the inverse temperature: β1 < 1,and β2 > 1.

(b) Write a program to solve the stationary condition for q, and plot the resulting valueq∗(β) as a function of 1/β.

(c) Compute the replica symmetric prediction for the free energy density ΨRS(q∗(β);β)and plot it as a function of 1/β.

(d) By taking suitable derivatives of the free energy, compute the replica symmetric pre-diction for the entropy density limn→∞H(µn)/n. Check that this becomes negativeat low temperature.

(e) By considering the limit β → ∞, compute the replica symmetric prediction for themaximum energy limn→∞maxσ∈+1,−1n〈σ,Wσ〉/n.

Exercise 2.3. We consider again the Sherrington-Kirkpatrick model, with λ = 0.

(a) Write the 1RSB expression for the free energy Ψ1RSB(q0, q1,m;β). This expressiondepends on the overlap parameters 0 ≤ q0 ≤ q1 ≤ 1 and the RSB parameter 0 ≤ m ≤1.

(b) Write a program to evaluate Ψ1RSB(q0, q1,m;β). Fix a value of the inverse tem-perature β > 1 and plot Ψ1RSB(q0 = 0, q1,m;β) as a function of (q1,m) ∈ [0, 1] ×[0, 1]. Minimize this function numerically over q1 and m to determine the minimizer(q1,∗,m∗) as well as the corresponding free energy estimate Ψ1RSB(q0 = 0, q1,∗,m∗;β).

(c) Recall that we defined the effective field on spin σi as hi = (1/β)atanh∑

σ µn(σ)σi

,and the asymptotic distribution of effective fields Qh as follows (where the limit isunderstood in the weak sense)

Qh = limn→∞

E

1

n

n∑i=1

δhi

. (2.7.25)

Assuming the 1RSB approximation, and hence overlap distribution ρ = (1−m)δq1 +mδq0 , write an expresssion for Qh.

Evaluate the expression for q0 = 0, q1 = q1,∗, m = m∗, and inverse temperature βchosen for the previous part. Plot the resulting density Qh. Compare this densitywith the Gaussian prediction obtained within the replica symmetric theory.

103

Exercise 2.4. Consider the model (2.1.8) with Y given by Eq. (2.1.1) with λ > 0 andh = 0. From the point of view of statistics, this is the Z2 synchronization problem with anon-vanishing signal-to-noise ratio. From the physics perspective, you can look at this asthe Sherrington-Kirkpatrick model with a ferromagnetic interaction.

(a) Write the 1RSB expression for the free energy Ψ1RSB(q0, q1, b;m). Notice that this willdepend on overlap parameters q0, q1, on the RSB parameter m and on the parameterb that corresponds to the bias in the direction of the signal x0.

(b) Write a program to find the stationary point of the 1RSB free energy, (q0,∗, q1,∗,m∗, b∗).

How can you compute the asymptotic accuracy M(β, λ) ≡ limn→∞ E|〈x0, σ(β)〉|/nin terms of this stationary point? Here σ(β) ≡ ∑σ µβ(σ)σ (where µβ is obtainedfrom µβ tilting by an infinitesimal amount in the direction of x0).

(c) Plot the 1RSB result for M(β, λ) as a function of λ ∈ [0, 3] for β ∈ 1, 2, 4, 8 (thusobtaining four curves).

104

Appendix A

Probability theory inequalities

Good references on this subject are [BLM13, Ver12].

A.1 Basic facts

Theorem 11 (Gaussian Integration by Parts). Let X ∼ N(0, 1) and f : R → R weaklydifferentiable. Assume that E[Xf(x)] and E[f ′(X)] are both well-defined. Then we have,

E[Xf(X)] = E[f ′(X)].

A.2 Basic inequalities

Lemma A.2.1 (Markov inequality). Let X be a non-negative random variable with E[X] <

∞. Then for any a > 0, P[X > a] ≤ E[X]a .

Lemma A.2.2 (Chebychev inequality). Let X be a real-valued random variable withE[X2] <∞. Then for any a > 0,

P[|X − E[X]| > a] ≤ Var(X)

a2.

Theorem 12 (Paley-Ziegmund inequality). Let X be a random variable with finite secondmoment and E(X) > 0. Then, for any t > 0, we have

P(X ≥ tEX

)≥ (1− t)3 (EX)2

E(X2). (A.2.1)

A.3 Concentration inequalities

Theorem 13 (McDiarmid inequality). Let F : X n → R be any function satisfying∣∣F (x1, . . . , xi−1, xi, xi+1, . . . , xn)− F (x1, . . . , xi−1, x′i, xi+1, . . . , xn)

∣∣ ≤ Li (A.3.1)

for all x = (x1, . . . ,xn) ∈ X n, and x′i ∈ X . If X = (X1, . . . , Xn) is a random vector takingvalues in X n, with independent coordinates, then, for any t ≥ 0

P(F (X) ≥ EF (X) + t

)≤ exp

− t2

2∑n

i=1 L2i

. (A.3.2)

105

Theorem 14 (Gaussian concentration). Let F : Rd → R be an L-Lipschitz function, i.e.a function such that, for all x,y ∈ Rd,

|F (x)− F (y)| ≤ L ‖x− y‖2 . (A.3.3)

If X ∼ N(0, Id) is a standard normal vector, then, for any t ≥ 0

PF (X) ≥ EF (X) + t

≤ e−

t2

2L2 . (A.3.4)

106

Appendix B

Summary of notations

B.1 Matrix Norms

For any matrix A ∈ Rd×d, define the operator norm ‖A‖2 = sup‖x‖2=1 ‖Ax‖2. On the

other hand, we define the Frobenius norm ‖A‖F =(∑

i,j A2ij

)1/2.

B.2 Asymptotics

Given two (strictly positive) functions f(n), g(n), we write f(n).= g(n) if they are equiv-

alent to leading exponential order, i.e. if limn→∞ n−1 log[f(n)/g(n)] = 0. For any two

sequences an, bn, we say an = On(bn) if there exists a constant C > 0 such that an/bn ≤ Cfor all n ≥ 1. Similarly, we say an = on(bn) if an/bn → 0 as n→∞.

B.3 Probability

(i) Convergence in Probability—For a sequence of random variables Xn : n ≥ 1 and

X defined on the same probability space (Ω,F ,P), we say that XnP→ X if for every

ε > 0, P[|Xn −X| > ε]→ 0 as n→∞. Given any positive sequence of real numbers

an : n ≥ 1, we say that Xn = on(an) if Xn/anP→ 0.

(ii) Convergence almost surely—For a sequence of random variables Xn : n ≥ 1 andX defined on the same probability space (Ω,F ,P), we say that Xn

a.s.→ X if P[ω :Xn(ω)→ X(ω)] = 1.

(iii) Convergence in distribution—Let X be a complete, separable metric space. LetXn : n ≥ 1, X be random variables taking values in X . We say that Xn

w⇒ Xif for all f : X → R bounded continuous, E[f(Xn)] → E[f(X)]. Equivalently, forany sequence of probability measures µn : n ≥ 1 and µ on X , we say µn

w⇒ µ if∫fdµn →

∫fdµ for all f : X → R bounded continuous.

(iv) For any two real-valued random variables X,Y , we say Xd= Y if for all x ∈ R,

P[X ≤ x] = P[Y ≤ x].

107

B.4 Probability distributions

(i) A real-valued random variable X ∼ N(0, 1) if X has a probability density function

fX(x) = 1√2π

exp(−x2

2 ) on R. For any µ ∈ R and σ > 0, we say Y ∼ N(µ, σ2) if

Yd= µ+ σX. For σ > 0, Y has probability density fY (x) = 1√

2πσexp(− (x−µ)2

2σ2 ).

For any d ≥ 1, an Rd valued random vector g ∼ N(0, Id) if g = (g1, · · · , gd),g1, · · · , gd ∼ N(0, 1) are i.i.d random variables. For any µ ∈ Rd and Σ ∈ Rd×d

non-negative definite, we say h ∼ N(µ,Σ) if hd= µ + Σ1/2g, where Σ1/2 is the

matrix square-root of Σ. If Σ is positive definite, h has density

fh(x) =1

(2π)d/2√

det(Σ)exp

(− 1

2(x− µ)>Σ−

12 (x− µ)

), x ∈ Rd.

(ii) We say X ∼ Poisson(λ) if P[X = k] = exp(−λ)λk

k! , k ∈ N := 0, 1, 2, · · · .

(iii) Let X be a closed subset of Rd. We define a Poisson Point Process (PPP) N withintensity m : X → R+ as follows:

– For any k ≥ 1 and B1, · · · ,Bk measurable disjoint subsets of X , N (B1), · · · N (Bk)are independent random variables taking values in N.

– For 1 ≤ i ≤ k, N (Bi) ∼ Poisson(∫Bim(x)dx).

(iv) Let G = (Gij) ∈ Rn×n be a random matrix with i.i.d. N(0, 1) entries. Set W = (G+G>)/

√2n. W is a symmetric random matrix, with Wij : i ≤ j independent, mean-

zero normal random variables. Further, Var(Wij) = (1 + 1i=j)/n. This distributionis referred to as the “Gaussian Orthogonal Ensemble” (GOE) in the literature. Fornotational compactness, we use W n ∼ GOE(n) in these notes.

108

Bibliography

[AAC13] Antonio Auffinger, Gerard Ben Arous, and Jirı Cerny, Random matri-ces and complexity of spin glasses, Communications on Pure and AppliedMathematics 66 (2013), no. 2, 165–201.

[ABE+05] Sanjeev Arora, Eli Berger, Hazan Elad, Guy Kindler, and Muli Safra, Onnon-approximability for quadratic programs, Foundations of Computer Sci-ence, 2005. FOCS 2005. 46th Annual IEEE Symposium on, IEEE, 2005,pp. 206–215.

[AC15a] Antonio Auffinger and Wei-Kuo Chen, On properties of parisi measures,Probability Theory and Related Fields 161 (2015), no. 3-4, 817–850.

[AC15b] , The Parisi formula has a unique minimizer, Communications inMathematical Physics 335 (2015), no. 3, 1429–1444.

[AG97] G Ben Arous and Alice Guionnet, Large deviations for wigner’s law andvoiculescu’s non-commutative entropy, Probability theory and related fields108 (1997), no. 4, 517–542.

[And88] Philip W Anderson, Spin glass I: A scaling law rescued, Physics Today 41(1988), no. 1, 9–11.

[And90] , Spin glass VII: Spin glass as a paradigm, Physics Today (1990).

[ASS03] Michael Aizenman, Robert Sims, and Shannon L Starr, Extended varia-tional principle for the sherrington-kirkpatrick spin-glass model, PhysicalReview B 68 (2003), no. 21, 214403.

[AT07] Robert J Adler and Jonathan E Taylor, Random fields and geometry,vol. 80, Springer, 2007.

[BBAP05] Jinho Baik, Gerard Ben Arous, and Sandrine Peche, Phase transition of thelargest eigenvalue for nonnull complex sample covariance matrices, Annalsof Probability (2005), 1643–1697.

[BCMT15] Adriano Barra, Pierluigi Contucci, Emanuele Mingione, and Daniele Tan-tari, Multi-species mean field spin glasses. rigorous results, Annales HenriPoincare, vol. 16, Springer, 2015, pp. 691–708.

[BDM+18] Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala, and LenkaZdeborova, Rank-one matrix estimation: analysis of algorithmic and in-formation theoretic limits by the spatial coupling method, arXiv:1812.02537(2018).

109

[BLM13] Stephane Boucheron, Gabor Lugosi, and Pascal Massart, Concentrationinequalities: A nonasymptotic theory of independence, Oxford universitypress, 2013.

[BM11] Mohsen Bayati and Andrea Montanari, The dynamics of message passingon dense graphs, with applications to compressed sensing, IEEE Trans. onInform. Theory 57 (2011), 764–785.

[BM12] , The LASSO risk for gaussian matrices, IEEE Trans. on Inform.Theory 58 (2012), 1997–2017.

[BM16] Boaz Barak and Ankur Moitra, Noisy tensor completion via the sum-of-squares hierarchy, Conference on Learning Theory, PMLR, 2016, pp. 417–445.

[BM19] Jean Barbier and Nicolas Macris, The adaptive interpolation method: asimple scheme to prove replica formulas in bayesian inference, ProbabilityTheory and Related Fields 174 (2019), no. 3-4, 1133–1185.

[BMN20] Raphael Berthier, Andrea Montanari, and Phan-Minh Nguyen, State evo-lution for approximate message passing with non-separable functions, In-formation and Inference: A Journal of the IMA 9 (2020), no. 1, 33–79.

[Bol14] Erwin Bolthausen, An iterative construction of solutions of the TAP equa-tions for the Sherrington–Kirkpatrick model, Communications in Mathe-matical Physics 325 (2014), no. 1, 333–366.

[BS21] Erik Bates and Youngtak Sohn, Free energy in multi-species mixed p-spinspherical models, arXiv preprint arXiv:2109.14790 (2021).

[CH06] Philippe Carmona and Yueyun Hu, Universality in sherrington–kirkpatrick’s spin glass model, Annales de l’Institut Henri Poincare (B)Probability and Statistics, vol. 42, Elsevier, 2006, pp. 215–222.

[Cha05] Sourav Chatterjee, A simple invariance theorem, arXiv preprintmath/0508213 (2005).

[Cha21] Patrick Charbonneau, History of RSB Interview: Miguel Virasoro, tran-script of an oral history conducted 2021 by Patrick Charbonneau andFrancesco Zamponi, History of RSB Project, CAPHES Ecole normalesuperieure, Paris.

[Che13] Wei-Kuo Chen, The aizenman-sims-starr scheme and parisi formula formixed p-spin spherical models, Electronic Journal of Probability 18 (2013),1–14.

[Che19] , Phase transition in the spiked random tensor with Rademacherprior, The Annals of Statistics 47 (2019), no. 5, 2734–2756.

[CHS93] Andrea Crisanti, Heinz Horner, and H-J Sommers, The sphericalp-spininteraction spin-glass model, Zeitschrift fur Physik B Condensed Matter92 (1993), no. 2, 257–271.

110

[CL21] Wei-Kuo Chen and Wai-Kit Lam, Universality of approximate messagepassing algorithms, Electronic Journal of Probability 26 (2021), 1–44.

[CMW20] Michael Celentano, Andrea Montanari, and Yuchen Wu, The estimation er-ror of general first order methods, Conference on Learning Theory, PMLR,2020, pp. 1078–1141.

[CR02] Andrea Crisanti and Tommaso Rizzo, Analysis of the ∞-replica symmetrybreaking solution of the Sherrington-Kirkpatrick model, Physical Review E65 (2002), no. 4, 046137.

[CS92] Andrea Crisanti and H-J Sommers, The spherical p-spin interaction spinglass model: the statics, Zeitschrift fur Physik B Condensed Matter 87(1992), no. 3, 341–354.

[CS95] , Thouless-anderson-palmer approach to the spherical p-spin spinglass model, Journal de Physique I 5 (1995), no. 7, 805–813.

[CT91] Thomas M. Cover and Joy A. Thomas, Elements of information theory,1991.

[DAL83] JRL De Almeida and EJS Lage, Internal field distribution in the infinite-range ising spin glass, Journal of Physics C: Solid State Physics 16 (1983),no. 5, 939.

[DAM16] Yash Deshpande, Emmanuel Abbe, and Andrea Montanari, Asymptoticmutual information for the balanced binary stochastic block model, Infor-mation and Inference: A Journal of the IMA 6 (2016), no. 2, 125–170.

[DAM17] , Asymptotic mutual information for the balanced binary stochasticblock model, Information and Inference: A Journal of the IMA 6 (2017),no. 2, 125–170.

[DM16] David Donoho and Andrea Montanari, High dimensional robust m-estimation: Asymptotic variance via approximate message passing, Proba-bility Theory and Related Fields 166 (2016), no. 3-4, 935–969.

[DMM09] David L. Donoho, Arian Maleki, and Andrea Montanari, Message PassingAlgorithms for Compressed Sensing, Proceedings of the National Academyof Sciences 106 (2009), 18914–18919.

[DMS17] Amir Dembo, Andrea Montanari, and Subhabrata Sen, Extremal cuts ofsparse random graphs, The Annals of Probability 45 (2017), no. 2, 1190–1217.

[DVJ98] DJ Daley and D Vere-Jones, Introduction to the general theory of pointprocesses, An Introduction to the Theory of Point Processes, Springer,1998, pp. 197–233.

[EA75] Samuel Frederick Edwards and Phil W Anderson, Theory of spin glasses,Journal of Physics F: Metal Physics 5 (1975), no. 5, 965.

111

[EAK18] Ahmed El Alaoui and Florent Krzakala, Estimation in the spiked wignermodel: A short proof of the replica formula, 2018 IEEE International Sym-posium on Information Theory (ISIT), IEEE, 2018, pp. 1874–1878.

[EAKJ20] Ahmed El Alaoui, Florent Krzakala, and Michael Jordan, Fundamentallimits of detection in the spiked wigner model, The Annals of Statistics 48(2020), no. 2, 863–885.

[Fan20] Zhou Fan, Approximate message passing algorithms for rotationally invari-ant matrices, arXiv preprint arXiv:2008.11892 (2020).

[Fek23] Michael Fekete, Uber die verteilung der wurzeln bei gewissen algebraischengleichungen mit ganzzahligen koeffizienten, Mathematische Zeitschrift 17(1923), no. 1, 228–249.

[FVRS21] Oliver Y Feng, Ramji Venkataramanan, Cynthia Rush, and Richard J Sam-worth, A unifying tutorial on approximate message passing, arXiv preprintarXiv:2105.02180 (2021).

[Gal62] Robert Gallager, Low-density parity-check codes, IRE Transactions on in-formation theory 8 (1962), no. 1, 21–28.

[GS00] Peter Gillin and David Sherrington, p > 2 spin glasses with first-orderferromagnetic transitions, Journal of Physics A: Mathematical and General33 (2000), no. 16, 3081.

[GT02] Francesco Guerra and Fabio Lucio Toninelli, The thermodynamic limit inmean field spin glass models, Communications in Mathematical Physics230 (2002), no. 1, 71–79.

[Gue03] Francesco Guerra, Broken replica symmetry bounds in the mean field spinglass model, Communications in mathematical physics 233 (2003), no. 1,1–12.

[HKZ12] Daniel Hsu, Sham M Kakade, and Tong Zhang, A spectral algorithm forlearning hidden markov models, Journal of Computer and System Sciences78 (2012), no. 5, 1460–1480.

[HR03] DC Hoyle and M Rattray, Pca learning for sparse high-dimensional data,EPL (Europhysics Letters) 62 (2003), no. 1, 117.

[HR04] David C Hoyle and Magnus Rattray, Principal-component-analysis eigen-value spectra from data with symmetry-breaking structure, Physical ReviewE 69 (2004), no. 2, 026124.

[JM13] Adel Javanmard and Andrea Montanari, State evolution for general ap-proximate message passing algorithms, with applications to spatial coupling,Information and Inference: A Journal of the IMA 2 (2013), no. 2, 115–144.

[JMRT16] Adel Javanmard, Andrea Montanari, and Federico Ricci-Tersenghi, Phasetransitions in semidefinite relaxations, Proceedings of the NationalAcademy of Sciences 113 (2016), no. 16, E2218–E2223.

112

[JS20] Aukosh Jagannath and Subhabrata Sen, On the unbalanced cut problem andthe generalized sherrington–kirkpatrick model, Annales de l’Institut HenriPoincare D 8 (2020), no. 1, 35–88.

[JT16] Aukosh Jagannath and Ian Tobasco, A dynamic programming approach tothe Parisi functional, Proceedings of the American Mathematical Society144 (2016), no. 7, 3135–3150.

[KM09] Satish Babu Korada and Nicolas Macris, Exact solution of the gauge sym-metric p-spin glass model on a complete graph, Journal of Statistical Physics136 (2009), no. 2, 205–230.

[KSS13] Nadia Kreimer, Aaron Stanton, and Mauricio D Sacchi, Tensor completionbased on nuclear norm minimization for 5d seismic data reconstruction,Geophysics 78 (2013), no. 6, V273–V284.

[KTJ76] John M Kosterlitz, David J Thouless, and Raymund C Jones, Sphericalmodel of a spin-glass, Physical Review Letters 36 (1976), no. 20, 1217.

[LC06] Erich L Lehmann and George Casella, Theory of point estimation, SpringerScience & Business Media, 2006.

[Lig78] Thomas M Liggett, Random invariant measures for markov chains, andindependent particle systems, Zeitschrift fur Wahrscheinlichkeitstheorie undVerwandte Gebiete 45 (1978), no. 4, 297–313.

[LL10] Nan Li and Baoxin Li, Tensor completion for on-board compression of hy-perspectral images, 2010 IEEE International Conference on Image Process-ing, IEEE, 2010, pp. 517–520.

[LM19] Marc Lelarge and Leo Miolane, Fundamental limits of symmetric low-rank matrix estimation, Probability Theory and Related Fields 173 (2019),no. 3-4, 859–929.

[LML+17] Thibault Lesieur, Leo Miolane, Marc Lelarge, Florent Krzakala, and LenkaZdeborova, Statistical and computational phase transitions in spiked tensorestimation, 2017 IEEE International Symposium on Information Theory(ISIT), IEEE, 2017, pp. 511–515.

[LMWY13] Ji Liu, Przemyslaw Musialski, Peter Wonka, and Jieping Ye, Tensor com-pletion for estimating missing values in visual data, IEEE Transactions onPattern Analysis and Machine Intelligence 35 (2013), no. 1, 208–220.

[MM09] Marc Mezard and Andrea Montanari, Information, Physics and Computa-tion, Oxford, 2009.

[Mør11] Morten Mørup, Applications of tensor (multiway array) factorizations anddecompositions in data mining, Wiley Interdisciplinary Reviews: Data Min-ing and Knowledge Discovery 1 (2011), no. 1, 24–40.

[MPV86] M Mezard, G Parisi, and MA Virasoro, Sk model: The replica solutionwithout replicas, EPL (Europhysics Letters) 1 (1986), no. 2, 77.

113

[MPV87] Marc Mezard, Giorgio Parisi, and Miguel A. Virasoro, Spin glass theoryand beyond, World Scientific, 1987.

[MR14] Andrea Montanari and Emile Richard, A statistical model for tensor PCA,Advances in Neural Information Processing Systems 27 (2014).

[MRZ16] Andrea Montanari, Daniel Reichman, and Ofer Zeitouni, On the limita-tion of spectral methods: From the gaussian hidden clique problem to rankone perturbations of gaussian tensors, IEEE Transactions on InformationTheory 63 (2016), no. 3, 1572–1579.

[MV21] Andrea Montanari and Ramji Venkataramanan, Estimation of low-rankmatrices via approximate message passing, The Annals of Statistics 49(2021), no. 1, 321–345.

[OVBS17] Onur Ozyec sil, Vladislav Voroninski, Ronen Basri, and Amit Singer, Asurvey of structure from motion, Acta Numerica 26 (2017), 305–364.

[Pan13] Dmitry Panchenko, The Sherrington-Kirkpatrick model, Springer Science& Business Media, 2013.

[Pan14] , The Parisi formula for mixed p-spin models, The Annals of Prob-ability 42 (2014), no. 3, 946–958.

[Pan15] , The free energy in a multi-species Sherrington–Kirkpatrick model,The Annals of Probability 43 (2015), no. 6, 3494–3513.

[Pan18a] , Free energy in the mixed p-spin models with vector spins, TheAnnals of Probability 46 (2018), no. 2, 865–896.

[Pan18b] , Free energy in the Potts spin glass, The Annals of Probability 46(2018), no. 2, 829–864.

[Pan18c] , On the K-sat model with large number of clauses, Random Struc-tures & Algorithms 52 (2018), no. 3, 536–542.

[Par79a] Giorgio Parisi, Infinite number of order parameters for spin-glasses, Phys-ical Review Letters 43 (1979), no. 23, 1754.

[Par79b] , Toward a mean field theory for spin glasses, Physics Letters A 73(1979), no. 3, 203–205.

[PWB20] Amelia Perry, Alexander S Wein, and Afonso S Bandeira, Statistical limitsof spiked tensor models, Annales de l’Institut Henri Poincare, Probabiliteset Statistiques, vol. 56, Institut Henri Poincare, 2020, pp. 230–264.

[PY91] Christos H Papadimitriou and Mihalis Yannakakis, Optimization, approx-imation, and complexity classes, Journal of computer and system sciences43 (1991), no. 3, 425–440.

[RA05] Anastasia Ruzmaikina and Michael Aizenman, Characterization of invari-ant measures at the leading edge for competing particle systems, The Annalsof Probability 33 (2005), no. 1, 82–113.

114

[Ran11] S. Rangan, Generalized Approximate Message Passing for Estimation withRandom Linear Mixing, IEEE Intl. Symp. on Inform. Theory (St. Perers-bourg), August 2011.

[RSF19] Sundeep Rangan, Philip Schniter, and Alyson K Fletcher, Vector ap-proximate message passing, IEEE Transactions on Information Theory 65(2019), no. 10, 6664–6684.

[RU08] Thomas J. Richardson and Rudiger Urbanke, Modern Coding Theory, Cam-bridge University Press, Cambridge, 2008.

[SC19] Pragya Sur and Emmanuel J Candes, A modern maximum-likelihood the-ory for high-dimensional logistic regression, Proceedings of the NationalAcademy of Sciences 116 (2019), no. 29, 14516–14525.

[Sch08] Manuel J Schmidt, Replica symmetry breaking at low temperatures, Ph.F.Thesis, 2008.

[Sen18] Subhabrata Sen, Optimization on sparse random hypergraphs and spinglasses, Random Structures & Algorithms 53 (2018), no. 3, 504–536.

[SK75] David Sherrington and Scott Kirkpatrick, Solvable model of a spin-glass,Physical review letters 35 (1975), no. 26, 1792.

[Sub17] Eliran Subag, The complexity of spherical p-spin models?a second momentapproach, The Annals of Probability 45 (2017), no. 5, 3385–3450.

[SVdPDMS11] Marco Signoretto, Raf Van de Plas, Bart De Moor, and Johan AK Suykens,Tensor versus matrix completion: a comparison with application to spectraldata, IEEE Signal Processing Letters 18 (2011), no. 7, 403–406.

[Tal06a] Michel Talagrand, Free energy of the spherical mean field model, Probabilitytheory and related fields 134 (2006), no. 3, 339–382.

[Tal06b] , The Parisi formula, Annals of Mathematics (2006), 221–263.

[Tal10] , Mean field models for spin glasses: Volume i: Basic examples,vol. 54, Springer Science & Business Media, 2010.

[Tal11] , Mean field models for spin glasses: Volume ii: Advanced replica-symmetry and low temperature, vol. 55, Springer Science & Business Media,2011.

[Ver12] R. Vershynin, Introduction to the non-asymptotic analysis of random matri-ces, Compressed Sensing: Theory and Applications (Y.C. Eldar and G. Ku-tyniok, eds.), Cambridge University Press, 2012, pp. 210–268.

[WJ08] Martin J Wainwright and Michael I Jordan, Graphical models, exponentialfamilies, and variational inference, Foundations and Trends in MachineLearning 1 (2008), no. 1-2, 1–305.

[ZB10] Lenka Zdeborova and Stefan Boettcher, A conjecture on the maximum cutand bisection width in random regular graphs, Journal of Statistical Me-chanics: Theory and Experiment 2010 (2010), no. 02, P02020.

115