Uniform test of algorithmic randomness over a general space

UNIFORM TEST OF ALGORITHMIC RANDOMNESS OVER A GENERAL SPACE

PETER GACS

ABSTRACT. The algorithmic theory of randomness is well developed when the underlyingspace is the set of finite or infinite sequences and the underlying probability distributionis the uniform distribution or a computable distribution. These restrictions seem artificial.Some progress has been made to extend the theory to arbitrary Bernoulli distributions (byMartin-Lof), and to arbitrary distributions (by Levin). We recall the main ideas and problemsof Levin’s theory, and report further progress in the same framework. The issues are thefollowing:– Allow non-compact spaces (like the space of continuous functions, underlying the Brown-

ian motion).– The uniform test (deficiency of randomness) dP (x) (depending both on the outcome x

and the measure P ) should be defined in a general and natural way.– See which of the old results survive: existence of universal tests, conservation of ran-

domness, expression of tests in terms of description complexity, existence of a universalmeasure, expression of mutual information as ”deficiency of independence”.

– The negative of the new randomness test is shown to be a generalization of complexity incontinuous spaces; we show that the addition theorem survives.

The paper’s main contribution is introducing an appropriate framework for studying thesequestions and related ones (like statistics for a general family of distributions).

1. INTRODUCTION

1.1. Problem statement. The algorithmic theory of randomness is well developed whenthe underlying space is the set of finite or infinite sequences and the underlying probabilitydistribution is the uniform distribution or a computable distribution. These restrictionsseem artificial. Some progress has been made to extend the theory to arbitrary Bernoullidistributions by Martin-Lof in [15], and to arbitrary distributions, by Levin in [11, 12, 13].The paper [10] by Hertling and Weihrauch also works in general spaces, but it is restrictedto computable measures. Similarly, Asarin’s thesis [1] defines randomness for sample pathsof the Brownian motion: a fixed random process with computable distribution.

The present paper has been inspired mainly by Levin’s early paper [12] (and the muchmore elaborate [13] that uses different definitions): let us summarize part of the contentof [12]. The notion of a constructive topological space X and the space of measures overX is introduced. Then the paper defines the notion of a uniform test. Each test is a lowersemicomputable function (µ, x) 7→ fµ(x), satisfying

∫fµ(x)µ(dx) 6 1 for each measure µ.

There are also some additional conditions. The main claims are the following.

(a) There is a universal test tµ(x), a test such that for each other test f there is a constantc > 0 with fµ(x) 6 c·tµ(x). The deficiency of randomness is defined as dµ(x) = log tµ(x).

Date: September 20, 2006.1991 Mathematics Subject Classification. 60A99; 68Q30.Key words and phrases. algorithmic information theory, algorithmic entropy, randomness test, Kolmogorov

complexity, description complexity.1

2 PETER GACS

(b) The universal test has some strong properties of “randomness conservation”: these say,essentially, that a computable mapping or a computable randomized transition does notdecrease randomness.

(c) There is a measure M with the property that for every outcome x we have tM (x) 6 1.In the present paper, we will call such measures neutral.

(d) Semimeasures (semi-additive measures) are introduced and it is shown that there isa lower semicomputable semimeasure that is neutral (so we can assume that the Mintroduced above is lower semicomputable).

(e) Mutual information I(x : y) is defined with the help of (an appropriate version of) Kol-mogorov complexity, between outcomes x and y. It is shown that I(x : y) is essentiallyequal to dM×M (x, y). This interprets mutual information as a kind of “deficiency ofindependence”.

This impressive theory leaves a number of issues unresolved:

1. The space of outcomes is restricted to be a compact topological space, moreover, a par-ticular compact space: the set of sequences over a finite alphabet (or, implicitly in [13], acompactified infinite alphabet). However, a good deal of modern probability theory hap-pens over spaces that are not even locally compact: for example, in case of the Brownianmotion, over the space of continuous functions.

2. The definition of a uniform randomness test includes some conditions (different onesin [12] and in [13]) that seem somewhat arbitrary.

3. No simple expression is known for the general universal test in terms of descriptioncomplexity. Such expressions are nice to have if they are available.

1.2. Content of the paper. The present paper intends to carry out as much of Levin’sprogram as seems possible after removing the restrictions. It leaves a number of questionsopen, but we feel that they are worth to be at least formulated. A fairly large part of thepaper is devoted to the necessary conceptual machinery. Eventually, this will also allowto carry further some other initiatives started in the works [15] and [11]: the study oftests that test nonrandomness with respect to a whole class of measures (like the Bernoullimeasures).

Constructive analysis has been developed by several authors, converging approximatelyon the same concepts. We will make use of a simplified version of the theory introducedin [24]. As we have not found a constructive measure theory in the literature fitting ourpurposes, we will develop this theory here, over (constructive) complete separable metricspaces. This generality is well supported by standard results in measure theoretical proba-bility, and is sufficient for constructivizing a large part of current probability theory.

The appendix recalls some of the needed topology, measure theory and constructive anal-ysis. Constructive measure theory is introduced in Section 2.

Section 3 introduces uniform randomness tests. It proves the existence of universal uni-form tests, under a reasonable assumption about the topology (“recognizable Boolean in-clusions”). Then it proves conservation of randomness.

Section 4 explores the relation between description (Kolmogorov) complexity and uni-form randomness tests. After extending randomness tests over non-normalized measures,its negative logarithm will be seen as a generalized description complexity.

The rest of the section explores the extent to which the old results characterizing a ran-dom infinite string by the description complexity of its segments can be extended to thenew setting. We will see that the simple formula working for computable measures overinfinite sequences does not generalize. However, still rather simple formulas are available

UNIFORM TEST OF ALGORITHMIC RANDOMNESS OVER A GENERAL SPACE 3

in some cases: namely, the discrete case with general measures, and a space allowing acertain natural cell decomposition, in case of computable measures.

Section 5 proves Levin’s theorem about the existence of a neutral measure, for compactspaces. Then it shows that the result does not generalize to non-compact spaces, not evento the discrete space. It also shows that with our definition of tests, the neutral measurecannot be chosen semicomputable, even in the case of the discrete space with one-pointcompactification.

Section 6 takes up the idea of viewing the negative logarithm of a randomness test asgeneralized description complexity. Calling this notion algorithmic entropy, this section ex-plores its information-theoretical properties. The main result is a (nontrivial) generalizationof the addition theorem of prefix complexity (and, of course, classical entropy) to the newsetting.

1.3. Some history. Attempts to define randomness rigorously have a long but rather sparsehistory starting with von Mises and continuing with Wald, Church, Ville. Kolmogorov’s workin this area inspired Martin-Lof whose paper [15] introduces the notion of randomness usedhere.

Description complexity has been introduced independently by Solomonoff, Kolmogorovand Chaitin. Prefix complexity has been introduced independently by Levin and Chaitin.See [14] for a discussion of priorities and contributions. The addition theorem (whosegeneralization is given here) has been proved first for Kolmogorov complexity, with a log-arithmic error term, by Kolmogorov and Levin. For the prefix complexity its present formhas been proved jointly by Levin and Gacs in [6], and independently by Chaitin in [5].

In his PhD thesis, Martin-Lof also characterized randomness of finite sequences via theircomplexity. For infinite sequences, complete characterizations of their randomness via thecomplexity of their segments were given by Levin in [11], by Schnorr in [17] and in [5](attributed). Of these, only Levin’s result is formulated for general computable measures:the others apply only to coin-tossing. Each of these works uses a different variant of de-scription complexity. Levin uses monotone complexity and the logarithm of the universalsemicomputable measure (see [8] for the difficult proof that these two complexities aredifferent). Schnorr uses “process complexity” (similar to monotone complexity) and prefixcomplexity. The work [7] by the present author gives characterizations using the originalKolmogorov complexity (for general computable measures).

Uniform tests over the space of infinite sequences, randomness conservation and neutralmeasures were introduced in Levin’s work [12]. The present author could not verify everyresult in that paper (which contains no proofs); he reproduced most of them with a changeddefinition in [7]. A universal uniform test with yet another definiton appeared in [13]. Inthis latter work, “information conservation” is a central tool used to derive several results inlogic. In the constellation of Levin’s concepts, information conservation becomes a specialcase of randomness conservation. We have not been able to reproduce this exact relationwith our definition here.

The work [9] is based on the observation that Zurek’s idea on “physical” entropy andthe “cell volume” approach of physicists to the definition of entropy can be unified: Zurek’sentropy can be seen as an approximation of the limit arising in a characterization of arandomness test by complexity. The author discovered in this same paper that the negativelogarithm of a general randomness test can be seen as a generalization of complexity. Hefelt encouraged by the discovery of the generalized addition theorem presented here.

The appearence of other papers in the meantime (including [10]) convinced the au-thor that there is no accessible and detailed reference work on algorithmic randomness

4 PETER GACS

for general measures and general spaces, and a paper like the present one, developing thefoundations, is needed. (Asarin’s thesis [1] does develop the theory of randomness for theBrownian motion. It is a step in our direction in the sense that the space is not compact,but it is all done for a single explicitly given computable measure.)

We do not advocate the uniform randomness test proposed here as necessarily the “defin-itive” test concept. Perhaps a good argument can be found for some additional conditions,similar to the ones introduced by Levin, providing additional structure (like a semicom-putable neutral measure) while preserving naturalness and the attractive properties pre-sented here.

1.4. Notation for the paper. (Nothing to do with the formal concept of “notation”, in-troduced later in the section on constructive analysis.) The sets of natural numbers, inte-gers, rational numbers, real numbers and complex numbers will be denoted respectivelyby N, Z, Q, R. The set of nonnegative real numbers will be denoted by R+. The set of realnumbers with −∞,∞ added (with the appropriate topology making it compact) will bedenoted by R. We use ∧ and ∨ to denote min and max, further

|x|+ = x ∨ 0, |x|− = | − x|+

for real numbers x. We partially follow [24], [3] and [10]. In particular, adopting thenotation of [24], we denote intervals of the real line as follows, (to avoid the conflict withthe notation of a pair (a, b) of objects).

[a; b] = x : a 6 x 6 b , (a; b) = x : a < x < b , [a; b) = x : a 6 x < b .If X is a set then X∗ is the set of all finite strings made up of elements of X, including the“empty string” Λ. We denote by Xω the set of all infinite sequences of elements of X. If Ais a set then 1A(x) is its indicator function, defined to be 1 if x ∈ A and to 0 otherwise. Fora string x, its length is |x|, and

x6n = (x(1), . . . , x(n)).

The relationsf

+

< g, f∗< g

mean inequality to within an additive constant and multiplicative constant respectively. Thefirst is equivalent to f 6 g + O(1), the second to f = O(g). The relation f ∗= g means f

∗< g

and f∗> g.

Borrowing from [16], for a function f and a measure µ, we will use the notation

µf =∫

f(x)µ(dx), µyf(x, y) =∫

f(x, y)µ(dy).

2. CONSTRUCTIVE MEASURE THEORY

The basic concepts and results of measure theory are recalled in Section B. For thetheory of measures over metric spaces, see Subsection B.6. We introduce a certain fixed,enumerated sequence of Lipschitz functions that will be used frequently. Let F0 be the setof functions of the form gu,r,1/n where u ∈ D, r ∈ Q, n = 1, 2, . . . , and

gu,r,ε(x) = |1− |d(x, u)− r|+/ε|+

is a continuous function that is 1 in the ball B(u, r), it is 0 outside B(u, r + ε), and takesintermediate values in between. Let

E = g1, g2, . . . (2.1)


be the smallest set of functions containing F0 and the constant 1, and closed under ∨, ∧and rational linear combinations. The following construction will prove useful later.

Proposition 2.1. All bounded continuous functions can be obtained as the limit of an increas-ing sequence of functions from the enumerated countable set E of bounded computable Lipschitzfunctions introduced in (2.1).

The proof is routine.

2.1. Space of measures. Let X = (X, d, D, α) be a computable metric space. In Subsec-tion B.6, the space M(X) of measures over X is defined, along with a natural enumerationν = νM for a subbase σ = σM of the weak topology. This is a constructive topologicalspace M which can be metrized by introducing, as in B.6.2, the Prokhorov distance p(µ, ν):the infimum of all those ε for which, for all Borel sets A we have µ(A) 6 ν(Aε) + ε, whereAε = x : ∃y ∈ A d(x, y) < ε . Let DM be the set of those probability measures that areconcentrated on finitely many points of D and assign rational values to them. Let αM be anatural enumeration of DM. Then

(M, p,DM, αM) (2.2)

is a computable metric space whose constructive topology is equivalent to M. Let U =B(x, r) be one of the balls in X, where x ∈ DX, r ∈ Q. The function µ 7→ µ(U) is typicallynot computable, not even continuous. For example, if X = R and U is the open interval(0; 1), the sequence of probability measures δ1/n (concentrated on 1/n) converges to δ0, butδ1/n(U) = 1, and δ0(U) = 0. The following theorem shows that the situation is better withµ 7→ µf for computable f :

Proposition 2.2. Let X = (X, d, D, α) be a computable metric space, and let M =(M(X), σ, ν) be the effective topological space of probability measures over X. If functionf : X → R is bounded and computable then µ 7→ µf is computable.

Proof sketch. To prove the theorem for bounded Lipschitz functions, we can invoke theStrassen coupling theorem B.16.

The function f can be obtained as a limit of a computable monotone increasing sequenceof computable Lipschitz functions f>

n , and also as a limit of a computable monotone de-creasing sequence of computable Lipschitz functions f<

n . In step n of our computation ofµf , we can approximate µf>

n from above to within 1/n, and µf<n from below to within

1/n. Let these bounds be a>n and a<

n . To approximate µf to within ε, find a stage n witha>

n − a<n + 2/n < ε.

2.2. Computable measures and random transitions. A measure µ is called computable ifit is a computable element of the space of measures. Let gi be the set of bounded Lipschitzfunctions over X introduced in (2.1).

Proposition 2.3. Measure µ is computable if and only if so is the function i 7→ µgi.

Proof. The “only if” part follows from Proposition 2.2. For the “if” part, note that in order totrap µ within some Prokhorov neighborhood of size ε, it is sufficient to compute µgi withina small enough δ, for all i 6 n for a large enough n.

Example 2.4. Let our probability space be the set R of real numbers with its standard topol-ogy. Let a < b be two computable real numbers. Let µ be the probability distributionwith density function f(x) = 1

b−a1[a; b](x) (the uniform distribution over the interval [a; b]).Function f(x) is not computable, since it is not even continuous. However, the measure µ is

6 PETER GACS

computable: indeed, µgi = 1b−a

∫ ba gi(x)dx is a computable sequence, hence Proposition 2.3

implies that µ is computable. ♦

The following theorem compensates somewhat for the fact mentioned earlier, that thefunction µ 7→ µ(U) is generally not computable.

Proposition 2.5. Let µ be a finite computable measure. Then there is a computable map hwith the property that for every bounded computable function f with |f | 6 1 with the propertyµ(f−1(0)) = 0, if w is the name of f then h(w) is the name of a program computing the valueµx : f(x) < 0 .

Proof. Straightforward.

Remark 2.6. Suppose that there is a computable function that for each i computes a Cauchysequence j 7→ mi(j) with the property that for i < j1 < j2 we have |mi(j1)−mi(j2)| < 2−j1 ,and that for all n, there is a measure ν with the property that for all i 6 n, νgi = mi(n).Is there a measure µ with the property that for each i we have limj mi(j) = µgi? Notnecessarily, if the space is not compact. For example, let X = 1, 2, 3, . . . with the discretetopology. The sequences mi(j) = 0 for j > i satisfy these conditions, but they convergeto the measure 0, not to a probability measure. To guarantee that the sequences mi(j)indeed define a probability measure, progress must be made, for example, in terms of thenarrowing of Prokhorov neighborhoods. ♦

Let now X,Y be computable metric spaces. They give rise to measurable spaces withσ-algebras A,B respectively. Let Λ = λx : x ∈ X be a probability kernel from X to Y(as defined in Subsection B.5). Let gi be the set of bounded Lipschitz functions over Yintroduced in (2.1). To each gi, the kernel assigns a (bounded) measurable function

fi(x) = (Λgi)(x) = λyxgi(y).

We will call Λ computable if so is the assignment (i, x) 7→ fi(x). In this case, of course, eachfunction fi(x) is continuous. The measure Λ∗µ is determined by the values Λ∗gi = µ(Λgi),which are computable from (i, µ) and so the function µ 7→ Λ∗µ is computable.

Example 2.7. A computable function h : X → Y defines an operator Λh with Λhg = g h(as in Example B.12). This is a deterministic computable transition, in which fi(x) =(Λhgi)(x) = gi(h(x)) is, of course, computable from (i, x). We define h∗µ = Λ∗hµ. ♦

2.3. Cells. As pointed out earlier, it is not convenient to define a measure µ constructivelystarting from µ(Γ) for open cells Γ. The reason is that no matter how we fix Γ, the func-tion µ 7→ µ(Γ) is typically not computable. It is better to work with bounded computablefunctions, since for such a function f , the correspondence µ 7→ µf is computable.

Under some special conditions, we will still get “sharp” cells. Let f be a bounded com-putable function over X, let α1 < · · · < αk be rational numbers, and let µ be a computablemeasure with the property that µf−1(αj) = 0 for all j. In this case, we will say that αj

are regular points of f with respect to µ. Let α0 = −∞, αk+1 = ∞, and for j = 0, . . . , k,let Let Uj = f−1((j, j + 1)). The sequence of disjoint r.e. open sets (U0, . . . , Uk) will becalled the partition generated by f, α1, . . . , αk. (Note that this sequence is not a partition inthe sense of

⋃j Uj = X, since the boundaries of the sets are left out.) If we have several

partitions (Ui0, . . . , Ui,k), generated by different functions fi (i = 1, . . . ,m) and differentregular cutoff sequences (αij : j = 1, . . . , ki), then we can form a new partition generatedby all possible intersections

Vj1,...,jn = U1,j1 ∩ · · · ∩ Um,jm .


A partition of this kind will be called a regular partition. The sets Vj1,...,jn will be called thecells of this partition.

Proposition 2.8. In a regular partition as given above, the values µVj1,...,jn are computablefrom the names of the functions fi and the cutoff points αij .

Proof. Straightforward.

Assume that a computable sequence of functions b1(x), b2(x), . . . over X is given, withthe property that for every pair x1, x2 ∈ X with x1 6= x2, there is a j with bj(x1) · bj(x2) <0. Such a sequence will be called a separating sequence. Let us give the correspondencebetween the set Bω of infinite binary sequences and elements of the set

X0 = x ∈ X : bj(x) 6= 0, j = 1, 2, . . . .

For a binary string s1 · · · sn = s ∈ B∗, let

Γs

be the set of elements of X with the property that for j = 1, . . . , n, if sj = 0 then bj(ω) < 0,otherwise bj(ω) > 0. This correspondence has the following properties.(a) ΓΛ = X.(b) For each s ∈ B, the sets Γs0 and Γs1 are disjoint and their union is contained in Γs.(c) For x ∈ X0, we have x =

⋂x∈Γs

Γs.

If s has length n then Γs will be called a canonical n-cell, or simply canonical cell, or n-cell.From now on, whenever Γ denotes a subset of X, it means a canonical cell. We will alsouse the notation

l(Γs) = l(s).

The three properties above say that if we restrict ourselves to the set X0 then the canonicalcells behave somewhat like binary subintervals: they divide X0 in half, then each half againin half, etc. Moreover, around each point, these canonical cells become “arbitrarily small”,in some sense (though, they may not be a basis of neighborhoods). It is easy to see thatif Γs1 ,Γs2 are two canonical cells then they either are disjoint or one of them contains theother. If Γs1 ⊂ Γs2 then s2 is a prefix of s1. If, for a moment, we write Γ0

s = Γs ∩X0 thenwe have the disjoint union Γ0

s = Γ0s0 ∪ Γ0

s1. For an n-element binary string s, for x ∈ Γs, wewill write

µ(s) = µ(Γs).

Thus, for elements of X0, we can talk about the n-th bit xn of the description of x: itis uniquely determined. The 2n cells (some of them possibly empty) of the form Γs forl(s) = n form a partition

Pn

of X0.

Examples 2.9.1. If X is the set of infinite binary sequences with its usual topology, the functions bn(x) =

xn − 1/2 generate the usual cells, and X0 = X.2. If X is the interval [0; 1], let bn(x) = − sin(2nπx). Then cells are open intervals of the

form (k·2−n; (k+1)·2n), the correspondence between infinite binary strings and elementsof X0 is just the usual representation of x as the binary decimal string 0.x1x2 . . . .

♦

8 PETER GACS

When we fix canonical cells, we will generally assume that the partition chosen is also“natural”. The bits x1, x2, . . . could contain information about the point x in decreasingorder of importance from a macroscopic point of view. For example, for a container ofgas, the first few bits may describe, to a reasonable degree of precision, the amount ofgas in the left half of the container, the next few bits may describe the amounts in eachquarter, the next few bits may describe the temperature in each half, the next few bits maydescribe again the amount of gas in each half, but now to more precision, etc. From nowon, whenever Γ denotes a subset of X, it means a canonical cell. From now on, for elementsof X0, we can talk about the n-th bit xn of the description of x: it is uniquely determined.

The following observation will prove useful.

Proposition 2.10. Suppose that the space X is compact and we have a separating sequencebi(x) as given above. Then the cells Γs form a basis of the space X.

Proof. We need to prove that for every ball B(x, r), there is a cell x ∈ Γs ⊂ B(x, r). Let Cbe the complement of B(x, r). For each point y of C, there is an i such that bi(x) · bi(y) < 0.In this case, let J0 = z : bi(z) < 0 , J1 = z : bi(z) > 0 . Let J(y) = Jp such that y ∈ Jp.Then C ⊂

⋃y J(y), and compactness implies that there is a finite sequence y1, . . . , yk with

C ⊂⋃k

j=1 J(yj). Clearly, there is a cell x ∈ Γs ⊂ B(x, r) r⋃k

j=1 J(yj).

3. UNIFORM TESTS

3.1. Universal uniform test. Let X = (X, d, D, α) be a computable metric space, and letM = (M(X), σ, ν) be the constructive topological space of probability measures over X. Arandomness test is a function f : M×X → R with the following two properties.

Condition 3.1.1. The function (µ, x) 7→ fµ(x) is lower semicomputable. (Then for each µ, the integral

µfµ = µxfµ(x) exists.)2. µfµ 6 1.

♦

The value fµ(x) is intended to quantify the nonrandomness of the outcome x with respectto the probability measure µ. The larger the values the less random is x. Condition 3.1.2guarantees that the probability of those outcomes whose randomness is > m is at most1/m. The definition of tests is in the spirit of Martin-Lof’s tests. The important difference isin the semicomputability condition: instead of restricting the measure µ to be computable,we require the test to be lower semicomputable also in its argument µ.

Just as with Martin-Lof’s tests, we want to find a universal test; however, we seem toneed a condition on the space X. Let us say that a sequence i 7→ Ui of sets has recognizableBoolean inclusions if the set

(S, T ) : S, T are finite,⋂i∈S

Ui ⊂⋃j∈T

Uj

is recursively enumerable. We will say that a computable metric space has recognizableBoolean inclusions if this is true of the enumerated basis consisting of balls of the formB(x, r) where x ∈ D and r > 0 is a rational number.

It is our conviction that the important metric spaces studied in probability theory haverecognizable Boolean inclusions, and that proving this in each individual case should notbe too difficult. For example, it does not seem difficult to prove this for the space C[0; 1] of


Example C.6, with the set of rational piecewise-linear functions chosen as D. But, we havenot carried out any of this work!

Theorem 1. Suppose that the metric space X has recognizable Boolean inclusions. Then thereis a universal test, that is a test tµ(x) with the property that for every other test fµ(x) there isa constant cf > 0 with cffµ(x) 6 tµ(x).

Proof.1. We will show that there is a mapping that to each name u of a lower semicomputable

function (µ, x) 7→ g(µ, x) assigns the name of a lower semicomputable function g′(µ, x)such that µxg′(µ, x) 6 1, and if g is a test then g′ = g.

To prove the statement, let us represent the space M rather as

M = (M(X), p,D, αM), (3.1)

as in (2.2). Since g(µ, x) is lower semicomputable, there is a computable sequence ofbasis elements Ui ⊂ M and Vi ⊂ X and rational bounds ri such that

g(µ, x) = supi

ri1Ui(µ)1Vi(x).

Let hn(µ, x) = maxi6n ri1Ui(µ)1Vi(x). Let us also set h0(µ, x) = 0. Our goal is to showthat the condition ∀µ µxhn(µ, x) 6 1 is decidable. If this is the case then we will be done.Indeed, we can define h′n(µ, x) recursively as follows. Let h′0(µ, x) = 0. Assume thath′n(µ, x) has been defined already. If ∀µ µxhn+1(µ, x) 6 1 then h′n+1(µ, x) = hn+1(µ, x);otherwise, it is h′n(µ, x). The function g′(µ, x) = supn h′n(µ, x) clearly satisfies our re-quirements.

We proceed to prove the decidability of the condition

∀µ µxhn(µ, x) 6 1. (3.2)

The basis elements Vi can be taken as balls B(qi, δi) for a computable sequence qi ∈ Dand computable sequence of rational numbers δi > 0. Similarly, the basis element Ui isthe set of measures that is a ball B(σi, εi), in the metric space (3.1). Here, using nota-tion (B.7), σi is a measure concentrated on a finite set Si. According to Proposition B.17,the ball Ui is the set of measures µ satisfying the inequalities

µ(Aεi) > σi(A)− εi

for all A ⊂ Si. For each n, consider the finite set of balls

Bn = B(qi, δi) : i 6 n ∪ B(s, εi) : i 6 n, s ∈ Si .Consider all sets of the form

UA,B =⋂

U∈A

U r⋃

U∈B

U

for all pairs of sets A,B ⊂ Bn. These sets are all finite intersections of balls or com-plements of balls from the finite set Bn of balls. The space X has recognizable Booleaninclusions, so it is decidable which of these sets UA,B are nonempty. The condition (3.2)can be formulated as a Boolean formula involving linear inequalities with rational coef-ficients, for the variables µA,B = µ(UA,B), for those A,B with UA,B 6= ∅. The solvabilityof such a Boolean condition can always be decided.

2. Let us enumerate all lower semicomputable functions gu(µ, x) for all the names u. With-out loss of generality, assume these names to be natural numbers, and form the functionsg′u(µ, x) according to the assertion 1 above. The function t =

∑u 2−u−1g′u will be the

desired universal test.

10 PETER GACS

From now on, when referring to randomness tests, we will always assume that our spaceX has recognizable Boolean inclusions and hence has a universal test. We fix a universaltest tµ(x), and call the function

dµ(x) = log tµ(x).

the deficiency of randomness of x with respect to µ. We call an element x ∈ X random withrespect to µ if dµ(x) < ∞.

Remark 3.2. Tests can be generalized to include an arbitrary parameter y: we can talk aboutthe universal test

tµ(x | y),

where y comes from some constructive topological space Y. This is a maximal (withina multiplicative constant) lower semicomputable function (x, y, µ) 7→ f(x, y, µ) with theproperty µxf(x, y, µ) 6 1. ♦

3.2. Conservation of randomness. For i = 1, 0, let Xi = (Xi, di, Di, αi) be computablemetric spaces, and let Mi = (M(Xi), σi, νi) be the effective topological space of probabilitymeasures over Xi. Let Λ be a computable probability kernel from X1 to X0 as definedin Subsection 2.2. In the following theorem, the same notation dµ(x) will refer to thedeficiency of randomness with respect to two different spaces, X1 and X0, but this shouldnot cause confusion. Let us first spell out the conservation theorem before interpreting it.

Theorem 2. For a computable probability kernel Λ from X1 to X0, we have

λyxtΛ∗µ(y)

∗< tµ(x). (3.3)

Proof. Let tν(x) be the universal test over X0. The left-hand side of (3.3) can be written as

uµ = ΛtΛ∗µ.

According to (B.4), we have µuµ = (Λ∗µ)tΛ∗µ which is 6 1 since t is a test. If we show that(µ, x) 7→ uµ(x) is lower semicomputable then the universality of tµ will imply uµ

∗< tµ.

According to Proposition C.7, as a lower semicomputable function, tν(y) can be writtenas supn gn(ν, y), where (gn(ν, y)) is a computable sequence of computable functions. Wepointed out in Subsection 2.2 that the function µ 7→ Λ∗µ is computable. Therefore thefunction (n, µ, x) 7→ gn(Λ∗µ, f(x)) is also a computable. So, uµ(x) is the supremum of acomputable sequence of computable functions and as such, lower semicomputable.

It is easier to interpret the theorem first in the special case when Λ = Λh for a computablefunction h : X1 → X0, as in Example 2.7. Then the theorem simplifies to the following.

Corollary 3.3. For a computable function h : X1 → X0, we have dh∗µ(h(x))+

< dµ(x).

Informally, this says that if x is random with respect to µ in X1 then h(x) is essentially atleast as random with respect to the output distribution h∗µ in X0. Decrease in randomnesscan only be caused by complexity in the definition of the function h. It is even easier tointerpret the theorem when µ is defined over a product space X1 ×X2, and h(x1, x2) = x1

is the projection. The theorem then says, informally, that if the pair (x1, x2) is random withrespect to µ then x1 is random with respect to the marginal µ1 = h∗µ of µ. This is a verynatural requirement: why would the throwing-away of the information about x2 affect theplausibility of the hypothesis that the outcome x1 arose from the distribution µ1?


In the general case of the theorem, concerning random transitions, we cannot bound therandomness of each outcome uniformly. The theorem asserts that the average nonrandom-ness, as measured by the universal test with respect to the output distribution, does notincrease. In logarithmic notation: λy

x2dΛ∗µ(y) +

< dµ(x), or equivalently,∫

2dΛ∗µ(y)λx(dy)+

<dµ(x).

Corollary 3.4. Let Λ be a computable probability kernel from X1 to X0. There is a constant csuch that for every x ∈ X1, and integer m > 0 we have

λx y : dΛ∗µ(y) > dµ(x) + m + c 6 2−m.

Thus, in a computable random transition, the probability of an increase of randomnessdeficiency by m units (plus a constant c) is less than 2−m. The constant c comes from thedescription complexity of the transition Λ.

A randomness conservation result related to Corollary 3.3 was proved in [10]. There, themeasure over the space X0 is not the output measure of the transformation, but is assumedto obey certain inequalities related to the transformation.

4. TESTS AND COMPLEXITY

4.1. Description complexity.

4.1.1. Complexity, semimeasures, algorithmic entropy. Let X = Σ∗. For x ∈ Σ∗ for somefinite alphabet Σ, let H(x) denote the prefix-free description complexity of the finite se-quence x as defined, for example, in [14] (where it is denoted by K(x)). For completeness,we give its definition here. Let A : 0, 1∗ × Σ∗ → Σ∗ be a computable (possibly partial)function with the property that if A(p1, y) and A(p2, y) are defined for two different stringsp1, p2, then p1 is not the prefix of p2. Such a function is called a (prefix-free) interpreter. Wedenote

HA(x | y) = minA(p,y)=x

|p|.

One of the most important theorems of description complexity is the following:

Proposition 4.1 (Invariance Theorem, see for example [14]). There is an optimal interpreterT with the above property: with it, for every interpreter A there is a constant cA with

HT (x | y) 6 HA(x | y) + cA.

We fix an optimal interpreter T and write H(x | y) = HT (x | y), calling it the conditionalcomplexity of a string x with respect to string y. We denote H(x) = H(x | Λ). Let

m(x) = 2−H(x).

The function m(x) is lower semicomputable with∑

x m(x) 6 1. Let us call any real functionf(x) > 0 over Σ∗ with

∑x f(x) 6 1 a semimeasure. The following theorem, known as the

Coding Theorem, is an important tool.

Proposition 4.2 (Coding Theorem). For every lower semicomputable semimeasure f there isa constant c > 0 with m(x) > c · f(x).

Because of this theorem, we will say that m(x) is a universal lower semicomputablesemimeasure. It is possible to turn m(x) into a measure, by compactifying the discretespace Σ∗ into

Σ∗ = Σ∗ ∪ ∞

12 PETER GACS

(as in part 1 of Example A.3; this process makes sense also for a constructive discretespace), and setting m(∞) = 1−

∑x∈Σ∗ m(x). The extended measure m is not quite lower

semicomputable since the number µ(Σ∗ r 0) is not necessarily lower semicomputable.

Remark 4.3. A measure µ is computable over Σ∗ if and only if the function x 7→ µ(x) iscomputable for x ∈ Σ∗. This property does not imply that the number

1− µ(∞) = µ(Σ∗) =∑x∈Σ∗

µ(x)

is computable. ♦

Let us allow, for a moment, measures µ that are not probability measures: they maynot even be finite. Metric and computability can be extended to this case (see [22]), theuniversal test tµ(x) can also be generalized. The Coding Theorem and other considerationssuggest the introduction of the following notation, for an arbitrary measure µ:

Hµ(x) = −dµ(x) = − log tµ(x). (4.1)

Then, with # defined as the counting measure over the discrete set Σ∗ (that is, #(S) = |S|),we have

H(x) += H#(x).This allows viewing Hµ(x) as a generalization of description complexity: we will call thisquantity the algorithmic entropy of x relative to the measure µ. Generalization to con-ditional complexity is done using Remark 3.2. A reformulation of the definition of testssays that Hµ(x) is minimal (within an additive constant) among the upper semicomputablefunctions (µ, x) 7→ fµ(x) with µx2−fµ(x) 6 1. The following identity is immediate from thedefinitions:

Hµ(x) = Hµ(x | µ). (4.2)

4.1.2. Computable measures and complexity. It is known that for computable µ, the testdµ(x) can be expressed in terms of the description complexity of x (we will prove theseexpressions below). Assume that X is the (discrete) space of all binary strings. Then wehave

dµ(x) = − log µ(x)−H(x) + O(H(µ)). (4.3)The meaning of this equation is the following. Due to maximality property of the semimea-sure m following from the Coding Theorem 4.2 above, the expression − log µ(x) is an upperbound (within O(H(µ))) of the complexity H(x), and nonrandomness of x is measured bythe difference between the complexity and this upper bound. See [26] for a first formu-lation of this general upper bound relation. As a simple example, consider the uniformdistribution µ over the set of binary sequences of length n. Conditioning everything on n,we obtain

dµ(x | n) += n−H(x | n),that is the more the description complexity H(x | n) of a binary sequence of length n differsfrom its upper bound n the less random is x.

Assume that X is the space of infinite binary sequences. Then equation (4.3) must bereplaced with

dµ(x) = supn

(− log µ(x6n)−H(x6n)

)+ O(H(µ)). (4.4)

For the coin-tossing distribution µ, this characterization has first been first proved bySchnorr, and published in [5].


Remark 4.4. It is possible to obtain similar natural characterizations of randomness, us-ing some other natural definitions of description complexity. A universal semicomputablesemimeasure mΩ over the set Ω of infinite sequences was introduced, and a complexityKM(x) = − log mΩ(x) defined in [26]. A so-called “monotonic complexity”, Km(x) wasintroduced, using Turing machines with one-way input and output, in [11], and a closelyrelated quantity called “process complexity” was introduced in [17]. These quantities canalso be used in a characterization of randomness similar to (4.3). The nontrivial fact thatthe complexities KM and Km differ by an unbounded amount was shown in [8]. ♦

For noncomputable measures, we cannot replace O(H(µ)) in these relations with any-thing finite, as shown in the following example. Therefore however attractive and simple,exp(− log µ(x)−H(x)) is not a universal uniform test of randomness.

Proposition 4.5. There is a measure µ over the discrete space X of binary strings such thatfor each n, there is an x with dµ(x) = n−H(n) and − log µ(x)−H(x)

+

< 0.

Proof. Let us treat the domain of our measure µ as a set of pairs (x, y). Let xn = 0n,for n = 1, 2, . . . . For each n, let yn be some binary string of length n with the propertyH(xn, yn) > n. Let µ(xn, yn) = 2−n. Then − log µ(xn, yn)−H(xn, yn) 6 n− n = 0. On theother hand, let tµ(x, y) be the test nonzero only on strings x of the form xn:

tµ(xn, y) =m(n)∑

z∈Bn µ(xn, z).

The form of the definition ensures semicomputability and we also have∑x,y

µ(x, y)tµ(x, y) 6∑

n

m(n) < 1,

therefore tµ is indeed a test. Hence tµ(x, y)∗> tµ(x, y). Taking logarithms, dµ(xn, yn)

+

>n−H(n).

The same example implies that it is also not an option, even over discrete sets, to replacethe definition of uniform tests with the ad hoc formula exp(− log µ(x)−H(x)):

Proposition 4.6. The test defined as fµ(x) = exp(− log µ(x) −H(x)) over discrete spaces Xdoes not obey the conservation of randomness.

Proof. Let us use the example of Proposition 4.5. Consider the function π : (x, y) 7→ x. Theimage of the measure µ under the projection is (πµ)(x) =

∑y µ(x, y). Thus, (πµ)(xn) =

µ(xn, yn) = 2−n. We have seen log fµ(xn, yn) 6 0. On the other hand,

log fπµ(π(xn, yn)) = − log(πµ)(xn)−H(xn) += n−H(n).

Thus, the projection π takes a random pair (xn, yn) into an object xn that is very nonrandom(when randomness is measured using the tests fµ).

In the example, we have the abnormal situation that a pair is random but one of itselements is nonrandom. Therefore even if we would not insist on universality, the testexp(− log µ(x)−H(x)) is unsatisfactory.

Looking into the reasons of the nonconservation in the example, we will notice that itcould only have happened because the test fµ is too special. The fact that − log(πµ)(xn)−H(xn) is large should show that the pair (xn, yn) can be enclosed into the “simple” set xn×Y of small probability; unfortunately, this observation does not reflect on − log µ(x, y) −H(x, y) when the measure µ is non-computable (it does for computable µ).

14 PETER GACS

4.1.3. Expressing the uniform test in terms of complexity. It is a natural idea to modify equa-tion (4.3) in such a way that the complexity H(x) is replaced with H(x | µ). However,this expression must be understood properly. The measure µ (especially, when it is notcomputable) cannot be described by a finite string; on the other hand, it can be describedby infinite strings in many different ways. Clearly, irrelevant information in these infinitestrings should be ignored. The notion of representation in computable analysis (see Sub-section C.1) will solve the problem. An interpreter function should have the property thatits output depends only on µ and not on the sequence representing it. Recall the topologicalspace M of computable measures over our space X. An interpreter A : 0, 1∗ ×M → Σ∗

is a computable function that is prefix-free in its first argument. The complexity

H(x | µ)

can now be defined in terms of such interpreters, noting that the Invariance Theorem holdsas before. To define this complexity in terms of representations, let γM be our chosenrepresentation for the space M (thus, each measure µ is represented via all of its Cauchysequences in the Prokhorov distance). Then we can say that A is an interpreter if it is(id, γM, id)-computable, that is a certain computable function B : 0, 1∗×Σω → Σ∗ realizesA for every p ∈ 0, 1∗, and for every sequence z that is a γM-name of a measure µ, wehave B(p, z) = A(p, µ).

Remark 4.7. The notion of oracle computation and reducibility in the new sense (where theresult is required to be independent of which representation of an object is used) may beworth investigating in other settings as well. ♦

Let us mention the following easy fact:

Proposition 4.8. If µ is a computable measure then H(x | µ) += H(x). The constant in +=depends on the description complexity of µ.

Theorem 3. If X is the discrete space Σ∗ then we have

dµ(x) += − log µ(x)−H(x | µ). (4.5)

Note that in terms of the algorithmic entropy notation introduced in (4.1), this theoremcan be expressed as

Hµ(x) += H(x | µ) + log µ(x). (4.6)

Proof. In exponential notation, equation (4.5) can be written as tµ(x) ∗= m(x | µ)/µ(x). Letus prove

∗> first. We will show that the right-hand side of this inequality is a test, and hence

∗< tµ(x). However, the right-hand side is clearly lower semicomputable in (x, µ) and whenwe “integrate” it (multiply it by µ(x) and sum it), its sum is 6 1; thus, it is a test.

Let us prove∗< now. The expression tµ(x)µ(x) is clearly lower semicomputable in (x, µ),

and its sum is 6 1. Hence, it is+

< m(x | µ).

Remark 4.9. As mentioned earlier, our theory generalizes to measures that are not proba-bility measures. In this case, equation (4.6) has interesting relations to the quantity called“physical entropy” by Zurek in [25]; it justifies calling Hµ(x) “fine-grained algorithmicBoltzmann entropy” by this author in [9]. ♦

For non-discrete spaces, unfortunately, we can only provide less intuitive expressions.


Proposition 4.10. let X = (X, d, D, α) be a complete computable metric space, and let Ebe the enumerated set of bounded Lipschitz functions introduced in (2.1), but for the spaceM(X)×X. The uniform test of randomness tµ(x) can be expressed as

tµ(x) ∗=∑f∈E

f(µ, x)m(f | µ)µyf(µ, y)

. (4.7)

Proof. For∗>, we will show that the right-hand side of the inequality is a test, and hence

∗< tµ(x). For simplicity, we skip the notation for the enumeration of E and treat eachelement f as its own name. Each term of the sum is clearly lower semicomputable in(f, x, µ), hence the sum is lower semicomputable in (x, µ). It remains to show that the µ-integral of the sum is 6 1. But, the µ-integral of the generic term is 6 m(f | µ), and thesum of these terms is 6 1 by the definition of the function m(· | ·). Thus, the sum is a test.

For∗<, note that (µ, x) 7→ tµ(x), as a lower semicomputable function, is the supremum

of functions in E . Denoting their differences by fi(µ, x), we have tµ(x) =∑

i fi(µ, x). Thetest property implies

∑i µ

xfi(µ, x) 6 1. Since the function (µ, i) 7→ µxfi(µ, x) is lowersemicomputable, this implies µxfi(µ, x)

∗< m(i | µ), and hence

fi(µ, x)∗< fi(µ, x)

m(i | µ)µxfi(µ, x)

.

It is easy to see that for each f ∈ E we have∑i:fi=f

m(i | µ) 6 µ(f | µ),

which leads to (4.7).

Remark 4.11. If we only want the∗> part of the result, then E can be replaced with any

enumerated computable sequence of bounded computable functions. ♦

4.2. Infinite sequences. In this section, we get a nicer characterization of randomnesstests in terms of complexity, in special cases. Let MR(X) be the set of measures µ withµ(X) = R.

Theorem 4. Let X = Nω be the set of infinite sequences of natural numbers, with the producttopology. For all computable measures µ ∈ MR(X), for the deficiency of randomness dµ(x),we have

dµ(x) += supn

(− log µ(x6n)−H(x6n)

). (4.8)

Here, the constant in += depends on the computable measure µ.

We will be able to prove the+

> part of the statement in a more general space, and withoutassuming computability. Assume that a separating sequence b1, b2, . . . is given as defined inSubsection 2.3, along with the set X0. For each x ∈ X0, the binary sequence x1, x2, . . . hasbeen defined. Let

µ(Γs) = R−∑

µ(Γs′) : l(s) = l(s′), s′ 6= s .

Then (s, µ) 7→ µ(Γs) is lower semicomputable, and (s, µ) 7→ µ(Γs) is upper semicomputable.And, every time that the functions bi(x) form a regular partition for µ, we have µ(Γs) =µ(Γs) for all s. LetM0

R(X) be the set of those measures µ inMR(X) for which µ(XrX0) =0.

16 PETER GACS

Theorem 5. Suppose that the space X is compact. Then for all computable measures µ ∈M0

R(X), for the deficiency of randomness dµ(x), the characterization (4.8) holds.

For arbitrary measures and spaces, we can say a little less:

Proposition 4.12. For all measures µ ∈ MR(X), for the deficiency of randomness dµ(x), wehave

dµ(x)+

> supn

(− log µ(x6n)−H(x6n | µ)

). (4.9)

Proof. Consider the function

fµ(x) =∑

s

1Γs(x)m(s | µ)µ(Γs)

=∑

n

m(x6n | µ)µ(x6n)

> supn

m(x6n | µ)µ(x6n)

.

The function (µ, x) 7→ fµ(x) is clearly lower semicomputable and satisfies µxfµ(x) 6 1, andhence

dµ(x)+

> log f(x)+

> supn

(− log µ(x6n)−H(x6n | µ)

).

Proof of Theorem 4. For binary sequences instead of sequences of natural numbers, the part+

> of the inequality follows directly from Proposition 4.12: indeed, look at Examples 2.9.For sequences of natural numbers, the proof is completely analogous.

The proof of+

< reproduces the proof of Theorem 5.2 of [7]. The computability of µimplies that t(x) = tµ(x) is lower semicomputable. Let us first replace t(x) with a rougherversion:

t′(x) = max 2n : 2n < tµ(x) .Then t′(x) ∗= t(x), and it takes only values of the form 2n. It is also lower semicomputable.Let us abbreviate:

1y(x) = 1xNω(x), µ(y) = µ(yNω).For every lower semicomputable function f over Nω, there are computable sequences yi ∈N∗ and ri ∈ Q with f(x) = supi ri1yi(x), with the additional property that if i < j and1yi(x) = 1yj (x) = 1 then ri < rj . Since t′(x) only takes values of the form 2n, there arecomputable sequences yi ∈ B∗ and ki ∈ N with

t′(x) = supi

2ki1yi(x) ∗=∑

i

2ki1yi(x),

with the property that if i < j and 1yi(x) = 1yj (x) = 1 then ki < kj . The equality ∗= followseasily from the fact that for any finite sequence n1 < n2 < . . . ,

∑j 2nj 6 2 maxj 2nj . Since

µt′∗< 1, we have

∑i 2

kiµ(yi)∗< 1. Since the function i → 2kiµ(yi) is computable, this

implies 2kiµ(yi)∗< m(i), 2ki

∗< m(i)/m(yi). Thus,

t(x)∗< sup

i1yi(x)

m(i)µ(yi)

.

For y ∈ N∗ we certainly have H(y)+

< infy=yi H(i), which implies supy=yim(i) 6 m(y). It

follows that

t(x)∗< sup

y∈B∗1y(x)

m(y)µ(y)

= supn

m(x6n)µ(x6n)

.

Taking logarithms, we obtain the+

< part of the theorem.


Proof of Theorem 5. The proof of part+

> of the inequality follows directly from Proposi-tion 4.12, just as in the proof of Theorem 4.

The proof of+

< is also similar to the proof of that theorem. The only part that needs tobe reproved is the statement that for every lower semicomputable function f over X, thereare computable sequences yi ∈ B∗ and qi ∈ Q with f(x) = supi qi1yi(x). This follows now,since according to Proposition 2.10, the cells Γy form a basis of the space X.

5. NEUTRAL MEASURE

Let tµ(x) be our universal uniform randomness test. We call a measure M neutral iftM (x) 6 1 for all x. If M is neutral then no experimental outcome x could refute thetheory (hypothesis, model) that M is the underlying measure to our experiments. It can beused as “apriori probability”, in a Bayesian approach to statistics. Levin’s theorem says thefollowing:

Theorem 6. If the space X is compact then there is a neutral measure over X.

The proof relies on a nontrivial combinatorial fact, Sperner’s Lemma, which also un-derlies the proof of the Brouwer fixpoint theorem. Here is a version of Sperner’s Lemma,spelled out in continuous form:

Proposition 5.1 (see for example [20]). Let p1, . . . , pk be points of some finite-dimensionalspace Rn. Suppose that there are closed sets F1, . . . , Fk with the property that for every subset1 6 i1 < · · · < ij 6 k of the indices, the simplex S(pi1 , . . . , pij ) spanned by pi1 , . . . , pij iscovered by the union Fi1 ∪ · · · ∪ Fij . Then the intersection

⋂i Fi of all these sets is not empty.

The following lemma will also be needed.

Lemma 5.2. For every closed set A ⊂ X and measure µ, if µ(A) = 1 then there is a pointx ∈ A with tµ(x) 6 1.

Proof. This follows easily from µ tµ = µx1A(x)tµ(x) 6 1.

Proof of Theorem 6. For every point x ∈ X, let Fx be the set of measures for which tµ(x) 61. If we show that for every finite set of points x1, . . . , xk, we have

Fx1 ∩ · · · ∩ Fxk6= ∅, (5.1)

then we will be done. Indeed, according to Proposition B.18, the compactness of X impliesthe compactness of the space M(X) of measures. Therefore if every finite subset of thefamily Fx : x ∈ X of closed sets has a nonempty intersection, then the whole family hasa nonempty intersection: this intersection consists of the neutral measures.

To show (5.1), let S(x1, . . . , xk) be the set of probability measures concentrated onx1, . . . , xk. Lemma 5.2 implies that each such measure belongs to one of the sets Fxi .Hence S(x1, . . . , xk) ⊂ Fx1 ∪ · · · ∪ Fxk

, and the same holds for every subset of the indices1, . . . , k. Sperner’s Lemma 5.1 implies Fx1 ∩ · · · ∩ Fxk

6= ∅.

When the space is not compact, there are generally no neutral probability measures, asshown by the following example.

Proposition 5.3. Over the discrete space X = N of natural numbers, there is no neutralmeasure.

18 PETER GACS

Proof. It is sufficient to construct a randomness test tµ(x) with the property that for everymeasure µ, we have supx tµ(x) = ∞. Let

tµ(x) = sup k ∈ N :∑y<x

µ(y) > 1− 2−k . (5.2)

By its construction, this is a lower semicomputable function with supx tµ(x) = ∞. It is atest if

∑x µ(x)tµ(x) 6 1. We have∑

x

µ(x)tµ(x) =∑k>0

∑tµ(x)>k

µ(x) <∑k>0

2−k 6 1.

Using a similar construction over the space Nω of infinite sequences of natural numbers,we could show that for every measure µ there is a sequence x with tµ(x) = ∞.

Proposition 5.3 is a little misleading, since as a locally compact set, N can be compactifiedinto N = N ∪ ∞ (as in Part 1 of Example A.3). Theorem 6 implies that there is a neutralprobability measure M over the compactified space N. Its restriction to N is, of course, nota probability measure, since it satisfies only

∑x<∞M(x) 6 1. We called these functions

semimeasures.

Remark 5.4.1. It is easy to see that Theorem 3 characterizing randomness in terms of complexity holds

also for the space N.2. The topological space of semimeasures over N is not compact, and there is no neutral

one among them. Its topology is not the same as what we get when we restrict thetopology of probability measures over N to N. The difference is that over N, for examplethe set of measures µ : µ(N) > 1/2 is closed, since N (as the whole space) is a closedset. But over N, this set is not closed.

♦

Neutral measures are not too simple, even over N, as the following theorem shows.

Theorem 7. There is no neutral measure over N that is upper semicomputable over N or lowersemicomputable over N.

Proof. Let us assume that ν is a measure that is upper semicomputable over N. Then the set

(x, r) : x ∈ N, r ∈ Q, ν(x) < r is recursively enumerable: let (xi, ri) be a particular enumeration. For each n, let i(n) bethe first i with ri < 2−n, and let yn = xi(n). Then ν(yn) < 2−n, and at the same time

H(yn)+

< H(n). As mentioned, in Remark 5.4, Theorem 3 characterizing randomness interms of complexity holds also for the space N. Thus,

dν(yn) += − log ν(yn)−H(yn | ν)+

> n−H(n).

Suppose now that ν is lower semicomputable over N. The proof for this case is longer.We know that ν is the monotonic limit of a recursive sequence i 7→ νi(x) of recursivesemimeasures with rational values νi(x). For every k = 0, . . . , 2n − 2, let

Vn,k = µ ∈M(N) : k · 2−n < µ(0, . . . , 2n − 1) < (k + 2) · 2−n ,J = (n, k) : k · 2−n < ν(0, . . . , 2n − 1) .


The set J is recursively enumerable. Let us define the functions j : J → N and x : J →0, . . . , 2n − 1 as follows: j(n, k) is the smallest i with νi(0, . . . , 2n − 1) > k · 2−n, and

xn,k = min y < 2n : νj(n,k)(y) < 2−n+1 .

Let us define the function fµ(x, n, k) as follows. We set fµ(x, n, k) = 2n−2 if the followingconditions hold:(a) µ ∈ Vn,k;(b) µ(x) < 2−n+2;(c) (n, k) ∈ J and x = xn,k.Otherwise, fµ(x, n, k) = 0. Clearly, the function (µ, x, n, k) 7→ fµ(x, n, k) is lower semicom-putable. Condition (b) implies∑

y

µ(y)fµ(y, n, k) 6 µ(xn,k)fµ(xn,k, n, k) < 2−n+2 · 2n−2 = 1. (5.3)

Let us show that ν ∈ Vn,k implies

fν(xn,k, n, k) = 2n−2. (5.4)

Consider x = xn,k. Conditions (a) and (c) are satisfied by definition. Let us show thatcondition (b) is also satisfied. Let j = j(n, k). By definition, we have νj(x) < 2−n+1. Sinceby definition νj ∈ Vn,k and νj 6 ν ∈ Vn,k, we have

ν(x) 6 νj(x) + 2−n+1 < 2−n+1 + 2−n+1 = 2−n+2.

Since all three conditions (a), (b) and (c) are satisfied, we have shown (5.4). Now wedefine

gµ(x) =∑n>2

1n(n + 1)

∑k

fµ(x, n, k).

Let us prove that gµ(x) is a uniform test. It is lower semicomputable by definition, so weonly need to prove

∑x µ(x)fµ(x) 6 1. For this, let In,µ = k : µ ∈ Vn,k . Clearly by

definition, |In,µ| 6 2. We have, using this last fact and the test property (5.3):∑x

µ(x)gµ(x) =∑n>2

1n(n + 1)

∑k∈In,µ

∑x

µ(x)fµ(x, n, k) 6∑n>2

1n(n + 1)

· 2 6 1.

Thus, gµ(x) is a uniform test. If ν ∈ Vn,k then we have

tν(xn,k)∗> gν(xn,k) >

1n(n + 1)

fµ(xn,k, n, k) >2n−2

n(n + 1).

Hence ν is not neutral.

Remark 5.5. In [12] and [13], Levin imposed extra conditions on tests which allow to find alower semicomputable neutral semimeasure. A typical (doubtless reasonable) consequenceof these conditions would be that if outcome x is random with respect to measures µ and νthen it is also random with respect to (µ + ν)/2. ♦

Remark 5.6. The universal lower semicomputable semimeasure m(x) has a certain propertysimilar to neutrality. According to Theorem 3, for every computable measure µ we havedµ(x) += − log µ(x) − H(x) (where the constant in += depends on µ). So, for computablemeasures, the expression

dµ(x) = − log µ(x)−H(x) (5.5)

20 PETER GACS

can serve as a reasonable deficiency of randomness. (We will also use the test t = 2d.) If wesubstitute m for µ in dµ(x), we get 0. This substitution is not justified, of course. The factthat m is not a probability measure can be helped, at least over N, using compactificationas above, and extending the notion of randomness tests. But the test dµ can replace dµ

only for computable µ, while m is not computable. Anyway, this is the sense in which alloutcomes might be considered random with respect to m, and the heuristic sense in whichm may still be considered “neutral”. ♦

Remark 5.7. Solomonoff proposed the use of a universal lower semicomputable semimea-sure (actually, a closely related structure) for inductive inference in [18]. He proved in [19]that sequences emitted by any computable probability distribution can be predicted well byhis scheme. It may be interesting to see whether the same prediction scheme has strongerproperties when used with the truly neutral measure M of the present paper. ♦

6. RELATIVE ENTROPY

Some properties of description complexity make it a good expression of the idea of indi-vidual information content.

6.1. Entropy. The entropy of a discrete probability distribution µ is defined as

H(µ) = −∑

x

µ(x) log µ(x).

To generalize entropy to continuous distributions the relative entropy is defined as follows.Let µ, ν be two measures, where µ is taken (typically, but not always), to be a probabil-ity measure, and ν another measure, that can also be a probability measure but is mostfrequently not. We define the relative entropy Hν(µ) as follows. If µ is not absolutely con-tinuous with respect to ν then Hν(µ) = −∞. Otherwise, writing

dµ

dν=

µ(dx)ν(dx)

=: f(x)

for the (Radon-Nikodym) derivative (density) of µ with respect to ν, we define

Hν(µ) = −∫

logdµ

dνdµ = −µx log

µ(dx)ν(dx)

= −νxf(x) log f(x).

Thus, H(µ) = H#(µ) is a special case.

Example 6.1. Let f(x) be a probability density function for the distribution µ over the realline, and let λ be the Lebesgue measure there. Then

Hλ(µ) = −∫

f(x) log f(x)dx.

♦

In information theory and statistics, when both µ and ν are probability measures, then−Hν(µ) is also denoted D(µ ‖ ν), and called (after Kullback) the information divergenceof the two measures. It is frequently used in the role of a distance between µ and ν. It isnot symmetric, but can be shown to obey the triangle inequality, and to be nonnegative. Letus prove the latter property: in our terms, it says that relative entropy is nonpositive whenboth µ and ν are probability measures.


Proposition 6.2. Over a space X, we have

Hν(µ) 6 −µ(X) logµ(X)ν(X)

. (6.1)

In particular, if µ(X) > ν(X) then Hν(µ) 6 0.

Proof. The inequality −a ln a 6 −a ln b + b − a expresses the concavity of the logarithmfunction. Substituting a = f(x) and b = µ(X)/ν(X) and integrating by ν:

(ln 2)Hν(µ) = −νxf(x) ln f(x) 6 −µ(X) lnµ(X)ν(X)

+µ(X)ν(X)

ν(X)− µ(X) = −µ(X) lnµ(X)ν(X)

,

giving (6.1).

The following theorem generalizes an earlier known theorem stating that over a discretespace, for a computable measure, entropy is within an additive constant the same as “aver-age complexity”: H(µ) += µxH(x).

Theorem 8. Let µ be a probability measure. Then we have

Hν(µ) 6 µxHν(x | µ). (6.2)

If X is a discrete space then the following estimate also holds:

Hν(µ)+

> µxHν(x | µ). (6.3)

Proof. Let δ be the measure with density tν(x | µ) with respect to ν: tν(x | µ) = δ(dx)ν(dx) . Then

δ(X) 6 1. It is easy to see from the maximality property of tν(x | µ) that tν(x | µ) > 0,

therefore according to Proposition B.9, we have ν(dx)δ(dx) =

(δ(dx)ν(dx)

)−1. Using Proposition B.9

and 6.2:

Hν(µ) = −µx logµ(dx)ν(dx)

,

−µxHν(x | µ) = µx logδ(dx)ν(dx)

= −µx logν(dx)δ(dx)

,

Hν(µ)− µxHν(x | µ) = −µx logµ(dx)δ(dx)

6 −µ(X) logµ(X)δ(X)

6 0.

This proves (6.2).Over a discrete space X, the function (x, µ, ν) 7→ µ(dx)

ν(dx) = µ(x)ν(x) is computable, there-

fore by the maximality property of Hν(x | µ) we have µ(dx)ν(dx)

∗< tν(x | µ), hence Hν(µ) =

−µx log µ(dx)ν(dx)

+

> µxHν(x | µ).

6.2. Addition theorem. The most important information-theoretical property of descrip-tion complexity is the following theorem (see for example [14]):

Proposition 6.3 (Addition Theorem). We have H(x, y) += H(x) + H(y | x,H(x)).

Mutual information is defined as I(x : y) = H(x) + H(y) − H(x, y). By the Additiontheorem, we have I(x : y) += H(y)−H(y | x, H(x)) += H(x)−H(x | y, H(y)). The two latterexpressions show that in some sense, I(x : y) is the information held in x about y as well asthe information held in y about x. (The terms H(x), H(y) in the conditions are logarithmic-sized corrections to this idea.) Using (5.5), it is interesting to view mutual information

22 PETER GACS

I(x : y) as a deficiency of randomness of the pair (x, y) in terms of the expression dµ, withrespect to m×m:

I(x : y) = H(x) + H(y)−H(x, y) = dm×m(x, y).

Taking m as a kind of “neutral” probability, even if it is not quite such, allows us to viewI(x : y) as a “deficiency of independence”. Is it also true that I(x : y) += dm×m(x)? Thiswould allow us to deduce, as Levin did, “information conservation” laws from randomnessconservation laws.1

Expression dm×m(x) must be understood again in the sense of compactification, as inSection 5. There seem to be two reasonable ways to compactify the space N×N: we eithercompactify it directly, by adding a symbol ∞, or we form the product N×N. With either ofthem, preserving Theorem 3, we would have to check whether H(x, y | m×m) += H(x, y).But, knowing the function m(x)×m(y) we know the function x 7→ m(x) ∗= m(x)×m(0),hence also the function (x, y) 7→ m(x, y) = m(〈x, y〉), where 〈x, y〉 is any fixed computablepairing function. Using this knowledge, it is possible to develop an argument similar to theproof of Theorem 7, showing that H(x, y | m×m) += H(x, y) does not hold.

Question 1. Is there a neutral measure M with the property I(x : y) = dM×M (x, y)? Is thistrue maybe for all neutral measures M? If not, how far apart are the expressions dM×M (x, y)and I(x : y) from each other?

The Addition Theorem (Proposition 6.3) can be generalized to the algorithmic entropyHµ(x) introduced in (4.1) (a somewhat similar generalization appeared in [23]). The gen-eralization, defining Hµ,ν = Hµ×ν , is

Hµ,ν(x, y) += Hµ(x | ν) + Hν(y | x, Hµ(x | ν), µ). (6.4)

Before proving the general addition theorem, we establish a few useful facts.

Proposition 6.4. We have

Hµ(x | ν)+

< − log νy2−Hµ,ν(x,y).

Proof. The function f(x, µ, ν) that is the right-hand side, is upper semicomputable by def-inition, and obeys µx2−f(x,µ,ν) 6 1. Therefore the inequality follows from the minimumproperty of Hµ(x).

Let us generalize the minimum property of Hµ(x).

Proposition 6.5. Let (x, y, ν) 7→ fν(x, y) be a nonnegative lower semicomputable functionwith Fν(x) = log νyfν(x, y). Then for all x with Fν(x) > −∞ we have

Hν(y | x, bFν(x)c)+

< − log fν(x, y) + Fν(x).

Proof. Let us construct a lower semicomputable function (x, y, m, ν) 7→ gν(x, y, m) for in-tegers m with the property that νygν(x, y, m) 6 2−m, and for all x with Fν(x) 6 −mwe have gν(x, y, m) = fν(x, y). Such a g can be constructed by watching the approx-imation of f grow and cutting it off as soon as it would give Fν(x) > −m. Now(x, y, m, ν) 7→ 2mgν(x, y, m) is a uniform conditional test of y and hence it is

∗< 2−Hν(y|x,m).

To finish the proof, substitute −bFν(x)c for m and rearrange.

1We cannot use the test tµ for this, since—as it can be shown easily–it does not obey randomnessconservation.


Let z ∈ N, then the inequality

Hµ(x)+

< H(z) + Hµ(x | z) (6.5)

will be a simple consequence of the general addition theorem. The following lemma,needed in the proof of the theorem, generalizes this inequality somewhat:

Lemma 6.6. For a computable function (y, z) 7→ f(y, z) over N, we have

Hµ(x | y)+

< H(z) + Hµ(x | f(y, z)).

Proof. The function

(x, y, µ) 7→ gµ(x, y) =∑

z

2−Hµ(x|f(y,z))−H(z)

is lower semicomputable, and µxgµ(x, y) 6∑

z 2−H(z) 6 1. Hence gµ(x, y)∗< 2−Hµ(x|y). The

left-hand side is a sum, hence the inequality holds for each element of the sum: just whatwe had to prove.

As mentioned above, the theory generalizes to measures that are not probability mea-sures. Taking fµ(x, y) = 1 in Proposition 6.5 gives the inequality

Hµ(x | blog µ(X)c)+

< log µ(X),

with a physical meaning when µ is the phase space measure. Using (6.5), this implies

Hµ(x)+

< log µ(X) + H(blog µ(X)c). (6.6)

The following simple monotonicity property will be needed:

Lemma 6.7. For i < j we have

i + Hµ(x | i)+

< j + Hµ(x | j).

Proof. From Lemma 6.6, with f(i, n) = i + n we have

Hµ(x | i)−Hµ(x | j)+

< H(j − i)+

< j − i.

Theorem 9 (General addition). The following inequality holds:

Hµ,ν(x, y) += Hµ(x | ν) + Hν(y | x, Hµ(x | ν), µ).

Proof. To prove the inequality+

<, let us define

Gµ,ν(x, y, m) = mini>m

i + Hν(y | x, i, µ).

Function Gµ,ν(x, y, m) is upper semicomputable and decreasing in m. Therefore

Gµ,ν(x, y) = Gµ,ν(x, y, Hµ(x | ν))

is also upper semicomputable since it is obtained by substituting an upper semicomputablefunction for m in Gµ,ν(x, y, m). Lemma 6.7 implies

Gµ,ν(x, y, m) += m + Hν(y | x,m, µ),

Gµ,ν(x, y) += Hµ(x | ν) + Hν(y | x,Hµ(x | ν), µ).

24 PETER GACS

Now, we have

νy2−m−Hν(y|x,m,µ) 6 2−m,

νy2−Gµ,ν(x,y) ∗< 2−Hµ(x|µ).

Therefore µxνy2−G ∗< 1, implying Hµ,ν(x, y)

+

< Gµ,ν(x, y) by the minimality property of

Hµ,ν(x, y). This proves the+

< half of our theorem.

To prove the inequality+

>, let

fν(x, y, µ) = 2−Hµ,ν(x,y),

Fν(x, µ) = log νyfν(x, y, µ).

According to Proposition 6.5,

Hν(y | x, bF c, µ)+

< − log fν(x, y, µ) + Fν(x, µ),

Hµ,ν(x, y)+

> −F + Hν(y | x, d−F e, µ).

Proposition 6.4 implies −Fν(x, µ)+

> Hµ(x | ν). The monotonity lemma 6.7 implies from

here the+

> half of the theorem.

6.3. Some special cases of the addition theorem; information. The function Hµ(·) be-haves quite differently for different kinds of measures µ. Recall the following property ofcomplexity:

H(f(x) | y)+

< H(x | g(y))+

< H(x). (6.7)for any computable functions f, g This implies

H(y)+

< H(x, y).

In contrast, if µ is a probability measure then

Hν(y)+

> Hµ,ν(x, y).

This comes from the fact that 2−Hν(y) is a test for µ× ν.Let us explore some of the consequences and meanings of the additivity property. As

noted in (4.2), the subscript µ can always be added to the condition: Hµ(x) += Hµ(x | µ).Similarly, we have

Hµ,ν(x, y) := Hµ×ν(x, y) += Hµ×ν(x, y | µ× ν) += Hµ×ν(x, y | µ, ν) =: Hµ,ν(x, y | µ, ν),

where only before-last inequality requires new (easy) consideration.Let us assume that X = Y = Σ∗, the discrete space of all strings. With general µ, ν such

that µ(x), ν(x) 6= 0 for all x, using (4.6), the addition theorem specializes to the ordinaryaddition theorem, conditioned on µ, ν:

H(x, y | µ, ν) += H(x | µ, ν) + H(y | x, H(x | µ, ν), µ, ν).

In particular, whenever µ, ν are computable, this is just the regular addition theorem.Just as above, we defined mutual information as I(x : y) = H(x) + H(y) −H(x, y), the

new addition theorem suggests a more general definition

Iµ,ν(x : y) = Hµ(x | ν) + Hν(y | µ)−Hµ,ν(x, y).

In the discrete case X = Y = Σ∗ with everywhere positive µ(x), ν(x), this simplifies to

Iµ,ν(x : y) = H(x | µ, ν) + H(y | µ, ν)−H(x, y|µ, ν),


which is += I(x : y) in case of computable µ, ν. How different can it be for non-computableµ, ν?

In the general case, even for computable µ, ν, it seems worth finding out how much thisexpression depends on the choice of µ, ν. Can one arrive at a general, natural definition ofmutual information along this path?

7. CONCLUSION

When uniform randomness tests are defined in as general a form as they were here,the theory of information conservation does not fit nicely into the theory of randomnessconservation as it did with [12] and [13]. Still, it is worth laying the theory onto broadfoundations that, we hope, can serve as a basis for further development.

APPENDIX A. TOPOLOGICAL SPACES

Given two sets X, Y , a partial function f from X to Y , defined on a subset of Y , will bedenoted as

f :⊆ X → Y.

A.1. Topology. A topology on a set X is defined by a class τ of its subsets called opensets. It is required that the empty set and X are open, and that arbitrary union and finiteintersection of open sets is open. The pair (X, τ) is called a topological space. A topologyτ ′ on X is called larger, or finer than τ if τ ′ ⊇ τ . A set is called closed if its complement isopen. A set B is called the neighborhood of a set A if B contains an open set that containsA. We denote by A,Ao the closure (the intersection of all closed sets containing A) and theinterior of A (the union of all open sets in A) respectively. Let

∂A = A r Ao

denote the boundary of set A. A base is a subset β of τ such that every open set is the unionof some elements of β. A neighborhood of a point is a base element containing it. A baseof neighborhoods of a point x is a set N of neighborhoods of x with the property that eachneighborhood of x contains an element of N . A subbase is a subset σ of τ such that everyopen set is the union of finite intersections from σ.

Examples A.1.

1. Let X be a set, and let β be the set of all points of X. The topology with base β is thediscrete topology of the set X. In this topology, every subset of X is open (and closed).

2. Let X be the real line R, and let βR be the set of all open intervals (a; b). The topologyτR obtained from this base is the usual topology of the real line. When we refer to R as atopological space without qualification, this is the topology we will always have in mind.

3. Let X = R = R∪−∞,∞, and let βR consist of all open intervals (a; b) and in additionof all intervals of the forms [−∞; a) and (a; ∞]. It is clear how the space R+ is defined.

4. Let X be the real line R. Let β>R be the set of all open intervals (−∞; b). The topology

with base β>R is also a topology of the real line, different from the usual one. Similarly,

let β<R be the set of all open intervals (b; ∞).

5. On the space Σω, let τC = AΣω : A ⊆ Σ∗ be called the topology of the Cantor space(over Σ).

♦

26 PETER GACS

A set is called a Gδ set if it is a countable intersection of open sets, and it is an Fσ set if itis a countable union of closed sets.

For two topologies τ1, τ2 over the same set X, we define the topology τ1 ∨ τ2 = τ1 ∩ τ2,and τ1∧ τ2 as the smallest topology containing τ1∪ τ2. In the example topologies of the realnumbers above, we have τR = τ<

R ∧ τ>R .

We will always require the topology to have at least the T0 property: every point isdetermined by the class of open sets containing it. This is the weakest one of a number ofother possible separation properties: both topologies of the real line in the example abovehave it. A stronger such property would be the T2 property: a space is called a Hausdorffspace, or T2 space, if for every pair of different points x, y there is a pair of disjoint opensets A,B with x ∈ A, y ∈ B. The real line with topology τ>

R in Example A.1.4 above is nota Hausdorff space. A space is Hausdorff if and only if every open set is the union of closedneighborhoods.

Given two topological spaces (Xi, τi) (i = 1, 2), a function f :⊆ X1 → X2 is calledcontinuous if for every open set G ⊂ X2 its inverse image f−1(G) is also open. If the topolo-gies τ1, τ2 are not clear from the context then we will call the function (τ1, τ2)-continuous.Clearly, the property remains the same if we require it only for all elements G of a subbaseof X2. If there are two continuous functions between X and Y that are inverses of eachother then the two spaces are called homeomorphic. We say that f is continuous at point xif for every neighborhood V of f(x) there is a neighborhood U of x with f(U) ⊆ V . Clearly,f is continuous if and only if it is continuous in each point.

A subspace of a topological space (X, τ) is defined by a subset Y ⊆ X, and the topologyτY = G ∩ Y : G ∈ τ , called the induced topology on Y . This is the smallest topology onY making the identity mapping x 7→ x continuous. A partial function f :⊆ X → Z withdom(f) = Y is continuous iff f : Y → Z is continuous.

For two topological spaces (Xi, τi) (i = 1, 2), we define the product topology on theirproduct X × Y : this is the topology defined by the subbase consisting of all sets G1 × X2

and all sets X1×G2 with Gi ∈ τi. The product topology is the smallest topology making theprojection functions (x, y) 7→ x, (x, y) 7→ y continuous. Given topological spaces X, Y, Z wecall a two-argument function f : X × Y 7→ Z continuous if it is continuous as a functionfrom X × Y to Z. The product topology is defined similarly for over the product

∏i∈I Xi

of an arbitrary number of spaces, indexed by some index set I. We say that a function is(τ1, . . . , τn, µ)-continuous if it is (τ1 × · · · × τn, µ)-continuous.

Examples A.2.1. The space R×R with the product topology has the usual topology of the Euclidean plane.2. Let X be a set with the discrete topology, and Xω the set of infinite sequences with

elements from X, with the product topology. A base of this topology is provided by allsets of the form uXω where u ∈ X∗. The elements of this base are closed as well as open.When X = 0, 1 then this topology is the usual topology of infinite binary sequences.

♦

A real function f : X1 → R is called continuous if it is (τ1, τR)-continuous. It is calledlower semicontinuous if it is (τ1, τ

<R )-continuous. The definition of upper semicontinuity is

similar. Clearly, f is continuous if and only if it is both lower and upper semicontinuous.The requirement of lower semicontinuity of f is that for each r ∈ R, the set x : f(x) > r is open. This can be seen to be equivalent to the requirement that the single set (x, r) :f(x) > r is open. It is easy to see that the supremum of any set of lower semicontinuousfunctions is lower semicontinuous.


Let (X, τ) be a topological space, and B a subset of X. An open cover of B is a family ofopen sets whose union contains B. A subset K of X is said to be compact if every open coverof K has a finite subcover. Compact sets have many important properties: for example, acontinuous function over a compact set is bounded.

Example A.3.

1. Every finite discrete space is compact. An infinite discrete space X = (X, τ) is notcompact, but it can be turned into a compact space X by adding a new element called∞: let X = X ∪ ∞, and τ = τ ∪ X r A : A ⊂ X closed . More generally, thissimple operation can be performed with every space that is locally compact, that each ofits points has a compact neighborhood.

2. In a finite-dimensional Euclidean space, every bounded closed set is compact.3. It is known that if (Xi)i∈I is a family of compact spaces then their direct product is also

compact.

♦

A subset K of X is said to be sequentially compact if every sequence in K has a convergentsubsequence with limit in K. The space is locally compact if every point has a compactneighborhood.

A.2. Metric spaces. In our examples for metric spaces, and later in our treatment of thespace of probability measures, we refer to [2]. A metric space is given by a set X and adistance function d : X×X → R+ satisfying the triangle inequality d(x, z) 6 d(x, y)+d(y, z)and also property that d(x, y) = 0 implies x = y. For r ∈ R+, the sets

B(x, r) = y : d(x, y) < r , B(x, r) = y : d(x, y) 6 r

are called the open and closed balls with radius r and center x. A metric space is alsoa topological space, with the base that is the set of all open balls. Over this space, thedistance function d(x, y) is obviously continuous. Each metric space is a Hausdorff space;moreover, it has the following stronger property. For every pair of different points x, y thereis a continuous function f : X → R with f(x) 6= f(y). (To see this, take f(z) = d(x, z).)This is called the T3 property. A metric space is bounded when d(x, y) has an upper boundon X. A topological space is called metrizable if its topology can be derived from somemetric space.

Notation. For an arbitrary set A and point x let

d(x, A) = infy∈A

d(x, y),

Aε = x : d(x,A) < ε . (A.1)

♦

Examples A.4.

1. The real line with the distance d(x, y) = |x − y| is a metric space. The topology of thisspace is the usual topology τR of the real line.

2. The space R × R with the Euclidean distance gives the same topology as the producttopology of R× R.

3. An arbitrary set X with the distance d(x, y) = 1 for all pairs x, y of different elements, isa metric space that induces the discrete topology on X.

28 PETER GACS

4. Let X be a bounded metric space, and let Y = Xω be the set of infinite sequencesx = (x1, x2, . . . ) with distance function dω(x, y) =

∑i 2−id(xi, yi). The topology of this

space is the same as the product topology defined in Example A.2.2.5. Let X be a metric space, and let Y = Xω be the set of infinite bounded sequences

x = (x1, x2, . . . ) with distance function d(x, y) = supi d(xi, yi).6. Let X be a metric space, and let C(X) be the set of bounded continuous functions over

X with distance function d′(f, g) = supx d(f(x), g(x)). A special case is C[0; 1] wherethe interval [0; 1] of real numbers has the usual metric.

7. Let l2 be the set of infinite sequences x = (x1, x2, . . . ) of real numbers with the propertythat

∑i x

2i < ∞. The metric is given by the distance d(x, y) = (

∑i |xi − yi|2)1/2.

♦

A topological space has the first countability property if each point has a countable base ofneighborhoods. Every metric space has the first countability property since we can restrictourselves to balls with rational radius. Given a topological space (X, τ) and a sequence x =(x1, x2, . . . ) of elements of X, we say that x converges to a point y if for every neighborhoodG of y there is a k such that for all m > k we have xm ∈ G. We will write y = limn→∞ xn.It is easy to show that if spaces (Xi, τi) (i = 1, 2) have the first countability property thena function f : X → Y is continuous if and only if for every convergent sequence (xn) wehave f(limn xn) = limn f(xn). A topological space has the second countability property if thewhole space has a countable base. For example, the space R has the second countabilityproperty for all three topologies τR, τ<

R , τ>R . Indeed, we also get a base if instead of taking

all intervals, we only take intervals with rational endpoints. On the other hand, the metricspace of Example A.4.5 does not have the second countability property. In a topologicalspace (X, τ), a set B of points is called dense at a point x if it intersects every neighborhoodof x. It is called everywhere dense, or dense, if it is dense at every point. A metric space iscalled separable if it has a countable everywhere dense subset. This property holds if andonly if the space as a topological space has the second countability property.

Example A.5. In Example A.4.6, for X = [0; 1], we can choose as our everywhere dense setthe set of all polynomials with rational coefficients, or alternatively, the set of all piecewiselinear functions whose graph has finitely many nodes at rational points. ♦

Let X be a metric space, and let C(X) be the set of bounded continuous functions overX with distance function d′(f, g) = supx d(f(x), g(x)). A special case is C[0; 1] where theinterval [0; 1] of real numbers has the usual metric.

Let (X, d) be a metric space, and a = (a1, a1, . . . ) an infinite sequence. A metric space iscalled complete if every Cauchy sequence in it has a limit. It is well-known that every metricspace can be embedded (as an everywhere dense subspace) into a complete space.

It is easy to see that in a metric space, every closed set is a Gδ set (and every open set isan Fσ set).

Example A.6. Consider the set D[0; 1] of functions over [0; 1] that are right continuous andhave left limits everywhere. The book [2] introduces two different metrics for them: theSkorohod metric d and the d0 metric. In both metrics, two functions are close if a slightmonotonic continuous deformation of the coordinate makes them uniformly close. But inthe d0 metric, the slope of the deformation must be close to 1. It is shown that the twometrics give rise to the same topology; however, the space with metric d is not complete,and the space with metric d0 is. ♦


Let (X, d) be a metric space. It can be shown that a subset K of X is compact if and onlyif it is sequentially compact. Also, K is compact if and only if it is closed and for every ε,there is a finite set of ε-balls (balls of radius ε) covering it.

We will develop the theory of randomness over separable complete metric spaces. Thisis a wide class of spaces encompassing most spaces of practical interest. The theory wouldbe simpler if we restricted it to compact or locally compact spaces; however, some impor-tant spaces like C[0; 1] (the set of continuouos functions over the interval [0; 1], with themaximum difference as their distance) are not locally compact.

Given a function f : X → Y between metric spaces and β > 0, let Lipβ(X, Y ) denote theset of functions (called the Lipschitz(β) functions, or simply Lipschitz functions) satisfying

dY (f(x), f(y)) 6 βdX(x, y). (A.2)

All these functions are uniformly continuous. Let Lip(X) = Lip(X, R) =⋃

β Lipβ be the setof real Lipschitz functions over X.

APPENDIX B. MEASURES

For a survey of measure theory, see for example [16].

B.1. Set algebras. A (Boolean set-) algebra is a set of subsets of some set X closed underintersection and complement (and then, of course, under union). It is a σ-algebra if it is alsoclosed under countable intersection (and then, of course, under countable union). A semi-algebra is a set L of subsets of some set X closed under intersection, with the property thatthe complement of every element of L is the disjoint union of a finite number of elementsof L. If L is a semialgebra then the set of finite unions of elements of L is an algebra.

Examples B.1.1. The set L1 of left-closed intervals of the line (including intervals of the form (−∞; a)) is

a semialgebra.2. The set L2 of all intervals of the line (which can be open, closed, left-closed or right-

closed), is a semialgebra.3. In the set 0, 1ω of infinite 0-1-sequences, the set L3 of all subsets of the form u0, 1ω

with u ∈ 0, 1∗, is a semialgebra.4. The σ-algebra B generated by L1, is the same as the one generated by L2, and is also the

same as the one generated by the set of all open sets: it is called the family of Borel setsof the line. The Borel sets of the extended real line R are defined similarly.

5. Given σ-algebras A,B in sets X, Y , the product σ-algebra A × B in the space X × Y isthe one generated by all elements A× Y and X ×B for A ∈ A and B ∈ B.

♦B.2. Measures. A measurable space is a pair (X,S) where S is a σ-algebra of sets of X.A measure on a measurable space (X,S) is a function µ : B → R+ that is σ-additive:this means that for every countable family A1, A2, . . . of disjoint elements of S we haveµ(

⋃i Ai) =

∑i µ(Ai). A measure µ is σ-finite if the whole space is the union of a countable

set of subsets whose measure is finite. It is finite if µ(X) < ∞. It is a probability measure ifµ(X) = 1.

It is important to understand how a measure can be defined in practice. Algebras aregenerally simpler to grasp constructively than σ-algebras; semialgebras are yet simpler.Suppose that µ is defined over a semialgebra L and is additive. Then it can always beuniquely extended to an additive function over the algebra generated by L. The followingis an important theorem of measure theory.

30 PETER GACS

Proposition B.2. Suppose that a nonnegative set function defined over a semialgebra L isσ-additive. Then it can be extended uniquely to the σ-algebra generated by L.

Examples B.3.1. Let x be point and let µ(A) = 1 if x ∈ A and 0 otherwise. In this case, we say that µ is

concentrated on the point x.2. Consider the the line R, and the algebra L1 defined in Example B.1.1. Let f : R → R be

a monotonic real function. We define a set function over L1 as follows. Let [ai; bi), (i =1, . . . , n) be a set of disjoint left-closed intervals. Then µ(

⋃i[ai; bi)) =

∑i f(bi) − f(ai).

It is easy to see that µ is additive. It is σ-additive if and only if f is left-continuous.3. Let B = 0, 1, and consider the set Bω of infinite 0-1-sequences, and the semialgebraL3 of Example B.1.3. Let µ : B∗ → R+ be a function. Let us write µ(uBω) = µ(u)for all u ∈ B∗. Then it can be shown that the following conditions are equivalent: µ isσ-additive over L3; it is additive over L3; the equation µ(u) = µ(u0) + µ(u1) holds forall u ∈ B∗.

4. The nonnegative linear combination of any finite number of measures is also a measure.In this way, it is easy to construct arbitrary measures concentrated on a finite number ofpoints.

5. Given two measure spaces (X,A, µ) and (Y,B, ν) it is possible to define the productmeasure µ× ν over the measureable space (X × Y,A×B). The definition is required tosatisfy µ × ν(A × B) = µ(A) × ν(B), and is determined uniquely by this condition. If νis a probability measure then, of course, µ(A) = µ× ν(A× Y ).

♦

Remark B.4. Example B.3.3 shows a particularly attractive way to define measures. Keepsplitting the values µ(u) in an arbitrary way into µ(u0) and µ(u1), and the resulting valueson the semialgebra define a measure. Example B.3.2 is less attractive, since in the processof defining µ on all intervals and only keeping track of finite additivity, we may end upwith a monotonic function that is not left continuous, and thus with a measure that is notσ-additive. In the subsection on probability measures over a metric space, we will find thateven on the real line, there is a way to define measures in a step-by-step manner, and onlychecking for consistency along the way. ♦

A probability space is a triple (X,S, P ) where (X,S) is a measurable space and P is aprobability measure over it.

Let (Xi,Si) (i = 1, 2) be measurable spaces, and let f : X → Y be a mapping. Then fis measurable if and only if for each element B of S2, its inverse image f−1(B) is in S1. Ifµ1 is a measure over (X1,S1) then µ2 defined by µ2(A) = µ1(f−1(A)) is a measure over X2

called the measure induced by f .

B.3. Integral. A measurable function f : X → R is called a step function if its range isfinite. The set of step functions is closed with respect to linear combinations and also withrespect to the operations ∧,∨. Such a set of functions is called a Riesz space.

Given a step function which takes values xi on sets Ai, and a finite measure µ, we define

µ(f) = µf =∫

f dµ =∫

f(x)µ(dx) =∑

i

xiµ(Ai).

This is a linear positive functional on the set of step functions. Moreover, it can be shownthat it is continuous on monotonic sequences: if fi 0 then µfi 0. The converse can


also be shown: Let µ be a linear positive functional on step functions that is continuous onmonotonic sequences. Then the set function µ(A) = µ(1A) is a finite measure.

Proposition B.5. Let E be any Riesz space of functions with the property that 1 ∈ E . Let µbe a positive linear functional on E continuous on monotonic sequences, with µ1 = 1. Thefunctional µ can be extended to the set E+ of monotonic limits of nonnegative elements of E , bycontinuity. In case when E is the set of all step functions, the set E+ is the set of all nonnegativemeasurable functions.

Let us fix a finite measure µ over a measurable space (X,S). A measurable function fis called integrable with respect to µ if µ|f |+ < ∞ and µ|f |− < ∞. In this case, we defineµf = µ|f |+ − µ|f |−. The set of integrable functions is a Riesz space, and the positive linearfunctional µ on it is continuous with respect to monotonic sequences. The continuity overmonotonic sequences also implies the following bounded convergence theorem.

Proposition B.6. Suppose that functions fn are integrable and |fn| < g for some integrablefunction g. Then f = limn fn is integrable and µf = limn µfn.

Two measurables functions f, g are called equivalent with respect to µ if µ(f − g) = 0.For two-dimensional integration, the following theorem holds.

Proposition B.7. Suppose that function f(·, ·) is integrable over the space (X×Y,A×B, µ×ν).Then for µ-almost all x, the function f(x, ·) is integrable over (Y,B, ν), and the function x 7→νyf(x, y) is integrable over (X,A, µ) with (µ× ν)f = µxµyf .

B.4. Density. Let µ, ν be two measures over the same measurable space. We say that ν isabsolutely continuous with respect to µ, or that µ dominates ν, if for each set A, µ(A) = 0implies ν(A) = 0. It can be proved that this condition is equivalent to the condition thatthere is a positive real number c with ν 6 cµ. Every nonnegative integrable function fdefines a new measure ν via the formula ν(A) = µ(f · 1A). This measure ν is absolutelycontinuous with respect to µ. The Radon-Nikodym theorem says that the converse is alsotrue.

Proposition B.8 (Radon-Nikodym theorem). If ν is dominated by µ then there is a nonneg-ative integrable function f such that ν(A) = µ(f · 1A) for all measurable sets A. The functionf is defined uniquely to within equivalence with respect to µ.

The function f of the Radom-Nikodym Theorem above is called the density of ν withrespect to µ. We will denote it by

f(x) =µ(dx)ν(dx)

=dµ

dν.

The following theorem is also standard.

Proposition B.9.(a) Let µ, ν, η be measures such that η is absolutely continuous with respect to µ and µ is

absolutely continuous with respect to ν. Then the “chain rule” holds:dη

dν=

dη

dµ

dµ

dν. (B.1)

(b) If ν(dx)µ(dx) > 0 for all x then µ is also absolutely continuous with respect to ν and µ(dx)

ν(dx) =(ν(dx)µ(dx)

)−1.

32 PETER GACS

Let µ, ν be two measures, then both are dominated by some measure η (for example byη = µ + ν). Let their densities with respect to η be f and g. Then we define the totalvariation distance of the two measures as

D(µ, ν) = η(|f − g|).It is independent of the dominating measure η.

Example B.10. Suppose that the space X can be partitioned into disjoint sets A,B such thatν(A) = µ(B) = 0. Then D(µ, ν) = µ(A) + ν(B) = µ(X) + ν(X). ♦B.5. Random transitions. Let (X,A), (Y,B) be two measureable spaces (defined in Sub-section B.2). We follow the definition given in [16]. Suppose that a family of probabilitymeasures Λ = λx : x ∈ X on B is given. We call it a probability kernel, (or Markov kernel,or conditional distribution) if the map x 7→ λxB is measurable for each B ∈ B. When X, Yare finite sets then λ is a Markov transition matrix. The following theorem shows that λassigns a joint distribution over the space (X × Y,A× B) to each input distribution µ.

Proposition B.11. For each nonnegative A× B-measureable function f over X × Y ,1. the function y → f(x, y) is B-measurable for each fixed x;2. x → λy

xf(x, y) is A-measurable;3. the integral f → µxλy

xf(x, y) defines a measure on A× B.

According to this proposition, given a probability kernel Λ, to each measure µ over Acorresponds a measure over A× B. We will denote its marginal over B as

Λ∗µ. (B.2)

For every measurable function g(y) over Y , we can define the measurable function f(x) =λxg = λy

xg(y): we writef = Λg. (B.3)

The operator Λ is linear, and monotone with Λ1 = 1. By these definitions, we have

µ(Λg) = (Λ∗µ)g. (B.4)

Example B.12. Let h : X → Y be a measureable function, and let λx be the measureδh(x) concentrated on the point h(x). This operator, denoted Λh is, in fact, a deterministictransition, and we have Λhg = g h. In this case, we will simplify the notation as follows:

h∗µ = Λ∗h.

♦B.6. Probability measures over a metric space. We follow the exposition of [2]. When-ever we deal with probability measures on a metric space, we will assume that our metricspace is complete and separable (Polish). Let X = (X, d) be a complete separable metricspace. It gives rise to a measurable space, where the measurable sets are the Borel sets ofX. It can be shown that, if A is a Borel set and µ is a finite measure then there are setsF ⊆ A ⊆ G where F is an Fσ set, G is a Gδ set, and µ(F ) = µ(G). Let B be a base of opensets closed under intersections. Then it can be shown that µ is determined by its values onelements of B. The following proposition follows then essentially from Proposition B.2.

Proposition B.13. Let B∗ be the set algebra generated by the above base B, and let µ be anyσ-additive set function on B∗ with µ(X) = 1. Then µ can be extended uniquely to a probabilitymeasure.

We say that a set A is a continuity set of measure µ if µ(∂A) = 0: the boundary of A hasmeasure 0.


B.6.1. Weak topology. LetM(X)

be the set of probability measures on the metric space X. Let

δx

be a probability measure concentrated on point x. Let xn be a sequence of points convergingto point x but with xn 6= x. We would like to say that δxn converges to δx. But the totalvariation distance D(δxn , δx) is 2 for all n. This suggests that the total variation distance isnot generally the best way to compare probability measures over a metric space. We saythat a sequence of probability measures µn over a metric space (X, d) weakly converges tomeasure µ if for all bounded continuous real functions f over X we have µnf → µf . Thistopology of weak convergence (M, τw) can be defined using a number of different subbases.The one used in the original definition is the subbase consisting of all sets of the form

Af,c = µ : µf < c for bounded continuous functions f and real numbers c. We also get a subbase (see for ex-ample [16]) if we restrict ourselves to the set Lip(X) of Lipschitz functions defined in (A.2).Another possible subbase giving rise to the same topology consists of all sets of the form

BG,c = µ : µ(G) > c (B.5)

for open sets G and real numbers c. Let us find some countable subbases. Since the space Xis separable, there is a sequence U1, U2, . . . of open sets that forms a base. We can restrictthe subbase of the space of measures to those sets BG,c where G is the union of a finitenumber of base elements Ui and c is rational. Thus, the space (M, τw) itself has the secondcountability property.

It is more convenient to define a countable subbase using bounded continuous functionsf , since µ 7→ µf is continuous on such functions, while µ 7→ µU is typically not continuouswhen U is an open set. Let F0 be the set of functions introduced before (2.1). Let

F1

be the set of functions f with the property that f is the minimum of a finite number ofelements of F0. Note that each element f of F1 is bounded between 0 and 1, and from itsdefinition, we can compute a bound β such that f ∈ Lipβ .

Proposition B.14. The following conditions are equivalent:1. µn weakly converges to µ.2. µnf → µf for all f ∈ F1.3. For every Borel set A, that is a continuity set of µ, we have µn(A) → µ(A).4. For every closed set F , lim infn µn(F ) > µ(F ).5. For every open set G, lim supn µn(G) 6 µ(G).

As a subbaseσM (B.6)

for M(x), we choose the sets µ : µf < r and µ : µf > r for all f ∈ F1 and r ∈ Q. LetE be the set of functions introduced in (2.1). It is a Riesz space as defined in Subsection B.3.A reasoning combining Propositions B.2 and B.5 gives the following.

Proposition B.15. Suppose that a positive linear functional µ with µ1 = 1 is defined on Ethat is continuous with respect to monotone convergence. Then µ can be extended uniquely toa probability measure over X with µf =

∫f(x)µ(dx) for all f ∈ E .

34 PETER GACS

B.6.2. Prokhorov distance. The definition of measures in the style of Proposition B.15 is notsufficiently constructive. Consider a gradual definition of the measure µ, extending it tomore and more elements of E , while keeping the positivity and linearity property. It canhappen that the function µ we end up with in the limit, is not continuous with respect tomonotone convergence. Let us therefore metrize the space of measures: then an arbitrarymeasure can be defined as the limit of a Cauchy sequence of simple meaures.

One metric that generates the topology of weak convergence is the Prokhorov distancep(µ, ν): the infimum of all those ε for which, for all Borel sets A we have (using the nota-tion (A.1))

µ(A) 6 ν(Aε) + ε.

It can be shown that this is a distance and it generates the weak topology. The followingresult helps visualize this distance:

Proposition B.16 (Coupling Theorem, see [21]). Let µ, ν be two probability measures overa complete separable metric space X with p(µ, ν) 6 ε. Then there is a probability measure Pon the space X × X with marginals µ and ν such that for a pair of random variables (ξ, η)having joint distribution P we have

P d(ξ, η) > ε 6 ε.

Since this topology has the second countability property, the metric space defined by thedistance p(·, ·) is separable. This can also be seen directly. Let S be a countable everywheredense set of points in X. Consider the set of M0(X) of those probability measures thatare concentrated on finitely many points of S and assign rational values to them. It canbe shown that M0(X) is everywhere dense in the metric space (M(X), p); so, this space isseparable. It can also be shown that (M(X), p) is complete. Thus, a measure can be givenas the limit of a sequence of elements µ1, µ2, . . . of M0(X), where p(µi, µi+1) < 2−i.

The definition of the Prokhorov distance quantifies over all Borel sets. However, in animportant simple case, it can be handled efficiently.

Proposition B.17. Assume that measure ν is concentrated on a finite set of points S ⊂ X.Then the condition p(ν, µ) < ε is equivalent to the finite set of conditions

µ(Aε) > ν(A)− ε (B.7)

for all A ⊂ S.

B.6.3. Relative compactness. A set Π of measures in (M(X), p) is called relatively compact ifevery sequence of elements of Π contains a convergent subsequence. Relative compactnessis an important property for proving convergence of measures. It has a useful characteri-zation. A set of Π of measures is called tight if for every ε there is a compact set K suchthat µ(K) > 1− ε for all µ in Π. Prokhorov’s theorem states (under our assumptions of theseparability and completeness of (X, d)) that a set of measures is relatively compact if andonly if it is tight and if and only if its closure is compact in (M(X), p). In particular, thefollowing fact is known.

Proposition B.18. The space (M(X), p) of measures is compact if and only if the space (X, d)is compact.

So, if (X, d) is not compact then the set of measures is not compact. But still, eachmeasure µ is “almost” concentrated on a compact set. Indeed, the one-element set µ iscompact and therefore by Prokhorov’s theorem tight. Tightness says that for each ε a massof size 1− ε of µ is concentrated on some compact set.


APPENDIX C. COMPUTABLE ANALYSIS

If for some finite or infinite sequences x, y, z, w, we have z = wxy then we write w v z(w is a prefix of z) and x / z. For integers, we will use the toupling functions

〈i, j〉 =12(i + 1)(i + j + 1) + j, 〈n1, . . . , nk+1〉 = 〈〈n1, . . . , nk〉, nk+1〉.

Inverses: πki (n).

Unless said otherwise, the alphabet Σ is always assumed to contain the symbols 0 and 1.After [24], let us define the wrapping function ι : Σ∗ → Σ∗ by

ι(a1a2 · · · an) = 110a10a20 · · · an011. (C.1)

Note that|ι(x)| = (2|x|+ 5) ∨ 6. (C.2)

For strings x, xi ∈ Σ∗, p, pi ∈ Σω, k > 1, appropriate tupling functions 〈x1, . . . , xk〉, 〈x, p〉,〈p, x〉, etc. can be defined with the help of 〈·, ·〉 and ι(·).

C.1. Notation and representation. The concepts of notation and representation, as de-fined in [24], allow us to transfer computability properties from some standard spaces tomany others. Given a countable set C, a notation of C is a surjective partial mappingδ :⊆ N → C. Given some finite alphabet Σ ⊇ 0, 1 and an arbitrary set S, a represen-tation of S is a surjective mapping χ :⊆ Σω → S. A naming system is a notation or arepresentation. Here are some standard naming systems:1. id, the identity over Σ∗ or Σω.2. νN, νZ, νQ for the set of natural numbers, integers and rational numbers.3. Cf : Σω → 2N, the characteristic function representation of sets of natural numbers, is

defined by Cf(p) = i : p(i) = 1 .4. En : Σω → 2N, the enumeration representation of sets of natural numbers, is defined by

En(p) = w ∈ Σ∗ : 110n+111 / p .5. For ∆ ⊆ Σ, En∆ : Σω → 2∆∗

, the enumeration representation of subsets of ∆∗, is definedby En∆(p) = w ∈ Σ∗ : ι(w) / p .One can define names for all computable functions between spaces that are Cartesian

products of terms of the kind Σ∗ and Σω. Then, the notion of computability can be trans-ferred to other spaces as follows. Let δi : Yi → Xi, i = 1, 0 be naming systems of the spacesXi. Let f :⊆ X1 → X0, g :⊆ Y1 → Y0. We say that function g realizes function f if

f(δ1(y)) = δ0(g(y)) (C.3)

holds for all y for which the left-hand side is defined. Realization of multi-argument func-tions is defined similarly. We say that a function f : X1×X2 → X0 is (δ1, δ2, δ0)-computableif there is a computable function g :⊆ Y1 × Y2 → Y0 realizing it. In this case, a name for fis naturally derived from a name of g.2

For representations ξ, η, we write ξ 6 η if there is a computable function f :⊆ Σω → Σω

with ξ(x) = η(f(x)). In words, we say that ξ is reducible to η, or that f reduces (translates)ξ to η. There is a similar definition of reduction for notations. We write ξ ≡ η if ξ 6 η andη 6 ξ.

C.2. Constructive topological space.

2Any function g realizing f via (C.3) automatically has a certain extensivity property: if δ1(y) = δ1(y′) then

g(y) = g(y′).

36 PETER GACS

C.2.1. Definitions. Section A gives a review of topological concepts. A constructive topolog-ical space X = (X, σ, ν) is a topological space over a set X with a subbase σ effectivelygiven as a list σ = ν(1), ν(2), . . . , and having the T0 property (thus, every point is deter-mined uniquely by the subset of elements of σ containing it). By definition, a constructivetopological space satisfies the second countability axiom.3 We obtain a base

σ∩

of the space X by taking all possible finite intersections of elements of σ. It is easy toproduce an effective enumeration for σ∩ from ν. We will denote this enumeration by ν∩.

The product operation is defined over constructive topological spaces in the natural way.

Examples C.1.1. A discrete topological space, where the underlying set is finite or countably infinite, with

a fixed enumeration.2. The real line, choosing the base to be the open intervals with rational endpoints with

their natural enumeration. Product spaces can be formed to give the Euclidean plane aconstructive topology.

3. The real line R, with the subbase σ>R defined as the set of all open intervals (−∞; b) with

rational endpoints b. The subbase σ<R , defined similarly, leads to another topology. These

two topologies differ from each other and from the usual one on the real line, and theyare not Hausdorff spaces.

4. Let X be a set with a constructive discrete topology, and Xω the set of infinite sequenceswith elements from X, with the product topology: a natural enumerated basis is alsoeasy to define.

♦

Due to the T0 property, every point in our space is determined uniquely by the set of opensets containing it. Thus, there is a representation γX of X defined as follows. We say thatγX(p) = x if EnΣ(p) = w : x ∈ ν(w) . If γX(p) = x then we say that the infinite sequencep is a complete name of x: it encodes all names of all subbase elements containing x. Fromnow on, we will call γX the complete standard representation of the space X.4

C.2.2. Constructive open sets, computable functions. In a constructive topological spaceX = (X, σ, ν), a set G ⊆ X is called r.e. open in set B if there is a r.e. set E withG =

⋃w∈E ν∩(w) ∩ B. It is r.e. open if it is r.e. open in X. In the special kind of spaces

in which randomness has been developed until now, constructive open sets have a nicecharacterization:

Proposition C.2. Assume that the space X = (X, σ, ν) has the form Y1 × · · · × Yn whereeach Yi is either Σ∗ or Σω. Then a set G is r.e. open iff it is open and the set (w1, . . . , wn) :⋂

i ν(wi) ⊂ G is recursively enumerable.

Proof. The proof is not difficult, but it relies on the discrete nature of the space Σ∗ and onthe fact that the space Σω is compact and its base consists of sets that are open and closedat the same time.

3A constructive topological space is an effective topological space as defined in [24], but, for simplicity werequire the notation ν to be a total function.

4 The book [24] denotes γX as δ′X instead. We use γX only, dispensing with the notion of a “computable”topological space.


It is easy to see that if two sets are r.e. open then so is their union. The above remarkimplies that a space having the form Y1×· · ·×Yn where each Yi is either Σ∗ or Σω, also theintersection of two recursively open sets is recursively open. We will see that this statementholds, more generally, in all computable metric spaces.

Let Xi = (Xi, σi, νi) be constructive topological spaces, and let f :⊆ X1 → X0 be afunction. As we know, f is continuous iff the inverse image f−1(G) of each open set G isopen. Computability is an effective version of continuity: it requires that the inverse imageof subbase elements is uniformly constructively open. More precisely, f :⊆ X1 → X0 iscomputable if the set ⋃

V ∈σ∩0

f−1(V )× V

is a r.e. open subset of X1 × σ∩0 . Here the base σ∩0 of X0 is treated as a discrete construc-tive topological space, with its natural enumeration. This definition depends on the enu-merations ν1, ν0. The following theorem (taken from [24]) shows that this computabilitycoincides with the one obtained by transfer via the representations γXi .

Proposition C.3. For i = 0, 1, let Xi = (Xi, σi, νi) be constructive topological spaces. Then afunction f :⊆ X1 → X0 is computable iff it is (γX1 , γX0)-computable for the representationsγXi defined above.

As a name of a computable function, we can use the name of the enumeration algorithmderived from the definition of computability, or the name derivable using this representationtheorem.

Remark C.4. As in Proposition C.2, it would be nice to have the following statement, at leastfor total functions: “Function f : X1 → X0 is computable iff the set

(v, w) : ν∩1 (w) ⊂ f−1[ν0(v)]

is recursively enumerable.” But such a characterization seems to require compactness andpossibly more. ♦

Let us call two spaces X1 and X0 effectively homeomorphic if there are computable mapsbetween them that are inverses of each other. In the special case when X0 = X1, we saythat the enumerations of subbases ν0, ν1 are equivalent if the identity mapping is a effectivehomeomorphism. This means that there are recursively enumerable sets F,G such that

ν1(v) =⋃

(v,w)∈F

ν∩0 (w) for all v, ν0(w) =⋃

(w,v)∈G

ν∩1 (v) for all w.

Lower semicomputability is a constructive version of lower semicontinuity. Let X =(X, σ, ν) be a constructive topological space. A function f :⊆ X → R+ is called lower semi-computable if the set (x, r) : f(x) > r is r.e. open. Let Y = (R+, σ<

R , ν<R ) be the effective

topological space introduced in Example C.1.2, in which ν>R is an enumeration of all open

intervals of the form (r; ∞] with rational r. It can be seen that f is lower semicomputableiff it is (ν, ν>

R )-computable.

C.2.3. Computable elements and sequences. Let U = (0, σ0, ν0) be the one-element spaceturned into a trivial constructive topological space, and let X = (X, σ, ν) be another con-structive topological space. We say that an element x ∈ X is computable if the function0 7→ x is computable. It is easy to see that this is equivalent to the requirement that the setu : x ∈ ν(u) is recursively enumerable. Let Xj = (Xj , σj , νj), for i = 0, 1 be constructive

38 PETER GACS

topological spaces. A sequence fi, i = 1, 2, . . . of functions with fi : X1 → X0 is a com-putable sequence of computable functions if (i, x) 7→ fi(x) is a computable function. Usingthe s-m-n theorem of recursion theory, it is easy to see that this statement is equivalentto the statement that there is a recursive function computing from each i a name for thecomputable function fi. The proof of the following statement is not difficult.

Proposition C.5. Let Xi = (Xi, σi, νi) for i = 1, 2, 0 be constructive topological spaces, andlet f : X1 ×X2 → X0, and assume that x1 ∈ X1 is a computable element.1. If f is computable and then x2 7→ f(x1, x2) is also computable.2. If X0 = R, and f is lower semicomputable then x2 7→ f(x1, x2) is also lower semicom-

putable.

C.3. Computable metric space. Following [4], we define a computable metric space as atuple X = (X, d, D, α) where (X, d) is a metric space, with a countable dense subset D andan enumeration α of D. It is assumed that the real function d(α(v), α(w)) is computable.As x runs through elements of D and r through positive rational numbers, we obtain theenumeration of a countable basis B(x, r) : x ∈ D, r ∈ Q (of balls or radius r and centerx) of X, giving rise to a constructive topological space X. Let us call a sequence x1, x2, . . . aCauchy sequence if for all i < j we have d(xi, xj) 6 2−i. To connect to the type-2 theory ofcomputability developed above, the Cauchy-representation δX of the space can be definedin a natural way. It can be shown that as a representation of X, it is equivalent to γX:δX ≡ γX.

Example C.6. Example A.5 is a computable metric space, with either of the two (equivalent)choices for an enumerated dense set. ♦

Similarly to the definition of a computable sequence of computable functions in C.2.3,we can define the notion of a computable sequence of bounded computable functions, orthe computable sequence fi of computable Lipschitz functions: the bound and the Lipschitzconstant of fi are required to be computable from i. The following statement shows, in aneffective form, that a function is lower semicomputable if and only if it is the supremum ofa computable sequence of computable functions.

Proposition C.7. Let X be a computable metric space. There is a computable mapping that toeach name of a nonnegative lower semicomputable function f assigns a name of a computablesequence of computable bounded Lipschitz functions fi whose supremum is f .

Proof sketch. Show that f is the supremum of a computable sequence of computable func-tions ci1B(ui,ri) where ui ∈ D and ci, ri > 0 are rational. Clearly, each indicator function1B(ui,ri) is the supremum of a computable sequence of computable functions gi,j . We havef = supn fn where fn = maxi6n cigi,n. It is easy to see that the bounds on the functionsfn are computable from n and that they all are in Lipβn

for a βn that is computable fromn.

The following is also worth noting.

Proposition C.8. In a computable metric space, the intersection of two r.e. open sets is r.e. open.

Proof. Let β = B(x, r) : x ∈ D, r ∈ Q be a basis of our space. For a pair (x, r) withx ∈ D, r ∈ Q, let

Γ(x, r) = (y, s) : y ∈ D, s ∈ Q, d(x, y) + s < r .


If U is a r.e. open set, then there is a r.e. set SU ⊂ D × Q with U =⋃

(x,r)∈SUB(x, r). Let

S′U =⋃Γ(x, r) : (x, r) ∈ SU , then we have U =

⋃(x,r)∈S′

UB(x, r). Now, it is easy to see

U ∩ V =⋃

(x,r)∈S′U∩S′

V

B(x, r).

REFERENCES

[1] Yevgeniy A. Asarin, Individual random signals: a complexity approach, Ph.D. thesis, Moscow State Univer-sity, Moscow, Russia, 1988, In Russian. 1.1, 1.3

[2] Patrick Billingsley, Convergence of probability measures, Wiley, 1968, First edition. Second edition 1999.A.2, A.6, B.6

[3] Vasco Brattka, Computability over topological structures, Computability and Models (S. Barry Cooper andSergey S. Goncharov, eds.), Kluwer Academic Publishers, New York, 2003, pp. 93–136. 1.4

[4] Vasco Brattka and Gero Presser, Computability on subsets of metric spaces, Theoretical Computer Science305 (2003), 43–76. C.3

[5] Gregory J. Chaitin, A theory of program size formally identical to information theory, J. Assoc. Comput.Mach. 22 (1975), 329–340. 1.3, 4.1.2

[6] Peter Gacs, On the symmetry of algorithmic information, Soviet Math. Dokl. 15 (1974), 1477–1780. 1.3[7] , Exact expressions for some randomness tests, Z. Math. Log. Grdl. M. 26 (1980), 385–394, Short

version: Springer Lecture Notes in Computer Science 67 (1979) 124-131. 1.3, 4.2[8] , On the relation between descriptional complexity and algorithmic probability, Theoretical Computer

Science 22 (1983), 71–93, Short version: Proc. 22nd IEEE FOCS (1981) 296-303. 1.3, 4.4[9] , The Boltzmann entropy and randomness tests, Proceedings of the Workshop on Physics and Com-

putation, IEEE Computer Society Press, 1994, Extended abstract., pp. 209–216. 1.3, 4.9[10] Peter Hertling and Klaus Weihrauch, Randomness spaces, Proc. of ICALP’98, Lecture Notes in Computer

Science, vol. 1443, Springer, 1998, pp. 796–807. 1.1, 1.3, 1.4, 3.2[11] Leonid A. Levin, On the notion of a random sequence, Soviet Math. Dokl. 14 (1973), no. 5, 1413–1416.

1.1, 1.2, 1.3, 4.4[12] , Uniform tests of randomness, Soviet Math. Dokl. 17 (1976), no. 2, 337–340. 1.1, 2, 1.3, 5.5, 7[13] , Randomness conservation inequalities: Information and independence in mathematical theories,

Information and Control 61 (1984), no. 1, 15–37. 1.1, 1, 2, 1.3, 5.5, 7[14] Ming Li and Paul M. B. Vitanyi, Introduction to Kolmogorov complexity and its applications (second edition),

Springer Verlag, New York, 1997. 1.3, 4.1.1, 4.1, 6.2[15] Per Martin-Lof, The definition of random sequences, Information and Control 9 (1966), 602–619. 1.1, 1.2,

1.3[16] David Pollard, A user’s guide to measure-theoretic probability, Cambridge Series in Statistical and Proba-

bilistic Mathematics, Cambridge University Press, Cambridge, U.K., 2001. 1.4, B, B.5, B.6.1[17] Claus Peter Schnorr, Process complexity and effective random tests, J. Comput. Syst. Sci 7 (1973), 376. 1.3,

4.4[18] Raymond J. Solomonoff, A formal theory of inductive inference i, Information and Control 7 (1964), 1–22.

5.7[19] , Complexity-based induction systems: Comparisons and convergence theorems, IEEE Transactions on

Information Theory IT-24 (1978), no. 4, 422–432. 5.7[20] Edwin H. Spanier, Algebraic topology, Mc Graw-Hill, New York, 1971. 5.1[21] Volker Strassen, The existence of probability measures with given marginals, Annals of Mathematical Statis-

tics 36 (1965), 423–439. B.16[22] Flemming Topsoe, Topology and measure, Lecture Notes in Mathematics, vol. 133, Springer Verlag, Berlin,

1970. 4.1.1

40 PETER GACS

[23] Volodimir G. Vovk and V. V. Vyugin, On the empirical validity of the Bayesian method, Journal of the RoyalStatistical Society B 55 (1993), no. 1, 253–266. 6.2

[24] Klaus Weihrauch, Computable analysis, Springer, 2000. 1.2, 1.4, C, C.1, 3, 4, C.2.2[25] Woitiech H. Zurek, Algorithmic randomness and physical entropy, Physical Review A 40 (1989), no. 8,

4731–4751. 4.9[26] Alexander K. Zvonkin and Leonid A. Levin, The complexity of finite objects and the development of the

concepts of information and randomness by means of the theory of algorithms, Russian Math. Surveys 25(1970), no. 6, 83–124. 4.1.2, 4.4

BOSTON UNIVERSITY

E-mail address: [email protected]

Uniform test of algorithmic randomness over a general space

Documents

Transcript of Uniform test of algorithmic randomness over a general space