Perturbation method for determining the group of invariance of hierarchical models

45
. . . On a perturbation method for determining group of invariance of hierarchical models Tomonari SEI 1 Satoshi AOKI 2 Akimichi TAKEMURA 1 1 University of Tokyo 2 Kagoshima University Dec. 16, 2008 T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 1 / 45

Transcript of Perturbation method for determining the group of invariance of hierarchical models

.

.

. ..

.

.

On a perturbation method for determininggroup of invariance of hierarchical models

Tomonari SEI1 Satoshi AOKI2 Akimichi TAKEMURA1

1University of Tokyo

2Kagoshima University

Dec. 16, 2008

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 1 / 45

Table of contents

.. .1 Introduction

.. .2 Group of invariance of hierarchical models

.. .3 Wreath product

.. .4 Main theorem

.. .5 Perturbation method (for a proof of theorems)

.. .6 Summary

For details: Sei, Aoki & Takemura (2008) arXiv:0808.2725v1

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 2 / 45

Introduction

Introduction

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 3 / 45

Introduction

Introduction

Consider a 2 × 3 contingency table:p11 p12 p13

p21 p22 p23

pij is probability and xij is observation.

Consider the independence model pij = aibj.

The set of sufficient statistics is {xi+} and {x+j}. (xi+ =∑

j xij etc.)

A minimal Markov basis for this model is, for example,

M1 =1 −1

−1 1, M2 =

1 −1−1 1

.

M2 is obtained by permutation of columns 2 and 3 of M1.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 4 / 45

Introduction

Introduction

Similarly, for I × J table, a Markov basis is obtained by

M1 =

1 −1 · · ·−1 1 · · ·

· · ·...

......

. . .

and its permutations of rows and columns. (i.e. generated by a singlemove if symmetry is considered.)

However algorithms obtaining Grobner bases and Markov bases donot utilize the symmetry at present.

For a larger table, algorithms just take longer.

If I = J, we can also consider a permutation of axes xij ↔ xji.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 5 / 45

Introduction

Introduction

Example : three-way tables with fixed two-dimensional marginals(no-three-factor-interaction model)

Table: Number of elements in the unique minimal Markov basis and reducedGrobner basis for 3 × 3 × K, K ≤ 7.

K 3 4 5 6 7# unique minimal MB 81 450 2670 10665 31815# reduced GB 110 622 3240 12085 34790# orbits in the MB 4 5 6 6 6

The number of orbits remains 6 for all K ≥ 5 (Aoki and Takemura(2003)): “Markov complexity”

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 6 / 45

Introduction

Introduction

Our problem is... [Aoki & Takemura (2008) J.Symb.Comp.]

Determine all permutations of cells that preserve a givenconfiguration determining a toric ideal.In particular, for the configuration associated withhierarchical models of contingency tables.

We call the set of allowable permutations the group of invariance.

An interesting example is Sudoku invariance group (explained later)

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 7 / 45

Introduction

Introduction

Is our definition of the group of invariance useful?

Why look at the largest symmetry?

A configuration usually has an obvious symmetry. However it isdifficult to prove that it is indeed the largest. (i.e. the is no moresymmetry than the obvious symmetry.)

Mathematically it makes sense to look at the largest symmetry.

The result of this talk shows that our definition is useful, leading to amathematically meaningful questions and results.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 8 / 45

Group of invariance of hierarchical models

Group of invariance of hierarchical models

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 9 / 45

Group of invariance of hierarchical models

Definition of hierarchical models (1/2)

m: number of factors of contingency tables.

Ij = [Ij] = {1, . . . , Ij}: the set of levels of the j-th factor(j = 1, . . . , m).

I =∏m

j=1 Ij: the set of cells.

∆: an abstract simplicial complex of [m] = {1, . . . , m}.

red∆: maximal simplices in ∆ w.r.t. inclusion order.———————————————————

For example, if m = 3 and ∆ = {∅, {1}, {2}, {3}, {1, 2}, {2, 3}},then red∆ = {{1, 2}, {2, 3}}.

1

2

3

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 10 / 45

Group of invariance of hierarchical models

Definition of hierarchical models (2/2)

.Definition..

.

. ..

.

.

The hierarchical model specified by ∆ is the set of probability distributions(pi)i∈I written in a log-linear form

log pi =∑

D∈red∆

φD(iD), iD = (ij)j∈D.

For example, let m = 3 and red∆ = {{1, 2}, {2, 3}}.

1

2

3

Then the hierarchical model is, by putting (i, j, k) = (i1, i2, i3),

log pijk = αij + βjk.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 11 / 45

Group of invariance of hierarchical models

Definition of group of invariance (1/2)

Let SI be the set of all permutations on the cells I.

Any g ∈ SI acts on the linear space QI (i.e. tables) by

(gθ)i = θg−1(i) for θ ∈ QI.

Now recall that the hierarchical model is log pi =∑

D∈∆ φD(iD).

Define a linear subspace of QI, the range of parameters, by

r(∆) =

{ ∑D∈red∆

φD(iD)

∣∣∣∣∣ φD ∈ QID

}.

(same as the row space of the configuration.)

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 12 / 45

Group of invariance of hierarchical models

Definition of group of invariance (2/2)

.Definition..

.

. ..

.

.

The group of invariance Gr(∆) is the setwise stabilizer of r(∆):

Gr(∆) := {g ∈ SI | g(r(∆)) = r(∆)} .

Remark that the subspace r(∆) is dual to the kernel of the sufficientstatistics in QI, which contains Markov bases.

The group of invariance of the kernel is the same as Gr(∆).

It is a linear-algebraic notion.

It can be easily seen that Gr(∆) also acts on the set of fibers.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 13 / 45

Group of invariance of hierarchical models

Some known results

For the completely independent model (red∆ = {{1}, . . . , {m}}),the group of invariance was derived by [Aoki & Takemura, JSC2008].

In their paper, cases of I × J × K no three-factor model and theHardy-Weinberg model (not hierarchical) were also solved.

[Bailey et al. Proc. London Math. Soc. 1983] studied a relatedconcept for design of experiments.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 14 / 45

Group of invariance of hierarchical models

Example

Consider the two-way independence model

log pij = αi + βj.

If |I1| 6= |I2|, the the group of invariance is known to beGr(∆) = SI1 × SI2 , the direct product of the row and columnpermutations.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 15 / 45

Wreath product

Wreath product

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 16 / 45

Wreath product

Wreath product

Wreath product naturally arises in our problem.

In general, the wreath product of two group actions (G, X) and(H, Y) is formally defined by

HwrG = G × HX (acts on X × Y),

where HX denotes all functions from X to H.

Let us consider a tableA B CD E F

.

Let SI1 and SI2 be the permutation group of rows and columns, resp.

Then, the wreath product SI2wrSI1 is generated from

Permutation of rows, and

Permutation of columns in each row:A B CF D E

.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 17 / 45

Wreath product

Why the wreath product arises?

Now we explain why the wreath product arises in our problem.

Consider the row-effect-only model: log pij = αi.

More visually, consider a tableα1 α1 α1

α2 α2 α2.

Then the wreath product SI2wrSI1 preserves the range therow-effect-only model.

Similarly SI1wrSI2 preserves the range of the column-effect-onlymodel.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 18 / 45

Wreath product

Since the range of the independence model is the vector sum of theabove two models,

(SI1wrSI2) ∩ (SI2wrSI1)

is a subgroup of of Gr(∆) for the two-way independence model.

Equality holds if |I1| 6= |I2|.Furthermore

(SI1wrSI2) ∩ (SI2wrSI1) = SI1 × SI2.

(even when |I1| = |I2|.)

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 19 / 45

Main theorem

Main theorem

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 20 / 45

Main theorem

Main Theorem 1

.Theorem (Main Theorem 1)..

.

. ..

.

.

Under a weak assumption on I, the group of invariance is given by

Gr(∆) =∩

D∈red(∆)

(SIDCwrSID),

where SID is the set of all permutations on ID.

Our assumption is

|ID| are mutually distinct, and

Ij = |I{j}| > 2 except for at most one j ∈ {1, . . . , m}.

Even if the assumption fails, the inclusion Gr(∆) ⊃∩

D(SIDCwrSID) isalways true.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 21 / 45

Main theorem

Generalized wreath product

We want more explicit expression of the right-hand side.

Wreath product is generalized for an indexed set of group actions(Gρ, Xρ)ρ∈P, where P is a poset.

The generalized wreath product is defined by∏

ρ∈P(Gρ)XA(ρ)

[Wells1976], where A(ρ) denotes the ancestor set of ρ.

The generalized version is useful for our problem, because it allows usto sample a random element of Gr(∆).

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 22 / 45

Main theorem

Main Theorem 2

.Theorem (Main theorem 2; can be deduced from Bailey et al. 1983)..

.

. ..

.

.

The intersection is rewritten as a generalized wreath product:∩D∈red(∆)

(SIDCwrSID) =∏ρ∈P

(SIρ)IA(ρ),

where the poset P is defined as follows.

For each i ∈ [m], define (red∆)(i) := {D ∈ red∆ | D 3 i}.

We write i ∼ j if (red∆)(i) = (red∆)(j).

P = [m]/∼, the quotient space.

Define a partial order ρ ≤ ρ′ in P if (red∆)(ρ) ⊂ (red∆)(ρ′).

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 23 / 45

Main theorem

Example

Let m = 6 and red∆ = {{1, 4, 5}, {2, 5, 6}, {3, 4, 6}}.

1

2 3

45

6

From Theorem 1 and 2, the group of invariance is

Gr(∆) = (SI1)I{4,5} × (SI2)

I{5,6} × (SI3)I{4,6} × SI4 × SI5 × SI6

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 24 / 45

Main theorem

Sudoku contingency table (1/4)

Sudoku is a puzzle on 9 × 9 contingency table.

Each row, column and 3 × 3 block contains the 9 digits exactly once.

For a sudoku solution, we define a 34 × 9 table (xijklc) by

xijklc = 1 if

kl

ij c

, and 0 otherwise.

i:band, j:row, k:stack, l:column, c:color.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 25 / 45

Main theorem

Sudoku contingency table (2/4)

We add an additional factor (dimension) corresponding to the color.

For example, if c = 3 then we put a single frequency on the thirdlevel of the factor c.

Then a sudoku solution has a one-to-one correspondence with (xijklc).

Adding an additional dimension corresponds to “Lawrence lifting”.

Ordinary Lawrence lifting has just two levels in the additionaldimension. Our case is a higher-order Lawrence lifting with 9 levels.

The present treatment of Sudoku is different from the approach foundin David A. Cox’s tutorial (2007) on Grobner basis approach toSudoku. (He does not consider Lawrence lifting.)

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 26 / 45

Main theorem

Sudoku contingency table (3/4)

A Sudoku solution satisfies xij++c = xi+k+c = x++klc = xijkl+ = 1.

This is a fiber of the model ∆, where

red∆ = {{1, 2, 5}, {1, 3, 5}, {3, 4, 5}, {1, 2, 3, 4}}.

1

2

3

4

5

We call it the Sudoku model.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 27 / 45

Main theorem

Sudoku contingency table (4/4)

Unfortunately, the Sudoku model does not satisfy the assumption inour main theorem because

|I{1,2,5}| = |I{1,3,5}| = |I{3,4,5}| = |I{1,2,3,4}| = 81.

But, by the main theorem, we can deduce that Gr(∆) contains

SI1 × (SI2)I1 × SI3 × (SI4)

I3 × SI5.

This group consists ofpermutation of bands,permutation of rows in each band,permutation of stacks,permutation of columns in each stack,permutation of digits.

But the transposition (i, j, k, l) ↔ (k, l, i, j) is not included here.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 28 / 45

Main theorem

How about classical Latin squares?

1 2 32 3 13 1 2

(Just one block of Sudoku).

By Lawrence lifting it corresponds to a 3 × 3 × 3 table.

The model is the no-three-factor-interaction model.

A Latin square is an element of the fiber with all marginals equal to 1.

Researchers are interested in “non-isomorphic” Latin squares, i.e., inorbits of obvious group actions.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 29 / 45

Perturbation method (for a proof of theorems)

Perturbation method (for a proof of theorems)

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 30 / 45

Perturbation method (for a proof of theorems)

Outline of the proof (simplest case)

Consider 2 × 3 independence model again.

The group of invariance is SI1 × SI2 .i.e. permutation of rows and columns, resp.

[Proof] Let θ =0 0 01 1 1

∈ r(∆) and g ∈ Gr(∆).

Since gθ must be0 0 01 1 1

or1 1 10 0 0

, we have g ∈ SI2wrSI1 .

Similarly, let θ =0 1 20 1 2

∈ r(∆) and g ∈ Gr(∆).

Then we can show that g ∈ SI1wrSI2 .

As already mentioned, (SI2wrSI1) ∩ (SI1wrSI2) = SI1 × SI2 .

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 31 / 45

Perturbation method (for a proof of theorems)

Outline of the proof (not trivial case 1/2)

Now we proceed to 2 × 4 independence table.

Let θ =1 1 1 10 0 0 0

.

Then gθ is1 1 1 10 0 0 0

or0 0 0 01 1 1 1

...?

→ No! because gθ can be1 1 0 01 1 0 0

(∈ model)

It needs a perturbation method. Now we explain it.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 32 / 45

Perturbation method (for a proof of theorems)

Outline of the proof (not trivial case 2/2)

Now we prove the 2 × 4 case by using the perturbation method.

Let θ =0 1 2 4

100 101 102 104. Then θ ∈ r(∆). “generic”

Put φ := gθ. Then φ ∈ r(∆).

We can write φij = aij +bij, where aij ∈ {0, 100}, bij ∈ {0, 1, 2, 4}.

Since φ11 + φ22 − φ12 − φ21 = 0, we havea11 + a22 − a12 − a21 = 0 and b11 + b22 − b12 − b21 = 0.

By careful consideration, we obtaina11 = a12 = a13 = a14 ∈ {0, 100} anda21 = a22 = a23 = a24 ∈ {0, 100}. Hence g ∈ S4wrS2.

Similarly, from b1j = b2j, we have g ∈ S2wrS4 and so g ∈ S2 × S4.

These observations are the perturbation method!

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 33 / 45

Perturbation method (for a proof of theorems)

Outline of the proof

Preliminary 1 (difference operator)

Let ∂j be j-th difference operator defined by

(∂jθ)i = θ(i1,...,ij,...,im) − θ(i1,...,1,...,im).

For any E ⊂ [m], define ∂E =∏

j∈E ∂j.

Fact: Let ηF(i) depend only on iF. Then ∂EηF = 0 if E 6⊂ F.

Fact: Let ∆′ ⊂ ∆ be another simplicial complex. Then

r(∆′) = r(∆) ∩

∩E∈∆\∆′

ker(∂E)

.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 34 / 45

Perturbation method (for a proof of theorems)

Outline of the proof

Preliminary 2 (perturbation method)

Construct a generic table θ = {θi}.

θi =∑

D∈red∆

φD(i), where φD depends only on iD,

in such a way that the following condition holds:

Fix (sufficiently large) positive integer b.If a quantity z satisfies z =

∑D

∑i cD,iφD(i) with

cD,i ∈ {−b, . . . , b}, then the coefficients {cD,i} areuniquely determined from z.

Perturbation lemma: There exists such a table θ. (See the nextpage.)

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 35 / 45

Perturbation method (for a proof of theorems)

Outline of the proof

.Lemma (Perturbation lemma)..

.

. ..

.

.

...1 Let n, b be positive integers. There exist n positive integers (Yl)nl=1

such that a map

{−b, · · · , b}n 3 (cl)nl=1 7→

n∑l=1

clYl ∈ Z

is injective.

...2 Furthermore, we can choose n vectors Y(j) = (Y(j)l ) such that they

span Qn and each of them satisfies the above condition.

.Proof..... ..

.

.Use van der Monde determinant.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 36 / 45

Perturbation method (for a proof of theorems)

Outline of the proof

Now we prove Theorem 1: Gr(∆) =∩

D∈red∆(SIDCwrSID).

We employ induction on K = |red∆|.For the case K = 1, it is essentially the same as the row-factor-onlymodel log pij = αi for 2-way tables. And therefore we haveGr(∆) = SIDCwrSID .

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 37 / 45

Perturbation method (for a proof of theorems)

Outline of the proof

We next consider K ≥ 2 and choose D ∈ red∆ such that|ID| = minF∈red∆ |IF|.Let θ be a generic element in r(∆)

Fix g ∈ Gr(∆). Then gθ ∈ r(∆).

Define a simplicial complex ∆′ by red∆′ = (red∆) \ D.

Let E ∈ ∆ \ ∆′. Then (∂E(gθ))i depends only on iD because otherterms vanish.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 38 / 45

Perturbation method (for a proof of theorems)

Outline of the proof

However, the quantity (∂E(gθ))i is a linear combination of {φF(j)}.Its coefficients are uniquely determined by (∂E(gθ))i (perturbationlemma).

The above two facts imply that (∂E(gθ))i is a linear combinationonly of {φD(j)}.

Now by the second part of the perturbation lemma, any table θ inr(∆′) is spanned by generic tables in r(∆).

Therefore ∂E(gθ) = 0 and

gθ ∈ r(∆) ∩( ∩

E∈∆′

ker(∂E)

)= r(∆′).

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 39 / 45

Perturbation method (for a proof of theorems)

Outline of the proof

Hence g maps r(∆′) into itself.

From the assumption of induction, we have

g ∈∩

F∈red∆′

(SIFCwrSIF) =∩

F∈red∆,F 6=D

(SIFCwrSIF).

It remains to prove g ∈ SIDCwrSID , which is not too difficult.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 40 / 45

Summary

Summary and future works

We have proved that

The group of invariance is determined (Theorem 1).

The group is rewritten by generalized wreath product (Theorem 2).

It enables us to draw a random sample from the group of invariance.

(A subset of) Sudoku invariance group is automatically obtained.

Our future works are

Remove the assumption of Theorem 1.

In particular, give a natural conjecture for this problem.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 41 / 45

Summary

Summary and future works

Suggestions from sudoku example

Sudoku suggests a new connection between Markov bases(Diaconis-Stumbles school) and experimental design (Pistone-Wynnschool) of algebraic statistics

In experimental design, Grobner bases have been used to clarifyconfounding and identification problems given a particular (typicallynon-regular) design.

A large part of literature on experimental design looks at finding agood design for a given set of constraints (number of runs, number oftreatments etc.)

It is very similar to finding a good “sudoku puzzle”.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 42 / 45

Summary

Summary and future works

But we saw that a sudoku puzzle is an element of a particular fiber(with all marginals = 1) of a hierarchical model.

If we know the MB for the fiber, we can construct a Markov chainover the set of sudoku solutions (a paper by Fontana and Rogantinein Pistone volume.)

For 2 × 2 × 2 × 2(×4) sudoku, Hisayuki Hara just obtained the MB.

We can now walk around 2 × 2 × 2 × 2 Sudoku solutions!

Standard 3 × 3 × 3 × 3 sudoku is a big challenge.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 43 / 45

Summary

Bibliography

S. Aoki & A. Takemura (2003). Minimal basis for a connected Markov chain over3 × 3 × K contingency tables with fixed two-dimensional marginals. Aust. N. Z. J.Stardust., 45 (2), 229–249.

S. Aoki & A. Takemura (2008). The largest group of invariance for Markov basesand toric ideals. J. Symbolic Com-put., 43 (5), 342–358.

R. A. Bailey, C. E. Praeger, C. A. Rowley & T. P. Speed (1983). Generalizedwreath products of permutation groups. Proc. London Math. Soc., 47 (3), 69–82.

David A. Cox. (2007). Grobner basis tutorial. Part II. A sampler from recentdevelopments. (available from Cox’s homepage).

R. Fontana & M.-P. Rogantin (2008). Indicator function and sudoku designs, inpress.

S. L. Lauritzen (1996). Graphical Models, Oxford University Press, Oxford.

H. Monod & R. A. Bailey (1992). Pseudofactors: normal use to improve designand facilitate analysis, Appl. Statist., 41 (2), 317–336.

E. Russel & F. Jarvis (2006), Mathematics of Sudoku II, Preprint.

C. Wells (1976). Some applications of the wreath product construction, Amer.Math. Monthly, 83 (5), 317–338.

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 44 / 45

Summary

Thank you for your attention!

T. SEI (Univ. Tokyo) Perturbation method Dec. 16, 2008 45 / 45