Yedidia.pdf - MIT Center for Bits and Atoms

23
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Constructing Free Energy Approximations and Generalized Belief Propagation Algorithms Jonathan S. Yedidia, William T. Freeman, and Yair Weiss TR-2002-35 August 2002 Abstract Important inference problems in statistical physics, computer vision, error-correcting coding theory, and artificial intelligence can all be reformulated as the computation of marginal probabilities on factor graphs. The belief propagation (BP) algorithm is an efficient way to solve these problems that is exact when the factor graph is a tree, but only approximate when the factor graph has cycles. We show that BP fixed points correspond to the stationary points of the Bethe ap- proximation to the free energy for a factor graph. We explain how to obtain region- based free energy approximations that improve the Bethe approximation, and corre- sponding generalized belief propagation (GBP) algorithms. We emphasize the conditions a free energy approximation must satisfy in order to be a “valid” approximation. We describe the relationship between four different methods that can be used to generate valid approximations: the “Bethe method,” the “junction graph method,” the “cluster variation method,” and the “region graph method.” The region graph method is the most general of these methods, and it subsumes all the other methods. Region graphs also provide the natural graphical setting for GBP algorithms. We explain how to obtain three different versions of GBP algorithms and show that their fixed points will always correspond to stationary points of the region graph approximation to the free energy. We also show that the region graph approxi- mation is exact when the region graph has no cycles. This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved.

Transcript of Yedidia.pdf - MIT Center for Bits and Atoms

MITSUBISHI ELECTRIC RESEARCH LABORATORIEShttp://www.merl.com

Constructing Free Energy Approximationsand Generalized Belief Propagation

Algorithms

Jonathan S. Yedidia, William T. Freeman, and Yair Weiss

TR-2002-35 August 2002

Abstract

Important inference problems in statistical physics, computer vision, error-correctingcoding theory, and artificial intelligence can all be reformulated as the computation ofmarginal probabilities on factor graphs. The belief propagation (BP) algorithm is anefficient way to solve these problems that is exact when the factor graph is a tree, butonly approximate when the factor graph has cycles.

We show that BP fixed points correspond to the stationary points of the Bethe ap-proximation to the free energy for a factor graph. We explain how to obtain region-based free energy approximations that improve the Bethe approximation, and corre-sponding generalized belief propagation (GBP) algorithms.

We emphasize the conditions a free energy approximation must satisfy in order to bea “valid” approximation. We describe the relationship between four different methodsthat can be used to generate valid approximations: the “Bethe method,” the “junctiongraph method,” the “cluster variation method,” and the “region graph method.”

The region graph method is the most general of these methods, and it subsumes allthe other methods. Region graphs also provide the natural graphical setting for GBPalgorithms. We explain how to obtain three different versions of GBP algorithms andshow that their fixed points will always correspond to stationary points of the regiongraph approximation to the free energy. We also show that the region graph approxi-mation is exact when the region graph has no cycles.

This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy inwhole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all suchwhole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric ResearchLaboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portionsof the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with paymentof fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved.

Copyright c© Mitsubishi Electric Research Laboratories, Inc., 2002201 Broadway, Cambridge, Massachusetts 02139

Publication History:–

1. First printing, TR-2002-35, August 2002

1

ConstructingFreeEnergy ApproximationsandGeneralizedBelief PropagationAlgorithms

JonathanS.Yedidia�, William T. Freeman� , andYair Weiss �

Abstract— Important inferenceproblemsin statistical physics,computer vision, error-correcting coding theory, and artificial in-telligencecan all be reformulated asthe computation of marginalprobabilities on factor graphs. The belief propagation(BP) algo-rithm is an efficient way to solve theseproblemsthat is exactwhenthe factor graph is a tr ee,but only approximate when the factorgraph hascycles.

We show that BP fixed points correspond to the stationarypoints of the Betheapproximation to the fr eeenergy for a factorgraph. Weexplainhow to obtain region-basedfr eeenergy approx-imations that impr ove the Betheapproximation, and correspond-ing generalizedbelief propagation(GBP) algorithms.

Weemphasizethe conditionsa fr eeenergy approximation mustsatisfy in order to be a “v alid” approximation. We describetherelationship betweenfour differ ent methods that can be used togeneratevalid approximations: the “Bethe method,” the “junctiongraph method,” the “cluster variation method,” and the “r egiongraph method.”

The regiongraph method is the most generalof thesemethods,and it subsumesall the other methods.Regiongraphsalsoprovidethe natural graphical settingfor GBP algorithms. Weexplain howto obtain thr eediffer entversionsof GBP algorithms and show thattheir fixed points will alwayscorrespondto stationary points of theregiongraph approximation to the fr eeenergy. Wealsoshow thatthe region graph approximation is exact when the region graphhasno cycles.

I . INTRODUCTION

Problemsinvolving probabilistic inferenceusing graphicalmodelsare importantin a wide variety of disciplines,includ-ing statisticalphysics,signalprocessing,artificial intelligence,and digital communications[1], [2]. Message-passingalgo-rithmsareapracticalandpowerful wayto solvesuchproblems.The centrality of such problemsand the utility of message-passingalgorithmsfor solving themis an explanationfor thefactthatequivalentor veryclosely-relatedmessage-passingal-gorithmshave now beenindependentlyinventedmany times.They arewell-known by nameslike the forward-backwardal-gorithmfor HiddenMarkov Models[3], theViterbi algorithm[4], [5], Gallager’s sum-productalgorithmfor decodinglow-densityparitycheckcodes[6], the“turbo-decoding”algorithm[7], [8], Pearl’s “belief propagation”algorithmfor inferenceonBayesiannetworks [9], the“Kalman filter” for signalprocess-ing [10], [11], andthe“transfermatrix” approachin statisticalmechanics[12].�

MERL CambridgeResearchLab, 201Broadway, 8th Floor, CambridgeMA [email protected]

ElectricalEngineeringandComputerScience,MIT Artificial IntelligenceLaboratory, NE43a,CambridgeMA [email protected]

Schoolof ComputerScienceandEngineering,TheHebrew UniversityofJerusalem,91904Jerusalem,[email protected]

In this list of “standard”belief propagation(BP) algorithms,we have blurreda distinctionbetweentwo differentobjectivesthat onemight have, andthe slightly differentalgorithmsthatresult.Sometimes,onemightbeinterestedin obtainingtheoneglobalstateof a systemthat is mostprobableor otherwiseop-timal. In othercases,one is interestedin obtainingmarginalprobabilitiesfor somesubsetof thenodesof thesystem,givenevidenceaboutothernodesin thesystem.In thispaper, wewillfocusexclusively on this latterproblem.

In all standardBP algorithms,messagesaresentfrom onenodein a graphicalmodel to a neighboringnode. The algo-rithms are exact when the graphicalmodel is free of cycles.Thus,a commonapproachfor dealingwith graphicalmodelsthatdohavecyclesis to try to convertthemto equivalentcycle-free graphicalmodels,and then to usethe standardBP algo-rithm [13]. In somecases,this is possible,but for many othercasesof practicalinterest,suchanapproachis intractable,andonemustsettlefor approximatemethods.

Fortunately, thestandardBPalgorithmsarewell-defined,andoftengive surprisinglygoodapproximateresults,for graphicalmodelswith cycles. Nevertheless,in suchcasestherearenoguarantees,andsometimestheresultsarequitepoor, or theal-gorithmfails to give any resultat all becauseit doesnot con-verge[14]. Two majorgoalsof thispaperareto explainwhy thestandardBP algorithmoften worksso well even for graphicalmodelswith cycles,andto usethat understandingto developimprovedalgorithmsfor caseswhenit doesnotwork well.

Theclassof algorithmsthatwe will describe,which we callgeneralizedbelief propagation (GBP)algorithms,all have thecharacteristicthatsetsor regionsof nodeswill sendmessagestootherregionsof nodes.Theregionsof nodesthatcommunicatewith eachothercanbe easilyvisualizedin termsof a regiongraph. The standardBP algorithmis a specialcaseof a GBPalgorithm,with aparticularchoiceof regions.Differentchoicesof region graphswill give differentGBP algorithms,and theusercanchooseto tradeoff complexity for accuracy.

In practice,GBP algorithmscanoften dramaticallyoutper-form BP algorithmsin termsof either their accuracy or theirconvergenceproperties,for minimalcomputationalcost,if onemakesan intelligentchoiceof regions. However, how to opti-mally chooseregionsfor aGBPalgorithmremainsat thispointmoreanart thana science.We hopethatthis papercontributesto thisproblemby delineatingwhatclassesof constructionsarelikely to givegoodresults.

We shall give a theoreticaljustificationof GBP algorithmsby showing thattheir fixedpointsareidenticalto thestationarypointsof a region-basedfreeenergy, whichis anapproximationto anotherfreeenergy thatcanbe justifiedby a rigorousvari-

ationalprinciple. The first specializedexamplesof suchfreeenergies� wereintroducedlong agoin thephysicsliteraturebyby Bethe[15] andKikuchi [16]. For theimportantspecialcaseof thestandardBP algorithm,we show that its fixedpointsarethesameasthestationarypointsof theBethefreeenergy, thusestablishingan importantbasiclink betweena classicalalgo-rithm anda classicalapproximationfrom physics.

One must be careful in constructinga region graphin or-der to ensurethat the resultingapproximationsare accurate.In our original work introducingGBPalgorithms[17], we fo-cusedon a sub-classof GBP algorithmsthat wereequivalentto freeenergy approximationsbasedon Kikuchi’sclustervari-ation method[16], [18], [19]. We shall show that this methodis only oneof a variety of methodsto generateregion graphsandtheir correspondingfreeenergiesandmessage-passingal-gorithms.

In our original work, we alsofocusedon graphicalmodelsdefinedin termsof pair-wise or higher-orderMarkov randomfields(MRFs). In this paper, we shall insteadfocuson graphi-calmodelsdefinedin termsof factorgraphs.All ourresultscanbe re-expressedfor othergraphicalmodelswithout difficulty.Usingfactorgraphshascertainpracticaladvantages–inpartic-ularwecanrefertheneophytereaderto theexcellentreview byKschischanget.al. [20]. That review explainstheequivalenceto factorgraphsof othergraphicalmodelssuchasBayesiannet-works, Tannergraphsfor error-correctingcodes,or pair-wiseMRFs, andexplains the standardBP algorithmin its variousguisesasanalgorithmthatoperateson factorgraphs.

Otherformulationsof thestandardBPalgorithmprovidedif-ferentinsights,andwe refer the interestedreaderto a numberof importantrecentpapersthatexploit alternative views of theBPalgorithm[21], [22], [23], [24], [25], [26].

After our original work which introducedregion-basedfreeenergies and GBP algorithmsbasedon the cluster variationmethod,Aji and McEliece introduceda classof free energyapproximationsandGBPalgorithmsbasedon junctiongraphs[27]. Oneof thegoalsof thispaperis to unify ourpreviousap-proachwith theonethatAji andMcEliecepresented.McElieceandYildirim have independentlydevelopeda unifiedapproachto belief propagationwhich is largely equivalentto our regiongraphapproach,and we recommendtheir elegant exposition[28]. PakzadandAnantharamhavealsorecentlypresentedpar-allel ideasin a brief paper[29].

Theoutlinefor therestof thepaperis asfollows. In sectionII, we review andintroduceour notationfor factorgraphsandthestandardBPalgorithm.In sectionsIII andIV, we introduceandexplain thephysicalintuition behindvariationalfreeener-giesandregion-basedapproximationsto them.In sectionV, weconsidertheBetheMethodwhichcanbeusedto obtainparticu-larly simpleregion-basedfreeenergy approximations.We alsoshow in that sectionthat the standardBP algorithmhasfixedpointsequivalentto thestationarypointsof theBetheapprox-imationto the freeenergy. In sectionVI, we developa theorythat canbe usedto determinewhich region-basedfree energyapproximationswill be likely to give accurateresults. In par-ticular, we describetheRegion GraphMethod, a very generalmethodfor generatingvalid region graphsandtheir associatedfreeenergies.In sectionVII, weintroduceGBPalgorithms,and

show thatthereareactuallyavarietyof waysto defineGBPal-gorithmsfor any givenregiongraph,all of whichhaveidenticalfixedpoints.Wefocusononeparticulartypeof GBPalgorithm,whichwecall theparent-to-child algorithm.Finally, in sectionVIII, we give a detailedexampleof the implementationof theparent-to-childGBPalgorithm.

We have chosento put an unusuallylarge amountof mate-rial in the appendicesof this paper. We did this in an attemptto help the readergraspthe fundamentalconceptsbehindourwork andnot losesight of the forestbecauseof all the trees.Theappendicesdescribea varietyof othermethodsto generateregiongraphsandGBPalgorithmswhichcouldeasilyprovetobeasimportantin practiceasthemethodsdescribedin themaintext.

I I . FACTOR GRAPHS AND BELIEF PROPAGATION

Let ��� ��������������������� be a set of � discrete-valued ran-dom variablesand let ��� representthe possiblerealizationsof random variable ��� . We considerthe joint probabilitymassfunction � �!� #" � ��� �#" � � ����������� �$" � �&% , whichwe shall write more succintly as � �!' % , where ' standsfor�� ��� � ���������� � � . We supposethat � �(' % factorsinto a productof functions.Thatis, wesupposethat � �!' % hastheverygeneralform �)�(' %*",+-/.1032 0 �!' 0 % � (1)

Here 4 is an index labeling 5 functions 2�6 � 2�7 � 2�8 �������� 2:9 ,wherethefunction 2 0 �(' 0 % hasarguments' 0 thataresomesub-setof �;�< �����=�����������>��� . - is a normalizationconstant.

A factor graph [20] is a bipartite graphthat expressesthefactorizationstructurein equation(1). A factor graphhasavariablenode(whichwedraw asa circle) for eachvariable� � ,a factornode(whichwedraw asasquare)for eachfunction 2 0 ,with anedgeconnectingvariablenode? to factornode 4 if andonly if � � is anargumentof 2 0 . (Weshallalwaysindex variablenodeswith lettersstartingwith ? , andfactornodeswith lettersstartingwith 4 .) As anexample,thefactorgraphcorrespondingto� �!�<:���>�:���>@=���BA %C" +- 2 6D�(�E������ %�2 7F�!���G���>@=���BA %�2 8H�!�BA % (2)

in shown in figure1.

Fig. 1. A small factor graphrepresentingthe joint probability distributionIKJ�LKM1N!L=ON!L:PNQL:R1S�T MUWVYX J�LKMZN!L=O1S V1[ J�L=ON!L:P;N\L:R1S V1] J�L:R^S`_Weshallfocusontheproblemof computingmarginalproba-

bility distributions.We call thepossiblevaluesof � � thestates

2

of variablenode ? . If a is a set of variablenodes,we use'cb to denoted

the statesof the correspondingvariablenodes.�<bc�!'cb % will denotethemarginal probability functionobtainedby marginalizing � �!' % ontothesetof variablenodesa , i.e.,�<bc�!'cb %*"feg=h�gGi � �(' % � (3)

Herethesumover ' j 'cb indicatesthatwesumoverthestatesofall thevariablenodesnot in theset a . We shallwrite �>���!��� % forthemarginalprobabilityfunctionwhentheset a consistsof thesinglenode ? . Oneshouldnotethat theproblemof computingmarginalprobabilityfunctionsis in generalhardbecauseit canrequiresumminganexponentiallylargenumberof terms.

Thebeliefpropagation(BP) algorithmis a methodfor com-puting marginal probability functions. We describethe algo-rithm in termsof operationson a factorgraph. As we alreadymentionedin the introduction,the computedmarginal proba-bility functionswill beexact if the factorgraphhasno cycles,but theBP algorithmis still well-definedandempiricallyoftengives good approximateanswerseven when the factor graphdoeshavecycles.

To definetheBP algorithm,we first introducemessagesbe-tweenvariablenodesand their neighboringfactor nodesandvice versa.The messagek 0^l �Y�(�>� % from the factornode 4 tothevariablenode? is avectoroverthepossiblestatesof ��� . Thismessagecanbeinterpretedasastatementfrom factornode4 tovariablenode? abouttherelativeprobabilitiesthat ? is in its dif-ferentstates,basedon thefunction 2 0 . Themessagem � l�0 �(� �`%from the variablenode ? to the factornode 4 may in turn beinterpretedasa statementaboutthe relative probabilitiesthatnode ? is in its differentstates,basedon all the information ?hasexceptfor thatbasedon thefunction 2 0 .

Themessagesareinitialized to k 0^l ���!��� %F" mn� lo0 �(�>� %p" +for all factornodes4 , variablenodes? , andstates� � . In fact,other initializationsarealsopossible,andthe overall normal-izationof themessagescanalsobechosenarbitrarily. Theonlyimportantnormalizationconditionis on thebeliefs,introducedbelow, which mustsumto one in order to properly representprobabilities. The messagesareupdatedaccordingto the fol-lowing rules: m � lo0 �(� �Q%Hqr" .sut �pv���w h 0 k s l � �!� �`% � (4)

and k 0^l �Y�(�>� %Dqx"yeg=z�h�{;| 2 0 �!' 0 % .} t �pv 0 w h � m } lo0 �!� } % (5)

Here, �~�(? % j:4 denotesall the nodesthat that areneighborsofnode? exceptfor node4 , and � g z hY{ | denotesasumoverall thevariables' 0 thatareargumentsof 2 0 except � � . Themessagesmay be normalizedin any way that is convenient,asonly theratios of the termsin a messageare relevant. This standardBPalgorithmis sometimescalledthe“sum-product”algorithmbecauseof thesumandproductthatoccurson the right-hand-sideof equation(5).

In somecases,it is convenientto eliminate the m � l�0 �(� �`%messagesand write the message-updateequationsentirely in

termsof the k 0�l � �!� �`% messages.Alternatively, of course,onecould chooseto eliminatethe k 0^l � �(� �Q% messagesin favor ofthe m � lo0 �(� �Q% messages.

These message-updaterules may initially appear quitemysterious–amajor goal of this paperwill be to explain, jus-tify, andultimately improve uponthem. First though,to com-pleteourpreliminarydescriptionof thestandardBPalgorithm,we introducethebelief �Z���(�>� % at a variablenode ? , which is theBP approximationto the exact marginal probability function�����!��� % . Thebelief �Z���!��� % canbecomputedfrom theequation� � �(� �`%C� .0 t �pv���w k 0^l � �(� �Q% � (6)

wherewe have usedthe proportionalitysymbol � to indicatethat one must normalizethe beliefs so that they sum to one.TheBPmessage-updateequationsareiterateduntil they (hope-fully) converge,at whichpoint thebeliefscanbereadoff fromequation(6).

We canalsousethe BP algorithmto computejoint beliefs�^bn�!'cb % over setsof variablenodes a that may containmorethan one node. Considerthe importantcasewhen the set aconsistsof all the variablenodesattachedto the 4 th function2 0 �(' 0 % . We will denotethe correspondingbelief by � 0 �(' 0 % ,whichwill begivenwithin theBPapproximationby� 0 �(' 0 %�� 2 0 �!' 0 %�.� t �Fv 0 w m � l�0 �(� �`%� 2 0 �!' 0 % .� t �Fv 0 w .s�t �Fv���w h 0 k s l � �!� �`% � (7)

We candirectly derivethemessageupdaterules(4) and(5)from thebelief equations(6) and(7), alongwith themarginal-izationcondition � � �!� ��%*"yeg=z^hY{;| � 0 �(' 0 % (8)

which holds when � � is one of the argumentsin the set ' 0 .Thus,thebelief equations(6) and(7) canbeconsideredto de-fine the BP algorithm,a point of view that will prove usefullater.

TheBP algorithmis normally justifiedasbeinganexactal-gorithm when the factor graphhasno cycles (i.e., it hasthetopologyof a tree.) We shallnot prove thatpropertyhere,butwill simply give a smallexample:considerthe joint probabil-ity distribution givenby equation(2) asillustratedin figure1.Supposethat we would like to compute� �(� �% , the marginalprobability distribution at variablenode + . RepeatedlyusingtheBPequations,wefind� �!� %�� k 6 l �!� %� e {;� 2 6D�(�E������ % mn� l 6D�(��� %� e { � 2�6 �(� ��� � % k 7 l � �(� �;%� e {;� e {� e {�� 2 6��!�<:����� %�2 7F�(�>�=����@G����A % mn@ l 7��(�>@ % mEA l 7p�(��A %

3

� e { � e { � e { � 2�6 �(� ��� �;%�2�7 �(� � ��� @ ��� A�% k 8 l A �(� A %� e {� e {;� e {� 2 6D�(�E������ %�2 7p�(���G���>@=���BA %�2 8H�!�BA % (9)

which is exactly thedesiredmarginal probabilityfunction. Wecould similarly demonstratethat equation(7) would give ex-actmulti-nodemarginalprobabilitiesfor graphswith nocycles.We canalreadyseefrom this examplethat for graphswith nocycles,theBPalgorithmis essentiallyadynamicprogrammingalgorithmthat organizesthe computationsnecessaryto com-putemarginal probabilitydistributionsin sucha way that theybecometractable.

The BP algorithmwasintroducedinto the codingliteratureby Gallagerasa sub-optimalprobabilisticdecodingalgorithmfor linearblock error-correctingcodes,andsomereadersmaybe most familiar with the BP algorithm in that context [6].Other readersmay be most familiar with the form of the BPalgorithmintroducedandpopularizedby Pearl[9] for proba-bilistic inferencewith Bayesiannetworks. Readerswho aremore familiar with the BP algorithmwritten on one of theseforms may want to consult the review by Kschischanget.al.[20], which explains the equivalencebetweentheseforms oftheBPalgorithmandtheonewehavechosento usehere.

I I I . FREE ENERGIES

In this section,we turn from simply describingtheBP algo-rithm to explaining its success.In sectionII, we saw that theBPalgorithmcanbedefinedin termsof thebeliefequations(6)and(7). We shall eventuallyshow that thesebelief equationscorrespondto thestationarityconditionsfor a functionalof thebeliefscalledthe Bethefreeenergy, � �����!�� �Q� � �1� 0 % . This factservesin somesenseto justify theBPalgorithmevenwhenthefactorgraphit operateson hascycles,becauseminimizing theBethefree energy is a sensibleapproximationprocedurethathasa long andsuccessfulhistoryin physics.It alsopointsto avarietyof waysto improveuponor generalizeBP, especiallybyimproving upontheapproximationsusedin theBethefreeen-ergy. In therestof thepaper, wewill discussall of theseissues,but wefirst turnto anexplanationof thenotionof a freeenergy.

Supposethatonehasa systemof � particles,eachof whichcanbe in oneof a discretenumberof states,wherethe statesof the ? th particle are labeledby ��� . (As an example, onemight make a variety of simplificationsand characterizethestatesof the atomsin a magneticcrystalby whethera givenelectronin eachatom has an “up” spin or a “down” spin.)The overall stateof the systemwill be denotedby the vector' " �� ��� � ����������� � � . Eachstateof the systemhasa corre-spondingenergy ���(' % . A fundamentalresultof statisticalme-chanicsis that,in thermalequilibrium,theprobabilityof astatewill begivenby Boltzmann’sLaw� �(' %�" +- �(� %=�=�<� v g w\��� � (10)

Here,� is thetemperature,and- �(� % is simplyanormalization

constant,known asthepartition function:- �(� %�"�eg t b � �<� v g w!�Y� (11)

wherea is thespaceof all possiblestates' of thesystem.A substantialpart of statisticalmechanicstheoryis devoted

to the justificationof Boltzmann’s Law. On theotherhand,ifonebegins with a joint probability distribution �)�(' % for somenon-physicalsystem,onecanview Boltzmann’s law asa pos-tulatethatservesto defineanenergy for thesystem,wherethetemperaturecanbe setarbitrarily, asit simply setsa scaleforthe units in which one measuresenergy. We shall take thispoint of view andset � " + throughoutthe restof this paper.For thecaseof a factorgraphprobabilitydistribution function� �(' %�" � + � - %B� 90Z� 2 0 �!' 0 % , we definethe energy ���(' % of astate' to be ���(' %*"�� 9e0Z� ¡ �¢ 2 0 �(' 0 % (12)

in orderto beconsistentwith Boltzmann’sLaw.TheHelmholtzfreeenergy �c£ �`¤ ¥W��¦Y¤ �!§ of a systemis� £ �`¤ ¥W�¦�¤ �!§ "��  �¢ - � (13)

Thisfreeenergy is afundamentallyimportantquantityin statis-tical mechanics,becauseif onecancalculatethefunctionalde-pendenceof � £ �`¤ ¥W�¦�¤ �!§ on quantitieslike a macroscopicmag-neticfield ¨ or temperature� , thenit is easyto computeex-perimentallymeasurablequantitieslike theresponseof thesys-temto a changein ¨ or � . Physicistshave thereforedevotedconsiderableenergy to developingtechniqueswhich give goodapproximationsto � £ �`¤ ¥W�¦�¤ �!§ .

Oneimportanttechniqueis basedon a variationalapproach.Supposeagainthat �)�(' % is the true probability distribution ofthesystem,which obeys Boltzmann’s Law � �!' %F" � �>� v g w � - .It may be that even if we know � �(' % exactly, it is of a formthat makesthe computationof � £ �`¤ ¥W��¦Y¤ �!§ difficult. We there-fore introducea “trial” probabilitydistribution ���!' % , anda cor-respondingvariational freeenergy (oftencalledtheGibbsfreeenergy) definedby ���\� %C"ª© �\� %W� ¨«�Q� % � (14)

where © �Q� % is thevariationalaverageenergy:© �\� %C"�eg t b �:�(' % ���(' % (15)

and ¨«�\� % is thevariationalentropy:¨«�Q� %C"¬�«eg t b �:�(' %  �¢ ���(' % � (16)

It followsdirectly from ourdefinitionsthat���\� %C" � £ �`¤ ¥W�¦�¤ �!§)­~® �Q�:¯�¯ � % (17)

where ® �Q�:¯�¯ � %C° eg t b ���(' %  �¢ ���(' %� �!' % (18)

is the Kullback-Leibler divergencebetween �:�(' % and �)�(' % .Sincethereexists a theorem[30] that ® �\�:¯�¯ � % is alwaysnon-negative and is zero if andonly if ���(' %±" � �!' % , we seethat���\� %³² � £ �`¤ ¥W��¦Y¤ �!§ , with equalitypreciselywhen ���!' %*" �)�(' % .

4

Minimizing theGibbsfreeenergy ���Q� % is thereforeanexactprocedure´ for computing � £ �`¤ ¥W��¦Y¤ �!§ andrecovering � �!' % . Ofcourse,as � becomeslarge, this procedureis alsototally in-tractable,as �:�(' % will take exponentiallylargememoryjust tostore.A morepracticalpossibilityis to upper-bound� £ �`¤ ¥W�¦�¤ �!§by minimizing ���Q� % overarestrictedclassof probabilitydistri-butions. This is the basicideaunderlyingthe meanfield ap-proach.Onevery popularmean-fieldform for ���(' % is the fac-torizedform: � 9¶µ �!' %*" �.� � � � �!� ��% � (19)

Usingthis �^9¶µH�!' % , andanenergy function ���(' % of thefactorgraphform given in equation(12), we caneasilycomputethemeanfield freeenergy � 9¶µ·"¸©C9¶µ¹� ¨ 9¶µ for anarbitraryfactorgraph:

© 9¶µH�`���� ��������1�Z�&� %C"º� 9e0Z� e g z  �¢ 2 0 �!'E» % .� t �pv 0 w �Z���(��� % �(20)¨ 9¼µ �u� � ���������Y� � � %C"¬� �e � � e {;| � � �(� ��%  �¢ � � �(� ��% � (21)

Minimizing ��9¶µ½�\������������Y�^� % over the �Z� will give us self-consistentequationsfor the �Z� , whichcanbesolvednumericallyto obtainamean-fieldapproximationfor thebeliefs � � �(� �`% .

Insteadof a factorizedform, onemight considerothermorecomplicatedformsfor ���(' % whichstill leadto tractableapprox-imations. This is the ideabehindthe “structuredmean-field”approach[31]. We will not follow that path,andwill insteaddescribea quite differentapproachto approximating���\� % inthenext section;onewhichunderliestheBP algorithm.

IV. REGION-BASED FREE ENERGY APPROXIMATIONS

Kikuchi andthe otherphysicistswho further developedtheso-calledclustervariation method[16], [18], [19] introduceda classof approximationsto the Gibbsfreeenergy ���Q� % . Theideabehindtheseapproximationsis similar, but slightly differ-entfrom themeanfield approximation.Whereasthefactorizedmean-fieldfreeenergy ��9¶µ is afunctionof single-nodebeliefs�Z���(�>� % , in a Kikuchi approximation,the approximatefree en-ergy �)¾ ��¿^À Á`Â;� will bea functionof beliefs �^b �('cb % over largersets a of variablenodes. Onedrawbackof the clustervaria-tion methodis that in contrastwith the mean-fieldapproach,onecannotnormallyexplicitly constructan overall “trial” be-lief vector ���!' % that is consistentwith the multi-nodebeliefs�^b �('cb % , andthereforeonedoesnot normallyobtainany upperboundon � [32]. On the otherhand,onecanmake approxi-mationsthataremuchmoreaccuratethanthefactorizedmean-field approximation,andthereis agreatdealof flexibility in theexactchoiceof approximation.As we shallalsoseein furtherdetail,theseapproximationscanbeexploitedto yield message-passingalgorithms,andaparticularlysimpleversion–theBetheapproximation–willgive resultsthatareequivalentto thestan-dardBP algorithm.

Weshallactuallydescribehereaclassof approximationsthatgeneralizethosegeneratedby theclustervariationmethodasit

hasbeendescribedin thephysicsliterature,andwill thereforerefer to suchapproximationsas region-basedapproximations.We refer to thesub-classof approximationsspecificallygener-atedusingtheclustervariationmethodasKikuchi approxima-tions.

Fig.2. An illustrationof thedefinitionof a region. Regionsaresetsof variableand factor nodesin a factor graphsuchthat all variablenodesconnectedtoany includedfactor nodesare included. Thus, the setsof nodes Ã�Ä N�Å�Æ andÃ1Ç N�È)NQÅ;NQÉ;N\ÊÆ couldberegions,but Ã1Ç N`É�Æ couldnotbearegion(sincefactornode Ç wasincluded,variablenodesÅ and Ê mustalsobeincluded.)

Webegin by assumingthat �)�(' % hasthefactorgraphform ofequation(1). We definea region Ë of a factorgraphto bea setÌBÍ

of variablenodesandaset � Í of factornodes,suchthatif afactornode4 belongsto � Í , all thevariablenodesneighboring4 arein

ÌBÍ. We give examplesof setsof nodesthat could or

could not be consideredregionsin figure 2. Note that the set� Í maybeempty, andthata factor 4 neednot be includedin� Í evenif all its neighboringvariablenodesareinÌ Í

.We definethestate' Í of a region Ë to bethecollective set

of variablenodestates�� � ¯ ?pÎ Ì Í � . Themarginal probabilityfunctionovera region Ë will bedenotedby � Í �!' Í % , by whichwe meana marginalizationof � �!' % onto thevariablenodesinÌ Í

. Thecorrespondingbelief � Í �!' Í % will beanapproximationto thetrue � Í �!' Í % .

We definetheregionenergy � Í �!' Í % to be� Í �(' Í %C"���e0 t µGÏ  �¢ 2 0 �!' 0 % � (22)

where,sinceall the variablenodesneighboringa factornode4ÐΪ� Í areguaranteedto be in the region Ë , we canalwaysdetermineany neededstate ' 0 from the state ' Í . We furtherdefinethe region average energy © Í �Q� Í % , the region entropy¨ Í �Q� Í % , andtheregion freeenergy � Í �\� Í % , by© Í �Q� Í %C" e g Ï � Í �!' Í % � Í �!' Í % (23)

¨ Í �\� Í %*"º�#e g Ï � Í �!' Í %  �¢ � Í �(' Í % (24)

and � Í �Q� Í %*"·© Í �Q� Í %)� ¨ Í �Q� Í % � (25)

Theintuitive ideabehindaregion-basedfreeenergy approx-imationis thatwewill try to breakupthefactorgraphinto asetof largeregionsthatincludeeveryfactorandvariablenode,andsaythattheoverallfreeenergy is thesumof thefreeenergiesofall theregions.Of course,if someof thelargeregionsoverlap,

5

thenwewill haveerredby countingthefreeenergy contributedby someÑ nodestwo or moretimes,sowe thenneedto subtractout thefreeenergiesof theseoverlapregionsin suchaway thateachfactorandvariablenodeis countedexactlyonce.

To make thesenotionsprecise,we say that a region-basedapproximation�cÒ for theGibbsfreeenergy will bedefinedintermsof a setof regions Ó , andan associatedsetof countingnumbers Ô Í . Ô Í will alwaysbeaninteger, but mightbezeroornegativefor someË .

Wesaythatasetof regionsÓ andcountingnumbersÔ Í giveavalid region-basedapproximationwhen,for everyfactornode4 andeveryvariablenode? in thefactorgraph,eÍ t Ò Õ 4�ÎÖ� Í)× Ô Í "$eÍ t Ò Õ ?CÎ Ì Í)× Ô Í " + � (26)

whereÕ Ø Î - × is the set-membershipindicatorfunction equal

to + ifØ Î - andequalto 0 otherwise.

Theseconditionsensurethatevery factorandvariablenodewill be countedexactly onetime in the approximationto thefreeenergy. If a givenfactoror variablenodeis addedinto thefreeenergy in two differentregions,thentheremustbeanotherregionwhereit is subtractedbackout.

Givenavalid setof regions Ó andcountingnumbersÔ Í , theregion-basedapproximationto theGibbsfreeenergy is simply� Ò �`��� Í � %C"$eÍ t Ò Ô Í � Í �Q� Í % � (27)

Notethattheregion-basedaverageenergy© Ò¼�`� � Í � %Ù" eÍ t Ò Ô Í © Í �\� Í %" �ÚeÍ t Ò Ô Í e g Ï � Í �(' Í %Ûe0 t µ Ï  �¢ 2 0 �(' 0 % (28)

will always be exact, provided that the beliefs ��� Í �!' Í % �are equal to the correspondingexact marginal probabilities�Y� Í �(' Í % � . We canseethis by comparingwith the exact av-erageenergy

©¬"Úeg t b � �!' % ���!' %�"º� 9e0^� e g z � 0 �!' 0 %  �¢ 2 0 �(' 0 % (29)

andnotingthat in theovercountingnumbersÔ Í guaranteethateachfactornodeis countedexactly oncein equation(28), andthatif all the � � Í � areexactin equation(28),they will properlymarginalizeto givethenecessaryfactorsof � 0 �(' 0 % in equation(29).

On theotherhand,theregion-basedentropy¨ Ò �`��� Í � %�" eÍ t Ò Ô Í ¨ Í �Q� Í %" � eÍ t Ò Ô Í e g Ï � Í �(' Í %  �¢ � Í �!' Í % (30)

will normally only be an approximationeven if the beliefs� Í �!' Í % wereexactly equalto the true marginal probabilities,althoughtheconditionthateachvariablenodeis countedonce

makesit a quite “reasonable”approximation,in thesensethatif theprobabilitydistribution �)�(' % wasflat, this entropy wouldat leastcountthenumberof degreesof freedomcorrectly. Theregion-basedentropy will alsobeexactin certaincasesthatwedescribelater.

How doesoneselecta valid setof regions Ó andcountingnumbersÔ Í for a givenfactorgraph?Therearein factaninfi-nite numberof waysto do that. In thenext sectionwe will de-scribeaverystraightforwardapproachwhichwecall theBethemethod, which is guaranteedto returnvalid setsof regionsandcountingnumbers.We thenprove that the fixed pointsof thestandardBP algorithmcorrespondto stationarypoints of theBetheapproximationto thefreeenergy.

In thefollowing section,we will introducethe region graphmethod, which is a very generalapproachto finding valid ap-proximations,basedon constructinga region graph. Regiongraphsplay a central role in the descriptionboth of the re-giongraphfreeenergy, andin theconstructionof correspondingGBPalgorithms,andprovide theclearway of visualizingandunderstandingaregion-basedapproximation.

The Bethemethodis a specialcaseof the muchmoregen-eral region graphmethod.In appendicesA andB, we discusstwo otherimportantmethodsthat arealsospecialcasesof theregion graphmethod:the junctiongraphmethodandtheclus-ter variation method. In appendixC, we discussin detail therelationshipbetweenthedifferentmethods.

Fig. 3. A factorgraphwhichweuseto illustrateavarietyof region-basedfreeenergy approximations.

V. THE BETHE METHOD

The origins of the Bethe method date back to 1935,andBethe’s famousapproximationmethodfor magnets[15].Kikuchi, in his 1951paperthatpioneeredtheclustervariationmethod[16], recognizedthat Bethe’s approximationwas thesimplestexampleof anapproximationthatcouldbegeneratedusingthatmethod.Of course,from themodernpoint of view,theseearly papersfocusedon very specialgraphicalmodels,andwe warn the readerwho wantsto readtheoriginal papersthatourdescriptionof Bethe’sandKikuchi’smethodswill bearlittle resemblanceto theirexpositions.

In theBethemethod, wetakethesetof regionsincludedin Óto beof two types.First,wehaveasetof largeregionsÓ�Ü suchthatthe 5 regionsin Ó�Ü eachcontainexactlyonefactornode

6

andall thevariablenodesneighboringthatfactornode.Second,we haveÑ a setof small regions ÓÛb , suchthat the � regionsinÓÛb eachcontaina singlevariablenode.

We take asan examplethe factorgraphshown in figure 3,which hassix factor nodeswhich we label Ý���Þ��1ß�� ® �Y�����and nine variable nodes which we label + �Yà�����������á . Forthis example, we would have the following large regionsin Ó Ü : � Ýo� + �Yà���âB�1ãK� , �;Þ��Yà���ä��1ãå�Yæå� , � ß���âB�1ãK� , � ® �Yã���æå� ,� ����â��Yãå�ZçK�Yèå� , and �;�D�Yã���æB��è��Yáå� , andthe following small re-gionsin ÓÛb : � + � , � àå� , � äå� , �âB� , � ãK� , � æå� , ��çK� , � èå� , and �;áå� .Thecompletesetof regions Ó������!�� includedin theBetheap-proximationis Ó������!�� " Ó�Ü�é�ÓÛb .

If Ë and Ë � aretwo regions,wesaythat Ë is a sub-regionor Ë � and Ë � is a super-region of Ë if thesetof variableandfactornodesin Ëê area subsetof thosein Ë�� .

In theBethemethod,the countingnumbersÔ Í for eachre-gion Ë�ÎÛÓ aregivenby

Ô Í " + � eb t:ë v Í w Ôb (31)

whereìp�\Ë % is thesetof regionsthataresuper-regionsof Ë .Using this definitionwe seethat for every region ËíÎ«Ó Ü ,Ô Í " + , while for everyregion Ë�Î±Ó b , Ô Í " + �±î � , whereî �

is thedegree(numberof neighboringfactornodes)of thevari-ablenode ? . It is easyto confirmthat theBetheapproximationwill alwaysbea valid approximation,aseachfactorandvari-ablenodewill clearlybecountedonceasrequiredin equation(26).

We canuseourexpressionsfor Ô Í in equation(28) to obtainthe Betheapproximation to the Gibbs free energy � �����!�� "© �����!��� � ¨ �����!�� , where

© �����!�� "º� 9e0Z� e g=z � 0 �(' 0 %  �¢ 2 0 �!' 0 % (32)

and

¨¶�����!�� " � 9e0Z� e g z � 0 �(' 0 %  �¢ � 0 �!' 0 %­ �e � � � î � � + % e { | �^���(�>� %  �¢ �^���(�>� % � (33)

NotethattheBetheentropy will beexactif thefactorgraphhasnocycles,becausein thatcasewehavetheexactformula[13]

� �(' %�" � 90Z� � 0 �!' 0 %� �� � �ï� � �!� ��%�%Yð | � � (34)

whichwecansubstituteinto theformulafor thevariationalen-tropy to recover ¨ �����!��� .

Weshallnow show thatminimizingtheBetheapproximationto thefreeenergy will alwaysgiveresultsthatareequivalenttothe standardBP algorithm,so the exactnessof the Betheap-proximationfor factorgraphswith nocyclesis nosurprise.

A. Equivalenceof theBetheApproximationandStandard BP

Wenow show thatthestandardBPalgorithmis equivalenttotheBetheapproximation,andexploresomeof theimplicationsof thatequivalence.In particular, weshow thatthe“messages”sentin BP areexponentiatedcombinationsof Lagrangemulti-pliers.

Theorem: Let �;k 0^l � �!� ��% ��m � lo0 �!� ��% � be a set of BP mes-sagesand let ��� 0 �(' 0 % �Y� � �(� ��% � be the beliefs calculatedfromthosemessages.Thenthebeliefsarefixedpointsof theBP al-gorithmif andonly if they arezerogradientpointsof theBethefreeenergy � �����\��� , subjectto theconstraintthatall thebeliefsarenormalizedandconsistent.

Proof: We wantto minimizetheBethefreeenergy, while in-sistingthatall thebeliefs �Z���(��� % and � 0 �!' 0 % areconsistent.Tothisend,weaddLagrangemultipliers ñ 0 �(�>� % whichenforcetheconstraintthat � g z hY{ |å� 0 �(' 0 %¼" �^���!��� % for every factornode4 andall its neighboringvariablenodes? . We alsoneedto addLagrangemultipliers to normalizethe beliefs, but we do notclutter our equationswith them,astheir effectsareautomati-cally takeninto accountif wesimplynormalizeourbeliefs.

Settingthederivativeof theresultingLagrangianò �����!�� withrespectto thebeliefs �Z���!��� % and � 0 �(' 0 % equalto zerogives:� 0 �!' 0 %C�·2 0 �(' 0 % .� t �pv 0 w ��ó zu| v { | w (35)

and � � �(� ��%C� ôõ .0 t �Fv���w ��ó zu| v { | w�ö÷ øù |�ú ø � (36)

If wemake theidentificationñ 0 �Y�(�>� %�"  �¢ mn� 0 �!��� %C"  �¢ .s�t �pv���wYû�c0 k 0�l ���!��� % (37)

thenwe find that we recover the standardBP belief equations(6) and (7), which meansthat the standardBP fixed pointscorrespondto stationarypoints of the constrainedBethefreeenergy. ü

The fact that òC�B���!��� is boundedbelow implies that the BPequationsalwayspossessa fixedpoint (obtainedat the globalminimumof òC�����\��� ). To our knowledge,this is thefirst proofof theexistenceof BP fixedpointsfor a generalgraphwith ar-bitrarypotentials.Of course,theexistenceof afixedpointdoesnot imply thattheBP algorithmwill convergestartingfrom ar-bitrary initial conditions.

Theconditionsfor theuniquenessof BPfixedpointsarealsoclarifiedby the equivalencewith theBetheapproximation.Ingraphswith no morethana singlecycle, it wasknown that ifall factorsarestrictly positive ( 2 0 �!' 0 %�ýÿþ for all 4 and ' 0 ),thentherewasauniqueBPfixedpoint.[33]For generalgraphs,wecanusetheequivalenceestablishedaboveto answeraques-tion abouttheuniquenessof stationarypointsfor theBethefreeenergy. The issueof the numberof stationarypoints of ap-proximatefreeenergiesis well studiedin physics.To bemoreprecise,wecanimaginedefininga sequenceof probabilitydis-tributions wheresomeor all of our original functionsare allraisedby a power: 2 0 �(' 0�� � % " 2 0 �(' % ��Y� . This is equiva-lent to changingthe temperaturein a physicalsystem,where

7

� is the temperature.Many systems,for exampleIsing fer-romagnets,� will have differentnumbersof solutionsabove orbelow a critical temperature � Á within the Betheapproxima-tion [34]. Above � Á , theconstrainedfreeenergy is convex andhasa uniquestationarypoint, while below � Á , therearemulti-plestationarypoints.Usingthisequivalenceit is easyto definesmall factorgraphsthatshow a similar behavior. Althoughthetopologydoesnot changeandthe factorsarealwayspositive,aswe smoothlychangethefactorswe go from a regimewith auniquefixedpoint to onewith multiplefixedpoints.

While we have shown that standardBP can only convergeto stationarypoints of the constrainedBethe free energy, itis importantto realizethat BP doesnot perform constrainedminimization of the Bethe free energy; i.e. it doesnot de-crease� �����!��� at every iteration. Indeed,the marginalizationconstraintsaretypically not satisfiedat intermediateiterationsof BP: it is only at a fixed point that the beliefsare in a fea-sible set. Basedon the equivalence,first notedin our earlierwork [17], othershaverecentlydevisedalgorithmsthatdirectlyminimize the free energy on the feasibleset [35], [36], [37].Suchfreeenergy minimizationsaresomewhatslower thantheBPalgorithm,but they areguaranteedto converge.

VI . THE REGION GRAPH METHOD

We now introduceregion graphs, which are central to theregion graphmethodfor generatingvalid freeenergy approxi-mations,andalsowill provide a graphicalframework for GBPalgorithms.

Let � be thesetof indicesfor the factorandvariablenodesin a factorgraph. A region graph is a labeled,directedgraph� " � Ì �����Yò % in which eachvertex �¹Î Ì

(correspondingtoa region) is labeledwith a subsetof � . We denotethe labelofvertex � by òp��� % . A directededge(or arc) mayexist pointingfrom vertex ��� to vertex � Á if òp��� Á1% is asubsetof òp����� % . If suchanarcexists,wesaythat � Á is achild of ��� , that ��� is aparentof� Á , andthatthey belongto differentgenerations.. If thereexistsadirectedpathfrom vertex � 0 to vertex � ð , wesaythat � 0 is anancestorof � ð , and � ð is a descendantof � 0 . Notethatbecauseof thetransitivity of thesubsetrelationship,aregiongraphmustbea directedacyclic graph,in thesensethatthearrows cannotlooparound.

A regiongraphis closelyrelatedto theHassediagram for apartially orderedset, or poset[38], if we considerour regionsto beorganizedinto a poset,with theorderingrelationshipbe-tweentheregionsto begivenby theancestor-descendantrela-tionship[28], [29]. Thereare,however, somedifferencesbe-tweenregion graphsandHassediagrams.First, region graphsare labeledgraphs,andwe will insist on some“region graphconditions,” describedbelow, that the labelsmustsatisfy. Sec-ond,regiongraphscanincludeanarcbetweentwo regionsthatarealsoconnectedby a pathof lengthtwo or greater, which isforbiddenfor Hassediagrams.

WedefineacountingnumberÔ for everyvertex in theregiongraph,by Ô� " + � eÀ t 6)v�À w Ô^À�� (38)

where ���� % is thesetof verticesthatareancestorsof � . Thus,the countingnumbersfor the regionsof a region graphcorre-spondto theMobiusfunctionof thecorrespondingpartiallyor-deredset[38].

For a graph�

to qualify asa region graph,we further insiston the region graph condition, which requiresthat for every?�Π� , thesubgraph

� �(? %&" � Ì �(? % �Y���(? % ��òF�!? %�% formedby justthoseverticeswhoselabelsinclude ? is a connectedgraphthatsatisfiesthecondition e t�� v���w Ô ê" + � (39)

Having definedregion graphs,it is almosttrivial to definea correspondingmethodfor generatingvalid region-basedfreeenergy approximations.We simply createa region graphsuchthattheverticescorrespondto regions,with labelscorrespond-ing to the factor and variablenodesin a region, and we re-quire that every factor and variablenode be containedin atleastone region. We associatethe countingnumbersÔ Í forregions directly with the countingnumbersÔ for the regiongraph,andtheregion graphfreeenergy � Í�� will begivenby� Í�� " � Í Ô Í � Í , where � Í is the freeenergy of the regionË .

Fig. 4. An exampleof aregiongraph.Wehave listedthecountingnumber���next to eachregion.

In figure4, wegiveanexampleof a regiongraphfor thefac-tor graphthat we alreadyintroducedin figure 3. This regiongraphwasconstructedto demonstratewhat is and is not per-mitted in a legal region graph,ratherthanwhat would likelygivegoodresults.Notethata regiongraphneednotobey somepropertiesthat onemight considerimportant(including somewhich are enforcedin the junction graph methoddescribedin appendixA and the clustervariation methoddescribedinappendixB). For examplethereneednot be any cleardelin-eationof “generations”(region �;èå� is a child of both regions� ß�������âB�1ãå�ZçK��è�� andregions �;�D�Yã���æB��è��Yáå� , while region � ãå�is a grand-childof region � ß��Y����â��Yãå�ZçK�Yèå� anda child of re-gion �;�D�Yã���æB��è��Yáå� .) Notealsothatregionsmayhave countingnumberequalto zero(e.g. region � ã���æ�� ), andthatthefactthata region is a sub-setof anotherregion neednot imply that it isalsoadescendantof thatregion(e.g.regions �;�D�Yã���æB��è��Yáå� and� ãå�Yæå� ).

What is essentialis that theregion graphconditionsthatwedescribedabove areobeyed. We insiston theseconditionsfor

8

thefollowing reasons.First,to reiteratethecommentswemadeaboutvalid� region-basedfreeenergy approximations,thecon-dition thateveryfactornodein thefactorgraphis countedoncewhenwe do theweightedsumoverall regionsensuresthattheregiongraphaverageenergy is exactif theregionbeliefsareex-act;andtheconditionthatevery variablenodeis countedonceensuresthat the region graphentropy is a reasonableapproxi-mation. Theconditionthat the regionscontaininga particularvariablenodeform a connectedsub-graphwill ensurethat themarginal probability at any nodeis consistentirrespective ofwhich region’sbeliefsoneusesto computeit. Empirically, wehave found in limited experimentsthat if oneattemptsto runa GBP algorithm(as describedbelow) on graphsthat do notsatisfyall theregiongraphconditions,theresultswill bepoor.

Fig.5. An exampleof agraphof regionsthatis notaregiongraphbecausethesumof thecountingnumbersof regionscontainingvariablenode5 is notone.

An exampleof a“f alseregiongraph”or graphof regionsthatdoesnotsatisfytheregiongraphconditionsis shown in figure5.Theproblemwith thisplausible-lookingconstructionis thatthesumof thecountingnumbersof theregionscontainingvariablenode5 is zero, ratherthan one. We could modify this falseregiongraphin a varietyof waysto obtaina realregiongraph.For example,we couldsimply remove node5 from theregion��àå�Yãå� . The resultingregion graphwould be an exampleof ajunctiongraph; seeappendixA. Alternatively, we couldaddaregion � ãå� which just containedvariablenode ã , andconnectthe regions ��àå�1ãK� , ��ß���â��YãK� , � ® �Yãå�Yæå� , and ��ãå��è�� to it (theresultof usingtheclustervariationmethod;seeappendixB).

JustastheBetheapproximationwill beexactwhenthe fac-tor graphis a tree,aregiongraphapproximationwill alwaysbeexactwhenthecorrespondingregion graphis a tree. This canbedemonstratedby recursivelyapplyingthefollowing junctiongraphformulafor theprobabilitydistribution of a factorgraphdividedinto largeregionsÓ Ü , andsmallregionsÓ b whichsep-aratethelargeregions(seeAppendixA for moredetails):� �!' %C" � Í t Ò�� � Í �(' Í %� Í t Ò i � Í �!' Í % ð Ï � � (40)

We illustrate the ideawith an example,that hasthe factorgraphgiven in figure 6, and the region graphgiven in figure7. We will recursively breakdown the full joint probabilitydistribution andshow that it is equalto a productof marginalprobabilitydistributionsoverregionsthathaspreciselytheformnecessarysothattheregiongraphfreeenergy is exact.

Fig. 6. A factorgraphthathasa treeregiongraphshown in figure7.

Fig. 7. A region graphwith no cyclesthathasa correspondingregion graphfreeenergy approximationwhich is exact.

Notethat for this region graph,theregion �;â�� separatestheleft part of the treeandthe right part of the tree. That meansthatwehave� �(� ������������ %C" �)�(�E�����@:���BA¡����� % � �(�>�=���BAG�����G��� � %�)�(� A % � (41)

The marginal probability distributions �)�(�<:���>@:���BA¡����� % and� �(�>�:���BA¡�����G��� � % can in turn be written in termsof marginalprobabilitiesof smallerregions. For example,we seethat theregion � ä���â�� separatestheregions �;Ýo� + �Yä���âB� and � ß��Yä���âB��æ�� ,sothat�)�(� ��� @ ��� A ��� �;%*" �)�(� ��� @ ��� A�% �)�(� @ ��� A ��� �;%� �!��@G����A % � (42)

Expandingeverythingout, we obtainthat the joint probabilitydistribution � �!� ������������� % equals� �!� ��� @ ��� A % � �!� @ ��� A ��� �;% � �(� � ��� A ��� � % �)�(� A ��� � ����� %� �!� @ ��� A;% �)�(� A ��� � % � �!� A�% � (43)

Substitutingthis result into the formula for the exact entropy,we recover the region graphentropy. Sincethe region graph

9

averageenergy is alwaysexactwhentheregionbeliefsare,thisdemonstrates� thattheapproximationis exactin thiscase.

We note that eachterm in the numeratorof the expression(43) hasa power of + , andeachterm in the denominatorhasa power of � + , correspondingexactly to thecountingnumberof the correspondingregion. In the generalcaseof a regiongraphwith no cycles,the recursive applicationof the junctiontreeseparatorformula(40) will alwayssimilarly reproducethecountingnumbersgivenby theregiongraphprescriptionequa-tion (38).

We have alreadyseenthat thestationarypointsof theBetheapproximationto the free energy are equivalent to the fixedpointsof thestandardBPalgorithm,whichoperatesona factorgraph.In thefollowingsections,weshallintroducegeneralizedbeliefpropagationalgorithmswhich operateon region graphs,anddemonstratethat their fixed pointscorrespondto the sta-tionarypointsof theregiongraphfreeenergy.

In appendicesA, B, wediscusstwo othermethods(thejunc-tion graphmethodandtheclustervariation method) thatgen-eratevalid freeenergy approximations.Both of thesemethodscanbeconsideredspecialcasesof theregiongraphmethod.InappendixC, wedescribetherelationshipbetweenall thediffer-entmethodsdescribedin thispaperin moredetail.

VI I . GENERALIZED BELIEF PROPAGATION ALGORITHMS

Justasthe standardBP algorithmcorrespondsto the Betheapproximation,onecanconstructgeneralizedbelief propaga-tion (GBP)algorithmscorrespondingto any region graphfreeenergyapproximation.In fact,therearemany waysto constructmessage-passingalgorithmswhosefixedpointsareequivalentto thestationarypointsof aregiongraphfreeenergy. In all thesealgorithms,messagesof somesortaresentbetweenregionsona regiongraph.

Oneshouldfirst notethatonecanobtaindifferentGBPalgo-rithmscorrespondingto thesamefreeenergy by usingdifferentregiongraphsthathavethesamefreeenergy. For example,onecould modify a region graphby connectinga grandparentre-gion directly to a grandchildregion. TheGBPalgorithmsthatwedescribebelow wouldbemodified,but theapproximatefreeenergy wouldnotbechanged.Makingsuchamodificationwillalterthedynamicsof a GBPalgorithm,but not its fixedpoints.

Evenif onefixesone’sattentiononaparticularregiongraph,thereare still a variety of different GBP algorithmsthat onecancreate.In themaintext of this paper, we will describeonepossibleapproach,whichwecall theparent-to-child algorithm.In appendicesD andE, we describetwo otherapproaches(thechild-to-parent algorithm and the two-wayalgorithm) whichgive algorithmswith equivalentfixed points,andwhich havetheir own advantages.An importantadvantageof the parent-to-childalgorithmis thatthemessage-passingalgorithmmakesnoreferenceto regioncountingnumbers,justasin thestandardBPalgorithm.

The standardBP algorithmis a specialcaseof all threeal-gorithmswhen the region graph is obtainedusing the Bethemethod.

A. TheParent-to-ChildAlgorithm

As we saw, thestandardBP message-passingequationscanbederivedusingthefactthatthebeliefatasinglevariablenodeis just theproductof all themessagesbearinginformationfromneighboringfactornodes,while thebelief at theregionof vari-ablenodesadjoiningasinglefactornodeis theproductof theirinternalfactors,multipliedby all themessagescominginto thegroupof nodesfrom factornodesoutsidetheregion.

The parent-to-childalgorithmgeneralizesthis idea. In thisalgorithm(which in previousexpositionswecalledthe“canon-ical” GBPalgorithm[17]) thebeliefatany region Ë will betheproductof thelocal factorsin thatregion,multiplied by all themessagescominginto region Ë from outsideregions.Thereisonecomplication,however: to make the algorithmequivalentto minimizing theregiongraphfreeenergy, weneedto includeadditionalmessagesinto regionswhich aredescendantsof Ëfrom otherparentregionsthatarenot themselvesdescendantsof region Ë .

To bemorespecific,in theparent-to-childalgorithm,weonlyhave onekind of messagek�� l Í �!' Í % from a parentregion toachild region. Eachregion Ë hasabelief � Í �!' Í % givenby

� Í �!' Í % � .0 t 6 Ï 2 0 �(' 0 %ôõ .� t�� v Í w k � l Í �(' Í % ö÷ �����

����� ôõ .!Dt�" v Í w .�$# t�� v ! w h&% v Í w k � # l ! �!' ! % ö÷ �(44)

Here '��!Ë % is the setof regionsthat areparentsto region Ë ,( �!Ë % is thesetof all regionsthataredescendantsof region Ë ,) �!Ë %C° Ë«é ( �\Ë % is thesetof all regionsthataredescendantsof Ë andalsoregion Ë itself, and '�� ® % j ) �!Ë % is thesetof allregionsthatareparentsof region ® exceptthosethatarealsodescendantsof region Ë or region Ë .

Fig. 8. A region graphusedto illustratethe parent-to-childGBPalgorithm.Notethatwe do not explicitly give thevariableandfactornodelabelsfor eachregion,asfor ourpurposes,weareonly interestedin thetopologyof theregiongraph.

An examplemayhelpmakethebeliefequationclearer. Con-sidertheexampleshown in figure8. Thebelief � Í �(' Í % at re-gion Ë is the productof its local factors � 0 t 6nÏ 2 0 �(' 0 % , themessagesfrom its parentsk±6 l Í �(' Í % and kÖ7 l Í �(' Í % , andthemessagesinto descendantsfrom otherparentswho arenotdescendants:k 8 l � �(' � % , k 8 l+* �!' * % , and k µ l,* �(' * % .

10

Oneobtainsself-consistentequationsfor themessagesby re-quiring consistency betweenthebeliefsbetweenevery pair ofparentandchild regions. Thusin figure8, we might focusontheregion Ë andits child � . Thebeliefat region Ë is givenby� Í � k 6 l Í k 7 l Í k 8 l � k 8 l,* k µ l,* .0 t 6 Ï 2 0 �(' 0 %

(45)(wherewe have lightenedthe notationby removing the obvi-ousfunctionaldependenciesof themessages)andthebelief atregion � is givenby� � � k Í l � kÖ8 l � k ! l � kÖ8 l+* kÖµ l,* .0 t 6$- 2 0 �(' 0 %

(46)Using the marginalization constraint, � Í �!' Í % "� g/.�h�g Ï �Z6D�('n6 % we obtain a relation between messagesthatwecaninterpretasthemessageupdaterulek Í l � �!' � % k ! l � �(' � %³qx"eg Ï h�g - k 6 l Í �!' Í % k 7 l Í �!' Í % .0 t 6 Ï h 6 - 2 0 �(' 0 % � (47)

Of course,similar messageupdateruleswould be obtainedfor all the pairsof parentandchildrenregions. Therewill beenoughconditionsto determineeverymessage.

In general,theparent-to-childmessage-updateruleswill bek � l Í �(� Í %³qr"� {�021 Ï � 0 t µ 021 Ï 2 0 �!� 0 % � v43�5 6=w t �pv � 5 Í w k 3 l 6 �!� 6B%� v43�5 6=w t/! v � 5 Í w k 3 l 6 �(� 6B% (48)

wherethesets�«�87D��Ë % and ® �97D�YË % canbecalculatedin ad-vance.Recallthat

) �\Ë %�" ˹é ( �\Ë % . Then �~�97D��Ë % is thesetof all connectedpairsof regions �9���: % suchthat : is in

) �97 %but not

) �!Ë % while � is not in ���97 % . ® �97D�YË % is thesetof allconnectedpairsof regions �8�B��: % suchthat : is in

) �\Ë % , while� is in) �87 % , but not

) �\Ë % .We now prove a central theoremaboutthe parent-to-child

GBPalgorithm,which is definedby the messageupdaterules(48)combinedwith thebeliefequations(44).

Theorem: A setof messagesandbeliefsdefinea fixedpointof theparent-to-childGBPalgorithmif andonly if thebeliefsareastationarypointof theregiongraphfreeenergy, wheretheregion graphfreeenergy is constrainedto have beliefsthatareconsistentandnormalized.

Proof: To simplify theproof, we will assumethatno regionË in theregiongraphhascountingnumberÔ Í "/þ . In appendixF, we discussthis technicallyusefulassumptionin detail. Inparticular, weshow thatit is easyto removeany Ô Í "/þ regionsto get an equivalentregion graph;andalsothat even if we dopermit them,theparent-to-childGBPalgorithmwill still workproperly, althoughthefollowing proofno longerholds.

Recallthattheregiongraphfreeenergy is simply� Ò �`��� Í � %C"$eÍ t Ò Ô Í � Í �Q� Í % � (49)

To derive thestationarityconditions,weneedto createa La-grangianò for the freeenergy which enforcesconsistency be-tweenthe beliefs in every pair of connectedregions. To that

end,weaddLagrangemultipliers ñ;� 8 �(' 8�% whichenforcethat�Z8½�!'n8 %C" eg 0 h�g/< � � �!' � % (50)

for everypairof parentandchild regions 7 and ß .Of course,we alsoneedto includeLagrangemultipliers = Í

whichenforcethenormalizationof thebeliefs: � g Ï � Í �(' Í %C"+ . Setting the derivatives of ò with respectto the beliefs� Í �(' Í % equalto zerogivesus the following stationaritycon-ditions: Ô Í  �¢ � Í �(' Í %�" = Í ­ Ô Í e0 t 6 Ï  �¢ 2 0 �(' 0 % ������ e� t�� v Í w ñ � Í �(' Í % ­ e8 t�> v Í w ñ Í 8F�('n8 % � (51)

where '��\Ë % is thesetof regionsthatareparentsof region Ë ,and ?C�!Ë % is thesetof regionsthatarechildrenof region Ë . Inthisexpression,' 0 and 'n8 areentirelydeterminedby thevalueof ' Í .

Our proof will now work backwardsfrom the belief equa-tionsthatwe wantto derive. We wantto show thatthereexistsa “rotation” from our Lagrangemultipliers ñ to anothersetofLagrangemultipliers @ suchthatthestationarypointconditionscanbere-writtenasÔ Í  �¢ � Í �(' Í %C" = Í ­ Ô Í e0 t 6 Ï  �¢ 2 0 �(' 0 % ����� (52)

­ e� t�� v Í w @$� Í �(' Í % ­ e!Dt�" v Í w e� # t�� v ! w h&% v Í w @$� # ! �!' ! % �Clearly, if we canshow this, thenby identifying the messagek � l Í �!' Í %#"BAC2D �9@ � Í �(' Í %�% , we will recover our desiredbeliefequations.

So what do the Lagrangemultipliers @$� Í �!' Í % constrain?Theansweris thatthey imposetheconstraintÔ Í � Í �(' Í % ­ e6 t/E v Í w h v �GF E v � w(w Ô�6 eg/.�h�g Ï �^6³�!'n6 %C"/þ � (53)

In words,theLagrangemultiplier @$� Í constrainstheweightedbelief in region Ë plusthesumof theweightedbeliefsin all theancestorregionsof region Ë , exceptfor regions 7 andall itsancestors,to beequalto zero. If we make a LagrangianusingtheseLagrangemultipliers,it is straightforwardtowork outthatits stationarypointsaregivenby equation(52).

Now we needto show thatthenew setof @ Lagrangemulti-pliersandtheir associatedconstraintsareequivalentto theoldsetof ñ Lagrangemultipliersandtheirconstraints.Wefirst notethatbecauseÔ Í ­ � 6 t/E v Í w Ô 6 " + , and Ô� ­ � 6 t/E v � w ß 6 "+ , wecansubtractthesetwo equationsandobtainÔ Í ­ e6 t/E v Í w h v �HF E v � w(w Ô^6 "3þ (54)

If we start with the ñ;� 8 �(' 8*% constraintsthat � 8 �(' 8�% "� g 0 h�g < �I�H�('$� % for everypairof parentandchild regions,we

11

canuseequation(54) asa basisfor deriving theconstraintsas-sociated

dwith the @ Lagrangemultipliers. It is alsoalwayspos-

sibleto go in theotherdirection:The @ constraintswill belin-earlyindependent,sothatif webegin with them,wecanderivethe ñ constraints[28]. (This is wherethe proof breaksdownif thereareregionswith countingnumberÔ Í "ÿþ ; the @ con-straintsmaybelinearlydependentin thatcase.)ü

Note that we have not given a generalformula relating thenew @ Lagrangemultipliers to the ñ Lagrangemultipliers, aswe only needto show the existenceof a rotationto a new setof Lagrangemultipliers,without constructingit explicitly. It isdifficult to derivea generalformularelatingthetwo setsof La-grangemultipliers,but for regiongraphswith only two “genera-tions” of regionslikethoseconstructedusingthejunctiongraphmethod(seeappendixA), we canin fact give the relationshipexplicitly: ñ � Í �(' Í %C" e�$# t�� v Í w h � @ � # Í �(' Í % � (55)

VI I I . DETAILED EXAMPLE OF A GBP ALGORITHM

Fig. 9. A factorgraphthat we will usefor our detailedexampleof how toconstructaGBPalgorithm.

We will now give a detailedexampleof how to constructa GBP algorithm. Considerthe factor graphdrawn in figure9, which hasseven variablenodesand ten factornodes. Forthis factorgraph,it is convenientto slightly alter our labelingconventionssothatsomeof thefactornodes(theonesattachedto asinglevariablenode)arelabeledwith anumberratherthana letter. This factorgraphcorrespondsto the joint probabilitydistribution

� �(�E �����G����������� � %C" +- J �.� � 2 ���(�>� %LK ����� (56)2�6 �!� ��� � ��� @ ��� � %�2�7 �(� ��� � ��� A ��� �;%�2�8 �(� ��� @ ��� A ����� %We will work out a GBPalgorithmmakingno assumptions

aboutthe actualforms of the functions,but we note that thisparticularfactorgraphcanbeusedto representtheprobabilitydistributionthatoccurswhendecodingablockerror-correctingcode[20]. In particular, if eachof thevariablenodesis binary,with possiblestates0 or1, andthefunctions2�6 , 2:7 , and 2�8 areparity-checkfunctions(equalto + if thesumof theirarguments

areeven,and þ otherwise),thenthis factorgraphcorrespondsto the linear block �Qçå��â���ä % Hammingcodewith parity checkmatrix ¨ " ôõ + + + þ + þ þ+ + þ + þ + þ+ þ + + þ þ + ö÷ � (57)

For the decodingproblem,the functions 2�� �!� �`% representthelikelihoodsof thepossiblestatesof thebits, in light of the re-ceivedblockfrom thechannelandtheassumedchannelmodel.

Fig. 10. A region graphobtainedfor the factorgraphof figure 9 usingtheclustervariationmethod.

To obtaina GBPalgorithm,we first needto createa regiongraph.Weusetheclustervariationmethod,with largestregions� 2 6�� 2 �� 2 �=� 2 @:� 2 �=� + �1àå�Yä��Yãå� , � 2 7H� 2 � 2 �:� 2 A=� 2 ��� + �Yà���â���æå� and� 2 8D� 2 � 2 @�� 2 A=� 2 � � + ��äB��âB�Zç¡� . Following the clustervariationmethodprescriptionfor finding intersectionregionsdetailedinappendixB, weobtaintheregiongraphshown in figure10.

Now that we have a region graph,we needto choosewhatkind of GBP algorithmwe want to useand then write downthe belief andmessageequationsfor the GBP algorithm. Wechooseto usetheparent-to-childalgorithm.

Notethatalthoughtheregiongraphfreeenergy is usefulfortheoreticallyjustifyingaGBPalgorithm,it will notbenecessaryfor constructingthe algorithm. Instead,we canwork directlywith thebeliefequations.

Recallthatin theparent-to-childalgorithm,weonlyhaveonekind of messagek�� l Í �(' Í % from a parentregion to a childregion. Eachregion Ë hasa belief � Í �(' Í % givenby equation(44)whichwe re-writehere:

� Í �!' Í % � .0 t 6 Ï 2 0 �(' 0 %ôõ .� t�� v Í w k � l Í �(' Í % ö÷ �����

����� ôõ .!Dt�" v Í w .�$# t�� v ! w h&% v Í w k � # l ! �!' ! % ö÷ �(58)

In words,this equationsaysthat the belief at eachregion is aproductof the local factorsin that region, the messagesfromparents,andthe messagesinto descendantregionsfrom otherparentswhoarenotalsodescendants.

In our region graph, we have seven regions that can begroupedinto threetypesof regions: the threeregions exem-plified by � 2�6 � 2= � 2�� � 2�@ � 2M� � + �1àå�Yä��Yãå� that containfive factor

12

nodesand four variablenodes;the threeregions exemplifiedby � 2G � 2�� � + �YàK� thatcontaintwo factornodesandtwo variablenodes;and the single region � 2= � + � that containsone factornodeandonevariablenode.

We will usean abbreviatednotation,droppingexplicit ' Ídependence,for beliefsandmessagesandfactorfunctions.Thenotationis bestexplainedwith someexamples:wewrite �����@N� ,���� and �� for the beliefs at the regions listed in the previ-ousparagraph;we write k @&� l �� for themessagefrom region� 2�6 � 2= � 2�� � 2�@ � 2M� � + �Yà���äB�YãK� to region � 2= � 2:� � + �Yàå� , k � l forthemessagefrom region � 2= � 2�� � + �1àK� to region � 2= � + � , andweabbreviate 2�6 �(� ��� � ��� @ ��� �;% as 2�6 .

In this abbreviated notation, the belief equationsfor thelargestregionswill be� ��Y@&�F�·2�6W2=^2��;2�@2�� k AN� l �� k A � l �@ k A l � (59)� ���AO�F�·2:7�2=�2���2 A 2M� k @&� l �� k A � l uA k @ l � (60)

and � �@�A � �ª2�8C2=^2:@2 A 2 �^k �&� l �@ k �&� l uA k � l � (61)

Note thatsincetheseregionsdo not have parents,all the rele-vant messagesare into descendantregionsfrom otherparentswhoarenotdescendants.

The belief equationsfor the intermediate-sizedregionswillbe ���� �·2 2 �^kÖ@N� l ��^k±AO� l ��k±@ l 1k±A l �� (62)���@ �/2 2 @�k±�N� l u@kÛA � l u@�k±� l ^kÛA l (63)

and ��uA �·2 2 A�kÖ�N� l uA�kÖ@ � l uA;k±� l 1kÖ@ l �� (64)

Finally, thebeliefequationfor theregion � 2G � + � will be�� �·2 1kÖ� l ZkÖ@ l 1k±A l � (65)

The message-updaterulesareobtainedby combiningthesebelief equationswith the marginalizationconditionsbetweenparentandchild regions:� 8 �(' 8�%C" eg 0 h�g/< �I�H�('$� % � (66)

For example,requiringconsistency betweenthe beliefsat theregion � 2= � + � andtheregion � 2= � 2�� � + �Yàå� tells usthat� �(� %*"ªe { � � u� �(� ��� �% (67)

from whichweobtainkÖ� l qx" e {;� 2 �^kÖ@N� l ��kÛAO� l u�G� (68)

The othermessage-updaterules,obtainedin the sameway(or equivalentlyby usingequation(48),will bekÖ@ l qx" e { � 2 @^kÖ�N� l �@kÛA � l u@G� (69)

k±A l qx"ªe {� 2 A�kÖ�N� l uA;k±@ � l `A¡� (70)

k @ l k @N� l ��&qx" e{ � 5 {�P 2�6W2�@;2M� k A � l �@ � (71)

k � l k �N� l �@&qx" e{ � 5 {�P 2�6W2��;2M� k AN� l �� � (72)

k A l k AN� l ��&qx" e{ � 5 {�Q 2�7�2�A;2M� k @ � l uA � (73)

kÖ� l ZkÖ�&� l uA qx" e{;� 5 { Q 2 7 2 � 2 �ZkÖ@N� l ��G� (74)

k±A l Zk±A � l �@ qx" e{� 5 {SR 2 8 2 A 2 � kÖ�N� l uA=� (75)

and k @ l k @ � l uA&qx" e{;� 5 {SR 2�8C2�@;2 �^k �N� l �@ � (76)

In practice,it oftenhelpsconvergenceto only stepthemes-sagespart-way to their newly computedvalues. This simpleheuristiccaneliminate“over-shooting”problems.

Wenotehereonepotentialpracticalpitfall to avoid whenus-ing inertia. Let ussupposethatwe have a setof old messages�k ¦�¤ T � , whichweusein theupdateequationsto calculateasetof messages�k U�V T�W��!� � , andthatwe want to setour new mes-sagesto behalf-waybetweentheold messagesandtheupdatedmessages:�k X �ZY � " � �;k ¦Y¤ T � ­ � �;k[U\V T�W��!� � . Werecommendwhenusinganupdateequationwith morethanonemessageonthe left handside, that all thosemessagesare k[U\V T\W��\� equa-tions. Mixing in k X �ZY or k ¦Y¤ T messageson theleft handsideempirically often resultsin poor convergenceproperties.Forexample,the updateequation(71) givenabove shouldexplic-itly be k U\V T\W��\�@ l k U�V T�W��!�@&� l �� qx" e{ � 5 {�P 2�6W2�@;2M� k ¦�¤ TA � l �@ � (77)

Fortunately, it is alwayspossibleto schedulethemessageup-datessothatonecomputestheupdatedmessagesinto thesmall-estregionsfirst (e.g. messageslike k U\V T\W��\�@ l ), so that they areavailablewhenneededto computethe updatedmessagesintolargerregions.

Therearemany otherdetailsthatcanbehandledin differentwaysin iteratingthemessageupdateequations.For example,themessagescanbeinitialized in any way onelikes;two pop-ular choicesarerandomor uniform messages.The algorithmtypically terminatesaftera fixednumberof iterations,or aftersomeconvergencecriterion is satisfied,but other terminationconditionsarepossible. In a decodingapplication,one typi-cally checksat eachiterationwhetherthe thresholdedbeliefscorrespondto a code-word, andterminatesthedecodingalgo-rithm if they do, stoppingotherwisewhensomefixednumberof iterationshaspassed.

IX. DISCUSSION

In this paper, we have presenteda generaltheory, basedonregion graphs,for constructinggeneralizedbelief propagation(GBP)algorithms.Region graphspermiteasyvisualizationofthestructureof GBPalgorithms–messagesarealwayssentbe-tweentheneighboringregionson thegraph.For region graphs

13

that have no cycles, the GBP algorithmsareexact. We havealsoseenÑ thatthefixedpointsof theGBPalgorithmalwayscor-respondto thestationarypointsof anapproximateregion-basedfreeenergy, sothatevenwhentheregiongraphhascycles,GBPalgorithmsseemto bedo somethingreasonable.ThestandardBPalgorithmturnedoutto beaspecialcaseof aGBPalgorithmobtainedwhentheregion graphis constructedusingtheBethemethod.

Given a factor graphand limited computationalresources,a key remainingproblemis how to choosean“optimal” regiongraph–i.e.onethatgivesthemostaccurateresultswith theleastcomputationaleffort. Welimit ourselveshereto suggestingtwosensibleheuristics.First, it is wiseto try to collecttheshortestcyclesin a factorgraphinto regions,sothatthey arehandledasaccuratelyaspossible.Second,in orderthat the region graphfreeenergy beasaccurateaspossible,oneshouldtry to maketheregiongraphresembleatree–thatis, oneshouldavoid shortcyclesin theregiongraph.

Wehavepreviouslyreportedpromisingnumericalresultsob-tainedusingGBPalgorithmsfor inferenceon randomMarkovRandomFields [17] and for decodingerror-correctingcodes[39].

ACKNOWLEDGEMENTS

We thankDave Forney andRobertMcEliecefor helpful andencouragingdiscussions,andDavid MacKayfor hiscommentsonadraft of thispaper.

APPENDIX A: THE JUNCTION GRAPH METHOD

A naturalideato generalizetheBetheMethodis to keepthenotionthat Ó shouldbetheunionof a setof largeregions Ó Üanda setof small regions Ó b , but to let the regionsin Ó Ü orÓ b containmorenodes.The junction graph method, that wedescribehere,exploits this idea,andis basedon a generaliza-tion of the “junction graphs”that wereintroducedby Aji andMcEliece[27].

We definea junction graph to be a labeledbipartitegraph� " � Ì Ü � Ì b �����Yò % in which thereare large vertices(corre-spondingto largeregions) �M]�Î Ì Ü , smallvertices(correspond-ing to small regions) �/^ Î Ì b , and directededges(or arcs)� Î � connectinglargeverticesto smallvertices.Theverticesin the junctiongrapharelabeled,andthe label of vertex � � isdenotedòF��� ��% . The labelswill besubsetsof a setof indices �representingfactoror variablenodesof a factorgraph.

For thegraph�

to beconsidereda junctiongraph,we insistupontwo conditions.First, if �/^ is a smallvertex neighboringthe _ largevertices� ] ø �&� ] � ��������&� ]a` , thenwerequirethat òF�9�/^ % isa subsetof eachof òF�9� ] ø % ��òp��� ] � % ������������òF�9� ]a`:% , or equivalently,that òF�9� ^ %�b òp���M] ø %�c òp���M] � %�c ����� c òF�9�M]a` % � (A-1)

Secondly, we requirethat for any index ?FÎd� , thesubgraphof�consistingonly of theverticeswhichcontain? in their labels,

is a connectedtree.The“junction graphs”introducedby Aji andMcEliece[27]

are a specialcaseof thosedescribedhere. In their junctiongraphs,small verticeswere restrictedto have preciselytwo

neighboringlargevertices,sothatthesmallverticescanbein-terpretedaslabeled“edges”betweenthe large vertices. Theyfurtherrequiredthatsmallregionlabelsnot includeany indicesrepresentingfactornodes.

Givena setof regions Ó 6 � " Ó�Ü�é ÓÛb thatareorganizedinto ajunctiongraph,wecanalwaysobtainavalid region-basedapproximationby definingasetof countingnumbersÔ Í asfol-lows. For all regions Ë Î/Ó�Ü , we let Ô Í " + , while for allregion Ë Î¸ÓÛb , we let Ô Í " + � î Í where î Í is the de-gree(numberingof neighboringlargeregions)of region Ë . Itis throughthis prescriptionthat thearcsthejunctiongraphbe-comerelevant–asmall region’s contribution to the freeenergyis subtractedout from thatof a largeregion only if thetwo re-gionsareconnectedby anarc. It is straightforwardto confirmthatthis prescriptionfor thecountingnumbersgivesusa validregion-basedfreeenergy approximation,asthe junctiongraphcondition that the sub-graphfor eachvariableor factor nodeis a treeguaranteesthat eachvariableandfactornodewill becountedonceasrequiredin equation(26).

Aji andMcElieceproveda theoremthat tells us that givenanysetof largeregions Ó Ü thatcontainall thefactorandvari-ablenodesin a factorgraph,wecanfind acorrespondingsetofsmallregionsÓ b andorganizetheregionsin Ó[6 � " Ó Ü éFÓ binto a junctiongraph.Their theoremgeneralizeswithout diffi-culty to ourversionof junctiongraphs.

As an example,considerthe factor graphwhich we intro-ducedin themaintext andre-draw in figure11. We couldtakeasoursetof largeregionsÓ Ü thefour regions �;Ýo�1ß�� + �Yàå��âB�1ãK� ,�;Þ�� ® �Yà���äB�Yãå�Yæå� , ��ß�������âB�1ãå�1çå��è�� , and � �D�Yã���æ��Yè��Yáå� . Anacceptableset of correspondingsmall regions ÓÛb would be� àå�1ãK� , ��ß���â��YãK� , � ãå�Yæå� , and �;è�� , with a junction graph asshown in figure11. Becausein this caseeachof thesmall re-gionsis connectedto two largeregions,they would eachhaveancountingnumberof � + .

Fig. 11. A junctiongraph(on theright) for thefactorgraphon theleft.

Thesetof regionsgivenby theBethemethodcanalsoalwaysbeorganizedinto a junctiongraph(thoughnot necessarilytherestrictedAji-McEliece versionof a junction graph);usingasanexamplethesamefactorgraph,theresultingjunctiongraphis shown in figure12. It is obviousfrom thisexamplethatthere

14

Fig. 12. A junction graphfor the factorgraphshown in figure 3 generatedusingthe Bethemethod. Note the isomorphismbetweenthis junction graphandtheoriginal factorgraph.

will always be a one-to-oneisomorphismbetweenthe origi-nal factorgraphandthecorrespondingjunctiongraphobtainedfrom theBethemethod.

Thejunctiongraphapproximationfor theGibbsfreeenergyis �e6 � �`��� Í � %C" © 6 � �`��� Í � %)� ¨f6 � �`��� Í � % � (A-2)

where©g6 � �u� � Í � %C" eÍ t Òh� © Í �Q� Í % ­ eÍ t Ò i � + ��î Í %Y© Í �\� Í % � (A-3)

and¨ 6 � �u� � Í � %C" eÍ t Ò�� ¨ Í �\� Í % ­ eÍ t Ò i � + � î Í % ¨ Í �Q� Í % �(A-4)

Junctiongraphsarea specialcaseof region graphs,wherethereare only two “generations”of regions. It follows thatminimizingthejunctiongraphfreeenergy � 6 � will onceagaingive beliefs � � Í � that areequivalentto thoseobtainedfrom amessage-passingBP algorithm. That algorithmis sometimesknownasthegeneralizeddistributivelaw [24]. Againit followsasacorrollaryof ourmoregeneralresultsfor regiongraphsthatthejunctiongraphapproximationto theGibbsfreeenergy willbeexact,andthegeneralizeddistributivelaw will giveexactre-sults,whenthejunctiongraphis atree.In thatcase,wecancallthejunctiongrapha junctiontree, andthegeneralizeddistribu-tive law reducesto thefamousjunctiontreealgorithm.

Ourjunctiontreesareactuallyaslightgeneralizationof whatis normallycalleda“junction tree,” in thatweallow separators(i.e., the small regions) to neighbormore than just two largeregions. We cangeneralizethewell-known result[13] for thejoint probabilityfunctionin junctiontreestoourcaseandobtain

� �!' %C" � Í t Ò � � Í �(' Í %� Í t Ò i � Í �!' Í % ð Ï � � (A-5)

To obtainthis result,we notethat while we have describedregion graphsandjunctiongraphsasdirectedgraphs,from the

pointof view of statisticalgrphicalmodels,they areequivalentto undirectedgraphs. In particular, one can re-write the fulljoint probabilitydistribution � �!' % for a factorgraphin theform�)�(' %*" +- .v Í b w/i Í b �(' Í ��' b % . Íkj Í �!' Í % (A-6)

where �\Ë&a % denotespairsof connectedregionsin a given re-gion graph for that factor graph. Specifically, when we setj Í �!' Í %¼"ml � 0 t 6 Ï 2 0 �(' 0 %&n Á Ï and i Í bc�(' Í ��'cb % equalto 1if ' Í is consistentwith 'cb andequalto 0 otherwise,this formof thejoint probabilitydistributionwill beequivalentto theonein the original factorgraphform. Sincethe formula (A-5) istruefor pairwiseMarkov RandomFieldswhenthesetof nodesin Ó Ü areseparatedby the setof nodesin Ó b , andwe haveshown how to convert a region graphinto an equivalentpair-wise Markov RandomField, we have justified using formula(A-5) for regiongraphsaswell.

APPENDIX B: THE CLUSTER VARIATION METHOD

Anothermethodfor selectinga valid setof regions Ó andcountingnumbersÔ Í is theclustervariationmethodintroducedby Kikuchi in 1951 and further developedin the physicslit-eraturesincethen [19]. The main featuredistinguishingthismethodfrom the junction graphmethodis that Ó may be theunionof morethanjust two generationsof regions.

In the clustervariationmethod,we begin with a setof dis-tinct largeregions Ópo suchthatevery factornode 4 andeveryvariablenode ? in our factorgraphis includedin at leastoneregion Ë Î Ópo . We alsorequirethat no region Ë ÎªÓpo bea subregion of any otherregion in Ópo . We thenconstructthesetof regions Ó by formingall possibleintersectionsbetweenregionsin Ópo , but discardingfrom Ó any intersectionregionsthat aresub-regionsof other intersectionregions. If possible,we thenconstructin thesameway thesetof regions Ó�� fromtheintersectionsbetweenregionsin Ó± . As long astherecon-tinue to be intersectionregions, we constructsetsof regionsÓ�@:�uÓ�A¡������� Ó ¾ in thesameway. Finally, thesetof regionsusedin theclustervariationmethodwill be Ó " ÓpoBéDÓ é&�����!éHÓ�¾ .

We define the counting numbersin the cluster variationmethodto be Ô Í " + � eb t:ë v Í w Ô b (B-1)

whereì��!Ë % is thesetof all regionswhicharesuper-regionsofregion Ë .

Returningto ourexamplefactorgraphdrawn in figure3, wecan choosethe baseset of regions Ó o to consistof the fourregions � Ýo�1ß�� + �1àå��âB�Yãå� , �;Þ�� ® �Yà���äB�Yãå�Yæå� , � ß��Y����â��Yãå�ZçK�Yèå� ,and � ® ���D�1ãå�Yæ���èB��á�� . Oncethesetof baseregions Ó o is cho-sen,thereis no furtherchoicein theclustervariationmethod.In our case,the set of intersectionregions Ó± would be theregions � à��YãK��� ß���âB�1ãK� , � ® �Yãå�Yæå� , � ã���è�� , andthesetof inter-sectionregions Ó � wouldbe � ãK� .

Eachof theregions ËºÎ±Ó o wouldhaveancountingnumberÔ Í " + . Becauseeachof theregions ˸Î#Ó± is thesubregionof two regionsin Ó o , they eachhave an countingnumberofÔ Í " + � à "�� + . Finally sinceevery region in Ópo and Ó isasuper-regionof � ãK� , its countingnumberis + � â ­ â " + .

15

We canrepresentthis setof regionsandcountingnumberswith theÑ regiongraphshown in figure13.

Fig. 13. A regiongraphgeneratedusingtheclustervariationmethod.

Note that theBetheapproximationwill bea specialcaseoftheclustervariationmethodif andonly if no factornodesharesmorethanonevariablenodewith anotherfactornode(or equiv-alently, thereareno cyclesof lengthfour in the factorgraph.)The factorgraphshown in figure 13 is thereforeoneexampleof a factorgraphfor which theBetheapproximationcannotbegeneratedby theclustervariationmethod.

We remarkthatin thephysicsliterature,theclustervariationmethodhasnormallybeenappliedto a restrictedclassof fac-tor graphsthatareparticularlyrelevantasmodelsof magneticmaterials.In particular, thefactorgraphnormallyrepresentsatranslationallyinvariantcrystallattice,andthefactornodesnor-mallyhavedegreetwo,correspondingto two-bodyinteractions.Translationalsymmetryin the factorgraphoften dramaticallysimplifiestheproblemof minimizing theKikuchi freeenergy,andwhenthefactornodeshave degreetwo, theBethemethodwill alwaysbea specialcaseof theclustervariationmethod.

APPENDIX C: RELATIONSHIPS BETWEEN DIFFERENT

METHODS

In this appendix,we summarizethe relationshipsbetweenthedifferentmethodsfor generatingvalid setsof regionsfor aregion-basedfreeenergy approximation.Firstof all, asis clearfrom its definition, a junction graphwill always be a regiongraph(thoughtheconverseis not true).Thesetsof regionsandcountingnumbersgeneratedby theclustervariationmethodcanalsoalwaysberepresentedby a region graph.We alreadysawoneexamplein figure13.

We emphasizethatonecanconstructregion graphapproxi-mationsthatcannotbegeneratedwith eitherthejunctiongraphor clustervariationmethods.We alreadysaw suchanexamplewhenwe introducedregion graphsin the main text in sectionVI. Constructionsthataremoregeneralthanthoseconstructedusingtheclustervariationmethodor thejunctiongraphmethodmaybeusefulfor a varietyof reasons,including reducingthecomputationalcomplexity of aGBPalgorithm.

Fig.14. For this factorgraph,thechoiceof regions ÃNq N Ä NQÅ;N\ÊÆ , Ã1Ç N Ä NQÉ;NZr�Æ ,à È)N\Å;NQÉ;NZs�Æ , and Ã�Ä N�Å;N\É�Æ , with correspondingcountingnumbersof Ä , Ä , Ä ,andt Ä , will giveavalid region-basedapproximationthatcannotberepresentedbya regiongraph.

Note,however, thatalthoughtheregiongraphmethodis themostgeneralmethodwe have introduced,theredo exist validregion-basedfreeenergy approximationsthatdo not have a re-gion graphrepresentation.We demonstrateanexamplein fig-ure14.

Fig.15. A Venndiagramillustratingtherelationshipsbetweendifferentmeth-odsof generatingvalid region-basedfree energy approximations.The Bethemethodis alwaysanexemplarof thejunctiongraphmethod,but is only a spe-cial caseof theclustervariationmethodif thefactorgraphhasnopairof factornodesthatsharemorethanonevariablenode,andis only a specialcaseof AjiandMcEliece’s junctiongraphmethodif therelevant factorgraphis a Forney“normal” graph(novariablenodeis connectedto morethantwo factornodes).

In summary, we have the following relationships,as illus-trated in the Venn diagramof figure 15. For a given factorgraph,the clustervariationmethodand the generalizedjunc-tion graphmethodeachgeneratevalid region-basedfree en-ergy approximationsthataresubclassesof all thepossiblevalidapproximations.Neither the clustervariationmethodnor thegeneralizedjunction graphmethodis more generalthan theother, andbotharesubsumedby themoregeneralregiongraphmethod. The setof regionsgeneratedby the Bethemethodisalwaysan examplarof thosegeneratedby the junction graphmethod,andwill beanexamplarof thosegeneratedby theclus-tervariationmethodif andonly if thefactorgraphhasnocyclesof lengthfour. In general,theBethemethodwill not bea spe-

16

cial caseof theAji-McEliece junctiongraphmethod,thoughitwill befor

ufactorgraphssuchthateachvariablenodeis adjacent

to no morethantwo factornodes(Forney’sso-called“normal”factorgraphs[21]).

In addition to being a more generalmethodthan the clus-ter variationmethodor thejunctiongraphmethod,we feel thattheregion graphmethodis easierto understandon anintuitivelevel. We simply selecta set of regions and countingnum-berssuchthateveryvariableandfactornodegetscountedonce,and suchthat we can enforceconsistency for the belief overany variablenode,no matterwhich region we choose.Regiongraphsalso have the importantadvantageof being a naturalgraphicalstructurefor describinggeneralizedbelief propaga-tion algorithms.

PakzadandAnantharamhavesuggestedstrengtheningthere-giongraphrequirementsdescribedin sectionVI sothatfor ev-ery sub-setof variablenodesin thefactorgraph,thesub-graphof regionscontainingthatsub-setmustbeconnectedandmusthave a sum of countingnumbersequal to one [29]. Suchastrengtheningwould ensurethat the beliefscomputedfor anysub-setof nodeswould beconsistent,no matterwhich regionswereusedto computeit. Theclustervariationmethodproducesregion graphsthat satisfythesestrongerrequirements,but wechosenot to insist on thesestrongerrequirementsin general,becauseregiongraphscreatedusingtheBetheMethodwill notnecessarilysatisfythem.

APPENDIX D: THE CHILD-TO-PARENT ALGORITHM

The observation underlyingthe “child-to-parentalgorithm”is that when we minimize the Bethe free energy, the La-grangemultipliers enforcing the marginalization constraintscorrespondexactly(afterexponentiation)to the m � lo0 �!� �`% mes-sagesfrom variablenodesto factornodesin theBP algorithm.Consideringthesemessagesasmessagesfrom child regionstoparentregionsin a region graph,we cantry to generalizetheapproachto arbitraryregiongraphs.Thus,weconstructa GBPalgorithmby simplyminimizingaregiongraphfreeenergy andidentifying Lagrangemultipliers that enforceconsistency be-tweenbeliefswith messagesfrom child regions to parentre-gions. Suchan approachwasconsideredin detail by KappenandWiegerinckfor regiongraphsconstructedusingtheclustervariationmethod[37].

We begin with thestationarypoint equationsobtainedfromdifferentiatinga Lagrangianò that representsa region graphfree energy � Ò �`� � Í � % with beliefs � � Í � that areconstrainedto beconsistentwith their neighborson the region graph. Weobtainedthis equationpreviously (seeequation(51)), andre-write it here:Ô Í  �¢ � Í �(' Í %C" = Í ­ Ô Í e0 t 6 Ï  �¢ 2 0 �!' 0 % ������ e� t�� v Í w ñ � Í �!' Í % ­ e8 t�> v Í w ñ Í 8p�('n8 % � (D-1)

where '��!Ë % is thesetof regionsthatareparentsof region Ë ,and ?C�!Ë % is thesetof regionsthatarechildrenof region Ë , andñ�� Í �(' Í % aretheLagrangemultipliersthatenforceconsistencybetweenthebeliefsin region 7 andthosein region Ë .

For Ô Íwv"/þ , wecanre-writethisequationas

� Í �!' Í %*� .0 t 6nÏ 2 0 �(' 0 %J � 8 t�> v Í w mn8 l Í �('n8 %� � t�� v Í w m Í l �H�!' Í % K ��1Á Ï �

(D-2)where m 8 l �H�(' 8³%Ö"xAC2D �\ñ�� 8 �(' 8*%�% is a “message”from achild region ß to a parentregion 7 , in analogywith themes-sagesm � lo0 �(� ��% in standardBP. If Ô Í "·þ , wedonotgetacon-dition on � Í �(' Í % ( � Í �!' Í % canstill bedeterminedfrom beliefsin super-regionsvia themarginalizationconditions);insteadweobtainthefollowing conditionon themessagesinto andout ofregion Ë : J � 8 t�> v Í w mn8 l Í �('n8 %� � t�� v Í w m Í l �½�!' Í % K " + � (D-3)

The messageupdaterules are then obtainedby applying themarginalizationconditions� 8 �!' 8�%C" � g 0 h�g/< �I�H�('$� % .

A small examplemight help clarify the meaningof theseequationsfor thereader. Considertheprobabilitydistribution� �!�<:�����G���>@ %*",+- 2 6��!�<:����� %�2 7F�(�>�=����@ % � (D-4)

We usetheBetheapproximation,whichshouldbeexactin thiscasebecausethe factorgraphis a tree. Thus,we obtainlargeregions �;Ý�� + �YàK� and �;Þ��Yà���äå� , with countingnumbers+ , andsmall regions � + � , � àå� , and �;ä�� , with countingnumbersþ , + ,and þ respectively. Weobtainthefollowing beliefequationsfortheregionswith Ô Íwv"/þ :� 6 �(� ��� �%��/2�6 �(� ��� � % m l 6 �(� �% m � l 6 �!� � % � (D-5)�Z7F�!���G����@ %*�/2 7F�(�>�=����@ % mn� l 7F�!��� % mn@ l 7��(��@ % � (D-6)� � �(� �;%C� m � l 6 �(� � % m � l 7 �(� � % � (D-7)

andthe following conditionson messagesfor theregionswithÔ Í "·þ : m l 6D�(�< %C" + � (D-8)

and m @ l 7 �(� @;%C" + � (D-9)

Using theseconditionsandthemarginalizationconditions,wefind that mn� l 6H�(��� %C" e {� 2 7p�(�>�=����@ % � (D-10)

and mn� l 7p�!��� %C" e { ø2 6D�(�E������ % � (D-11)

We cannow easilycheckthat in this example,the computedbeliefsgivebackthedesiredmarginalprobabilitiesexactly.

The child-to-parentalgorithm, by its construction,clearlygivesageneralizedBPalgorithmwhosefixedpointscorrespondto thestationarypointsof theregiongraphfreeenergy. On theotherhand,it mightbeconsideredinelegantbothbecauseit fo-cusesonly onthemessagesfrom child regionsto parentregionsand becausethe messageupdateequationswill inevitably becomplicatedand involve the countingnumbersÔ Í . The two-wayalgorithmdescribedin AppendixE andtheparent-to-childdescribedin the main text in sectionVII-A aredifferentGBPalgorithmsthatattemptto amelioratetheseflaws.

17

APPENDIX E: THE TWO-WAY ALGORITHM

To motivatethetwo-wayalgorithm,wereturnto thestandardBP algorithm,wherewe recall that thebelief equationscanbewritten in theform�Z���(��� %*" .0 t �pv���w k 0^l �Y�(�>� % (E-1)

and � 0 �(' 0 %C"ª2 0 �(' 0 % .� t �pv 0 w m � lo0 �!� ��% (E-2)

where mn� lo0 �!��� %C" .s�t �pv���w h 0 k s l �Y�(�>� % � (E-3)

Giventheseequations,it is naturalto aimfor ageneralizationwherethebeliefequationswill have theform� Í �(' Í %C"zy2 Í �(' Í % .8 t�> v Í w m 8 l Í �(' 8�% .� t�� v Í w k � l Í �('$� % �

(E-4)In otherwords,we aim to write thebelief equationsso that

the belief in a region is a productof local factorsand mes-sagesarriving from all theconnectedregions,whetherthey areparentsor children. It will turn out that we can do this, butin order that the GBP algorithm be correspondto the regiongraphfree energy, we will needto usemodifiedfactorsandarathercomplicatedrelationbetweenthe m 8 l �H�(' 8�% messagesand k�� l 8 �('$� % messagesgeneralizingthe relation for stan-dardBP givenin equation(E-3).

It will beconvenientto denotethenumberof parentsof re-gion Ë by � Í , anddefinethenumbers{ Í ° � + � Ô| % � ��| and} Í ° + � �\à � {�| % . Whena regionhasnoparentsothat � Í "·þand Ô Í " + , we take { Í " } Í " + . Notethatwithin theBetheapproximation,{ Í " } Í " + for all regions. We will assumethat { Íkv" à so that

} Íis well-defined(normally, if onehasa

regiongraphwith aregionsuchthat { Í " à , oneshouldbeableto changetheconnectivity of Ë to avoid thisproblem).

We first definethesetof pseudo-messagesfor all regions Ëandtheirparents7 andchildren ß :m o Í l � �!' Í %*" (E-5)y2 Í �(' Í % .� # t�� v Í w h � k � # l Í �!' Í % .8 t�> v Í w mn8 l Í �('n8 %andk o Í l 8 �(' 8³%C" (E-6)eg Ï h�g < y2 Í �(' Í % .� t�� v Í w k�� l Í �!' Í % .8 # t�> v Í w h 8 m 8 # l Í �(' 8 # % �where y2 Í �(' Í %C° l � 0 t 6�~ 2 0 �(' 0 % n Á`Ï .

Aside from the fact that we raisedthe productof the localfactorsto a power of Ô Í , thesepseudo-messagesarewhatonewould naively expectthemessageupdatesto look like. To ob-tain the truemessageupdates,however, oneneedsto combinethe pseudo-messagesgoing in the two directionsof a link asfollows:m Í l �H�(' Í %*" l!m o Í l � �(' Í % n�� Ï l!k o � l Í �!' Í % n�� Ï � (E-7)

andk � l Í �!' Í %C" l\m o Í l � �!' Í % n� Ï � l\k o � l Í �(' Í % n� Ï (E-8)

Notethatwhen} Í " + , themessagesarepreciselythesameas

thepseudo-messages.Thetwo-wayalgorithmis completedby thebeliefequations,

which have the form alreadygivenin equation(E-4). We nowclaim that the above setsof messagesand beliefs are fixedpointsof two-wayGBPif andonly if they arestationarypointsof theregiongraphfreeenergy.

Proof: We form a Lagrangianfrom the region graph en-ergy asalreadyindicatedin the previoussectionon the child-to-parentalgorithm. If we exponentiateequation(51) derivedthere,weobtaintheequation

� Í �(' Í % Á`Ï � y2 Í �(' Í % .8 t�> v Í w ��ó Ï < v g/< wôõ .� t�� v Í w ��ó 0 Ï<v g Ï�w�ö÷ �

�(E-9)

Supposethatwearegivenasetof ñ and � Í thatsatisfythesestationaryconditionsof theLagrangian.Now wedefinem Í l �½�(' Í %C" ��ó 0 ÏEv g Ïåw (E-10)

and k � l Í �!' Í %C" � Í �!' Í %�� Ï �G� ó 0 Ï<v g Ï�w (E-11)

Of course,we have one k messageand one m messageforevery Lagrangemultiplier ñ , so for thesedefinitionsto hold,we also need to have constraintsrelating the k ’s and m ’s.Theconstraintswill begivenby thedefinitionsof thepseudo-messagesand the relations betweenthe messagesand thepseudo-messagesthatwe definedabove. We wantto show thattheserelations,aswell as the two-way GBP belief equationspreviouslydefined,musthold.

First,weshow thatthebeliefequations(E-4)hold. We have� Í �(' Í % Á Ï � y2 Í �(' Í % .8 t�> v Í w � ó Ï < v g < w .� t�� v Í w � � ó 0 Ï v g Ï w� y2 Í �(' Í % .8 t�> v Í w mn8 l Í �!'n8 % .� t�� v Í w�� 2 Í �!' Í %� Í �(' Í %�� � Ï k � l Í �!' Í %� �\� Í �!' Í %�% � � Ï � Ï y2 Í �!' Í % .8 t�> v Í w mn8 l Í �('n8 % .� t�� v Í w k � l Í �!' Í %� �\� Í �!' Í %�% Á`Ï � y2 Í �(' Í % .8 t�> v Í w m 8 l Í �(' 8³% .� t�� v Í w k � l Í �(' Í %sothatindeed� Í �!' Í % is productof localpotentialsandincom-ing messages.

Turning to the constraints,we have from the definition ofm o Í l � �(' Í % , thatm o Í l � �(' Í % k�� l Í �!' Í %*" � Í �!' Í % (E-12)" eg 0 h�g Ï �I�H�!'$� % (E-13)

" m Í l �H�!' Í % k o � l Í �(' Í % � (E-14)

18

Equations(E-10)and(E-11)imply thatm Í l �½�!' Í % k�� l Í �!' Í %C" � Í �!' Í %�� Ï (E-15)"�l m o Í l � �!' Í % k�� l Í �!' Í %�n � Ï � (E-16)

Togethertheseequationsgive us two equationsfor the twounknowns k�� l Í �!' Í % and m Í l �F�(' Í % :k � l Í �(' Í %m Í l �H�(' Í % " k o � l Í �(' Í %m o Í l � �(' Í % 2 Í �(' Í % � � Ï (E-17)

andm Í l �½�(' Í % k � l Í �(' Í % � � Ï "�l m o Í l � �(' Í %&n � Ï (E-18)

Theuniquesolutionof theseequationsis givenby equations(E-7)and(E-8). Thus,wehaveshown thatthemessagepassingalgorithmpreviouslydefinedhasfixedpointsthatareequivalentto thestationarypointsof theregiongraphfreeenergy.

The two-way algorithm will be particularly elegant wheny2 Í �!' Í %o" 2 Í �!' Í % andwhen} Í " + for all regions. In that

case,eachregion will sendmessagesto all adjacentregions,andthemessageupdateruleswill bethenaturalgeneralizationof theordinaryBPruleswrittenwith two kindsof messages.Itis interestingto notethattheconditionthat y2 Í �!' Í %�" 2 Í �(' Í %canbe ensuredby requiringthat only regionswith no parentscontainfactornodes,while the conditionthat

} Í " + for allregionscanbeensuredby requiringthatthesub-graphobtainedby takingany regionandall of its ancestorregionsmustalwaysform a tree.

When} Í " + for all regions,the two-way GBP algorithm

is equivalentto Pearl’s methodof clustering[9]: we form newnodesfrom clustersof variablesin theoriginalgraph(thesearethe regions)and run an ordinary BP algorithmon the result-ing graph.It is importantto bearin mind that this equivalenceonly holdsfor a subsetof possibleregion graphs:if oneusesthis methodon a setof regionsthatdoesnot satisfytheregiongraphconditions,or on a region graphfor which

} | v" + forsomeregions,theresultingbeliefswill generallybea poorap-proximation.

APPENDIX F: REGION GRAPHS WITH Ô Í "3þ REGIONS

In our proof thatthefixedpointsof theparent-to-childGBPalgorithmareequivalentto the stationarypointsof the regiongraphfree energy (given in sectionVII-A), we assumedthatno region hascountingnumber Ô Í " þ . That is never diffi-cult to arrange:if onehasa region graphwith regionswhosecountingnumberequalszero,onecanremove them,andthenconnectdirectly any regionsthatwerepreviously ancestorsordescendantsof eachother, but areno longerafter the removalof the Ô Í "ÿþ regions. Theremainingregionswill have iden-tical countingnumbersby construction,andsincethe regionswith Ô Í "¬þ did not contributeto theregion graphfreeenergyin any case,it will be unchanged.In figure 16, we illustratethe “surgery” thatneedsto beperformedon a region graphtoremoveregionswith countingnumberzero.

In fact, however, the parent-to-childalgorithm is well-definedevenwhensomeof theregionshave countingnumbers

Fig. 16. An illustrationof how onecantake aregion graphwith someregionsthathave countingnumberzero,andobtainanotherregion graphwith no suchregions but with an identical free energy. One first removes regions with acountingnumberof zero,andthendirectly connectsany ancestor-descendantpairs that have becomedisconnected.In this example,we form new directconnectionsbetweenregions � and � andbetweenregions Ç and � .

equalto zero,andwhenoneimplementsit, onefinds that theresultsat its fixed pointsareidenticalto thoseobtainedwhenonesurgically removesthe Ô Í " þ regions. The reasonthatthealgorithmstill givesproperresults,even thoughtheaboveproof breaksdown, is that the ñ constraintsthatcannotbede-rivedfrom the @ constraintsareactuallynotnecessary–they allinvolve Ô Í "3þ regionsthatdonotcontributeto thefreeenergyin any case.

Fig. 17. A small illustrative region graph(seetext). Note that region � hascountingnumber��� T�� .

A small examplemay make this point more comprehensi-ble. Considerthesmallregiongraphshown in figure(17). Thecountingnumbersof the regions are Ô 6 " Ô 7 " Ô 8$" + ,Ô ! " Ô � "Ú� + , and Ô µ~" þ , sothatregion � couldclearlyberemovedto obtainanequivalentregion graph.For thepurposeof illustration,we leave it in. We have six ñ constraints,eachof which is very straightforward. For example,the constraintassociatedwith ñB6 ! �!' ! % is � ! �(' ! % " � g/.Bh�g�� �Z6D�('n6 % ,while the constraintassociatedwith ñ ! µ½�!'nµ % is �ZµH�('nµ % "� g/�ch�g�� � ! �!' ! .

Thesix @ constraintsaresomewhatlessstraightforward.Go-ing backto theprescriptiongivenin equation(53), we seefor

19

examplethattheconstraintassociatedwith @ 6 ! �(' ! % isÔ ! � ! �(' ! % ­ Ô 7 eg/�<h�g/� � 7 �(' 7D%C"/þ (F-1)

or equivalently, � ! �!' ! %*" eg/�<h�g/� � 7 �!' 7³% (F-2)

while theconstraintassociatedwith @ ! µH�('nµ % isÔ�µ*�ZµD�('nµ % ­ Ô � eg - h�g/� � � �!' � % ­ Ô^8 eg < h�g/� �Z8F�('n8 %*"/þ (F-3)

or equivalently eg < h�g/� � 8 �(' 8³%C" eg - h�g/� � � �(' � % � (F-4)

BecauseÔ µª" þ , therewill not beany @ constraintdirectlyinvolving �ZµH�!'nµ % , so we cannotderive someof the ñ con-straints.On theotherhand,theseconstraintsarenot necessary,becausetheregiongraphfreeenergy itself alsodoesnotdependdirectly on �ZµH�('nµ % . We alsoseethat the @ constraintsarestillsufficient to ensurethatall thebeliefsareconsistentwhentheyaremarginalizeddown to region � . Finally, if we do surgeryon this region graphandremove region � , we cantheneasilyverify that the ñ constraintsarethenentirelyequivalentto the@ constraints.

REFERENCES

[1] B. J.Frey. GraphicalModelsfor MachineLearningandDigital Commu-nication. MIT Press,1998.

[2] M. I. Jordan,editor. Learningin graphicalmodels. MIT Press,1998.[3] L. R. Rabiner. A tutorial on hiddenmarkov modelsandselectedapplica-

tions. Proceedingsof theIEEE, 77(2):257–286,1989.[4] A. J.Viterbi. Errorboundsfor convolutionalcodesandanasymptotically

optimumdecodingalgorithm. IEEETrans.Info. Theory, 13(2):260–269,1967.

[5] G. D. Forney. The Viterbi algorithm. Proceedingsof the IEEE,61(3):268–278,1973.

[6] R. G. Gallager. Low-densityparity check codes. MIT Press,1963.[7] C. Berrou,A. Glavieux,andP. Thitimajshima.NearShannonlimit error-

correctingcoding and decoding: Turbo-codes. In Proceedings1993IEEE InternationalConferenceon Communications, pages1064–1070,Geneva, Switzerland,1993.

[8] R. J.McEliece,D. J.C. MacKay, andJ.F. Cheng.Turbodecodingasaninstanceof Pearl’s ‘belief propagation’algorithm. IEEE J. on Sel.Areasin Comm., 16(2):140–152,1998.

[9] J.Pearl.Probabilisticreasoningin intelligent systems:networksof plau-sibleinference. MorganKaufmann,1988.

[10] R.E.Kalman.A new approachto linearfilteringandpredictionproblems.J. BasicEng., 82:34–45,1960.

[11] A. Gelb,editor. Appliedoptimalestimation. MIT Press,1974.[12] R. J.Baxter. ExactlySolvedModelsin StatisticalMechanics. Academic

Press,1982.[13] R. Cowell. Advancedinferencein Bayesiannetworks. In M.I. Jordan,

editor, Learningin GraphicalModels. MIT Press,1998.[14] K.P. Murphy, Y. Weiss,andM.I. Jordan. Loopy belief propagationfor

approximateinference:an empiricalstudy. In Proc. Uncertaintyin AI,1999.

[15] H. A. Bethe.Proc.Roy. Soc.LondonA, 150:552,1935.[16] R. Kikuchi. A theoryof cooperative phenomena.Phys.Rev., 81(6):988–

1003,1951.[17] J. S. Yedidia,W. T. Freeman,andY. Weiss. Generalizedbelief propa-

gation. In T. K. Leen,T. G. Dietterich,andV. Tresp,editors,Advancesin Neural InformationProcessingSystems, volume13, Cambridge,MA,2001.MIT Press.

[18] T. Morita. Clustervariationmethodfor non-uniformIsingandHeisenbergmodelsandspin-paircorrelationfunction.Prog. Theor. Phys., 85(2):243–255,1991.

[19] Eds.T. Morita, M. Suzuki,K. Wada,andM. Kaburagi. Foundationsandapplicationsof clustervariationmethodandpathprobabilitymethod.InProg. Theor. Phys.Supplement, volume115,1994.

[20] F. R. Kschischang,B. J.Frey, andH.-A. Loeliger. Factorgraphsandthesum-productalgorithm. IEEETrans.Info. Theory, 47:498–519,2001.

[21] G. D. Forney. Codeson graphs:normalrealizations.IEEE Trans.Info.Theory, 47(2):520–548,2001.

[22] T. P. Minka. Expectationpropagationfor approximateBayesianinfer-ence.UAI-01, 2001.

[23] M. J.Wainwright,T. Jaakkola, andA. S.Willsky. Tree-basedreparame-terizationfor approximateestimationon loopy graphs. In T. Dietterich,S. Becker, andZ. Ghahramani,editors,Advancesin Neural InformationProcessingSystems, volume14.MIT Press,2002.

[24] S. M. Aji andR. J. McEliece. The generalizeddistributive law. IEEETrans.Info. Theory, 46:325–343,2000.

[25] E. Offer and E. Soljanin. An algebraicdescriptionof iterative decod-ing schemes. In B. Marcusand J. Rosenthal,editors,Codes,Systemsand Graphical Models, Volumesin Mathematicsand its Applications.SpringerVerlag,2000.

[26] M. Mezard and R. Zecchina. The random k-satisfiability problem:from an analyticsolution to an efficient algorithm. Availableonline athttp://xxx.lanl.gov/pdf/cond-mat/0207194, 2002.

[27] S. M. Aji andR. J. McEliece. Thegeneralizeddistributive law andfreeenergy minimization. In Proceedingsof the39thAllertonConferenceonCommunication,Control andComputing, Allerton, Illinois, 2001.

[28] R. J. McElieceandM. Yildirim. Belief propagationon partially orderedsets.FifteenthInternationalSymposiumonMathematicalTheoryof Net-worksandSystems,2002.

[29] P. PakzadandV. Anantharam.Belief propagationandstatisticalphysics.In Proceedingsof theConferenceon InformationSciencesandSystems,2002.

[30] T. M. Cover andJ. A. Thomas. Elementsof InformationTheory. JohnWiley andSons,1991.

[31] M. I. Jordan,Z. Ghahramani,T. Jaakkola,andL. Saul.An introductiontovariationalmethodsfor graphicalmodels.In M.I. Jordan,editor, Learningin GraphicalModels. MIT Press,1998.

[32] D. J.C. Mackay, J.S.Yedidia,W. T. Freeman,andY. Weiss.A conversa-tion abouttheBethefreeenergy andsum-product(discussiondocument).Availableonlineat http://www.merl.com/reports/TR2001-18/index.html,2001.

[33] Y. Weiss.Correctnessof localprobabilitypropagationin graphicalmod-elswith loops.Neural Computation, 12:1–41,2000.

[34] K. Huang.StatisticalMechanics. JohnWiley andSons,1963.[35] A. L. Yuille. A double-loopalgorithmto minimizetheBetheandKikuchi

freeenergies.Unpublished,2001.[36] M. Welling andY. W. Teh. Belief optimization: A stablealternative to

beliefpropagation.UAI-01, 2001.[37] H. J.KappenandW. Wiegerinck.Novel iterationschemesfor thecluster

variationmethod.In T. Dietterich,S.Becker, andZ. Ghahramani,editors,Advancesin Neural Information ProcessingSystems, volume 14. MITPress,2002.

[38] R. P. Stanley. Enumerative Combinatorics, volume1. CambridgeUni-versityPress,1997.

[39] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Characterizingbelief propagation and its generalizations. Available online athttp://www.merl.com/reports/TR2001-15/index.html, 2001.

20