Restricted Boltzmann Machines - PolyU

21
1 Restricted Boltzmann Machines Supplementary Notes to EIE4105 (Out of Syllabus) M.W. Mak [email protected] http://www.eie.polyu.edu.hk/~mwmak References (Equations of this file are obtained from): 1. Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images, 2009

Transcript of Restricted Boltzmann Machines - PolyU

1

Restricted Boltzmann Machines Supplementary Notes to EIE4105 (Out of Syllabus)

M.W. Mak [email protected]

http://www.eie.polyu.edu.hk/~mwmak References (Equations of this file are obtained from):

1.  Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images, 2009

2

RestrictedBoltzmannMachines •  AnRBMcomprisesvisiblenodesandhiddennodes.

•  Opera8onofanRBMisgovernedbyitsenergyfunc8on

•  Intheabovediagram,V=2andH=3

vi

hjwij

bjh

1

1biv

3

RestrictedBoltzmannMachines •  Given

thejointprobabilityofvandhis

•  Intui8vely,theconfigura8on(v,h)leadingtolow(high)energyareassignedwithhigh(low)probability

•  Marginalizingovervandh,wehave

•  Intheabovediagram,V=2andH=3

v = [v1, . . . , vV ]T and h = [h1, . . . , hH ]T

4

RestrictedBoltzmannMachines •  Condi8onalprobabili8es:

•  p(v|h)isdifficulttoevaluatebecausetherearemanydifferentu’sinthedenominator.

•  However,itispossibletoderiveaclose-formsolu8onfor

5

RestrictedBoltzmannMachines

6

RestrictedBoltzmannMachines

7

RestrictedBoltzmannMachines •  Similarly:

•  Therefore,thecondi8onalprobabilitythatahiddenunitisonisindependentofotherhiddenunits.

•  ThispropertymakesRBMtrainingveryefficient

8

TrainingofRBMs •  GivenCtrainingvectors:,weaimto

maximizethelogprobability

•  Usinggradientascent

9

TrainingofRBMs •  Thefirstterm:

=X

g

vci gj

"e�E(vc,g)

Pg0 e�E(vc,g0)

#

=X

g

vci gjp(g|vc)

= vci hgj |vci= vci p(gj = 1|vc)

= vci p(hj = 1|vc)

10

TrainingofRBMs •  Thesecondterm:

=X

u

X

g

uigj

"e�E(u,g)

Pu0P

g0 e�E(u0,g0)

#

= huigjip(u,g)= hvihjip(v,h)

11

TrainingofRBMs •  Thesecondtermisdifficulttocomputebecausethereisno

close-formsolu8onforp(v,h)•  Inprac8ce,the2ndtermcanbeapproximatedby1-step

contras8vedivergence(CD-1):

12

TrainingofRBMs 1.  Assignvisibleunits:2.  Computehiddenac8va8ons

3.  Samplingthehiddennodefrompmftoobtainasample(binary)h

4.  Reconstructvbasedonthesampledbinaryh:

5.  Computehidden-nodepmfbasedonthereconstructedv:

6.  Computeapproximatedexpecta8on(2ndterm):

vi vci , i = 1, . . . , V

hj =XV

i=1viwij + bhj

p(hj = 1|vc) =1

1 + e�(P

i viwij+bhj )

vreci =1

1 + e�(P

j hjwji+bvi )

p(hj = 1|vrec) = 1

1 + e�(P

i vreci wji+bhj )

hvihjip(v,h) ⇡ vreci p(hj = 1|vrec)

13

TrainingofRBMs

14

TrainingofRBMs •  Combining1stand2ndterms,trainingofRBMusingCD-1

amountsto:

wherethesuperscript“rec”meansreconstruc8ngvusingthec-thtrainingvectorasinput.

�wij = ✏w [hvihjidata

� hvihjimodel

]

= ✏w

CX

c=1

[vci p(hj = 1|vc)� vreci p(hj = 1|vrec)]

15

Gaussian-BernoulliRBMs •  Forreal-valueratherthanbinaryinput,thevisualnodesare

assumetofollowGaussiandistribu8ons.•  Energyfunc8onofGaussian-BernoulliRBM(GB-RBM):

•  Energyfunc8onofBernoulli-BernoulliRBM(BB-RBM):

•  InGB-RBM,eachvisualnodeaddsaparabolicoffsettotheenergyfunc8on,withthewidthoftheparabolacontrolledbyσ.

16

Gaussian-BernoulliRBMs •  Itcanbeshownthat

p(v|h) = N (v;µv|h,⌃v|h)

17

Gaussian-BernoulliRBMs •  TrainingofGB-RBMsissimilartothatofBB-RBMs•  1sttermofthederiva8veoflog-probability:

= � 1

�ivci p(hj = 1|vc)

18

Gaussian-BernoulliRBMs •  2ndtermofthederiva8veoflog-probability:

•  Updateformula:

= � 1

�ivreci p(hj = 1|vrec)

�wij =✏w�i

[hvihjidata

� hvihjimodel

]

=✏w�i

CX

c=1

[vci p(hj = 1|vc)� vreci p(hj = 1|vrec)]

19

Gaussian-BernoulliRBMs

20

Bernoulli-GaussianRBMs •  Forreal-valueratherthanbinaryoutput,thehiddennodes

areassumetofollowGaussiandistribu8ons.•  Energyfunc8onofBernoulli-GaussianRBM(BG-RBM):

•  Energyfunc8onofBernoulli-BernoulliRBM(BB-RBM):

•  InBG-RBM,eachhiddennodeaddsaparabolicoffsettotheenergyfunc8on,withthewidthoftheparabolacontrolledbyσ.

E(v,h) =HX

j=1

(hj � bhj )2

2�2j

�VX

i=1

bvi vi �VX

i=1

HX

j=1

vihj

�jwij

21

Bernoulli-GaussianRBMs •  Itcanbeshownthat

p(vk = 1|h) = 1

1 + e�⇣PH

j=1

hjwjk�j

+bvk

p(h|v) = N (h;µh|v,⌃h|v)

bhj + �j

VX

i=1

viwji