Post on 04-Feb-2023
1
Restricted Boltzmann Machines Supplementary Notes to EIE4105 (Out of Syllabus)
M.W. Mak enmwmak@polyu.edu.hk
http://www.eie.polyu.edu.hk/~mwmak References (Equations of this file are obtained from):
1. Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images, 2009
2
RestrictedBoltzmannMachines • AnRBMcomprisesvisiblenodesandhiddennodes.
• Opera8onofanRBMisgovernedbyitsenergyfunc8on
• Intheabovediagram,V=2andH=3
vi
hjwij
bjh
1
1biv
3
RestrictedBoltzmannMachines • Given
thejointprobabilityofvandhis
• Intui8vely,theconfigura8on(v,h)leadingtolow(high)energyareassignedwithhigh(low)probability
• Marginalizingovervandh,wehave
• Intheabovediagram,V=2andH=3
v = [v1, . . . , vV ]T and h = [h1, . . . , hH ]T
4
RestrictedBoltzmannMachines • Condi8onalprobabili8es:
• p(v|h)isdifficulttoevaluatebecausetherearemanydifferentu’sinthedenominator.
• However,itispossibletoderiveaclose-formsolu8onfor
7
RestrictedBoltzmannMachines • Similarly:
• Therefore,thecondi8onalprobabilitythatahiddenunitisonisindependentofotherhiddenunits.
• ThispropertymakesRBMtrainingveryefficient
9
TrainingofRBMs • Thefirstterm:
=X
g
vci gj
"e�E(vc,g)
Pg0 e�E(vc,g0)
#
=X
g
vci gjp(g|vc)
= vci hgj |vci= vci p(gj = 1|vc)
= vci p(hj = 1|vc)
10
TrainingofRBMs • Thesecondterm:
=X
u
X
g
uigj
"e�E(u,g)
Pu0P
g0 e�E(u0,g0)
#
= huigjip(u,g)= hvihjip(v,h)
11
TrainingofRBMs • Thesecondtermisdifficulttocomputebecausethereisno
close-formsolu8onforp(v,h)• Inprac8ce,the2ndtermcanbeapproximatedby1-step
contras8vedivergence(CD-1):
12
TrainingofRBMs 1. Assignvisibleunits:2. Computehiddenac8va8ons
3. Samplingthehiddennodefrompmftoobtainasample(binary)h
4. Reconstructvbasedonthesampledbinaryh:
5. Computehidden-nodepmfbasedonthereconstructedv:
6. Computeapproximatedexpecta8on(2ndterm):
vi vci , i = 1, . . . , V
hj =XV
i=1viwij + bhj
p(hj = 1|vc) =1
1 + e�(P
i viwij+bhj )
vreci =1
1 + e�(P
j hjwji+bvi )
p(hj = 1|vrec) = 1
1 + e�(P
i vreci wji+bhj )
hvihjip(v,h) ⇡ vreci p(hj = 1|vrec)
14
TrainingofRBMs • Combining1stand2ndterms,trainingofRBMusingCD-1
amountsto:
wherethesuperscript“rec”meansreconstruc8ngvusingthec-thtrainingvectorasinput.
�wij = ✏w [hvihjidata
� hvihjimodel
]
= ✏w
CX
c=1
[vci p(hj = 1|vc)� vreci p(hj = 1|vrec)]
15
Gaussian-BernoulliRBMs • Forreal-valueratherthanbinaryinput,thevisualnodesare
assumetofollowGaussiandistribu8ons.• Energyfunc8onofGaussian-BernoulliRBM(GB-RBM):
• Energyfunc8onofBernoulli-BernoulliRBM(BB-RBM):
• InGB-RBM,eachvisualnodeaddsaparabolicoffsettotheenergyfunc8on,withthewidthoftheparabolacontrolledbyσ.
17
Gaussian-BernoulliRBMs • TrainingofGB-RBMsissimilartothatofBB-RBMs• 1sttermofthederiva8veoflog-probability:
= � 1
�ivci p(hj = 1|vc)
18
Gaussian-BernoulliRBMs • 2ndtermofthederiva8veoflog-probability:
• Updateformula:
= � 1
�ivreci p(hj = 1|vrec)
�wij =✏w�i
[hvihjidata
� hvihjimodel
]
=✏w�i
CX
c=1
[vci p(hj = 1|vc)� vreci p(hj = 1|vrec)]
20
Bernoulli-GaussianRBMs • Forreal-valueratherthanbinaryoutput,thehiddennodes
areassumetofollowGaussiandistribu8ons.• Energyfunc8onofBernoulli-GaussianRBM(BG-RBM):
• Energyfunc8onofBernoulli-BernoulliRBM(BB-RBM):
• InBG-RBM,eachhiddennodeaddsaparabolicoffsettotheenergyfunc8on,withthewidthoftheparabolacontrolledbyσ.
E(v,h) =HX
j=1
(hj � bhj )2
2�2j
�VX
i=1
bvi vi �VX
i=1
HX
j=1
vihj
�jwij