FAU Thesis presentation
Transcript of FAU Thesis presentation
Introduction Theory Implementation Results Conclusion
Solving Stochastic PDEs with ApproximateGaussian Markov Random FieldsUsing Di↵erent Programming Environments
Kelvin Kwong Lam Loh
University Erlangen-Nuremberg – System Simulation
August 13, 2014
Introduction Theory Implementation Results Conclusion
Outline
1 Introduction
2 Theory
3 Implementation
4 Results
5 Conclusion
Introduction Theory Implementation Results Conclusion
Introduction
Uncertainties inherent in real-life systems.
Costly to do Monte Carlo experiments.
Cost comes from sample generation + slow sampling rateconvergence.
Two recent methods to speed up: GMRF + MLMC.
Introduction Theory Implementation Results Conclusion
Introduction - Objectives
Mathematics: Computational Engineering:
Validate the use of thetwo methods.
Explore theimplementation of thetwo methods.
Determine potentialperformanceenhancements.
Implement the methodsin di↵erent languageswith varying abstractionlevels.
Create application forExaStencils project.
Introduction Theory Implementation Results Conclusion
Theory - Problem Statement
Solve for U(x) given random field a(x).
r ·⇣e
a(x)rU(x)⌘= 0 x 2 ⌦ = [0, 1]⇥ [0, 1]
⌦
(0, 0) (1, 0)
(0, 1) (1, 1)U
N
= 1
U
S
= 10
U
E
= 5U
W
= 3
Figure: Problem domain
Introduction Theory Implementation Results Conclusion
Theory - Example of a realization
: Solution : Random coe�cient
Figure: Solution, U(x), and random coe�cient, a(x) fields for a singlerealization, � = 0.1, � = 1
Introduction Theory Implementation Results Conclusion
Theory - GMRF approximation
Covariance matrix of Gaussian Field is dense.
Approximate Gaussian field by Gaussian Markov RandomField.
(2 ��)↵/2(⌧a(x)) = W (x)
ra(x) · n = 0, x 2 @⌦
⌧
2 =�(⌫)
�(⌫)(4⇡)d/22⌫�2
=
p8⌫
�
⌫ = ↵� d
2
Introduction Theory Implementation Results Conclusion
Theory - Multilevel Monte Carlo
For standard Monte Carlo (SLMC):
Idea is to do repeated independent trials until statisticalconvergence
e(QMC
M,N
)2 =V(Q
M
)
N
+ (E[QM
� Q])2
Both error terms independent of each other
Multilevel Monte Carlo (MLMC):
Use multiple levels!
E[QM
] = E[QM
0
] +LX
l=1
E[QM
l
� Q
M
l�1
] =LX
l=0
E[Yl
]
V⇣Q
MLMC
M
⌘=
LX
l=0
N
�1
l
V(Yl
)
Introduction Theory Implementation Results Conclusion
Theory - Multilevel Monte Carlo
e(QMLMC
M
)2 := Eh(QMLMC
M
� E[Q])2i
=LX
l=0
N
�1
l
V(Yl
) + (E[QM
� Q])2
If E⇥(Q
M
� Q)2⇤! 0 as M ! 1, then,
V(Yl
) = V(QM
l
� Q
M
l�1
) ! 0 as l ! 1. It is then possibleto choose N
l
! 1 as l ! 1.
For this thesis, QM
= ||UM
||2p
M
Introduction Theory Implementation Results Conclusion
Implementation - Finite Volume Discretization
⌦i
⌦i+1
⌦i�1
⌦i+M
x
⌦i�M
x
�
(i)
1
�
(i)
2
�
(i)
3
�
(i)
4
Figure: FVM discretization of interior cell, ⌦i
, and faces, �(i)
j
associatedwith the cell
Discretization example of GMRF equation:
2
a
i
d⌦i
�⇣�(i)
1
+ �(i)
3
⌘�y �
⇣�(i)
2
+ �(i)
4
⌘�x = Z
i
pd⌦
i
Introduction Theory Implementation Results Conclusion
Implementation - Domain Decomposition
Domain decomposition.Static Master-Slave.Dynamic Master-Slave.
Figure: Example DD of matrix for PETSc using the mpiaij type.(Image taken from PETSc presentation)
Introduction Theory Implementation Results Conclusion
Implementation - Static task assignment
N
s
P
N
s
P
N
s
P
N
s
P
N
s
P
N
s
P
Slave Group 1 Slave Group 2 · · · · · · Slave Group P-1 Slave Group P
Root
Figure: The Static Master-Slave (MSS) strategy [Ns
= Total number ofsamples, P = Total number of worker groups]
Introduction Theory Implementation Results Conclusion
Implementation - Dynamic Work Pool
Slave Group 1
Slave Group 2 · · · · · ·· · ·
N
s
Slave Group P-1
Slave Group P
Work pool - Root
Figure: The Dynamic Master-Slave (MSD) strategy [Ns
= Total numberof samples, P = Total number of worker groups]
Introduction Theory Implementation Results Conclusion
Implementation - ExaStencils
Domain decomposition based.
GMRF approximation method with SLMC implemented inLayer 4.
Figure: Images taken from ExaStencil paper (arxiv, 2014)
Introduction Theory Implementation Results Conclusion
Results - Convergence Tests
Convergence tests for discretization and sampling errors.
: Grid : SLMC
Figure: Convergence test
Introduction Theory Implementation Results Conclusion
Results - GMRF Validation
: Exact Matern : GMRF
Figure: Covariance field for x = (0.51, 0.49) for the 10000 samplesrealized
Introduction Theory Implementation Results Conclusion
Results - GMRF Performance
Figure: Speedup of the GMRF approximation over the Choleskydecomposition approach
Introduction Theory Implementation Results Conclusion
Results - GMRF Performance
: Cholesky Decomposition : GMRF
Figure: Percentage of total wall clock time (4 procs, 1000 samples)
Introduction Theory Implementation Results Conclusion
Results - MLMC Validation
For sampling error, ✏ = 5E � 3
SLMC gives E[Q240
] = 5.201.MLMC gives E[QMLMC
240
] = 5.209.Absolute error 8E � 3 ⇡ 5E � 3.
Figure: Number of samples
Introduction Theory Implementation Results Conclusion
Results - MLMC Validation
: Expectation,E[Y
l
] = E[Ql
� Q
l�1
]: Variance,V(Y
l
) = V(Ql
� Q
l�1
)
Figure: MLMC sample case plots for � = 0.1, � = 0.3, Finest grid size isat [240⇥240], sampling error, ✏ = 5e � 3.
Introduction Theory Implementation Results Conclusion
Results - MLMC Performance
: Speedup as a function of gridsize
: Speedup as a function of levels
Figure: Speedups for the MLMC using � = 0.1, � = 0.3, finest grid size[240⇥240], sampling error, ✏ = 5e � 3.
Introduction Theory Implementation Results Conclusion
Results - E�ciency (PETSc)
Figure: E�ciency for di↵erent parallelization strategies, 1000 samples,[500⇥500], 1 processor per slave group
Introduction Theory Implementation Results Conclusion
Results - Processor bindings
: Balanced
: Unbalanced
Figure: Processor bindings report from OpenMPI
Introduction Theory Implementation Results Conclusion
Results - Jumpshot Visualization (PETSc)
: Balanced : Unbalanced
Figure: Jumpshot analysis for 10 processors
Introduction Theory Implementation Results Conclusion
Results - Jumpshot Visualization (PETSc)
Figure: Jumpshot visualization for MSD - 4 processors per slave group(1000x1000 grid size, 500 samples)
Introduction Theory Implementation Results Conclusion
Results - ExaStencils
Recent results show that the E[Q512
] = 5.326. Need to debugfurther.Speedup promising.
Figure: Expectation of U(x) for ExaStencils
Introduction Theory Implementation Results Conclusion
Results - ExaStencils
Figure: Speedups for ExaStencils
Introduction Theory Implementation Results Conclusion
Comparison between languages
Suppose, we want to calculate res = b � Ax .
MATLAB example
residual = b - A*x;
PETSc example
ierr = MatMult(A,x,xtemp);CHKERRQ(ierr);
ierr = VexWAXPY(residual,-1,xtemp,b);CHKERRQ(ierr);
ExaStencils Layer 4 example
loop over inner on Residual@current {
Residual@current = RHS@current - (Laplace@current *
Solution[0]@current)
}
Introduction Theory Implementation Results Conclusion
Experiences With Di↵erent Languages
The language at layer 4 has a relatively low learning curve ifdocumentation is available.
Similar in user friendliness such as MATLAB, but much easierthan C/C++ when compared to the PETSc implementation.
Still limited in application capability (No arbitrary grid size,and no custom parallelization strategy i.e. Master-Slave forconcurrent sample calculations).
Introduction Theory Implementation Results Conclusion
Experiences With Di↵erent Languages
Performance of target code is very good and should be evenmore enhanced if concurrent sample calculations can beperformed.
Program code (DSL Layer 4) management is easier than thePETSc implementation (484 lines vs 575 lines) and (1 sourcefile vs 5 source/header files).
More comfortable with PETSc or MATLAB since the sourcecodes can be easily debugged. Not the same with thegenerated target code which generated greater than 130source files.
Introduction Theory Implementation Results Conclusion
Conclusion
GMRF approximation and MLMC methods does speed upcomputation time.
They are both consistent with the standard methods.
Dynamic Master-Slave parallelization method works very wellfor Monte Carlo methods.
ExaStencils provides relatively ”easy” coding for a significantperformance gain.
Still more comfortable with C++ and/or MATLAB.