FAU Thesis presentation

Introduction Theory Implementation Results Conclusion

Solving Stochastic PDEs with ApproximateGaussian Markov Random FieldsUsing Di↵erent Programming Environments

Kelvin Kwong Lam Loh

University Erlangen-Nuremberg – System Simulation

August 13, 2014


Outline

1 Introduction

2 Theory

3 Implementation

4 Results

5 Conclusion


Introduction

Uncertainties inherent in real-life systems.

Costly to do Monte Carlo experiments.

Cost comes from sample generation + slow sampling rateconvergence.

Two recent methods to speed up: GMRF + MLMC.


Introduction - Objectives

Mathematics: Computational Engineering:

Validate the use of thetwo methods.

Explore theimplementation of thetwo methods.

Determine potentialperformanceenhancements.

Implement the methodsin di↵erent languageswith varying abstractionlevels.

Create application forExaStencils project.


Theory - Problem Statement

Solve for U(x) given random field a(x).

r ·⇣e

a(x)rU(x)⌘= 0 x 2 ⌦ = [0, 1]⇥ [0, 1]

⌦

(0, 0) (1, 0)

(0, 1) (1, 1)U

N

= 1

U

S

= 10

U

E

= 5U

W

= 3

Figure: Problem domain


Theory - Example of a realization

: Solution : Random coe�cient

Figure: Solution, U(x), and random coe�cient, a(x) fields for a singlerealization, � = 0.1, � = 1


Theory - GMRF approximation

Covariance matrix of Gaussian Field is dense.

Approximate Gaussian field by Gaussian Markov RandomField.

(2 ��)↵/2(⌧a(x)) = W (x)

ra(x) · n = 0, x 2 @⌦

⌧

2 =�(⌫)

�(⌫)(4⇡)d/22⌫�2

=

p8⌫

�

⌫ = ↵� d

2


Theory - Multilevel Monte Carlo

For standard Monte Carlo (SLMC):

Idea is to do repeated independent trials until statisticalconvergence

e(QMC

M,N

)2 =V(Q

M

)

N

+ (E[QM

� Q])2

Both error terms independent of each other

Multilevel Monte Carlo (MLMC):

Use multiple levels!

E[QM

] = E[QM

0

] +LX

l=1

E[QM

l

� Q

M

l�1

] =LX

l=0

E[Yl

]

V⇣Q

MLMC

M

⌘=

LX

l=0

N

�1

l

V(Yl

)


Theory - Multilevel Monte Carlo

e(QMLMC

M

)2 := Eh(QMLMC

M

� E[Q])2i

=LX

l=0

N

�1

l

V(Yl

) + (E[QM

� Q])2

If E⇥(Q

M

� Q)2⇤! 0 as M ! 1, then,

V(Yl

) = V(QM

l

� Q

M

l�1

) ! 0 as l ! 1. It is then possibleto choose N

l

! 1 as l ! 1.

For this thesis, QM

= ||UM

||2p

M


Implementation - Finite Volume Discretization

⌦i

⌦i+1

⌦i�1

⌦i+M

x

⌦i�M

x

�

(i)

1

�

(i)

2

�

(i)

3

�

(i)

4

Figure: FVM discretization of interior cell, ⌦i

, and faces, �(i)

j

associatedwith the cell

Discretization example of GMRF equation:

2

a

i

d⌦i

�⇣�(i)

1

+ �(i)

3

⌘�y �

⇣�(i)

2

+ �(i)

4

⌘�x = Z

i

pd⌦

i


Implementation - Domain Decomposition

Domain decomposition.Static Master-Slave.Dynamic Master-Slave.

Figure: Example DD of matrix for PETSc using the mpiaij type.(Image taken from PETSc presentation)


Implementation - Static task assignment

N

s

P

N

s

P

N

s

P

N

s

P

N

s

P

N

s

P

Slave Group 1 Slave Group 2 · · · · · · Slave Group P-1 Slave Group P

Root

Figure: The Static Master-Slave (MSS) strategy [Ns

= Total number ofsamples, P = Total number of worker groups]


Implementation - Dynamic Work Pool

Slave Group 1

Slave Group 2 · · · · · ·· · ·

N

s

Slave Group P-1

Slave Group P

Work pool - Root

Figure: The Dynamic Master-Slave (MSD) strategy [Ns

= Total numberof samples, P = Total number of worker groups]


Implementation - ExaStencils

Domain decomposition based.

GMRF approximation method with SLMC implemented inLayer 4.

Figure: Images taken from ExaStencil paper (arxiv, 2014)


Results - Convergence Tests

Convergence tests for discretization and sampling errors.

: Grid : SLMC

Figure: Convergence test


Results - GMRF Validation

: Exact Matern : GMRF

Figure: Covariance field for x = (0.51, 0.49) for the 10000 samplesrealized


Results - GMRF Performance

Figure: Speedup of the GMRF approximation over the Choleskydecomposition approach


Results - GMRF Performance

: Cholesky Decomposition : GMRF

Figure: Percentage of total wall clock time (4 procs, 1000 samples)


Results - MLMC Validation

For sampling error, ✏ = 5E � 3

SLMC gives E[Q240

] = 5.201.MLMC gives E[QMLMC

240

] = 5.209.Absolute error 8E � 3 ⇡ 5E � 3.

Figure: Number of samples


Results - MLMC Validation

: Expectation,E[Y

l

] = E[Ql

� Q

l�1

]: Variance,V(Y

l

) = V(Ql

� Q

l�1

)

Figure: MLMC sample case plots for � = 0.1, � = 0.3, Finest grid size isat [240⇥240], sampling error, ✏ = 5e � 3.


Results - MLMC Performance

: Speedup as a function of gridsize

: Speedup as a function of levels

Figure: Speedups for the MLMC using � = 0.1, � = 0.3, finest grid size[240⇥240], sampling error, ✏ = 5e � 3.


Results - E�ciency (PETSc)

Figure: E�ciency for di↵erent parallelization strategies, 1000 samples,[500⇥500], 1 processor per slave group


Results - Processor bindings

: Balanced

: Unbalanced

Figure: Processor bindings report from OpenMPI


Results - Jumpshot Visualization (PETSc)

: Balanced : Unbalanced

Figure: Jumpshot analysis for 10 processors


Results - Jumpshot Visualization (PETSc)

Figure: Jumpshot visualization for MSD - 4 processors per slave group(1000x1000 grid size, 500 samples)


Results - ExaStencils

Recent results show that the E[Q512

] = 5.326. Need to debugfurther.Speedup promising.

Figure: Expectation of U(x) for ExaStencils


Results - ExaStencils

Figure: Speedups for ExaStencils


Comparison between languages

Suppose, we want to calculate res = b � Ax .

MATLAB example

residual = b - A*x;

PETSc example

ierr = MatMult(A,x,xtemp);CHKERRQ(ierr);

ierr = VexWAXPY(residual,-1,xtemp,b);CHKERRQ(ierr);

ExaStencils Layer 4 example

loop over inner on Residual@current {

Residual@current = RHS@current - (Laplace@current *

Solution[0]@current)

}


Experiences With Di↵erent Languages

The language at layer 4 has a relatively low learning curve ifdocumentation is available.

Similar in user friendliness such as MATLAB, but much easierthan C/C++ when compared to the PETSc implementation.

Still limited in application capability (No arbitrary grid size,and no custom parallelization strategy i.e. Master-Slave forconcurrent sample calculations).


Experiences With Di↵erent Languages

Performance of target code is very good and should be evenmore enhanced if concurrent sample calculations can beperformed.

Program code (DSL Layer 4) management is easier than thePETSc implementation (484 lines vs 575 lines) and (1 sourcefile vs 5 source/header files).

More comfortable with PETSc or MATLAB since the sourcecodes can be easily debugged. Not the same with thegenerated target code which generated greater than 130source files.


Conclusion

GMRF approximation and MLMC methods does speed upcomputation time.

They are both consistent with the standard methods.

Dynamic Master-Slave parallelization method works very wellfor Monte Carlo methods.

ExaStencils provides relatively ”easy” coding for a significantperformance gain.

Still more comfortable with C++ and/or MATLAB.


Covariance Function

GMRF approximation yields a Matern covariance function ofthe form:

C (�x) =1

2⌫�1�(⌫)(�x)⌫ K

⌫

(�x)

Exponential covariance function is a subset of the Materncovariance function.

FAU Thesis presentation

Documents

Transcript of FAU Thesis presentation