FAU Thesis presentation

32
Introduction Theory Implementation Results Conclusion Solving Stochastic PDEs with Approximate Gaussian Markov Random Fields Using Dierent Programming Environments Kelvin Kwong Lam Loh University Erlangen-Nuremberg – System Simulation August 13, 2014

Transcript of FAU Thesis presentation

Introduction Theory Implementation Results Conclusion

Solving Stochastic PDEs with ApproximateGaussian Markov Random FieldsUsing Di↵erent Programming Environments

Kelvin Kwong Lam Loh

University Erlangen-Nuremberg – System Simulation

August 13, 2014

Introduction Theory Implementation Results Conclusion

Outline

1 Introduction

2 Theory

3 Implementation

4 Results

5 Conclusion

Introduction Theory Implementation Results Conclusion

Introduction

Uncertainties inherent in real-life systems.

Costly to do Monte Carlo experiments.

Cost comes from sample generation + slow sampling rateconvergence.

Two recent methods to speed up: GMRF + MLMC.

Introduction Theory Implementation Results Conclusion

Introduction - Objectives

Mathematics: Computational Engineering:

Validate the use of thetwo methods.

Explore theimplementation of thetwo methods.

Determine potentialperformanceenhancements.

Implement the methodsin di↵erent languageswith varying abstractionlevels.

Create application forExaStencils project.

Introduction Theory Implementation Results Conclusion

Theory - Problem Statement

Solve for U(x) given random field a(x).

r ·⇣e

a(x)rU(x)⌘= 0 x 2 ⌦ = [0, 1]⇥ [0, 1]

(0, 0) (1, 0)

(0, 1) (1, 1)U

N

= 1

U

S

= 10

U

E

= 5U

W

= 3

Figure: Problem domain

Introduction Theory Implementation Results Conclusion

Theory - Example of a realization

: Solution : Random coe�cient

Figure: Solution, U(x), and random coe�cient, a(x) fields for a singlerealization, � = 0.1, � = 1

Introduction Theory Implementation Results Conclusion

Theory - GMRF approximation

Covariance matrix of Gaussian Field is dense.

Approximate Gaussian field by Gaussian Markov RandomField.

(2 ��)↵/2(⌧a(x)) = W (x)

ra(x) · n = 0, x 2 @⌦

2 =�(⌫)

�(⌫)(4⇡)d/22⌫�2

=

p8⌫

⌫ = ↵� d

2

Introduction Theory Implementation Results Conclusion

Theory - Multilevel Monte Carlo

For standard Monte Carlo (SLMC):

Idea is to do repeated independent trials until statisticalconvergence

e(QMC

M,N

)2 =V(Q

M

)

N

+ (E[QM

� Q])2

Both error terms independent of each other

Multilevel Monte Carlo (MLMC):

Use multiple levels!

E[QM

] = E[QM

0

] +LX

l=1

E[QM

l

� Q

M

l�1

] =LX

l=0

E[Yl

]

V⇣Q

MLMC

M

⌘=

LX

l=0

N

�1

l

V(Yl

)

Introduction Theory Implementation Results Conclusion

Theory - Multilevel Monte Carlo

e(QMLMC

M

)2 := Eh(QMLMC

M

� E[Q])2i

=LX

l=0

N

�1

l

V(Yl

) + (E[QM

� Q])2

If E⇥(Q

M

� Q)2⇤! 0 as M ! 1, then,

V(Yl

) = V(QM

l

� Q

M

l�1

) ! 0 as l ! 1. It is then possibleto choose N

l

! 1 as l ! 1.

For this thesis, QM

= ||UM

||2p

M

Introduction Theory Implementation Results Conclusion

Implementation - Finite Volume Discretization

⌦i

⌦i+1

⌦i�1

⌦i+M

x

⌦i�M

x

(i)

1

(i)

2

(i)

3

(i)

4

Figure: FVM discretization of interior cell, ⌦i

, and faces, �(i)

j

associatedwith the cell

Discretization example of GMRF equation:

2

a

i

d⌦i

�⇣�(i)

1

+ �(i)

3

⌘�y �

⇣�(i)

2

+ �(i)

4

⌘�x = Z

i

pd⌦

i

Introduction Theory Implementation Results Conclusion

Implementation - Domain Decomposition

Domain decomposition.Static Master-Slave.Dynamic Master-Slave.

Figure: Example DD of matrix for PETSc using the mpiaij type.(Image taken from PETSc presentation)

Introduction Theory Implementation Results Conclusion

Implementation - Static task assignment

N

s

P

N

s

P

N

s

P

N

s

P

N

s

P

N

s

P

Slave Group 1 Slave Group 2 · · · · · · Slave Group P-1 Slave Group P

Root

Figure: The Static Master-Slave (MSS) strategy [Ns

= Total number ofsamples, P = Total number of worker groups]

Introduction Theory Implementation Results Conclusion

Implementation - Dynamic Work Pool

Slave Group 1

Slave Group 2 · · · · · ·· · ·

N

s

Slave Group P-1

Slave Group P

Work pool - Root

Figure: The Dynamic Master-Slave (MSD) strategy [Ns

= Total numberof samples, P = Total number of worker groups]

Introduction Theory Implementation Results Conclusion

Implementation - ExaStencils

Domain decomposition based.

GMRF approximation method with SLMC implemented inLayer 4.

Figure: Images taken from ExaStencil paper (arxiv, 2014)

Introduction Theory Implementation Results Conclusion

Results - Convergence Tests

Convergence tests for discretization and sampling errors.

: Grid : SLMC

Figure: Convergence test

Introduction Theory Implementation Results Conclusion

Results - GMRF Validation

: Exact Matern : GMRF

Figure: Covariance field for x = (0.51, 0.49) for the 10000 samplesrealized

Introduction Theory Implementation Results Conclusion

Results - GMRF Performance

Figure: Speedup of the GMRF approximation over the Choleskydecomposition approach

Introduction Theory Implementation Results Conclusion

Results - GMRF Performance

: Cholesky Decomposition : GMRF

Figure: Percentage of total wall clock time (4 procs, 1000 samples)

Introduction Theory Implementation Results Conclusion

Results - MLMC Validation

For sampling error, ✏ = 5E � 3

SLMC gives E[Q240

] = 5.201.MLMC gives E[QMLMC

240

] = 5.209.Absolute error 8E � 3 ⇡ 5E � 3.

Figure: Number of samples

Introduction Theory Implementation Results Conclusion

Results - MLMC Validation

: Expectation,E[Y

l

] = E[Ql

� Q

l�1

]: Variance,V(Y

l

) = V(Ql

� Q

l�1

)

Figure: MLMC sample case plots for � = 0.1, � = 0.3, Finest grid size isat [240⇥240], sampling error, ✏ = 5e � 3.

Introduction Theory Implementation Results Conclusion

Results - MLMC Performance

: Speedup as a function of gridsize

: Speedup as a function of levels

Figure: Speedups for the MLMC using � = 0.1, � = 0.3, finest grid size[240⇥240], sampling error, ✏ = 5e � 3.

Introduction Theory Implementation Results Conclusion

Results - E�ciency (PETSc)

Figure: E�ciency for di↵erent parallelization strategies, 1000 samples,[500⇥500], 1 processor per slave group

Introduction Theory Implementation Results Conclusion

Results - Processor bindings

: Balanced

: Unbalanced

Figure: Processor bindings report from OpenMPI

Introduction Theory Implementation Results Conclusion

Results - Jumpshot Visualization (PETSc)

: Balanced : Unbalanced

Figure: Jumpshot analysis for 10 processors

Introduction Theory Implementation Results Conclusion

Results - Jumpshot Visualization (PETSc)

Figure: Jumpshot visualization for MSD - 4 processors per slave group(1000x1000 grid size, 500 samples)

Introduction Theory Implementation Results Conclusion

Results - ExaStencils

Recent results show that the E[Q512

] = 5.326. Need to debugfurther.Speedup promising.

Figure: Expectation of U(x) for ExaStencils

Introduction Theory Implementation Results Conclusion

Results - ExaStencils

Figure: Speedups for ExaStencils

Introduction Theory Implementation Results Conclusion

Comparison between languages

Suppose, we want to calculate res = b � Ax .

MATLAB example

residual = b - A*x;

PETSc example

ierr = MatMult(A,x,xtemp);CHKERRQ(ierr);

ierr = VexWAXPY(residual,-1,xtemp,b);CHKERRQ(ierr);

ExaStencils Layer 4 example

loop over inner on Residual@current {

Residual@current = RHS@current - (Laplace@current *

Solution[0]@current)

}

Introduction Theory Implementation Results Conclusion

Experiences With Di↵erent Languages

The language at layer 4 has a relatively low learning curve ifdocumentation is available.

Similar in user friendliness such as MATLAB, but much easierthan C/C++ when compared to the PETSc implementation.

Still limited in application capability (No arbitrary grid size,and no custom parallelization strategy i.e. Master-Slave forconcurrent sample calculations).

Introduction Theory Implementation Results Conclusion

Experiences With Di↵erent Languages

Performance of target code is very good and should be evenmore enhanced if concurrent sample calculations can beperformed.

Program code (DSL Layer 4) management is easier than thePETSc implementation (484 lines vs 575 lines) and (1 sourcefile vs 5 source/header files).

More comfortable with PETSc or MATLAB since the sourcecodes can be easily debugged. Not the same with thegenerated target code which generated greater than 130source files.

Introduction Theory Implementation Results Conclusion

Conclusion

GMRF approximation and MLMC methods does speed upcomputation time.

They are both consistent with the standard methods.

Dynamic Master-Slave parallelization method works very wellfor Monte Carlo methods.

ExaStencils provides relatively ”easy” coding for a significantperformance gain.

Still more comfortable with C++ and/or MATLAB.

Introduction Theory Implementation Results Conclusion

Covariance Function

GMRF approximation yields a Matern covariance function ofthe form:

C (�x) =1

2⌫�1�(⌫)(�x)⌫ K

(�x)

Exponential covariance function is a subset of the Materncovariance function.