Coupling Lattice Boltzmann Gas and Level Set Method for Simulating Free Surface Flow in GPU/CUDA...

Coupling Lattice Boltzmann Gas and Level SetMethod for Simulating Free Surface Flow in

GPU/CUDA Environment

Tomir Kryza and Witold Dzwinel

AGH University of Science and Technology, Institute of Computer Science, Krakow,Poland

Abstract. We present here a proof-of-concept of a novel, efficient methodfor modeling of liquid/gas interface dynamics. Our approach consists incoupling the lattice Boltzmann gas (LBG) and the level set (LS) meth-ods. The inherent parallel character of LBG accelerated by level setsis the principal advantage of our approach over similar particle basedsolvers. Consequently, this property allows for efficient use of our solverin GPU/CUDA environment. We demonstrate preliminary results andGPU/CPU speedups simulating two standard free surface fluid scenarios:the falling droplet and the breaking dam problems.

Keywords: free surface flow, lattice Boltzmann gas, level sets, CUDA,GPGPU

1 Introduction

The computational fluid dynamics (CFD) often involves solving free surfaceproblems such as river flows, floods and breaking waves. Important industrialproblems include processes such as foaming and casting, inkjet droplet forma-tion and various types of fluid-solid structure interactions. These free surfacescenarios become always tough problems in terms of computational complexity.Thus, real-time simulation of two-phase (liquid/gas) flows is still a challenginggoal. Meanwhile, the possibility of interactive visualization and simulation ofapproximate dynamics of the liquid/gas interface is in scope of great interestof game and simulator designers. Using GPU boards and CUDA technology formodeling free surface dynamics is the straightforward way for both speeding upthe computations and increasing their precision.

When constructing approximate models of free surface fluid flow, the follow-ing basic aspects should be taken into account:

• dynamics of the liquid volume, i.e., calculation of velocity and pressure fieldsinside the liquid volume,

• interaction of liquid with container walls,• representation of the free surface interface,• dynamics of the liquid on the interface,

• liquid volume separation and merging, e.g., a droplet pinch-off or coalescence.

For the sake of efficiency, our model does not take into account internaldynamics of the gaseous phase (i.e., we handle the free surface conditions onlyat the interface).

The dynamics of fluid flow can be simulated classically by solving the Navier-Stokes equation numerically or by means of discrete particles methods. However,additional constrains that have to be imposed make these approaches very de-manding computationally for simulating free surface flows. The CFD problemscan also be attacked by solving discrete Boltzmann equation by using, so called,lattice Boltzmann gas (LBG) method [14]. In many cases, especially, for simu-lating complex fluids, LBG can be competitive in terms of efficiency for classicalmethods [17, 2] mainly due to its inherent parallelism. On the other hand, LBGapproach fails to resolve thin layers of liquid efficiently [15]. Meanwhile, theLevel Set Method [11] is a perfect modeling tool for tracking interface dynamicsin the course of simulation. It enables realistic and smooth representation of theliquid/gas interface diminishing computational load.

The main purpose of this paper is to present a novel concept which consistsin coupling of LBG and level set methods. We expect that such the coupling canincrease the overall efficiency of the free surface simulation. In section 2 we showthat this hybrid approach can be efficiently implemented on GPU architecture.In section 3 we demonstrate preliminary results of modeling and GPU/CPUspeedups employing two standard free surface fluid tests: the falling droplet andthe breaking dam problems. We discuss the advantages of our approach in theConclusions.

2 Algorithms and implementation

The coupling of the lattice Boltzmann gas and Marker Level Set [10] methodsand implementation of this integrated approach in GPU/CUDA environment forapproximate and fast simulation of the free surface fluid dynamics, is the mainidea of this paper.

The process of modeling consists of steps that are depicted in Fig. 1. Wemodel the free surface flow of incompressible fluid in a rectangular container. Weassume that the friction between the fluid and the container walls is modeledwith no-slip boundary conditions. For simplification reasons and to reduce thecomputational load without loss of generality, we assume fixed cell size ∆x = 1for LBG lattice and time step ∆t = 1 for LBG and LS advection steps.

As shown in Fig. 1, the Marker Level Set mode consists of several steps pro-cessing the level set distance function, cell types matrix and marker particles set.The GPU kernels used for these computations can be divided into two groups:particle kernels and level set kernels. The level set kernels operate on a geometryof discrete lattice nodes while the particle kernels perform computations on thearray of particle positions. The geometry of the level set kernels is structured asfollows.

Fig. 1. The block diagram of LBG and LS coupling strategy

Each block contains one full row of cells in x -axis. Grid indices define y andz row indices. Such layout enables efficient value acquiring from neighboringcells thanks to the L1/L2 cache on the Fermi architecture while keeping kernelimplementation simple and easier to maintain.

The particle kernels layout is based on a mapping of grid/block hierarchyon one-dimensional array of structures. Each block contains a fixed numberof threads (K = 512) in one row of the block. The grid contains one row ofblocks. Number of blocks is defined by total number of particles: blockCount =particleCount

512 . It is assumed that the number of particles is a multiple of the blocksize K. Separation of particles into blocks is dictated by the CUDA constrainton a maximum block size. The algorithms implemented in particle kernels donot use information from other particle than the processed one.

As shown in Fig. 1, the LBG mode is composed of steps responsible directlyfor simulation of liquid dynamics. All of GPU kernels used by this modelingmode operate on a similar setup as the level set kernels. Each block containsone full row of cells in x -axis while a plane of blocks maps onto y-z plane. Thislayout was inspired by implementation of a 3D lattice Boltzmann gas methodimplemented on a GPU described in [16] and was chosen because it yields ahigh memory throughput of LBG lattice processing. The components of the twocomputational modes are discussed below.

Cells reinitializationThe zero level set isosurface defines the liquid/gas interface at the beginning ofsimulation. The following timesteps are based on the assumption that none ofthe liquid cells have a neighboring gas cell. Therefore, a reinitialization step isrequired that will define interface cell type. Such separation setup is presentedin Fig. 2(a). The algorithm for defining this layer is as follows:

1. Initialize all boundary cells with B. For all other cells:2. if Φ > 0 assign cell type: A,3. if Φ ≤ 0 assign I type when there exists a neighboring cell with Φ > 0 or L

otherwise.

Here we assume that Φ is the level set implicit function.

L L L L L L L L L L L L L L L L L L LL L L L L L L L L L L L L L L L

L L L L L L L L L L

L L L L L

L L

I I I

I I II I I

I I II I I

I I I

I I I

I

I I

II I

I

A A A A A A A A A A A A A A A A A A A







A AA A A A A

A A A A A A A AA A A A A A A A

A A A A A A A A A A A A A A A A A

A A A A A A A

A A A A

A A

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B

L L L L L L L L L L L L L L L L L L L




L L L L L L L L L L L L L L L L L L LL L L L L L L L L L L L L L L L L L L

(a) (b)

Fig. 2. (a) Cell types: A - air phase, L - liquid phase, I - interface cells, B - boundary ;(b) A cut-plane of a spherical velocity field initialized inside a sphere (left) and extendedin a narrow band outside of the interface (right).

The dependence of cell type on the value of Φ assures that the assigned celltype will not rely upon the order in which cells are processed. This assumptionis very important especially for GPU algorithms as race conditions that couldappear in the simulation can be very hard to detect.

Defining interface cells at the beginning of simulation provides an additionalbenefit. There is no need to check if neighbors of cells inside the volume existduring Φ evaluation. Therefore, the number of if branches is minimal implyinghigh speed of the kernel execution.

Velocity extensionThe velocity field generated by the LBG stream-collide process is valid only insidethe bulk of liquid. For a proper numerical advection of the level set and markerparticles, velocity field needs to be defined on both sides of the interface. Thatis why, an iterative velocity extrapolation method is employed in the simulation.Calculation of the velocity field values takes place only in cells that are locatedin the region external to the liquid. The reason is that the velocity field is definedinside the liquid by the stream-collide step. The length of the iteration process isfixed in the simulation because numerically correct values are needed only insidea narrow band of the interface. An example of velocity field extension inside anarrow band outside the interface is presented in Fig. 2(b).

Level set advectionAdvection of the level set function Φ is implemented using the first-order forwardEuler method in time and an upwind differencing of Φ values. Numerical errors

induced by the first-order accuracy are reduced by use of the marker particlesthat correct the level set function values at each iteration.

Particle advection

Particle advection equation is solved using the second order Runge-Kutta method- the midpoint method. The velocity field has to be interpolated in particle po-sitions because marker particles are not bound to the discrete lattice nodes andcan move freely inside the lattice boundaries. We use linear interpolation in eachdimension. After the first step of the midpoint method a check is performed ifthe particle left the lattice boundaries. If so, the particle is marked as to beexcluded from all computations for the rest of simulation.

Level set correction

After the level set advection, calculated by means of the low-order numericalscheme, the zero level set isosurface will be distorted. Therefore, informationabout the position of marker particles is used to correct the values close to theinterface. The correction is performed with the gaussian kernel base function[10]. The use of the weight function with the gaussian kernel leads to a simplecorrection equation. Choosing the kernel radius ensuring that the weight functiondoes not vanish only in a close proximity of x, allows for bounding summationsonly to particles located in the close vicinity of x. For each active particle,neighboring cells are selected and the particle impact on every cell is stored.

In the Marker Level Set implementation on a GPU proposed in [9], the cor-rection of the level set values by marker particles was achieved by means of ashader and volume rendering. In this paper a different approach is proposed.The level set values correction is performed in two successive steps:

1. Every particle’s impact on the closest 27 cells ( 3x3x3 cube ) is computed andstored in a temporary array. A problem can occur in a situation when two ormore particles affect one cell. If these particles are processed in parallel onthe GPU, a race condition may occur that would make computations invalid.Therefore, an atomic addition CUDA function atomicAdd is used for storingcomputed weight impact values.

2. For each cell of the level set grid, total weight is taken from the temporaryarray and a correction value is subtracted from the original level set value.Once again atomicAdd function is used to prevent possible race conditions.

Level set reinitialization

After the correction phase, level set values located on the zero isofurface haveproper values. The level set values that are far from the interface are still dis-torted by low-order advection scheme. For that reason, the level set reinitializa-tion step is performed. The Fast Iterative Method [18] is used. The number ofiterations is fixed because for computations only level set values from the narrowband of the zero level set are required.

Particle adjustment

It must be guaranteed that before the next iteration all active particles will belocated on the zero level set. As this constraint may have been violated duringthe level set reinitialization an adjustment process is executed. The particles thatare outlying are moved in the direction normal to the interface by an arbitraryfraction of the distance to the interface. A natural choice would be to set thisfraction to 0, so that the particle advection always defines the interface. Unfor-tunately, this leads to numerical artifacts caused by large distances between theinterface and particles for turbulent modes. Therefore, the parameter definingthis fraction has to be determined experimentally for a specific simulation.

Cells refilling

Every change of a type of cell may lead to a situation where an air cell becomespartially filled with fluid or a fluid cell becomes empty. The level set functionautomatically handles the case of an empty cell occurrence. On the other hand,in case of a partially filled cell we need special handling. Change from an emptycell to an interface cell leads to a problem where the stream-collision processwould operate on a cell for which the distribution function has no values. For thatreason, after such event, cell’s distribution function is reinitialized. The new valueis equal to the equilibrium distribution function with arguments averaged on allnon-empty neighbors. Density ρ and velocity v are averaged and a distributionfunction is computed for each cell that changed types from A to I (Fig. 2(a)).It is assumed that there is no possibility of a change from A to L.

Streaming

The copy operation of distribution function values between cells is the basic pro-cedure of the streaming step. The distribution function is double buffered andthere is only one write operation for each of the distribution function values.Therefore, there is no possibility of a race condition. The 18 assignments (weuse the D3Q19 velocity set [12]) are executed in a straightforward way. However,a different approach needs to be taken for liquid streamed from interface cells.Interface cell by definition has at least one empty neighboring cell. This meansthat the distribution function value for a direction opposite to the direction ofthe neighboring empty cell will not be assigned during the streaming process.To handle this case, the distribution function reconstruction procedure [6] is em-ployed. Streaming from fluid and interface cells is separated from streaming fromsolid boundary cells to keep branching cost at minimum. Streaming from solidcells implements the no-slip boundary conditions and handles lattice boundarycases by means of if conditionals.

Collision

The collision step is an inherently local operation, i.e., it does not use informa-tion from other neighboring cells. As a result, very efficient implementation ona GPU is possible. The collision is performed only in cells that have non emptydistribution functions. That is, only liquid and interface cells are processed. Toincrease the stability of simulation, Smagorinsky sub-grid model [13] is imple-

mented. As was said before, collision process uses only information from thecurrent cell. The straightforward implementation is suggested.

3 Results

We performed two test simulations for the common problems of a liquid dropfalling to a liquid surface and a breaking dam. The snapshots from the simula-tion of a liquid droplet are shown in Fig. 3. One can clearly see the moment ofthe topological change during merge of the drop with fluid surface. The initialsplash and the secondary droplet are not resolved in the simulation due to lackof the surface tension in the model. This could however, be easily incorporatedby extending the model by introducing the force dependent on the curvature.The comparison of performance for a GPU and simple CPU implementations ispresented in Table 1. All of the tests were performed on NVIDIA Fermi basedGeForce GTX 460 board and standard AMD Athlon II X4 635 processor. TheGPU-based engine outperforms the single core CPU-based implementation anorder of magnitude. The snapshots from simulation of the breaking dam problemcan be seen in Fig. 4. As shown in the figure, the container is partially filled withliquid. The liquid section is separated from the rest of container volume by anobstacle. Initially the liquid is stationary, then, at a given moment, the obstacleis removed and the liquid is collapsing freely by means of the gravitational force.The impact of the no-slip boundary conditions is apparent in the first few snap-shots, where the fluid is slowed down by the walls. Despite a large number ofiterations a loss of mass is negligible. On the last snapshots small liquid dropletscan be seen falling. This effect is resolved thanks to the marker particles thatcan preserve fine fluid volume details.

Table 1. Performance comparison of CPU and GPU based implementations. Valuesin the table present execution times of one iteration in milliseconds.

CPU GPUAverage Std. Dev. Average Std.Dev.

Falling drop 4322.14 1445.86 437.86 39.33Breaking dam 3765.66 1387.11 651.01 101.78

4 Related work

The application of the LBG approach for simulating free surface flow is dictatedby the straightforward mapping of the lattice streaming and collisions onto theGPU computation model. Other discrete CFD methods , including moleculardynamics (MD) [3, 1], dissipative particle dynamics (DPD) [4] and smoothedparticle hydrodynamics (SPH) can also be used for modeling free surface flow.

Fig. 3. The snapshots from simulation of a falling drop. Numbers represent the iterationof the simulation.

However, all of these methods are rather computationally demanding and requireadditional mechanisms to control the interface dynamics [3, 1, 4].

The approach of coupling LBG with level sets for free surface dynamicssimulations similar to that presented above has been introduced in [15]. However,unlike out method, it suffers from nonphysical loss of mass. No implementationdetails about the hardware architecture are given in [15] but it seems that themethod targets serial CPU architecture without any parallelism. The LBG/LSsimulation engine presented in this paper conserves mass due to the utilizationof marker particles. Additionally, the GPU version of the engine performs farbetter in terms of performance than it’s single-core CPU counterpart.

Another similar method coupling LBG with the Particle Level Set method(PLSM), called Hybrid lattice Boltzmann/level sets method (HLBM) is alsodescribed in [7]. It reports an improved performance compared to the originalPLSM [5]. However, it is still behind the performance of our method. The MarkerLevel Set method used in our approach involves less particles to capture fine in-terface details and, therefore, performs better in comparison with HLBM. Thehybrid of the level set interface tracking and the Particle Level Set method withthe SPH is another successful approach used for simulating free surface flow. In[8] visually appealing results are presented demonstrating free-surface simula-tions capturing fine details of the flow such as sprays. However, the integratedmodels of SPH and LS are very computationally demanding due to particle mo-tion changing constantly the nearest neighbors of SPH particles. Their efficientimplementation on GPU is very difficult and give unconvincing benefits.

Fig. 4. The snapshots from breaking dam simulation. Numbers represent the iterationof the simulation.

5 Conclusions

In this paper we propose a new concept, which can be applied for efficient sim-ulation of a free surface evolution. It integrates the lattice Boltzmann gas andthe level set simulation methodologies. The inherent parallelism of LBG allowingfor optimal use of GPU architecture together with the possibility of control ofthe liquid/gas interface by the level sets make the method competitive to otherknown CFD approaches. Promising performance of its GPU/CUDA implemen-tation demonstrates that the method could be successfully adopted in differentareas of physical simulations such as multiple-phase flows, flames spreading,shock and detonation waves tracking and others. Due to its high efficiency, ourapproach could be potentially used by game and simulator designers. However,interactive visualization of free surface flows still needs faster GPU processors,more work on LBG parallelization and more efficient coupling schemes with levelset methods. To be more physically correct, the model should be extended e.g.,by enabling surface tension simulation. Summarizing, the concept presented isvery promising and deserves more attention in the future.

6 Acknowledgements

This research is supported by the AGH grant No.11.11.120.777. We thank mgr inz.Maciej Kluczny for his essential contribution to this paper.

References

1. Alda, W., Dzwinel, W., Kitowski, J., Moscinski, J., Pogoda, M., Yuen, D.A.: Com-plex fluid-dynamical phenomena modeled by large-scale molecular-dynamics sim-ulations. Comput. Phys. 12(6), 595–600 (Nov 1998)

2. Anderson, J.D.: Computational Fluid Dynamics: The Basics with Applications.McGraw-Hill, Inc. (1995)

3. Dzwinel, W., Alda, W., Pogoda, M., Yuen, D.A.: Turbulent Mixing in the Mi-croscale. Physica D (137), 157–171 (2000)

4. Dzwinel, W., Yuen, D.A.: Rayleigh-Taylor Instability in the Mesoscale Modeled byDissipative Particle Dynamics. Int. J. Mod Phys.C (12/1), 91–118 (2001)

5. Enright, D., Losasso, F., Fedkiw, R.: A fast and accurate semi-Lagrangian particlelevel set method. Computers Structures 83, 479–490 (2005)

6. Korner, C., Thies, M., Hofmann, T., Thurey, N., Rude, U.: Lattice BoltzmannModel for Free Surface Flow for Modeling Foaming. Journal of Statistical Physics121, 179–196 (2005)

7. Kwak, Y., Nakano, A.: Hybrid Lattice-Boltzmann/Level-Set Method for LiquidSimulation and Visualization. International Journal of Computational Science3(579), 1–14 (2009)

8. Losasso, F., Talton, J., Kwatra, N., Fedkiw, R.: Two-way coupled SPH and par-ticle level set fluid simulation. IEEE Transactions on Visualization and ComputerGraphics 14(4), 797–804 (2008)

9. Mei, X., Decaudin, P., Hu, B.G., Zhang, X.: Real-Time Marker Level Set on GPU.In: International Conference on Cyberworlds, CW ’08, September, 2008. pp. 209–216. IEEE, Hangzhou, China (Sep 2008)

10. Mihalef, Sussman, Metaxas: The Marker Level Set method: a new approach tocomputing accurate interfacial dynamics. Journal of Computational Physics (2007)

11. Osher, S., Fedkiw, R.: Level Set Methods and Dynamic Implicit Surfaces. AppliedMathematical Sciences, Springer (2003)

12. Rubinstein, R., Luo, L.S.: Theory of the Lattice Boltzmann equation: Symmetryproperties of discrete velocity sets. Physical Review E 77(3), 036709 (2008)

13. Smagorinsky, J.: General circulation experiments with the primitive equations.Monthly Weather Review 91(3), 594–5 (1963)

14. Succi, S.: The Lattice Boltzmann Equation for Fluid Dynamics and Beyond.Clarendon Press, Oxford (2001)

15. Thuerey, N., Ruede, U.: Free Surface Lattice-Boltzmann fluid simulations with andwithout level sets. Proc. of Vision, Modelling, and Visualization VMV pp. 199–207(2004)

16. Tolke, J., Krafczyk, M.: TeraFLOP computing on a desktop PC with GPUs for3D CFD. International Journal of Computational Fluid Dynamics 22(7), 443–456(2008)

17. Wesseling, P.: Principles of Computational Fluid Dynamics. Springer Series inComputational Mathematics, Springer (2009)

18. Won-Ki Jeong, R.T.W.: A fast iterative method for a class of Hamilton-Jacobiequations on parallel systems. University of Utah Technical Report UUCS07010pp. 1–25 (2007)

Coupling Lattice Boltzmann Gas and Level Set Method for Simulating Free Surface Flow in GPU/CUDA...

Documents

Transcript of Coupling Lattice Boltzmann Gas and Level Set Method for Simulating Free Surface Flow in GPU/CUDA...