Highly parallel particle-laden flow solver for turbulence research

Computers & Fluids 76 (2013) 170–177

Contents lists available at SciVerse ScienceDirect

Computers & Fluids

journal homepage: www.elsevier .com/locate /compfluid

Highly parallel particle-laden flow solver for turbulence research

0045-7930/$ - see front matter � 2013 Elsevier Ltd. All rights reserved.http://dx.doi.org/10.1016/j.compfluid.2013.01.020

⇑ Corresponding author at: Sibley School of Mechanical and Aerospace Engineer-ing, Cornell University, Ithaca, NY 14853, USA. Tel.: +1 607 255 9679; fax: +1 607255 9606.

E-mail address: [email protected] (L.R. Collins).1 Present address: INVISTA S.a.r.l., PO Box 1003, Orange, TX 77631, USA.

Peter J. Ireland, T. Vaithianathan 1, Parvez S. Sukheswalla, Baidurja Ray, Lance R. Collins ⇑Sibley School of Mechanical and Aerospace Engineering, Cornell University, Ithaca, NY 14853, USAInternational Collaboration for Turbulence Research

a r t i c l e i n f o

Article history:Received 6 November 2012Received in revised form 5 January 2013Accepted 29 January 2013Available online 16 February 2013

Keywords:Isotropic turbulenceHomogeneous turbulent shear flowInertial particlesDirect numerical simulation

a b s t r a c t

In this paper, we present a Highly Parallel Particle-laden flow Solver for Turbulence Research (HiPPSTR).HiPPSTR is designed to perform three-dimensional direct numerical simulations of homogeneous turbu-lent flows using a pseudospectral algorithm with Lagrangian tracking of inertial point and/or fluid parti-cles with one-way coupling on massively parallel architectures, and is the most general and efficientmultiphase flow solver of its kind. We discuss the governing equations, parallelization strategies, timeintegration techniques, and interpolation methods used by HiPPSTR. By quantifying the errors in thenumerical solution, we obtain optimal parameters for a given domain size and Reynolds number, andthereby achieve good parallel scaling on Oð104Þ processors.

� 2013 Elsevier Ltd. All rights reserved.

1. Introduction

Turbulent flows laden with inertial particles (that is, particlesdenser than the carrier fluid) are ubiquitous in both industry andthe environment. Natural phenomena such as atmospheric cloudformation [1–3], plankton distributions in the sea [4], and plane-tesimal formation in the early universe [5,6] are all influenced byparticle–turbulence interactions. Inertial particle dynamics alsoimpact engineered systems such as spray combustors [7], aerosoldrug delivery systems [8], and powder manufacturing [9,10],among many other systems [11]. Despite extensive research, how-ever, open questions remain about the distribution of these parti-cles in the flow, their settling speed due to gravity, and theircollision rates. This is due in part to our incomplete understandingof the effect of the broad spectrum of flow scales that exists atintermediate or high Reynolds numbers.

Since inertial particle dynamics are strongly sensitive to thesmallest scales in the flow [11], large-eddy simulation, with itsassociated small-scale modeling, has difficulties representingsub-filter particle dynamics accurately, including particle cluster-ing that is driven by the Kolmogorov scales [12,13]. Consequently,our investigations rely on the direct numerical simulation (DNS) ofthe three-dimensional Navier–Stokes equations and the Maxeyand Riley equation for particle motion [14]. DNS has proven tobe an extremely effective tool for investigating inertial particle

dynamics, albeit at modest values of the Reynolds number due tothe heavy computational demands of resolving all relevant tempo-ral and spatial scales.

To extend the range of Reynolds numbers that can be simulated,we have developed a more advanced DNS code: the Highly Parallel,Particle-laden flow Solver for Turbulence Research (HiPPSTR). HiP-PSTR is capable of simulating inertial particle motion in homoge-neous turbulent flows on thousands of processors using apseudospectral algorithm. The main objective of this paper is todocument the solution strategy and numerical algorithms involvedin solving the relevant governing equations.

This paper is organized as follows. In Section 2, we show theequations governing the fluid and particle motion and the underly-ing assumptions of the flow solver. We discuss the time integrationand interpolation techniques in Sections 3 and 4, respectively,complete with an error analysis for optimizing code performance.Section 5 then describes the parallelization strategy, and Section 6provides conclusions.

2. Governing equations

2.1. Fluid phase

The underlying flow solver is based on the algorithm presentedin Ref. [15] and summarized below. It is capable of simulating bothhomogeneous isotropic turbulence (HIT) and homogeneous turbu-lent shear flow (HTSF) with a pseudospectral algorithm, whileavoiding the troublesome remeshing step of earlier algorithms[16]. The governing equations for the flow of an incompressible

http://crossmark.dyndns.org/dialog/?doi=10.1016/j.compfluid.2013.01.020&domain=pdf

http://dx.doi.org/10.1016/j.compfluid.2013.01.020

mailto:[email protected]

http://dx.doi.org/10.1016/j.compfluid.2013.01.020

http://www.sciencedirect.com/science/journal/00457930

http://www.elsevier.com/locate/compfluid

Fig. 1. Shear-periodic boundary conditions due to the mean shear applied in the x3

direction. The orthogonal domain is shown with the solid lines, while the shear-periodic boundary points are indicated by the black dots.

P.J. Ireland et al. / Computers & Fluids 76 (2013) 170–177 171

fluid are the continuity and Navier–Stokes equations, here pre-sented in rotational form

@ui

@xi¼ 0; ð1Þ

@ui

@tþ �ijkxjuk ¼ �

@ p=qþ 12 u2

� �@xi

þ m@2ui

@xj@xjþ fi; ð2Þ

where ui is the velocity vector (with magnitude u), �ijk is the alter-nating unit symbol, xi is the vorticity, p is the pressure, q is the den-sity, m is the kinematic viscosity, and fi is a large-scale forcingfunction that is added to achieve stationary turbulence for HIT.Aside from the term 1

2 u2, which ultimately will be subsumed intoa modified pressure term—see Eq. (10) below, this form of the Na-vier–Stokes equation has only six nonlinear terms, as compared tonine in the standard form, which reduces the expense of computa-tion and renders the solution method more stable.

2.1.1. Reynolds decompositionWe apply the standard Reynolds decomposition on the velocity,

vorticity, and pressure, yielding

ui ¼ Ui þ u0i; ð3Þxi ¼ Xi þx0i; ð4Þp ¼ P þ p0; ð5Þ

where capitalized variables denote mean quantities and primedvariables denote fluctuating quantities.

Subtracting the ensemble average of Eqs. (1) and (2) from Eqs.(1) and (2), respectively, and specializing for the case of homoge-neous turbulence (homogeneous turbulence implies all single-point statistics are independent of position, with the exception ofthe mean velocity Ui, which can vary linearly while still preservinghomogeneity for all higher-order velocity statistics of the flowfield) yields

@u0i@xi¼ 0; ð6Þ

@u0i@tþ �ijk Xju0k þx0jUk þx0ju

0k

� �¼ �

@ p0=qþ u0jUj þ 12 u02

h i@xi

þ m@2u0i@xj@xj

þ f 0i : ð7Þ

We consider HTSF with a uniform mean shear rate S, wherecoordinates x1, x2 and x3 are the streamwise, spanwise and sheardirections, respectively, without loss of generality. Under theseconditions, the mean velocity and vorticity can be expressed as

U ¼ ðSx3;0;0Þ; ð8ÞX ¼ ð0;S;0Þ; ð9Þ

thus reducing Eq. (7) to

@u0i@tþ u03Sdi1 þ Sx3

@u0i@x1þ �ijkx0ju

0k ¼ �

@p�

@xiþ m

@2u0i@xj@xj

þ f 0i ; ð10Þ

where the modified pressure p� � p0 qþ 12 u02

�. The above equations

are written to encompass both HIT and HTSF; for HIT S ¼ 0 andf 0i – 0, while for HTSF S – 0 and f 0i ¼ 0.

2.1.2. Spectral transformsPseudospectral algorithms take advantage of the fact that the

fluid velocity and pressure satisfy periodic boundary conditions.For HIT the period is simply the box domain, and so for a box oflength L1, L2 and L3 in the x1, x2 and x3 directions, respectively,the wavevector for the transforms takes the conventional form

k � 2pL1

n1;2pL2

n2;2pL3

n3

� �; ð11Þ

where the integers ni vary between �Ni/2 + 1 and Ni/2 for Ni gridpoints in each direction [17]. Note that the code allows you to inde-pendently set the number of grid points Ni and the box length Li ineach direction.

For HTSF, the presence of the mean shear modifies this to a‘‘shear-periodic boundary condition’’, as illustrated in Fig. 1. Theclassic Rogallo [16] algorithm eliminated this complication bysolving the equations of motion in a frame of reference that de-forms with the mean shear, enabling standard fast Fourier trans-forms (FFTs) to be applied; however, in order to relieve thedistortion of the physical space grid with time, the standard Rogal-lo algorithm maps the variables on the positively deformed meshonto a mesh that is deformed in the opposite direction, a stepknown as ‘‘remeshing’’. Remeshing leads to sudden changes inthe turbulent kinetic energy and dissipation rate as a consequenceof zeroing out certain wavenumbers. The algorithm of Bruckeret al. [15] eliminates this remeshing step by directly applying thespectral transform in a fixed frame of reference while incorporat-ing the phase shift that results from the mean shear analytically.We begin by defining a modified wavevector as

k0i � ki � Stk1di3 ð12Þ

and the Fourier decomposition of a variable /(x, t) as

/ðk; tÞ �X

x

/ðx; tÞ exp �Ik0ixi� �

; ð13Þ

where /ðk; tÞ is the forward spectral transform of /(x, t), denoted asFf/ðx; tÞg, and I is the imaginary unit. The inclusion of the crossterm Stk1di3 in the modified wavevector accounts for the shear-periodic boundary condition. We can similarly define a modifieddiscrete backward spectral transform of /ðk; tÞ as

/ðx; tÞ ¼ 1N1N2N3

Xk

/ðk; tÞ exp Ik0ixi� �

: ð14Þ

With these definitions, we can transform (6) and (10) to spec-tral space to obtain

k0iu0i ¼ 0; ð15Þ

@

@tþ mk02

u0i ¼ �dim þ

k0ik0m

k02

� ��mjnF x0ju

0n

n oþ 2

k0ik02

k01Su03

� di1Su03 þ f 0i; ð16Þ

where k02 ¼ k0ik0i. Note that we have used the incompressibility of the

flow, as specified by Eq. (15), to project the pressure out of Eq. (16)(refer to Ref. [15] for a detailed derivation).

The direct evaluation of the convolution F x0ju0nn o

would becomputationally prohibitively expensive. The ‘‘pseudospectralapproximation’’ is obtained by first transforming the velocity and

172 P.J. Ireland et al. / Computers & Fluids 76 (2013) 170–177

vorticity into physical space, computing the product, then trans-forming the result back into spectral space [18]. The transforma-tions between physical and spectral spaces are accomplishedusing modified fast Fourier transforms (FFTs) that are describedin Section 5. The pseudospectral algorithm introduces aliasing er-rors that are eliminated by a combination of spherical truncationand phase-shifting (refer to Appendix A of Ref. [19] for details).

2.1.3. Low wavenumber forcingTo achieve stationary turbulence with HIT, we must introduce

forcing to compensate for the energy that is dissipated [20]. Theforcing is usually restricted to low wavenumbers (large scales)with the hope that the intermediate and high wavenumbers be-have in a natural way that is not overly influenced by the forcingscheme. The two most widely used forcing schemes in the litera-ture are ‘‘deterministic’’ and ‘‘random’’ forcing schemes. Both aredescribed below.

The deterministic forcing scheme [21] first computes theamount of turbulent kinetic energy dissipated in a given time steph, DEtot(h). This energy is restored at the end of each time step byaugmenting the velocity over the forcing range between kf,min andkf,max so as to precisely compensate for the dissipated energy

uðk; t0 þ hÞ ¼ uðk; t0 þ hÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1þ DEtotðhÞR kf;max

kf ;minEðk; t0 þ hÞdk

vuut ; ð17Þ

where E(k, t0 + h) represents the turbulent kinetic energy in a wave-number shell with magnitude k at time t0 + h.

The alternative forcing scheme is based on introducing a sto-chastic forcing function, f 0iðk; tÞ [20]. This function is non-zero onlyover the forcing range between kf,min and kf,max and evolves accord-ing to a vector-valued complex Ornstein–Uhlenbeck process [20],as shown below

df 0iðk; tÞ ¼ �f 0iðk; tÞ

Tfdt þ

ffiffiffiffiffiffiffiffiffi2r2

f

Tf

sdWiðk; tÞ; 8kðkf ;min; kf ;maxÞ; ð18Þ

where Tf is the integral time-scale of the random forcing, r2f denotes

its strength, and Wi(k, t) is the Wiener process whose incrementdWi is defined to be a joint-normal, with zero mean, and covariancegiven by

hdWidW�j i ¼ hdij: ð19Þ

While implementing on a computer, we set the incrementdWi ¼ ðai þ IbiÞ

ffiffiffiffiffiffiffiffih=2

p, where ai and bi are two independent random

numbers drawn from a standard normal distribution.

2.2. Particle phase

The Maxey and Riley equation [14] is used for simulating spher-ical, non-deforming particles in the flow. We take the particles tobe small (i.e., d/g� 1, where d is the particle diameter,g � m3/4/�1/4 is the Kolmogorov length scale, and � is the averageturbulent kinetic energy dissipation rate) and heavy (i.e., qp/q� 1, where qp and q are the densities of the particles and fluid,respectively). We also assume that the particles are subjected toonly linear drag forces, which is valid when the particle Reynoldsnumber Rep � ku(x) � vkd/m < 1. Here, u(x) denotes the undis-turbed fluid velocity at the particle position, and v denotes theparticle velocity. For HTSF, we express u(x) as

uiðxÞ ¼ u0iðxÞ þ Sx3di1 ð20Þ

and for HIT, uiðxÞ ¼ u0iðxÞ.Under these assumptions, the Maxey and Riley equation simpli-

fies to a system of ordinary differential equations for the positionand velocity of a given particle

dxdt¼ v; ð21Þ

dvdt¼ uðxÞ � v

spþ g; ð22Þ

where sp �qp

qd2

18m is the particle response time and g is the gravita-tional acceleration vector. Note that the numerical solution of (21)and (22) requires an interpolation of grid values of the fluid velocityto the location at the center of each particle. The interpolationmethods used are discussed in Section 4.

The influence of particles on the continuity and momentum equa-tions is neglected due to the low volume ðOð10�6ÞÞ and massðOð10�3ÞÞ loadings [22,23], and therefore we consider only one-way coupling between the flow field and the particles. All particlesare represented as point-particles, and collisions are neglected[24,25].

3. Time integration

3.1. Fluid update

To calculate the temporal evolution of the fluid, we introducethe integrating factor

Aijðt0Þ � expZ t0

0mk02dij � 2di3dj3S

k01k03k02

� �dt

" #ð23Þ

into (16), yielding@

@tAijðtÞu0jh i

¼ AijðtÞRHSjðtÞ; ð24Þ

where for ease of notation, we have defined

RHSj � �djm þk0jk0m

k02

!�m‘nF x0‘u

0n

� þ 2

k0jk02

k01Su03 � dj1Su03: ð25Þ

Integrating (24) from time t0 to time t0 + h, we obtain

Aijðt0 þ hÞu0jðt0 þ hÞ ¼ u0jðt0ÞAijðt0Þ þZ t0þh

t0

RHSjðtÞAijðtÞdt: ð26Þ

The trapezoid rule is used to approximate the integral on the RHS of(26), yieldingZ t0þh

t0

RHSjðtÞAijðtÞdt � h2

RHSjðt0ÞAijðt0Þ þh2

RHSjðt0 þ hÞAijðt0 þ hÞ:

ð27Þ

As the integrand in Eq. (23) is a rational function of time, we canevaluate Aij(t0) analytically, substitute the result into Eq. (26), andapply the approximation in Eq. (27) to formulate a second-orderRunge–Kutta method (RK2) for Eq. (26) as follows:

u0iðt0 þ hÞ ¼ u0jðt0Þ exp �Z t0þh

t0

mk02dij � 2di3dj3Sk01k03k02

� �dt

" #

þ h2

RHSjðt0Þ

� exp �Z t0þh

t0

mk02dij � 2di3dj3Sk01k03k02

� �dt

" #

þ h2

RHSiðt0 þ hÞ: ð28Þ

To avoid convective instabilities, the time step h is chosen tosatisfy the Courant condition [26]

u0maxhffiffiffi3p

minðDx1;Dx2;Dx3ÞK 0:5; ð29Þ

where u0max is the maximum velocity fluctuation in the domain, andDxi is the grid spacing in the ith coordinate direction.

Fig. 2. Functional behavior of weights for the exponential integrator (denoted assolid lines with closed symbols) and the Runge–Kutta scheme (denoted as dashedlines with open symbols) in a Stokes number range of interest.


3.2. Particle update

Inertial particles introduce another time scale into the simula-tion, namely the particle response time sp, which is a parameterthat is independent of the fluid time step h. For the case wheresp� h, the system defined by (21) and (22) is stiff, and traditionalexplicit Runge–Kutta schemes require an extremely small timestep for numerical accuracy and stability. Note that updating theparticle equations implicitly would be prohibitively expensive, asit necessarily would involve multiple interpolations of the fluidvelocity to the particle location. An alternative scheme we haveused in the past is known as ‘‘subcycling’’, whereby the time stepused to update the particle field is chosen so as to ensureh/sp 6 0.1. The fluid velocity field is linearly interpolated betweenthe values at consecutive fluid time steps. While this method im-proves the accuracy of the RK2 method, it is computationallyexpensive, particularly for particles with low values of sp. Addi-tionally, subcycling can degrade the parallel scaling of the codefor simulations with a distribution of values of sp. The distributionof sp may vary from processor to processor, causing the entire cal-culation to run at the speed of the slowest processor (i.e., the pro-cessor with the highest load).

We have overcome these limitations by formulating an alterna-tive second-order scheme that is uniformly effective over the en-tire range of sp, including the challenging case of sp� h. Thenumerical scheme is based on ‘‘exponential integrators’’ [27].Exponential integrators are a broad class of methods that treatthe linear term in (22) exactly and the inhomogeneous part usingan exponential quadrature.

The starting point for all of the updates is as follows:

vðt0 þ hÞ ¼ e�h=sp vðt0Þ þw1u½xðt0Þ þw2u½xðt0Þ þ vðt0Þhþ ð1� e�h=sp Þspg; ð30Þ

where the weights w1 and w2 define the method. In our standardRK2 implementation, the weights are defined by

wRK1 ¼ 0:5ðh=spÞe�h=sp ; wRK

2 ¼ 0:5ðh=spÞ; ð31Þ

where the superscript ‘‘RK’’ indicates the standard RK method. Inthe new formulation based on the exponential quadrature, we rede-fine the weights as follows

w1 �hsp

� �u1 �

hsp

� ��u2 �

hsp

� � ; w2 �

hsp

� �u2 �

hsp

� �ð32Þ

and

u1ðzÞ �ez � 1

z; u2ðzÞ �

ez � z� 1z2 : ð33Þ

Fig. 2 compares the behavior of the RK weights with the newlydefined exponential weights as a function of h/sp for particles withStokes numbers St � sp/sg 6 0.1, where sg � (m/�)1/2 is the Kol-mogorov time scale. In the limit h/sp ? 0, we see all the weightsconverge, implying consistency between the RK and exponential-quadrature methods in the limit of a highly resolved simulation,and confirming the new method’s second-order accuracy. In theother limit, that is St ? 0 (or equivalently h/sp ?1), we see thestandard RK weights diverge, which clearly will lead to largenumerical errors. In contrast, the exponential-quadrature weightshave limits w1 ? 0 and w2 ? 1, implying from Eq. (30) that theparticle velocity smoothly approaches u[x(t0) + v(t0)h] + spg in thislimit, which is the fluid velocity (corrected for gravitational set-tling) at the anticipated particle position at time t0 + h based onan Euler time step, i.e., the correct limit. Furthermore, the trunca-tion error per step (as sp ? 0) for the RK2 scheme is O h3s�3

p u� �

while that of the new scheme is O h3s�1p

d2udt2

� �, indicating that the

error constant for the RK2 is O s�2p

� �greater than that of the expo-

nential integrator.We quantify the integration errors in the two schemes by per-

forming a numerical experiment on a domain with 5123 gridpoints, 2,097,152 particles, and small scale resolution kmaxg = 2(where kmax ¼ N

ffiffiffi2p

=3 is the maximum resolved wavenumber mag-nitude). We first use a Courant number of 0.5 to advance the par-ticle field for one time step, with both the RK2 scheme and theexponential integrator scheme. To reduce time-stepping errorsand attain a baseline case for quantifying these errors, we performa set of low-Courant-number simulations, starting from the sameinitial condition as the simulations with a Courant number of 0.5.We advance the particle field for 1000 time steps using the expo-nential integrator scheme with the Courant number of 0.0005(i.e., a time step equal to 1/1000th that of the other simulations)and h/sp 6 0.1. The estimated root-mean-square (RMS) velocity er-rors incurred during numerical integration for the exponentialintegrator and RK2 schemes are plotted in Fig. 3a. The time to up-date the positions and velocities of the particles for each of the twoschemes is plotted in Fig. 3b, normalized by the time to completeone step of the Navier–Stokes solver, which remains constant forall the cases. From Fig. 3a and b, it is evident that the exponentialintegrator outperforms the RK2 scheme, both in terms of numeri-cal accuracy and computational expense, particularly for high val-ues of h/sp (i.e., low values of St).

4. Interpolation

4.1. Interpolation methods

As discussed in Section 2.2, the solution of (21) and (22) re-quires an interpolation of grid values of the fluid velocity to theparticle centers. In principle, this interpolation could be performedexactly (provided that the grid is sufficiently fine to capture allscales of motion) using a spectral interpolation [28]. For a typicalsimulation involving at least a million particles, however, such aninterpolation technique is prohibitively expensive.

To compute the fluid velocities at the particle centers, we haveembedded several different interpolation methods into our code.They include linear, Lagrangian [29], Hermite [30], shape functionmethod (SFM) [28], and B-spline interpolation [31]. For theLagrangian and B-spline interpolations, we use 4, 6, 8, and 10 inter-polation points (denoted as LagP and BSplineP, where P is the num-ber of interpolation points). These interpolation methods are

(a) (b)

Fig. 3. (a) RMS velocity integration errors in the particle update (normalized by the RMS fluid velocity) as a function of h/sp and St for both the exponential integrator and RK2schemes. (b) Time to update the positions and velocities of 2,097,152 particles for both schemes, normalized by the time to complete one step of the Navier–Stokes solver.


compared in Fig. 4, both in terms of accuracy and computationalexpense. All data in Fig. 4 is from a simulation with 5123 gridpoints, 2,097,152 particles, and kmaxg = 2. In all cases, we deter-mined the ‘‘exact’’ values of the particle velocities from a spectralinterpolation, and defined the velocity error of a particular schemeas the RMS difference between the velocities obtained from thatscheme and the spectral interpolation at a given time.

Based on Fig. 4, the B-spline interpolation scheme, which isoptimized for spectral simulations [31], provides the best trade-off between computational expense and accuracy. To determinethe optimal number of spline points, we compare the interpolationerror to the local time-stepping error, as given in Fig. 3a. Fromthese results, for a given run, we choose the number of splinepoints so that the interpolation and time-stepping errors are ofthe same order of magnitude.

4.2. Interpolation in shear flow

All our interpolation methods are designed for velocities whichare stored on an orthogonal grid in physical space. As discussed inSection 2.1.2, the Brucker et al. [15] algorithm for the fluid velocitystores the physical-space velocity on an orthogonal mesh that isshear periodic. Consequently, the interpolation methods must beadapted to accommodate particles that require data from shear-periodic grid points. As an example, consider linear interpolationfor the particle shown in Fig. 5. The schematic is shown in twodimensions on a 4 � 4 grid for clarity.

Fig. 4. Interpolation error for different methods as a function of computation time,which is normalized by the time for one step of the Navier–Stokes solver. All errorsare normalized by the RMS fluid velocity, u0 .

Grid values of velocity are stored at the filled circles, and peri-odic points are shown as squares: standard periodic points arefilled squares and shear-periodic points are open squares. The dot-ted lines in the figure illustrate the phase shift that occurs betweenpoints A, B, C and D on the bottom surface and the shear-periodicpoints on the top surface. To complete the interpolation, the dataon the top surface must be phase shifted back to the orthogonalmesh points A0, B0, C0 and D0. This is accomplished by applying aone-dimensional spectral transform along the x1-direction

�u0boundaryðk1; x2; x3Þ �X

x1

u0boundaryðx1; x2; x3Þ expð�Ik1x1Þ; ð34Þ

multiplying by the following factor to shift the points in the x1-direction

�u00boundaryðk1; x2; x3Þ � �u0boundaryðk1; x2; x3Þ expð�Ik1StL3Þ ð35Þ

and converting back to physical space to recover the values at theopen squares A0, B0, C0 and D0 (denoted u00boundary below)

u00boundaryðx1; x2; x3Þ �1

N1

Xk1

�u00boundaryðk1; x2; x3Þ expðIk1x1Þ: ð36Þ

Given these velocity values, we can complete the interpolationusing the standard approaches for orthogonal grids.

Fig. 5. Example to demonstrate the interpolation approach for shear-periodicboundary conditions on a 4 � 4 grid. The particle is indicated by the symbol �, thestored grid values by filled circles, the standard periodic boundary points by filledsquares, and the shear-periodic boundary points by open squares. Correspondingboundary points are labeled A, B, C, D to illustrate the shear-periodic boundarycondition. We determine the velocity values at the open circles (A0 , B0 , C0 , D0) tocomplete the interpolation.


5. Parallelization

5.1. Fluid update

As the Reynolds number increases, the range of spatial and tem-poral scales also increases, resulting in the approximate scalingNT R9=2

k , where NT is the total number of grid points,Rk � 2K

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi5=ð3m�Þ

pis the Reynolds number based on the Taylor

microscale, and K is the average turbulent kinetic energy. The rapidincrease in the computational load with Reynolds number has lim-ited DNS to relatively modest values of this parameter. Further-more, with the slowing of processor speeds from Moore’s Lawover the past decade, the strategy for advancing supercomputerspeeds has shifted toward increasing the number of accessible pro-cessors on the system. Exploiting these massively parallel architec-tures requires a new class of algorithms.

The previous version of the code utilized one-dimensional(‘‘plane’’) parallel domain decomposition. Using this parallelizationstrategy, a domain with N3 grid points can be parallelized on atmost N processors, which realistically can accommodate a maxi-mum simulation size of 10243 grid points or Reynolds numbersin the range Rk [ 400.

To increase the granularity of the parallelization, we have mod-ified the code to allow two-dimensional (‘‘pencil’’) domain decom-position. With pencil decomposition, we are able to parallelize adomain with N3 grid points on up to N2 processors. Fig. 6 illustratesthe difference between the plane and pencil domaindecompositions.

While the Message Passing Interface (MPI) is used as the com-munication environment, the detailed bookkeeping for the FFTswith pencil decomposition is performed using the P3DFFT library[32]. P3DFFT uses standard FFT libraries (such as FFTW [33] or ESSL[34]) to compute the three-dimensional FFT of the velocity or pres-sure distributed over the two-dimensional array of processors. TheFFTs are necessary for computing the nonlinear convolution sumsdescribed in Section 2.1.1 and constitute the bulk of the computa-tional expense of the overall fluid solver. They are performed as aseries of one-dimensional FFTs in each of the three coordinatedimensions. Starting from a spectral-space variable /ðk1; k2; k3Þ,where the entire k3-dimension is stored on each pencil, we firstperform N1 � N2 one-dimensional complex-to-complex penciltransforms in the x3 direction to obtain

�/ðk1; k2; x3Þ �1

N3

Xk3

/ðk1; k2; k3Þ expðIk3x3Þ: ð37Þ

For HTSF only, we phase shift �/ðk1; k2; x3Þ to account for the shear-periodic boundary conditions, as discussed in Section 2.1.1

~/ðk1; k2; x3Þ � �/ðk1; k2; x3Þ expð�Ik1Stx3Þ: ð38Þ

The data in the second and third dimensions is then transposedusing the P3DFFT column communicator, so that all the data fromthe second dimension is now contiguous. Next, N1 � N3 one-dimensional complex-to-complex FFTs are performed on the sec-ond dimension

Fig. 6. A representative two-dimensional (‘pencil’) decomposition (left) and one-dimensional (‘plane’) decomposition (right).

�/ðk1; x2; x3Þ �1

N2

Xk2

~/ðk1; k2; x3Þ expðIk2x2Þ: ð39Þ

The data in the first and second dimensions is then transposedusing the P3DFFT row communicator, after which N2 � N3 com-plex-to-real one-dimensional transforms are performed on the firstdimension, to obtain

/ðx1; x2; x3Þ ¼1

N1

Xk1

�/ðk1; x2; x3Þ expðIk1x1Þ: ð40Þ

The forward transform follows by analogy. Starting from a real-space variable /(x1, x2, x3), we first perform N2 � N3 real-to-com-plex one-dimensional transforms on the first dimension

�/ðk1; x2; x3Þ ¼X

x1

/ðx1; x2; x3Þ expð�Ik1x1Þ: ð41Þ

The data in the first and second dimensions is transposed usingthe P3DFFT row communicator, and N1 � N3 complex-to-complexone-dimensional transforms are performed on the seconddimension

~/ðk1; k2; x3Þ ¼X

x2

�/ðk1; x2; x3Þ expð�Ik2x2Þ: ð42Þ

We then perform a transpose of the second and third dimen-sions using the P3DFFT column communicator, and phase shift toaccount for the shear-periodic boundary conditions

�/ðk1; k2; x3Þ ¼ ~/ðk1; k2; x3Þ expð�Ik1Stx3Þ: ð43Þ

We complete the transformation by calling N1 � N2 complex-to-complex one-dimensional transforms

/ðk1; k2; k3Þ ¼X

x3

�/ðk1; k2; x3Þ expð�Ik3x3Þ: ð44Þ

5.2. Particle update

The scale separation between the largest and smallest turbulentflow features increases with the flow Reynolds number. Therefore,an increase in Reynolds number requires an increase in the numberof Lagrangian particles, to ensure that the particles sample the flowwith sufficient resolution. The flow Reynolds number we achieve isdetermined by both the number of grid points and the small-scaleresolution kmax g. From experience, for kmaxg � 2, we have foundthat a population of (N/4)3 particles (where N is the number of gridpoints in each direction) for a given St provides a good balance be-tween statistical convergence and computational expense.

At the start of a simulation, particles are placed randomlythroughout the solution domain with a uniform distribution. Theserandom initial positions are generated efficiently in parallel usingversion 2.0 of the SPRNG library [35]. Each particle is assigned toa processor based on its physical location in the flow. Some ofthe grid values of the fluid velocity needed for the interpolationdiscussed in Section 4 may reside outside of that processor’s mem-ory, however. We therefore pad the processor’s velocity field withghost cells, as illustrated in Fig. 7. The unshaded grid cells repre-sent fluid velocities that are local to a given processor, and theshaded grid cells represent ghost cell values from adjacent proces-sors. The cells are numbered to indicate the correspondence be-tween standard grid cells and ghost cells. The ghost cellexchanges are performed using two-sided, nonblocking MPI. Asthe simulation progresses, particles travel throughout the simula-tion domain under the influence of the turbulent flow. Particleswhich cross processor boundaries are exchanged at the end ofevery time step, also using two-sided, nonblocking MPI.

Fig. 7. Ghost cell communication for interpolation for a representative two-dimensional parallel domain decomposition over four processors. The unshadedgrid cells are fluid velocities which are local to each processor, and the shaded gridcells are ghost cells from adjacent processors. The cells are numbered to indicatethe correspondence between standard grid cells and ghost cells.

Fig. 8. Parallel scaling on ORNL Jaguar for grids of size N3 with (N/4)3 particles ondifferent numbers of cores. The wall-clock time per step t is normalized by N3log2 N,the expected scaling for a three-dimensional FFT.


5.3. Parallel scaling

In Fig. 8, we show timing data for simulations with a total of N3

grid points and (N/4)3 particles as a function of the number ofcores. The wall-clock time per step t is normalized by N3log2 N,the expected scaling for a three-dimensional FFT. The ideal scalingcase (slope �1 line) is shown for comparison. All timings were per-formed on the computing cluster ‘‘Jaguar’’ at Oak Ridge NationalLaboratory (ORNL).

We achieve good scaling on ORNL Jaguar for the largest problemsizes. For a domain with 20483 grid points, for example, we ob-serve 85% strong scaling when moving from 1024 processors to4096 processors, and nearly 60% strong scaling on up to 16,384processors.

6. Conclusion

We have presented a highly parallel, pseudospectral code ide-ally suited for the direct numerical simulation of particle-ladenturbulence. HiPPSTR, the most efficient and general multiphaseflow code of its kind, utilizes two-dimensional parallel domaindecomposition, Runge–Kutta and exponential–integral time-step-ping, and accurate and efficient B-spline velocity interpolationmethods. All of these methods are selected and tuned for optimalperformance on massively parallel architectures. HiPPSTR thus

achieves good parallel scaling on Oð104Þ cores, making it idealfor high-Reynolds-number simulations of particle-ladenturbulence.

Acknowledgments

This article is a revised and expanded version of a conferenceproceeding for the Seventh International Conference on Computa-tional Fluid Dynamics (ICCFD7) [36]. We are grateful to M. vanHinsberg and Professor P.-K. Yeung for helpful discussions duringthe code development. The authors acknowledge financial supportfrom the National Science Foundation through Grant CBET-0967349 and through the Graduate Research Fellowship awardedto PJI. Additionally, the authors are indebted to several organiza-tions for computational time used in developing and optimizingHiPPSTR. We would like to acknowledge high-performance com-puting support provided by the Extreme Science and EngineeringDiscovery Environment (supported by National Science Founda-tion Grant No. OCI-1053575), NCAR’s Computational and Informa-tion Systems Laboratory (sponsored by the National ScienceFoundation), and the resources of the Oak Ridge Leadership Com-puting Facility at the Oak Ridge National Laboratory (supportedby the Office of Science of the U.S. Department of Energy underContract No. DE-AC05-00OR22725).

References

[1] Falkovich G, Fouxon A, Stepanov MG. Acceleration of rain initiation by cloudturbulence. Nature 2002;419:151–4.

[2] Shaw RA. Particle–turbulence interactions in atmospheric clouds. Annu RevFluid Mech 2003;35:183–227.

[3] Warhaft Z. Laboratory studies of droplets in turbulence: towardsunderstanding the formation of clouds. Fluid Dyn Res 2009;41. 011201–+.

[4] Malkiel E, Abras JN, Widder EA, Katz J. On the spatial distribution and nearestneighbor distance between particles in the water column determined fromin situ holographic measurements. J Plankton Res 2006;28(2):149–70.

[5] Cuzzi JN, Hogan RC. Blowing in the wind I. Velocities of chondrule-sizedparticles in a turbulent protoplanetary nebula. Icarus 2003;164:127–38.

[6] Cuzzi JN, Davis SS, Dobrovolskis AR. Blowing in the wind II. Creation andredistribution of refractory inclusions in a turbulent protoplanetary nebula.Icarus 2003;166:385–402.

[7] Faeth GM. Spray combustion phenomena. Int Combust Symp 1996;26(1):1593–612.

[8] Li WI, Perzl M, Heyder J, Langer R, Brain JD, Englemeier KH, et al. Aerodynamicsand aerosol particle deaggregation phenomena in model oral-pharyngealcavities. J Aerosol Sci 1996;27(8):1269–86.

[9] Pratsinis SE, Zhu W, Vemury S. The role of gas mixing in flame synthesis oftitania powders. Powder Tech 1996;86:87–93.

[10] Moody EG, Collins LR. Effect of mixing on nucleation and growth of titaniaparticles. Aerosol Sci Tech 2003;37:403–24.

[11] Balachandar S, Eaton JK. Turbulent dispersed multiphase flow. Annu Rev FluidMech 2010;42:111–33.

[12] Fede P, Simonin O. Numerical study of the subgrid fluid turbulence effects onthe statistics of heavy colliding particles. Phys Fluids 2006;18. 045103–+.

[13] Ray B, Collins LR. Preferential concentration and relative velocity statistics ofinertial particles in Navier–Stokes turbulence with and without filtering. JFluid Mech 2011;680:488–510.

[14] Maxey MR, Riley JJ. Equation of motion for a small rigid sphere in anonuniform flow. Phys Fluids 1983;26:883–9.

[15] Brucker KA, Isaza JC, Vaithianathan T, Collins LR. Efficient algorithm forsimulating homogeneous turbulent shear flow without remeshing. J ComputPhys 2007;225:20–32.

[16] Rogallo RS. Numerical experiments in homogeneous turbulence. Tech. Rep.81315; NASA; 1981.

[17] Pope SB. Turbulent flows. New York: Cambridge University Press; 2000.[18] Orszag SA, Patterson GS. Numerical simulation of turbulence. New

York: Springer-Verlag; 1972.[19] Johnson RW. The handbook of fluid dynamics. CRC Press; 1998.[20] Eswaran V, Pope SB. An examination of forcing in direct numerical simulations

of turbulence. Comput Fluids 1988;16:257–78.[21] Witkowska A, Brasseur JG, Juvé D. Numerical study of noise from isotropic

turbulence. J Comput Acoust 1997;5:317–36.[22] Elghobashi SE, Truesdell GC. On the two-way interaction between

homogeneous turbulence and dispersed particles. i: Turbulencemodification. Phys Fluids A 1993;5:1790–801.

[23] Sundaram S, Collins LR. A numerical study of the modulation of isotropicturbulence by suspended particles. J Fluid Mech 1999;379:105–43.


[24] Sundaram S, Collins LR. Collision statistics in an isotropic, particle-ladenturbulent suspension I. Direct numerical simulations. J Fluid Mech1997;335:75–109.

[25] Reade WC, Collins LR. Effect of preferential concentration on turbulentcollision rates. Phys Fluids 2000;12:2530–40.

[26] Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical recipies inFortran. Cambridge: Cambridge University Press; 1999.

[27] Hochbruck M, Ostermann A. Exponential integrators. Acta Numer2010;19:209–86.

[28] Balachandar S, Maxey MR. Methods for evaluating fluid velocities in spectralsimulations of turbulence. J Comput Phys 1989;83:96–125.

[29] Berrut JP, Trefethen LN. Barycentric Lagrange interpolation. Siam Rev2004;46:501–17.

[30] Lekien F, Marsden J. Tricubic interpolation in three dimensions. Int J NumerMethods Eng 2005;63(3):455–71.

[31] van Hinsberg MAT, Thije Boonkkamp JHM, Toschi F, Clercx HJH. On theefficiency and accuracy of interpolation methods for spectral codes. SIAM J SciComput 2012;34(4):B479–98.

[32] Pekurovsky D. P3DFFT: a framework for parallel computations of Fouriertransforms in three dimensions. SIAM J Sci Comput 2012;34(4):C192–209.

[33] Frigo M, Johnson SG. The design and implementation of FFTW3. Proc IEEE2005;93(2):216–31. http://dx.doi.org/10.1109/JPROC.2004.840301.

[34] Engineering and scientific subroutine library (ESSL) and parallel ESSL; 2012.<http://www-03.ibm.com/systems/software/essl/>.

[35] Mascagni M, Srinivasan A. Algorithm 806: SPRNG: a scalable library forpseudorandom number generation. ACM Trans Math Softw2000;26(3):436–61.

[36] Ireland PJ, Vaithianathan T, Collins LR. Massively parallel simulation of inertialparticles in high-Reynolds-number turbulence. In: Seventh internationalconference on computational fluid dynamics; 2012. p. 1–9.

http://dx.doi.org/10.1109/JPROC.2004.840301

http://www-03.ibm.com/systems/software/essl/

Highly parallel particle-laden flow solver for turbulence research

Documents

Transcript of Highly parallel particle-laden flow solver for turbulence research