DOMAIN DECOMPOSITION TECHNIQUES AND ... - CIMEC

DOMAIN DECOMPOSITION TECHNIQUES AND

DISTRIBUTED PROGRAMMING IN

COMPUTATIONAL FLUID DYNAMICS

by

Rodrigo Rafael Paz

A dissertation submitted to the Postgraduate Department of the

FACULTAD DE INGENIERIA Y CIENCIAS HIDRICAS

for partial fulfillment of the requirements

for the degree of

DOCTOR IN ENGINEERING

Field of Computational Mechanics

of the

UNIVERSIDAD NACIONAL DEL LITORAL

2006

TECNICAS DE DESCOMPOSICION DE DOMINIOY PROGRAMACION DISTRIBUIDA EN

MECANICA DE FLUIDOS COMPUTACIONAL

por

Rodrigo Rafael Paz

Tesis remitida a la Comision de Posgrado de la

FACULTAD DE INGENIERIA Y CIENCIAS HIDRICAS

como parte de los requisitos para la obtencion

del grado de

DOCTOR EN INGENIERIA

Mencion Mecanica Computacional

de la

UNIVERSIDAD NACIONAL DEL LITORAL

2006

A Eliana y Guadalupe,

a mis Padres Nestor y Susana,

a mi Hermana Lici

y a la memoria de mi Abuelo Agustın.

Acknowledgments

I will always be in debt to Mario Storti for his advise, support and kindly guide during

the elaboration of this thesis at CIMEC. I have had the privilege to work and teach with

him along these years. Also, I would like to remark that Mario gives special and dedicated

support to every PhD student and researcher at CIMEC laboratory. He can stay by your

side (stuck on a chair) for hours discussing an idea or debugging a (frequently unfriendly)

code as he would be the writer.

I would like to express my deepest appreciation to Prof. Sergio Idelsohn for constant

encouragement. Prof. Idelsohn has given me special participation in one of the most

important projects in which the CIMEC laboratory has been involved. Special thanks to

Norberto Nigro for very insightful discussions and intense collaboration. Beto had always

been interested on my work.

The research documented in this dissertation has been supported by Consejo Nacional

de Investigaciones Cientıficas y Tecnicas (CONICET), the national council for the research

in Argentina.

I would like to name Professor Vitoriano Ruas from the Laboratoire de Modelisation en

Mecanique, Universite Pierre et Marie Curie (Paris VI) and Professor Grigori Panasenko

from the Equipe d’Analyse Numerique, Universite de Saint-Etienne, who gave me special

support during my stage at Paris and Saint-Etienne. It has been very fruitful to work

with them. I am grateful to Carlos Mendez for revising the manuscript of this thesis, for

his useful advices and for the amusing conversations at the river shore in Santa Fe while

eating the well-known choris.

To all my friends at CIMEC. I have had wonderful days working with them in an

enlightening environment.

To my friends, always.

Finally, I would like to express my deepest thanks to Eliana and Guadalupe, for being

always with me, for the love, forbearance and unconditional support. To my father and

mother, Nestor and Susana, and to my sister Lici. They have always taught me the

importance of studying and the freedom that a person needs to do what he believes. To

my grandfather Agustın, I have spent my happiest days with him. My family is the energy

that moves me through life.

Deo Gratias.

Agradecimientos

Estoy inmensamente agradecido a Mario Storti por su direccion, guıa, dedicacion y ayuda;

dejandome ir siempre en la direccion en que me sentıa con mayor confianza. Tambien por

haber confiado en mı para dar clases junto a el en la facultad. A Sergio Idelsohn por creer

en mı y darme participacion dentro de los proyectos en que trabaje durante la tesis. A

Norberto Nigro por las discusiones y charlas, y por haberse interesado siempre en lo que

estaba trabajando.

Quiero agradecerle a CONICET por su programa de soporte para las carreras de

doctorado.

Un agradecimiento especial al Profesor Vitoriano Ruas del Laboratoire de Modelisation

en Mecanique, Universite Pierre et Marie Curie (Paris VI) y al Profesor Grigori Panasenko

del Equipe d’Analyse Numerique, Universite de Saint-Etienne, por el inmenso apoyo

brindado durante mi estadıa en Paris y Saint-Etienne. El trabajo con ellos fue muy

enriquecedor.

Agradezco a Carlos Mendez por la revision del manuscrito de esta tesis y por las

excelentes discusiones que hemos tenido a lo largo de estos anos.

A todos los amigos del CIMEC con los que aprendı y me divertı en un ambiente muy

grato.

A mis amigos, siempre.

Finalmente quiero agradecer a Eliana y a Guadalupe por estar a mi lado siempre, por

el carino y apoyo incondicional y la paciencia grande que tienen. A mis Padres Nestor

y Susana y a mi Hermana Lici por haberme inculcado la importancia del estudio y la

libertad que una persona necesita para hacer lo que cree. A mi abuelo Agustın que

siempre me hizo feliz. Ellos son el motor indispensable para seguir siempre adelante.

Deo Gratias.

Author’s Legal Declaration

This dissertation have been submitted to the Postgraduate Department of the Facultad

de Ingenierıa y Ciencias Hıdricas for partial fulfillment of the requirements for the degree

of Doctor in Engineering - Field of Computational Mechanics of the Universidad Nacional

del Litoral. A copy of this document will be available at the University Library and it

will be subjected to the Library’s legal normative.

Some parts of the work presented in this thesis have been (or are going to be) published

in the following journals: International Journal for Numerical Methods in Engineering, In-

ternational Journal for Numerical Methods in Fluids, Journal of Parallel and Distributed

Computing, Journal of Sound and Vibration, Journal of Computational Methods in Sci-

ence and Engineering and Journal of Computational Physics.

Any comment about the ideas and topics discussed and developed through this docu-

ment will be highly appreciated.

Rodrigo Rafael PAZ

c© Copyright by Rodrigo Rafael PAZ – 2006

All Rights Reserved

Introduction

The large spread in length and time scales present in Computational Fluid Dynamics

(CFD) problems and its interaction with solid or elastic bodies (e.g., coupled surface-

subsurface flows, high speed wind flows around complex bodies, non-linear fluid-structure

interactions) requires a high degree of refinement in the finite element mesh and, then,

requires very large computational resources.

The solution of ‘Large Scale’ CFD problems has a particular challenge that is the

efficient use of the allowable computational resources [LTV97, SP96]. If no suitable nu-

merical techniques are used to reduce, optimize and/or simplify the problem at hand it

may be necessary to increase computational resources in order to handle the problem.

Newer technologies and even faster and powerful (super-)computers make that the prob-

lems to be resolved be even larger and complex (i.e., larger domains, larger number of

degree of freedom’s (dof’s), models with increasing number of evolution variables, coupled

interacting fields). That is why the mathematical models used nowadays could be more

complex and complete (from a physical point of view) making the simulations extensive

and complicated. The constraint on the allowable computer resources is always present

and that is the reason of the urgent development and verification of solution techniques

that exploit efficiently the potential of new computers and the possibility to obtain solu-

tions of high quality [PNS06] in a affordable simulation time (i.e., CPU time). This thesis

has been conceived based on that fact.

During last decades there have been developed and tested a wide diversity of linear sys-

tem ‘solvers’ and they have been applied to the resolution of ‘real world’ physics problems

by means of the discretization of coupled (or not) sets of non-linear Partial Differential

Equations (PDE’s) via the Finite Element Method (FEM), the Finite Difference Method

(FDM) and/or Finite Volume Method (FVM). Since no long time ago (even at present

days), it has been preferred the direct solution of these systems of equations instead of

iterative schemes due to its higher robustness and its predictable behavior. Nevertheless,

the increasing number of iterative techniques and the proposed improvements, jointly with

the need to solve larger problems in Computational Mechanics area have bent to the use

xi

xii

of this kind of schemes and the development of newer ones.

This trend has been taking place since early seventies when two crucial developments

marked up an inflection point in the solution techniques of ‘large scale’ systems of equa-

tions. On of these was the advantage of using ‘low density’ or ‘low sparsity’ matrices that

arise in FEM (as well as in FDM and FVM) when discretized PDE’s are stated. The other

was the development of iterative methods over the Krylov space (or sub-space) such as

Conjugate Gradients (CG) and Generalized Minimal Residuals (GMRes) [SS86, Saa00].

Gradually, iterative methods (and their variants such as preconditioning) have attained

popularity and have begun to be extensively used by the scientific community and software

developers. Particularly, there is a vast amount of written work on the use of CG methods

for solving large scale coercive systems such as those resulting from the discretization of

linear elastic problem, potential flows and heat conduction, among others.

Nowadays, very large scale systems that arise in the context of FEM treatment of non-

linear transient governing equations are solved in high performance computers (parallel

and vectorized architectures) by means of iterative methods due to less requirements in

communications between processor than those required by direct methods (such as LU

decomposition, multifrontal methods).

The iterative ‘Substructuring’ Method, or Domain Decomposition Method (DDM) and

iteration over the ’Schur Complement’ Matrix on non-overlapping sub-domains, lead to

a reduced system better suited (i.e., lower condition number κ(A) and better eigenvalue

distribution) for Krylov-based iterative solution than the global solution. In Schur com-

plement domain decomposition general method, the condition number is lowered (∝ 1/h

vs ∝ 1/h2 for the global system, h being the mesh size) and the computational cost per

iteration is not so high once the sub-domain matrices have been factorized.

Iterative substructuring methods rely on a non-overlapping partition into sub-domains

(substructures). The efficiency of these methods can be further improved by using pre-

conditioners [LTV97]. Once the degrees of freedom inside the substructures have been

eliminated by block Gaussian elimination (or other algorithm), a preconditioner for the

resulting Schur complement system is built with matrix blocks relative to a decompo-

sition of interface finite element functions into subspaces related to geometrical objects

(vertices, edges, faces, single substructures) or simply by the coefficients of sub-domain

matrices near the interface. Iterative methods like CG and GMRes are then employed.

Early works, such as [BPS86, BPS89], have influenced most of the later work in the field.

They proposed two spaces for the coarse problem. One of their coarse spaces is given in

terms of the averages of the nodal values over the entire substructure boundaries ∂Ωi.

The other space is defined by extending the wire basket (we recall that the wire basket is

xiii

the union of the boundaries of the faces which separate the substructures) values as a two

dimensional discrete harmonic function onto the faces, and then as a discrete harmonic

function into the interiors of the sub-domains.

For self-adjoint positive semidefinite problems, Neumann-Neumann preconditioner is

the most classical one. From a mathematical point of view, the preconditioner is defined

by approximating the inverse of the global Schur complement matrix by the weighted sum

of local Schur complement matrices. From a physical point of view, Neumann-Neumann

preconditioner is based on splitting the flux applied to the interface in the preconditioning

step and solving local Neumann problems in each sub-domain. This strategy is good only

for symmetric operators.

Another family of DDM, the overlapping Schwarz domain decomposition schemes,

have also been extensively used in computational mechanics. A good introduction and

applications of these methods is presented by Smith and coworkers in Reference [SBrG96].

In the CFD area, Rachowicz [Rac97] applied successfully the GMRes solver with a domain

decomposition Schwarz-type preconditioner in the solution of hypersonic high Reynolds

number flows with strong shock-boundary layer interaction.

The main purpose of the present thesis work is the efficient solution of large scale

problems arising in Computational Fluid Dynamics challenge problems, the proposi-

tion of new ideas in preconditioning techniques, the implementation of such ideas in a

parallel multiphysics C++ code using the message passing paradigm via MPI/PETSc li-

braries [GLS94, BGCMS04] and its evaluation on a Beowulf class cluster [SSBS99]. These

topics are presented in the first part of this work. The second part is devoted to the

application of the algorithm proposed in the first part (§I) to the solution of more gen-

eral/complex problems like the wave absorption on fictitious boundaries and the resolution

of fluid-structure problems in the supersonic regime of a compressible fluid flow.

Introduccion

La diversidad de escalas de tiempo y de espacio presentes en problemas relacionados con la

mecanica de fluidos y su interaccion con cuerpos solidos (e.g., problemas de la hidrologıa

superficial y subterranea acoplados o no, flujo de viento alrededor de cuerpos, edificios

o vehıculos, etc.) requiere un alto grado de refinamiento en las mallas utilizadas en el

metodo de elementos finitos, y por lo tanto, demanda grandes recursos computacionales.

La solucion de problemas en ‘gran escala’ en la mecanica computacional tiene un

desafıo particular y es el de utilizar eficientemente los recursos disponibles [LTV97, SP96].

Si no se utilizan adecuadas tecnicas numericas para reducir, optimizar y/o simplificar

el problema, es menester contar con grandes recursos computacionales para tratar el

problema. Por otro lado, el auge de computadoras cada vez mas rapidas y con mayor

capacidad de calculo hace que los problemas que se quieren resolver sean cada vez mas

grandes y complejos (i.e., mayores y mas variadas escalas, acople de distintos campos,

modelos que tengan en cuenta otras variables y su evolucion e interaccion con las demas,

etc.). Es ası que los modelos matematicos son cada vez mas complejos y sofisticados,

haciendo que las simulaciones de los sistemas resultantes sean extensas y complicadas.

La restriccion sobre los recursos computacionales disponibles esta siempre presente y por

eso la urgencia en el desarrollo y verificacion de tecnicas de solucion capaces de explotar

eficientemente el potencial de las modernas computadoras y la posibilidad de obtener

soluciones de buena calidad en un tiempo aceptable de simulacion (tiempo de CPU). La

presente tesis nace de esta necesidad.

Durante varias decadas se han desarrollado y probado tecnicas concernientes a la

solucion de problemas lineales que son resultado de la aplicacion del metodo de elementos

finitos (MEF) a ecuaciones diferenciales en derivadas parciales (EDDP) que tratan de

describir un conjunto de eventos de la fısica (e.g., mecanica de cuerpos solidos, dinamica

estructural, dinamica de fluidos, etc.). Hasta no hace mucho tiempo, la solucion directa de

estos sistemas era preferida a la solucion iterativa debido a su mayor robustez y al caracter

predictivo de su comportamiento. Sin embargo, la gran cantidad de tecnicas iterativas que

han sido desarrolladas, conjuntamente con la necesidad de resolver sistemas de ecuaciones

xv

xvi

cada vez mas grandes en diferentes arquitecturas han dado como resultado una inclinacion

al uso de este tipo de tecnicas y al desarrollo de nuevas.

Esta tendencia se viene dando desde 1970 cuando dos importantes desarrollos mar-

caron un punto de inflexion en la solucion de grandes sistemas de ecuaciones. Uno fue la

explotacion de la ‘baja densidad’ (por sparsity, matrices ralas, matrices con tasa de llenado

baja) de los sistemas que resultan de la aplicacion del MEF (como ası tambien del metodo

de diferencias finitas MDF) a las EDDP. El otro fue el desarrollo de metodos tales como los

de Krylov (o metodos tipo gradientes conjugados precondicionados). Gradualmente los

metodos iterativos (precondicionamiento e iteracion en el espacio de Krylov) comenzaron

a aproximarse en calidad a las soluciones provistas por metodos directos. Particular-

mente, mucho se ha escrito sobre el metodo de gradientes conjugados precondicionado

para sistemas lineales simetricos que resultan de operadores simetricos (e.g., elasticidad

lineal y no lineal, flujo potencial, etc.).

Hoy, los grandes sistemas de ecuaciones obtenidos de las EDDP no lineales medi-

ante el MEF para problemas transitorios en dos y tres dimensiones donde pueden haber

varias incognitas por nodo, son resueltos con metodos iterativos en computadoras de alta

performance (arquitecturas paralelas o vectoriales) debido a que requieren mucha menor

comunicacion entre los procesadores que la necesaria en metodos directos donde la solucion

de cada una de las incognitas esta acoplada con las demas.

El metodo de subestructuracion (o metodo de descomposicion de dominios e iteracion

sobre la matriz complemento de Schur para dominios no solapados) conduce a sistemas

reducidos mejor condicionados para la solucion mediante metodos de Krylov. El numero

de condicion de estos problemas se ve disminuido en un factor 1/h (∝ 1/h vs ∝ 1/h2

para el sistema global, siendo h la dimension caracterıstica de la malla) y el costo com-

putacional por iteracion no se ve encarecido debido a que las matrices correspondientes a

los grados de libertad de los subdominios (grados de libertad interiores) ya han sido

factorizadas. La eficiencia de estos metodos puede ser mejorada mediante el uso de

precondicionadores [Meu99, Man93, BPS86, Cro02]. Diferentes tecnicas de precondi-

cionamiento han sido propuestas y la reduccion del numero de condicion de las matrices ha

sido demostrada en el marco de ecuaciones diferenciales lineales elıpticas (e.g., precondi-

cionadores del tipo wire basket, Neumann-Neumann y sus variantes para los problemas

de elasticidad y flujo de Stokes).

En este trabajo se buscara solucionar eficientemente los sistemas de ecuaciones prove-

nientes de la discretizacion mediante el MEF o MDF de ecuaciones diferenciales no lineales

en derivadas parciales que representan modelos numericos de problemas reales (como los

descriptos arriba) considerados un desafıo para los metodos computacionales actuales. El

xvii

objetivo es tambien el desarrollo de un codigo de elementos finitos orientado a objetos

(que reduce drasticamente las dependencias de implementacion entre subsistemas y que

conduce al principio de reusabilidad de disenos de interfaces) que resuelva problemas de

la mecanica de fluidos computacional en gran escala en forma distribuida mediante la

tecnica de paso de mensajes (MPI/PETSc [GLS94, BGCMS04]). Esta tecnica es amplia-

mente explotable en arquitecturas de computadoras paralelas, tales como la de ‘clusters’

Beowulf [SSBS99]. En la primer parte de esta tesis seran expuestos y desarrollados los

topicos relacionados con los metodos de descomposicion de dominios y su desempeno en

problemas clasicos de la mecanica de fluidos computacional. La segunda parte esta ded-

icada a la aplicacion del algoritmo propuesto en la primera parte (§I) a la solucion de

problemas mas generales/complejos como lo es la absorcion de ondas en fronteras ficticias

y la solucion de problemas de interaccion fluido/estructura para el flujo supersonico de

un fluido compresible.

Contents

I Domain Decomposition Methods 1

1 Preliminaries 3

1.1 Solution of Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1.1 Perturbation Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1.2 Condition Number . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Basic Iterative Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.1 Optimal Iteration Methods . . . . . . . . . . . . . . . . . . . . . . . 11

2 The ‘Interface Strip Preconditioner’ for Domain Decomposition Meth-

ods 13

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Schur Complement Domain Decomposition Method . . . . . . . . . . . . . 18

2.2.1 The Steklov Operator . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.2 Eigenvalues of Steklov Operator . . . . . . . . . . . . . . . . . . . . 21

2.3 Preconditioners for the Schur Complement Matrix . . . . . . . . . . . . . . 24

2.3.1 The Neumann-Neumann Preconditioner . . . . . . . . . . . . . . . 25

2.3.2 The Interface Strip Preconditioner (ISP) . . . . . . . . . . . . . . . 27

2.4 The Advective-Diffusive Case . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.5 Implementation of the Neumann-Neumann Preconditioner . . . . . . . . . 36

2.5.1 The Balancing Neumann-Neumann Version . . . . . . . . . . . . . . 38

2.6 The Interface Strip Preconditioner: Solution of the Strip Problem . . . . . 42

2.6.1 Implementation Details of the IISD Solver . . . . . . . . . . . . . . 44

2.7 Classical Overlapping Domain Decomposition

Method: Alternating Schwarz Methods . . . . . . . . . . . . . . . . . . . . 46

2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

xix

xx CONTENTS

3 Numerical Tests 49

3.1 Numerical Examples in Sequential Environments . . . . . . . . . . . . . . . 50

3.1.1 The Poisson’s Problem . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.1.2 The Scalar Advective-Diffusive Problem . . . . . . . . . . . . . . . 51

3.1.3 The Hypersonic Flow Over a Flat Plate Test . . . . . . . . . . . . . 53

3.2 Numerical Examples in Parallel Environment . . . . . . . . . . . . . . . . . 60

3.2.1 The Poisson’s Problem . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.2.2 The Scalar Advective-Diffusive Problem . . . . . . . . . . . . . . . 63

3.2.3 The Coupled Hydrological Flow Model . . . . . . . . . . . . . . . . 65

3.2.4 The Stokes Flow in a Long Horizontal Channel . . . . . . . . . . . 74

3.2.5 The Viscous Incompressible Navier-Stokes Flow Around an Infinite

Cylinder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

3.2.6 Navier-Stokes Flow Using the Fractional Step Scheme. The Lid

Driven Cavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3.2.7 The Wind Flow Around a 3D Immersed Body.

The AHMED Model . . . . . . . . . . . . . . . . . . . . . . . . . . 92

3.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

II Applications and Usage 99

4 Dynamic Boundary Conditions in CFD 101

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.2 General Advective-Diffusive Systems of Equations . . . . . . . . . . . . . . 104

4.2.1 Linear Advection-Diffusion Model . . . . . . . . . . . . . . . . . . . 105

4.2.2 Gas Dynamic Equations . . . . . . . . . . . . . . . . . . . . . . . . 105

4.2.3 Shallow Water Equations . . . . . . . . . . . . . . . . . . . . . . . . 106

4.2.4 Channel Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

4.3 Variational Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

4.4 Absorbing Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . 107

4.4.1 Advective-Diffusive Systems in 1D . . . . . . . . . . . . . . . . . . . 109

4.4.2 Linear 1D Absorbing Boundary Conditions . . . . . . . . . . . . . . 110

4.4.3 Multidimensional Problems . . . . . . . . . . . . . . . . . . . . . . 112

4.4.4 Absorbing Boundary Conditions for Nonlinear Problems . . . . . . 114

4.4.5 Riemann Based Absorbing Boundary Conditions . . . . . . . . . . . 114

4.4.6 Absorbing Boundary Conditions Based on Last State . . . . . . . . 116

4.4.7 Imposing Nonlinear Absorbing Boundary Conditions . . . . . . . . 117

CONTENTS xxi

4.4.8 Numerical Example. Viscous Compressible Subsonic Flow Over a

Parabolic Bump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

4.5 Dynamically Varying Boundary Conditions . . . . . . . . . . . . . . . . . . 121

4.5.1 Varying Boundary Conditions in External Aerodynamics . . . . . . 121

4.5.2 Aerodynamics of Falling Objects . . . . . . . . . . . . . . . . . . . 124

4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5 Strong Coupling Strategy for Fluid-Structure Interaction Problems in

Supersonic Regime Via Fixed Point Iteration 131

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.2 Strongly Coupled Partitioned Algorithm Via Fixed Point Iteration . . . . . 133

5.2.1 Notes on the Fluid/Structure Interaction (FSI) Algorithm . . . . . 135

5.3 Description of Test Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

5.3.1 Dimensionless Parameters . . . . . . . . . . . . . . . . . . . . . . . 138

5.3.2 Houbolt’s Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

5.3.3 FSI Code Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

5.4 Stability of the Weak/Strong Staged Coupling Outside the Flutter Region 151

5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

III Final Conclusions 157

6 Overview and Final Remarks 159

IV Appendix 161

A Functional Spaces 163

A.1 Some Used Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

A.2 Extension to Vector-Valued Functions . . . . . . . . . . . . . . . . . . . . . 164

B Resumen extendido en castellano 167

B.1 El Metodo de Descomposicion de Dominios en Mecanica de Fluidos Com-

putacional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

B.2 Ecuaciones de Gobierno . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

B.2.1 Propiedades Continuas de los Fluidos . . . . . . . . . . . . . . . . . 170

B.2.2 Campos Lagrangianos y Eulerianos . . . . . . . . . . . . . . . . . . 171

B.2.3 La Ecuacion de Continuidad . . . . . . . . . . . . . . . . . . . . . . 174

xxii CONTENTS

B.2.4 La Ecuacion de Cantidad de Movimiento . . . . . . . . . . . . . . . 175

B.2.5 Las Ecuaciones de Navier-Stokes en Sistemas de Referencia No Iner-

ciales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

B.2.6 Las Ecuaciones de Navier-Stokes Incompresibles . . . . . . . . . . . 177

B.3 Formulacion de otros Modelos Matematicos a Tratar . . . . . . . . . . . . 179

B.3.1 Problemas Hidrologicos . . . . . . . . . . . . . . . . . . . . . . . . . 179

B.4 Computacion de Alta Performance . . . . . . . . . . . . . . . . . . . . . . 182

B.4.1 Resolucion Numerica del Modelo de CFD/Hidrologıa Superficial y

Subterranea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

B.4.2 Solucion de Grandes Sistemas de Ecuaciones . . . . . . . . . . . . . 183

B.4.3 Metodos de Descomposicion de Dominio . . . . . . . . . . . . . . . 185

B.4.4 Precondicionamiento . . . . . . . . . . . . . . . . . . . . . . . . . . 188

B.4.5 Implementacion Operativa del Cluster . . . . . . . . . . . . . . . . 193

B.5 Algunas Definiciones Topologicas . . . . . . . . . . . . . . . . . . . . . . . 194

B.6 Dominio Lipschitz, Frontera Lipschitz . . . . . . . . . . . . . . . . . . . . . 194

B.7 Funcion Lipschitz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

B.8 Problemas Bien Planteados en el Sentido de Hadamard . . . . . . . . . . . 195

List of Tables

2.1 Condition number for the Steklov operator and several preconditioners

(mesh: 50× 50 elements, strip: 5 layers of nodes) . . . . . . . . . . . . . . 33

2.2 Condition number for the Steklov operator and several preconditioners

(mesh: 100× 100 elements, strip: 10 layers of nodes) . . . . . . . . . . . . 33

3.1 CPU time and memory requirements per proc. for Poisson problem (mesh 500×500 elements). Note: * in table means iteration failed to converge to a

specified tolerance in a maximum of 200 its. . . . . . . . . . . . . . . . . . 62

3.2 CPU time and memory requirements per proc. for advective-diffusive prob-

lem (mesh 1000× 1000 elements). Note: * in table means iteration failed

to converge to a specified tolerance in a maximum of 200 its. . . . . . . . . 65

3.3 CPU time and memory requirements for Saint-Venant equations (mesh 500×500 elements). Note: * in table means iteration failed to converge to a

specified tolerance in a maximum of 400 its. . . . . . . . . . . . . . . . . . 71

B.1 Algoritmo Gradiente Conjugado Precondicionado . . . . . . . . . . . . . . 189

xxiii

List of Figures

1.1 Families of Solvers: Direct and Iterative Solvers . . . . . . . . . . . . . . . 4

1.2 Families of Solvers: Domain Decomposition Solvers . . . . . . . . . . . . . 5

1.3 Aleksei Nikolaevich Krylov (1863–1945) . . . . . . . . . . . . . . . . . . . . 5

1.4 Carl Gustav Jacob Jacobi (1804–1851) . . . . . . . . . . . . . . . . . . . . 6

1.5 Johann Carl Friedrich Gauß (1777–1855) . . . . . . . . . . . . . . . . . . . 7

1.6 Andre-Louis Cholesky (1875–1918) . . . . . . . . . . . . . . . . . . . . . . 10

1.7 Pafnuty Lvovich Chebyshev (1821–1894) . . . . . . . . . . . . . . . . . . . 10

1.8 Lewis Fry Richardson (1881–1953) . . . . . . . . . . . . . . . . . . . . . . . 11

1.9 Cornelius Lanczos (1893–1974) . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1 Issai Schur (1875–1941) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Carl Gottfried Neumann (1832–1925) . . . . . . . . . . . . . . . . . . . . . 15

2.3 Domain Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4 Johann Peter Gustav Lejeune Dirichlet (1805–1859) . . . . . . . . . . . . . 17

2.5 Joseph–Louis Lagrange (1736–1813) . . . . . . . . . . . . . . . . . . . . . . 17

2.6 Simon-Denis Poisson (1781–1840) . . . . . . . . . . . . . . . . . . . . . . . 20

2.7 Vladimir Andreevich Steklov (1864–1926) . . . . . . . . . . . . . . . . . . . 21

2.8 Pierre–Simon Laplace 1749–1827) . . . . . . . . . . . . . . . . . . . . . . . 22

2.9 Eigenfunctions of Schur complement matrix with 2 sub-domains . . . . . . 24

2.10 Eigenfunctions of Schur complement matrix with 9 sub-domains . . . . . . 25

2.11 Eigenfunctions of Schur complement matrix with 2 sub-domains and ad-

vection (global Peclet 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.12 Eigenvalues of Steklov operators and preconditioners for the Laplace oper-

ator (Pe = 0) and symmetric partitions (L1 = L2 = L/2, b = 0.1L) . . . . 31

2.13 Eigenvalues of Steklov operators and preconditioners for the Laplace oper-

ator (Pe = 0) and non-symmetric partitions (L1 = 0.75L, L2 = 0.25L, b =

0.1L) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

xxv

xxvi LIST OF FIGURES

2.14 Eigenvalues of Steklov operators and preconditioners for the advection-

diffusion operator (Pe = 5) and symmetric partitions (L1 = L2 = L/2, b =

0.1L) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.15 Eigenvalues of Steklov operators and preconditioners for the advection-

diffusion operator (Pe = 50) and symmetric partitions (L1 = L2 = L/2, b =

0.1L) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.16 Robert Lee Moore (1882–1974) . . . . . . . . . . . . . . . . . . . . . . . . 40

2.17 Roger Penrose (1931–) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.18 Strip Interface problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.19 IISD decomposition by sub-domains. Actual decomposition . . . . . . . . . 44

2.20 Non local element contribution due to bad partitioning . . . . . . . . . . . 45

2.21 Hermann Amandus Schwarz (1843–1921) . . . . . . . . . . . . . . . . . . . 47

3.1 Leonhard Euler (1707–1783) . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.2 Solution of Poisson’s problem . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.3 Solution of advective-diffusive problem . . . . . . . . . . . . . . . . . . . . 52

3.4 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.5 Claude Louis Marie Henri Navier (1785–1836) . . . . . . . . . . . . . . . . 54

3.6 George Gabriel Stokes (1819–1903) . . . . . . . . . . . . . . . . . . . . . . 54

3.7 Leopold Kronecker (1823–1891) . . . . . . . . . . . . . . . . . . . . . . . . 56

3.8 Osborne Reynolds (1842–1912) . . . . . . . . . . . . . . . . . . . . . . . . 57

3.9 Ernst Mach (1838–1916) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.10 Skin friction coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.11 Stanton number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.12 Solution of Poisson’s problem (mesh 500× 500 elements) . . . . . . . . . . 63

3.13 Solution of advective-diffusive problem (mesh 500× 500 elements) . . . . . 64

3.14 Iteration counts for advective-diffusive problem (mesh 1000× 1000 elements) 64

3.15 Adhemar Jean Claude Barre de Saint-Venant (1797–1886) . . . . . . . . . 66

3.16 Stream/Aquifer coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.17 Iteration counts for Saint-Venant system of equations (mesh 500 × 500

elements) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.18 Solution of Saint-Venant system of equations (mesh 500× 500 elements) . 71

3.19 Iteration counts for the coupled flow . . . . . . . . . . . . . . . . . . . . . 72

3.20 Soybean location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.21 Difference in phreatic levels for both cases . . . . . . . . . . . . . . . . . . 73

3.22 Aquifer State at t=2 years . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

LIST OF FIGURES xxvii

3.23 Olga Alexandrovna Ladyzhenskaya (1922–2004) . . . . . . . . . . . . . . . 75

3.24 Phyllis Nicolson (1917–1968) . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.25 John Crank (1916–) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.26 Residual history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

3.27 Velocity field in the channel height (nnwt=1) . . . . . . . . . . . . . . . . . 80

3.28 Pressure field along channel (nnwt=1) . . . . . . . . . . . . . . . . . . . . 81

3.29 Velocity field in the channel height (nnwt=100 for Global GMRes, nnwt=3

for IISD+ISP, nnwt=20 for additive Schwarz, nnwt=22 for block-Jacobi) . 81

3.30 Pressure field along channel (nnwt=100 for Global GMRes, nnwt=3 for

IISD+ISP, nnwt=20 for additive Schwarz, nnwt=22 for block-Jacobi) . . . 82

3.31 Theodore von Karman (1881–1963) . . . . . . . . . . . . . . . . . . . . . . 83

3.32 Re = 100. Residual history . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

3.33 Re = 100. viscous x-force coefficient . . . . . . . . . . . . . . . . . . . . . . 84

3.34 Re = 100. viscous y-force coefficient . . . . . . . . . . . . . . . . . . . . . . 84

3.35 Re = 100. viscous z-moment coefficient . . . . . . . . . . . . . . . . . . . . 85

3.36 3D LES flow at Re = 5 · 104. Top: initial state, bottom: pseudo-stationary

state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3.37 Residual history for Poisson Step . . . . . . . . . . . . . . . . . . . . . . . 88

3.38 Time-converged solution for IISD+ISP solver (Re = 1000) . . . . . . . . . 90

3.39 Scalability properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

3.40 Stokes Flow. Residual history (max. of 100 Newton iterations) . . . . . . . 93

3.41 Stokes Flow. Force and moment coefficients . . . . . . . . . . . . . . . . . 93



3.44 Re = 1000. Residual history (100 time steps, 10 seconds of simulation) . . 95

3.45 Re = 1000. Force and moment coefficients . . . . . . . . . . . . . . . . . . 95



3.48 Re = 4.25e6. Friction lines . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.1 Shallow water flow and wave absorption at artificial boundaries . . . . . . 108

4.2 Temporal evolution of axial velocity in 1D gas dynamics problem without

absorbing boundary condition at outlet . . . . . . . . . . . . . . . . . . . . 111

4.3 Temporal evolution of axial velocity in 1D gas dynamics problem with ab-

sorbing boundary condition at outlet . . . . . . . . . . . . . . . . . . . . . 112

xxviii LIST OF FIGURES

4.4 Rate of converge of 1D gas dynamics problem with and without absorbing

boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

4.5 Rate of converge of 1D gas dynamics problem in full non-linear regime with

different kind of absorbing boundary conditions . . . . . . . . . . . . . . . 113

4.6 Georg Friedrich Bernhard Riemann (1826–1866) . . . . . . . . . . . . . . . 115

4.7 Riemann invariants at boundaries with ULSAR ABC’s . . . . . . . . . . . 117

4.8 Convergence history when using with ULSAR ABC’s . . . . . . . . . . . . 118

4.9 Boris Grigorievich Galerkin (1871–1945) . . . . . . . . . . . . . . . . . . . 118

4.10 Problem geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

4.11 y-Force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

4.12 y-Force evolution for absorbent conditions . . . . . . . . . . . . . . . . . . 123

4.13 Number of incoming/outgoing characteristics changing on an accelerating

body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

4.14 Falling ellipse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

4.15 Gaspard-Gustave de Coriolis (1792–1843) . . . . . . . . . . . . . . . . . . . 125

4.16 Computed trajectory of falling ellipse . . . . . . . . . . . . . . . . . . . . . 127

4.17 Ellipse falling at supersonic speeds. Colormaps of |u|. Station A (t = 3.75),

station B (t = 6.25), station C (t = 10). Stations in the trajectory refer

to Figure 4.16. Results are shown in a non-inertial frame attached to the

ellipse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

4.18 Ellipse velocities for different external radius . . . . . . . . . . . . . . . . . 130

5.1 Thomas Simpson (1710–1761) . . . . . . . . . . . . . . . . . . . . . . . . . 136

5.2 Description of test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

5.3 Lowest frequency mode for test case . . . . . . . . . . . . . . . . . . . . . . 142

5.4 Mach 2.2, phase 0. Black= plate deflection, blue=pressure, green=power.

Quantities normalized (not to scale) . . . . . . . . . . . . . . . . . . . . . . 143

5.5 Mach 2.27, phase 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

5.6 Mach 2.35, phase 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

5.7 Plate deflection in distributed points along plate at M=1.8 . . . . . . . . . 145

5.8 Plate deflection in distributed points along plate at M=2.225 . . . . . . . . 146





5.13 Fluid and structure fields at M=3.2 . . . . . . . . . . . . . . . . . . . . . . 149

LIST OF FIGURES xxix

5.14 Fluid and structure fields at M=3.2 . . . . . . . . . . . . . . . . . . . . . . 150

5.15 Experimentally determined order of convergence with ∆t for the uncoupled

algorithm with fourth order predictor . . . . . . . . . . . . . . . . . . . . . 151

5.16 Convergence of fluid state in stage loop . . . . . . . . . . . . . . . . . . . . 152

5.17 Convergence of structure state in stage loop . . . . . . . . . . . . . . . . . 153

5.18 Stability analysis - Staged algorithm with nstage = 5. Vertical displacements

of the plate vs time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

5.19 Stability analysis - Non-staged algorithm. Vertical displacements of the

plate vs time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

5.20 Unstable weak coupling for m = 0.0135 and CFL = 0.5 . . . . . . . . . . . 154

5.21 Stable staged coupling for m = 0.0135, CFL = 1 and nstage = 2 . . . . . . 155

5.22 Strong partitioned scheme in a coarse mesh . . . . . . . . . . . . . . . . . . 155

5.23 Strong partitioned scheme in a fine mesh . . . . . . . . . . . . . . . . . . . 156

A.1 Sergei Lvovich Sobolev (1908–1989) . . . . . . . . . . . . . . . . . . . . . . 163

A.2 David Hilbert (1862–1943) . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

A.3 Augustin Louis Cauchy (1789–1857) . . . . . . . . . . . . . . . . . . . . . . 165

B.1 Trayectoria de una partıcula . . . . . . . . . . . . . . . . . . . . . . . . . . 172

B.2 Sistema de referencia no inercial . . . . . . . . . . . . . . . . . . . . . . . . 176

B.3 Descomposicion del Dominio . . . . . . . . . . . . . . . . . . . . . . . . . . 186

B.4 Rudolf Otto Sigismund Lipschitz (1832–1903) . . . . . . . . . . . . . . . . 194

Nomenclature

Greek Letters

~θ(t) force due to ~ω(t) change per mass unit

δij Kronecker’s tensor

κ(A) condition number of matrix A

λp mean path length between particles

λmax(A) maximum eigenvalue of matrix A

λmin(A) minimum eigenvalue of matrix A

µ dynamic viscosity

ν kinematic viscosity

Ψ gravitational potential

ρ(x, t) fluid density at point x and time t

ρ+ Lagrangian density

τij(x, t) stress tensor

~ω(t) velocity of the non-inertial frame of reference

Roman Letters

u+ Lagrangian velocity

fC Coriolis force per mass unit

fc centrifugal force per volume unit

fext external tractions

n unit vector normal to a surface

u(x, t) averaged particle velocity

X+(y,Y) particle position at time t and Y location

J(t,Y) Jacobian determinant of the mapping between referential frame and material

frame

det(A) determinant of matrix A

` smallest geometrical scale

`∗ medium length scale

xxxi

xxxii Nomenclature

inf infimum of a set

KSPdim Krylov subspace dimension

K the real or complex set numbers

Ki+1(A; r0) Krylov subspace generated by A and r0

P static pressure

S(t) material surface that encloses V

S0 material surface that encloses V at t0

Vx ball-like region

V(x, t) specific fluid volume

nd number of space dimension

Re Reynolds number

sup supremum of a set

~r material point as seen from rotational frame

g gravity acceleration

Kn Knudsen number

p modified pressure

t time [sec]

t0 reference time

v velocity due to the rotation of the frame of reference

w fluid velocity as seen from non-inertial frame of reference

Part I

Domain Decomposition Methods

1

Chapter 1

Preliminaries

At this very moment the

search is on,

every numerical analyst has a favorite preconditioner,

and you have a perfect chance to find a better one.

Gil Strang, 1986

Solving mathematical problems in computational mechanics is the major area of sci-

entific computation. Many of these mathematical problems arise in the engineering dis-

ciplines when modeling physical behavior of complex systems. The solution process of

non-linear problems frequently is composed of a iterated solution of linearized problems.

Other linear problems arise on the computation of the solution of a linear system of equa-

tions and the determination of eigenvalues and eigenvectors of a linear mapping. In other

words, matrices play a star role in numerical computations. There are three ways to solve

problems of this type, direct approaches, iterative approaches and a mixture between

them. Figures 1.1 and 1.2 show these three groups of methods and its sub-groups. Even

though the computation of eigenvalues has to be iterative, previous reductions to simpler

form are mostly based on direct approaches. Direct approaches are more natural and have

been used for a long time. Most direct methods used nowadays are stable and robust.

Large scale matrix computations are often based on iterative approaches. The overlap-

ping and non-overlapping Domain Decomposition Methods is an hybrid technique that

3

4 Chapter 1. Preliminaries

Figure 1.1: Families of Solvers: Direct and Iterative Solvers

combines features from both methods in order to obtain better suited systems when ill-

conditioning is verified. A broad class of iterative methods is given by the class of Krylov

subspace methods. Krylov subspace methods have a variety of favorable characteristics,

at least in exact arithmetic:

i) Krylov methods are direct methods. They are coordinate free variants of some well-

known matrix reduction and matrix decomposition algorithms.

ii) Krylov methods are optimal methods. They compute the optimal solution in a

subspace subject to method dependent constraints.

iii) Krylov methods are cheap methods. When considered as iterative methods, Krylov

methods tend to converge fast with mostly linear operation count and storage

amount per step to the solution.

iv) Krylov methods are at the heart of numerical analysis. Krylov methods are related

5

Additive Schwarz Methods Multiplicative Schwarz Methods

Overlapping DDM

Domain Decomposition Methods (Preconditioners)

non−Overlapping DDM

Schur Complement Matrix prec.(Substructuring)

Neumann−Neumann prec.

FETI (and its variants)

Block−Jacobi prec.

Coarse correction variants

Present Research Proposal

Balancing Neumann−Neumann prec.

Figure 1.2: Families of Solvers: Domain Decomposition Solvers

Figure 1.3: Aleksei Nikolaevich Krylov (1863–1945)

to structured eigenvalue problems, to orthogonal polynomials, to rational approxi-

mation theory. Amongst others, this enables detailed convergence analysis.


v) Krylov methods are closely related to each other. In particular, a linear system solver

can be used to extract eigenvalues, and vice versa.

On the other hand, in finite precision arithmetic (an original work on this topic can be

consulted in Reference [Zem03]), Krylov methods do not terminate after a finite number

of steps. The solutions are not optimal in the Krylov subspace constructed. Nevertheless,

Krylov methods compute useful results. Only a part of the matrix relations defining the

methods in infinite precision have a finite precision counterpart.

A numerical analysis sight shows an important difference between dense and sparse

systems. For dense systems the state of the art almost seems to have reached its final

destination via a variety of well-known and well understood algorithms. The situation

is not optimistic when looking at sparse systems. Most direct methods lead to storage

problems due to fill-in and numerical instabilities due to restrictions on the pivoting

strategies. The classical iterative methods like Jacobi, Gauß-Seidel and SOR in general

are converging too slowly to the solution to be of practical use. Krylov subspace methods

Figure 1.4: Carl Gustav Jacob Jacobi (1804–1851)

are direct methods (they terminate after a finite number of steps, at least in theory) and

improve over the classical iterative methods in the sense of being optimal. Krylov methods

are frequently the method of choice for large sparse problems. Past Krylov methods were

developed in the early fifties and the first papers written by Lanczos also appeared by

that years [Lan50, Arn51]. Due to a lack of better understanding (i.e., the methods

where not competitive to the other direct methods in terms of accuracy and stability)

they were abandoned, or only used in conjunction with complete reorthogonalization,

which made them less competitive. It was necessary almost twenty years for theoretical

and practical recognition of Krylov methods. Krylov methods are only competitive when

1.1. Solution of Linear Systems 7

Figure 1.5: Johann Carl Friedrich Gauß (1777–1855)

used with preconditioning. There is a vast amount of work in scientific journals such

as Mathematics of Computation, SIAM Journal on Matrix Analysis and Applications,

SIAM Journal on Numerical Analysis, Computer Methods in Applied Mechanics and

Engineering, International Journal for Numerical Methods in Engineering, Journal of

Computational Physics about preconditioning techniques and the application on solving

continuum mechanics problems (most of them are references along this thesis). Though,

Domain Decomposition Methods and its variants are treated as Preconditioner for General

Krylov Methods like Conjugate Gradients and/or Generalized Minimal Residuals. Part

of this thesis is focused on this topic.

1.1 Solution of Linear Systems

The linear system under consideration will be denoted in this section as Ax = b, where

we assume that A ∈ Kn×n and b ∈ Kn are known and we look for a solution x ∈ Kn.

This solution is unique when A is regular. For regular A, the entries xi of the solution x

depend analytically on the entries of A and b. This is known as Cramer’s rule

xi =det(a1, ..., ai−1, b, ai+1, ..., an)

det(A). (1.1)

This relation is merely of theoretical interest.

1.1.1 Perturbation Theory

The linear system and the related perturbed system will be denoted by Ax = b and

Ax = b, respectively. We suppose that A ∈ Kn×n and b ∈ Kn are given, and we seek


x ∈ Kn. Also, we define the differences

∆A = A− A, ∆x = x− x and b = b− b. (1.2)

We interpret x as an approximate solution of Ax = b and denote the corresponding

residual by r = b− Ax.

1.1.2 Condition Number

A condition number is a bound on the set of changes computed using perturbation theory.

A variety of useful condition numbers can be defined. We can define the norm-wise

condition number of the system Ax = b as

κα,β(A, x) ≡ infε>0

sup

||∆x||||x||

ε : Ax = b, ∆A ≤ εα, ∆b ≤ εβ

(1.3)

=||A−1||(α||x||+ β)

||x||, (1.4)

for the particular choice of α = ||A|| and β = ||b||. Generally, for a non-singular matrix

A, the norm-wise condition number can be written as

κ(A) = ||A||||A−1||, (1.5)

or simply

κ(A) = λmax(A)/λmin(A) (1.6)

if the matrix is symmetric and positive definite. The simbol ‖.‖ denotes any suitable

norm.

When the condition number is small, backward stable algorithms are also forward

stable. A problem is termed ill-conditioned when the condition number is large, and

ill-posed when the condition number is infinite. Ill-posed problems can not be solved in

finite precision.

1.2 Preconditioning

The main objective of preconditioning techniques is the lowering of the condition number

for ill-conditioned systems. That is why frequently domain decomposition methods are

considered preconditioners instead of solvers. Nevertheless, it is imperative the use of some

preconditioning in some domain decomposition schemes. This topic will be discussed in §2.

Another important issue of preconditioning is the improvement of the spectral properties

(i.e., the eigenvalues distribution) of the preconditioned system.

1.3. Basic Iterative Method 9

Basically, preconditioning consists in multiplying (on the left or right side of matrix

A and vector b) both side of the original system by a matrix P (i.e., the ‘preconditioner’)

such that

PAx = Pb, (1.7)

and such that κ(PA) κ(A) and the eigenvalues are grouped in clusters. In other words,

matrix PA has better invertible properties than original matrix A.

There exist a wide variety of preconditioning techniques. Some preconditioners are

based on straight insight of the structure of the problem at hand. Other preconditioners,

which in general depend on the topology where the problems are defined, are multigrid

and domain decomposition methods, alternating direction preconditioners. Nevertheless,

there exist fully algebraic versions of such preconditioners/solvers. One commonly used

and easy to implement preconditioner is Jacobi preconditioning, where P is the inverse

of the diagonal part of A. One can also use other preconditioners based on the classical

stationary iterative methods, such as the symmetric Gauß–Seidel preconditioner. For

applications to PDE’s, these preconditioners have no very effective performances. Another

approach is to apply a sparse Cholesky factorization to the matrix A (thereby giving up a

fully matrix-free formulation) and discarding small elements of the factors and/or allowing

only a fixed amount of storage for the factors. Such preconditioners are called incomplete

factorization preconditioners. One could also attempt to estimate the spectrum of A, find

a polynomial p such that 1−zp(z) is small on the approximate spectrum, and use p(A) as

a preconditioner. This is the so-called polynomial preconditioning. The preconditioned

system is

p(A)Ax = p(A)b, (1.8)

and one would expect the spectrum of p(A)A to be more clustered near z = 1 than

that of A. If an interval containing the spectrum can be found, the residual polynomial

q(z) = 1−zp(z) of smallest L∞ norm on that interval can be expressed in terms of Cheby-

shev polynomials. Alternatively q can be selected to solve a least squares minimization

problem.

1.3 Basic Iterative Method

A basic algorithm that leads to many effective iterative solvers, is to split the matrix of a

given linear system into a sum of two matrices, one of which leads to a system well suited

to solve. The most simple splitting we can think of is A = I − (I − A). This splitting


Figure 1.6: Andre-Louis Cholesky (1875–1918)

Figure 1.7: Pafnuty Lvovich Chebyshev (1821–1894)

leads to the well-known Richardson iteration for the linear system:

xi+1 = (I − A)xi + b = xi + ri. (1.9)

Multiplying by −A and adding b gives

−Axi+1 + b = −Axi − Ari + b, (1.10)

and

ri+1 = (I − A)ri = (I − A)i+1r0 = Pi+1(A)r0, (1.11)

or, in terms of the error

A(x− xi+1) = Pi+1(A)A(x− x0), (1.12)

1.3. Basic Iterative Method 11

then

x− xi+1 = Pi+1(A)(x− x0). (1.13)

In last equations Pi+1 is a polynomial of degree i + 1. Note that Pi+1(0) = 1. A more

Figure 1.8: Lewis Fry Richardson (1881–1953)

general splitting A = M −N = M − (M −A) can be rewritten as the standard splitting

B = I − (I −B) for the preconditioned matrix B = M−1A = PA.

Now, assume that x0 = 0 without loss of generality. Thus, the simple Richardson

iteration is such that

xi+1 = r0 + r1 + r2 + ...+ ri =i∑

j=0

(I − A)jr0, (1.14)

and xi ∈ r0, r1, ..., ri ≡ r0, Ar0, ..., Air0 = Ki+1(A; r0), being Ki+1(A; r0) the Krylov

subspace associated to A and r0 (and i.e., b).

1.3.1 Optimal Iteration Methods

A difficulty associated with classical/stationary iterative methods (e.g., SOR, Chebychev

semi-iterative, etc.) is that they depend upon some parameters that are sometimes hard

to choose in proper manner. The natural question is how to get good approximations

xi+1 from the Krylov subspace that is generated by the basic iterative method. A good

choose seems to be those xi+1 for which ||xi+1 − x|| (for a well suited norm) is minimal.

In general (see [Kel95]), an optimal choose on the search direction is such that ||xi− x||Ais minimal for xi ∈ Ki(A; r0). That is

xi − x ⊥A Ki(A; r0) (1.15)


or

ri ⊥ Ki(A; r0), (1.16)

for the ‘energy’ norm ||.||A.

Using Lanczos’s iteration, the Conjugate Gradients method proposed by Hestener and

Stiefel in [HS52] and the Generalized Minimal Residuals method by Saad and Schultz [SS86]

for symmetric positive definite matrices and non-singular diagonalizable matrices, respec-

tively, exploit the considerations expressed above. These kind of methods will be used in

Figure 1.9: Cornelius Lanczos (1893–1974)

this thesis in the context of Domain Decomposition techniques.

Chapter 2

The ‘Interface Strip Preconditioner’

for Domain Decomposition Methods

Find a job you love

and you’ll never work a day in your life.

Kong Fuzi (Confucius)

In this chapter, a new preconditioner for iterative solution of the interface problem in

Schur Complement Domain Decomposition Methods is presented. Also, the efficiency of

this parallelizable preconditioner is studied in the context of the solution of non-symmetric

linear equations arising from discretization of the Partial Differential Equations. The

proposed Interface Strip Preconditioner (ISP) is based on solving a global problem in a

narrow strip around the interface. It requires much less memory and computing time

than classical Neumann-Neumann preconditioner, and handles correctly the flux splitting

among sub-domains that share the interface. The performance of this preconditioner is

assessed with an analytical study of Schur complement matrix eigenvalues and numerical

experiments conducted in parallel computational environment (consisting of a Beowulf

cluster of twenty nodes).

The aim of this chapter is to present a theoretical basis (regarding the behavior of

Schur complement matrix spectra) and some simple and complex numerical experiments

13

14Chapter 2. The ‘Interface Strip Preconditioner’ for Domain Decomposition

Methods

conducted in sequential and parallel platforms as a motivation for adopting the proposed

preconditioner. Efficiency, scalability, and implementation details on a production parallel

finite element code [SYNS02, SNPD06] will be presented in the next chapter (also, see

References [PS05, PNS06]).

2.1 Introduction

The large spread in length scales present in CFD problems (like viscous/inviscid com-

pressible/incompressible flows around bodies, river/aquifer interactions, open channels,

etc.) requires a high degree of refinement in the finite element mesh and, then, requires

very large computational resources. It is known that the number of grid points in a 3D

mesh for a turbulence DNS model grows with the Reynolds number as <9/4. Also, in a

2D coupled surface-subsurface flow problem, a typical multi-aquifer model, the number of

unknowns per surface node is, at least, equal to the number of aquifers and aquitards. Due

to this fact, it is expected to have a very high demand of CPU computation time, calling

for parallel processing techniques. Linear systems obtained from discretization of PDE’s

by means of Finite Difference or Finite Element Methods are normally solved in paral-

lel by iterative methods [Saa00, Meu99] because they require much less communication

compared to direct solvers.

The Schur complement domain decomposition method leads to a reduced system better

suited for iterative solution than the global system, since its condition number is lower

(∝ 1/h vs ∝ 1/h2 for the global system, h being the mesh size) and the computational

cost per iteration is not so high once the sub-domain matrices have been factorized.

Iterative substructuring methods rely on a non-overlapping partition into sub-domains

(substructures). The efficiency of these methods can be further improved by using pre-

conditioners [LTV97]. Once the degrees of freedom inside the substructures have been

eliminated by block Gaussian elimination (or other algorithm), a preconditioner for the

resulting Schur complement system is built with matrix blocks relative to a decompo-

sition of interface finite element functions into subspaces related to geometrical objects

(vertices, edges, faces, single substructures) or simply by the coefficients of sub-domain

matrices near the interface. Iterative methods like Conjugate Gradients and GMRes are

then employed. Early works, such as [BPS86, BPS89], have influenced most of the later

work in the field. They proposed two spaces for the coarse problem. One of their coarse

spaces is given in terms of the averages of the nodal values over the entire substructure

boundaries ∂Ωi. The other space is defined by extending the wire basket (recall that the

wire basket is the union of the boundaries of the faces that separate the substructures)

2.1. Introduction 15

values as a two dimensional discrete harmonic function onto the faces, and then as a

discrete harmonic function into the interiors of the sub-domains. For self-adjoint posi-

Figure 2.1: Issai Schur (1875–1941)

Figure 2.2: Carl Gottfried Neumann (1832–1925)

tive semidefinite problems, Neumann-Neumann preconditioner is the most classical one.

From a mathematical point of view, the preconditioner is defined by approximating the

inverse of the global Schur complement matrix by the weighted sum of local Schur com-

plement matrices. From a physical point of view, Neumann-Neumann preconditioner is

based on splitting the flux applied to the interface in the preconditioning step and solving

local Neumann problems in each sub-domain. This strategy is good only for symmetric

operators.

The preconditioner proposed here is based on solving a problem in a ‘strip’ of nodes

around the interface (see Figure 2.3). When the width of the strip is narrow, the com-

putational cost and memory requirements are low and the iteration count is relatively


Methods

high, when the strip is wide, the converse is verified. This preconditioner performs better

(I)Strip

21

interface stripnodes in the

interior nodes

ΩΩ

Interface

Figure 2.3: Domain Decomposition

for non-symmetric operators and does not have rigid body modes for internal floating

sub-domains, as is the case for the Neumann-Neumann preconditioner. Recall that for

operators that involve only derivatives of the unknowns (as Laplace equation, steady

elasticity, steady advection-diffusion, for instance) a portion of the boundary should have

Dirichlet or mixed boundary conditions. Otherwise, the problem is ill-posed and the

matrix is singular. When using the Neumann-Neumann preconditioner, sub-domains in-

herit the boundary condition of the original problem in the external boundary, whereas

Neumann boundary conditions are imposed at the internal sub-domain interfaces. Sub-

domains that have a non-empty intersection with a portion of the Dirichlet part of the

external boundary do not have rigid modes. Sub-domains whose boundary has empty

intersection with the external Dirichlet or mixed portion of the boundary would have

Neumann condition imposed on their whole boundary and would have rigid modes for

the kind of operators described above. In contrast with the wire-basket algorithms, the

IS preconditioner is purely algebraic, i.e., it can be assembled from a subset of the matrix

coefficients. There are no requirements on the topology of the mesh, and even it could

be applied to sparse matrices coming from other kind of problems, not necessarily from

PDE discretizations.

Linear systems obtained from discretization of PDE’s by means of FDM or FEM

are normally solved in parallel by iterative methods [Saa00] because they are much less

coupled than direct solvers.


Figure 2.4: Johann Peter Gustav Lejeune Dirichlet (1805–1859)

The Schur complement domain decomposition method leads to a reduced system better

suited for iterative solution than the global system, since its condition number is lower

(∝ 1/h vs ∝ 1/h2 for the global system, h being the mesh size) and the computational cost

per iteration is not so high once the sub-domain matrices have been factorized. In addition,

it has other advantages over global iteration. It solves bad ‘inter-equation’ conditioning,

it can handle Lagrange multipliers and in a sense it can be thought as a mixture between

a global direct solver and a global iterative one. The efficiency of iterative methods

Figure 2.5: Joseph–Louis Lagrange (1736–1813)

can be further improved by using preconditioners [LTV97]. For mechanical problems,

Neumann-Neumann is the most classical one. From a mathematical point of view, the

preconditioner is defined by approximating the inverse of the global Schur complement

matrix by the weighted sum of local Schur complement matrices. From a physical point

of view, Neumann-Neumann preconditioner is based on splitting the flux applied to the


Methods

interface in the preconditioning step and solving local Neumann problems in each sub-

domain. This strategy is good only for symmetric operators.

A new preconditioner based on solving a global problem in a strip of nodes around the

interface is proposed. A similar idea has been already exploited in the context of FETI

methods [RMSB03] in order to construct an approximation of local Schur complement

matrices. In contrast, the preconditioning technique considered here approximates the

inverse of global Schur matrix. This preconditioner performs better for non-symmetric

operators; it does not suffer from the rigid body modes for internal floating sub-domains

as is the case for the Neumann-Neumann preconditioner and naturally conducts to sub-

domain coupling (thus eliminating the need of a coarse problem). A detailed computation

of the eigenvalue spectra for simple cases and some numerical examples are presented.

2.2 Schur Complement Domain Decomposition Me-

thod

Consider solving in each time step a linearized form of system (i.e., Au = f) resulting

from finite element discretization as described in the next sections. Let Ω denote the

computational domain of the CFD problem, and Ωii=ni=1 its decomposition into n non-

overlapping sub-domains. Now, reorder u and f as u = (uL, uI)T and f = (fL, fI)

T ,

numbering the global nodes such that the coefficient matrices of variables assume block-

ordered structure

A =

[ALL ALI

AIL AII

], (2.1)

where ALL = diag[A11,A22, ...,ANsNs ] is a block-diagonal with each block Aii, i =

1, 2, ..., Ns being the matrix corresponding to the unknowns belonging to the interior ver-

tices of sub-domain Ωi. Blocks ALI and AIL represents connections between sub-domains

to interfaces.

Block AII corresponds to the discretization of the differential operator restricted to

the interfaces and represents the coupling between local interface points.

The numerical solution of Au = f is equivalent to solving

SuI = g on interfaces Γ, (2.2)

and

ALLuL = fL −ALIuI in Ωi, (2.3)

2.2. Schur Complement Domain Decomposition Method 19

being

S = AII −Ns∑i=1

AILA−1LLALI , (2.4)

and

g = fI −Ns∑i=1

AILA−1LLfL, (2.5)

where S is the well-known Schur complement matrix. If Si is the Schur Complement

matrix associated to the i-subdomain, then equations (2.4) and (2.5) can be written as

Si = AiII −Ai

ILAi−1LL Ai

LI , (2.6)

and

gi = f iI −Ai

ILAi−1LL f

iL. (2.7)

Also, if the restriction operator Ri, which extracts from a global vector u the entries

corresponding to the interface nodes such that uI = Riu, is introduced then

S =Ns∑i=1

RTi SiRi (2.8)

and

g = fI −Ns∑i=1

RTi Ai

ILAi−1LL f

iL. (2.9)

The Schur domain decomposition method starts by first determining uI on the interfaces

between sub-domains by solving (2.2). Upon obtaining uI , the sub-domain problems (2.3)

decouple and may be solved in parallel. The main computational cost for the iterative

solution of (2.2) depends on the number of iteration, i.e., the condition number, to achieve

convergence to a given accuracy criterion.

It is clear that the knowledge of the eigenvalue spectrum of the Schur complement

matrix is one of the most important issues in order to develop suitable preconditioners.

To obtain analytical expressions for Schur complement matrix eigenvalues and also the

influence of several preconditioners, a simplified problem is considered, namely the solution

to the Poisson problem in a unit square

∆φ = g, in Ω = 0 < x, y < 1, (2.10)

with boundary conditions

φ = φ, at Γ = x, y = 0, 1, (2.11)


Methods

Figure 2.6: Simon-Denis Poisson (1781–1840)

where φ is the unknown, g(x, y) is a given source term and Γ is the boundary. Consider

now the partition of Ω in Ns non-overlapping sub-domains Ω1, Ω2, . . . ,ΩNs , such that

Ω = Ω1

⋃Ω2

⋃. . .⋃

ΩNs . For the sake of simplicity, let assume that the sub-domains

are rectangles of unit height and width Lj. In practice this is not the best partition,

but it will allow us to compute the eigenvalues of the interface problem in closed form.

Let Γint = Γ1

⋃Γ2

⋃. . .⋃

ΓNs−1 be the interior interfaces among adjacent sub-domains.

Given a guess ψj for the trace of φ in the interior sub-domains φ|Γj, each interior problem

can be solved independently as

∆φ = g, in Ωj,

φ =

ψj−1, at Γj−1,

ψj, at Γj,

φ, at Γup,j + Γdown,j,

(2.12)

where ψ0 = φ∣∣x=0

and ψNs = φ∣∣x=1

are given.

2.2.1 The Steklov Operator

Not all combinations of trace values ψj give the solution of the original problem (2.10).

Indeed, the solution to (2.10) is obtained when the trace values are chosen in such a way

that the flux balance condition at the internal interfaces is satisfied,

fj =∂φ

∂x

∣∣∣∣−Γj

− ∂φ

∂x

∣∣∣∣+Γj

= 0, (2.13)

where the ± superscripts stand for the derivative taken from the left and right sides of

the interface. We can think of the correspondence between the ensemble of interface


values ψ = ψ1, . . . , ψNs−1 and the ensemble of flux imbalances f = f1, . . . , fNs−1 as

an interface operator S such that

Sψ = f − f0, (2.14)

where all inhomogeneities coming from the source term and Dirichlet boundary conditions

are concentrated in the constant term f0, and the homogeneous operator S is equivalent

to solving the equation set (2.12) with source term g = 0 and homogeneous Dirichlet

boundary conditions φ = 0 at the external boundary Γ. Here, S is the Steklov operator.

Figure 2.7: Vladimir Andreevich Steklov (1864–1926)

In a more general setting, it relates the unknown values and fluxes at boundaries when the

internal domain is in equilibrium. In the case of internal boundaries, it can be generalized

by replacing the fluxes by the flux imbalances. The Schur complement matrix is a discrete

version of the Steklov operator, and we will show that in this simplified case we can

compute the Steklov operator eigenvalues in closed form, and then a good estimate for

the corresponding Schur complement matrix ones.

2.2.2 Eigenvalues of Steklov Operator

We will further assume that only two sub-domains are present, one of them at the left

of width L1 and the other at the right of width L2, so that L = L1 + L2 = 1 is the side

length.

We solve first the Laplace problem in each sub-domain with homogeneous Dirichlet


Methods

boundary condition at the external boundary and ψ at the interface,

∆φ = 0, in Ω1,2,

φ =

0, at Γ,

ψ, at Γ1.

(2.15)

The solution of (2.15) can be expressed as a linear combination of functions of the form

Figure 2.8: Pierre–Simon Laplace 1749–1827)

φn(x, y) =

[sinh(knx)/ sinh(knL1)] sin(kny), 0 ≤ x ≤ L1,

[sinh(kn(L− x))/ sinh(knL2)] sin(kny), L1 ≤ x ≤ L,(2.16)

where the wave number kn and the wavelength λn are defined as

kn = 2π/λn, λn = 2L/n, n = 1, . . . ,∞. (2.17)

The flux imbalance for each function in (2.16) can be computed as

fn =∂φn

∂x

∣∣∣∣x=L−1

− ∂φn

∂x

∣∣∣∣x=L+

1

=

= kn

(cosh(knL1)

sinh(knL1)+

cosh(knL2)

sinh(knL2)

)sin(kny) =

= kn [coth(knL1) + coth(knL2)] sin(kny).

(2.18)

A given interface value function ψ is an eigenfunction of the Steklov operator if the

corresponding flux imbalance f = Sψ is proportional to ψ, i.e., Sψ = ωψ, ω being the


corresponding eigenvalue. We can see from (2.15) to (2.18) that the eigenfunctions of the

Steklov operator are

ψn(y) = sin(kny) (2.19)

with eigenvalues

ωn = eig(S)n = eig(S−)n + eig(S+)n =

= kn [coth(knL1) + coth(knL2)] ,(2.20)

where S∓ are the Steklov operators of the left and right sub-domains,

S∓ψ = ± ∂φ

∂x

∣∣∣∣L∓1

, (2.21)

and their eigenvalues are

eig(S∓)n = kn coth(knL1,2). (2.22)

For large n, the hyperbolic cotangents in (2.22) both tend to unity. This shows that

the eigenvalues of the Steklov operator grow proportionally to n for large n, and then

its condition number is infinity. However, when considering the discrete case the wave

number kn is limited by the largest frequency that can be represented by the mesh, which

is kmax = π/h where h is the mesh spacing. The maximum eigenvalue is

ωmax = 2kmax =2π

h, (2.23)

which grows proportionally to 1/h. As the lowest eigenvalue is independent of h, this

means that the condition number of the Schur complement matrix grows as 1/h. Note

that the condition number of the discrete Laplace operator typically grows as 1/h2. Of

course, this reduction in the condition number is not directly translated to total compu-

tation time, since we have to take account of the factorization of the sub-domain matrices

and forward and backward substitutions involved in each iteration to solve internal prob-

lems. However, the overall balance is positive and reduction in the condition number,

besides being inherently parallel, turns out to be one of the main strengths of domain

decomposition methods.

In Figure 2.9 we can see the first and tenth eigenfunctions computed directly from the

Schur complement matrix for a 2 sub-domain partition, whereas in Figure 2.10 we see

the first and twenty-fourth eigenfunction for a 9 sub-domain partition. The eigenvalue

magnitude is related to eigenfunction frequency along the inter-subdomain interface, and

the penetration of the eigenfunctions towards sub-domains interiors decays strongly for

higher modes∗.

∗I would like to thank to Lisandro Dalcın for the confection of Figures 2.9 and 2.10


Methods

(a) 1st eigenfunction

(b) 10th eigenfunction

Figure 2.9: Eigenfunctions of Schur complement matrix with 2 sub-domains

2.3 Preconditioners for the Schur Complement Ma-

trix

In order to further improve the efficiency of iterative methods, a preconditioner has to

be added so that the condition number of the Schur complement matrix is lowered. The

most known preconditioners for mechanical problems are Neumann-Neumann and its

variants [Man93, Cro02, BR96] for Schur complements methods, and Dirichlet for FETI

2.3. Preconditioners for the Schur Complement Matrix 25



Figure 2.10: Eigenfunctions of Schur complement matrix with 9 sub-domains

methods and its variants [FR91, FMR94, FM98, FLLT+01, RMSB03]. It can be proved

that they reduce the condition number of the preconditioned operator to O(1) (i.e., inde-

pendent of h) in some special cases.

2.3.1 The Neumann-Neumann Preconditioner

Consider the Neumann-Neumann preconditioner

PNNv = f, (2.24)

where

v(y) = 1/2[v1(L1, y) + v2(L1, y)], (2.25)


Methods

and vi, i = 1, 2, are defined through the following problems

∆vi = 0 in Ωi,

vi = 0 at Γ0 + Γup,i + Γdown,i,

(−1)i−1∂vi

∂x= 1/2f at Γ1.

(2.26)

The preconditioner consists in assuming that the flux imbalance f is applied on the

interface. Since the operator is symmetric and the domain properties are homogeneous,

this ‘load’ is equally split among the two sub-domains. Then, we have a problem in

each sub-domain with the same boundary conditions in the exterior boundaries, and a

non-homogeneous Neumann boundary condition at the inter-subdomain interface.

Again, we will show that the eigenfunctions of the Neumann-Neumann preconditioner

are (2.19). Effectively, we can propose for v1 the form

v1 = C sinh(knx) sin(kny), (2.27)

where C is determined from the boundary condition at the interface in (2.26) and results

in

C =1

2kn cosh(knL1), (2.28)

and similarly for v2, so that

v1(x, y) =1

2kn

sinh(knx)

cosh(knL1)sin(kny),

v2(x, y) =1

2kn

sinh(kn(L− x))

cosh(knL2)sin(kny).

(2.29)

Then, the value of v = P−1NNf can be obtained from (2.25)

v(y) = P−1NNf =

1

4kn

[tanh(knL1) + tanh(knL2)] sin(kny), (2.30)

so that the eigenvalues of PNN are

eig(PNN)n = 4kn [tanh(knL1) + tanh(knL2)]−1 . (2.31)

As its definition suggests, it can be verified that

eig(PNN)n = 4 [eig(S−)−1n + eig(S+)−1

n ]−1. (2.32)

As the Neumann-Neumann preconditioner (2.24) and the Steklov operator (2.14) diago-

nalize in the same basis (2.19) (i.e., they ‘commute’ ), the eigenvalues of the preconditioned

operator are simply the quotients of the respective eigenvalues, i.e.,

eig(P−1NNS)n = 1/4[tanh(knL1) + tanh(knL2)] [coth(knL1) + coth(knL2)]. (2.33)

2.3. Preconditioners for the Schur Complement Matrix 27

We see that all tanh(knLj) and coth(knLj) factors tend to unity for n→∞, then we have

eig(P−1NNS)n → 1 for n→∞, (2.34)

so that this means that the preconditioned operator P−1NNS has a condition number O(1),

i.e., it doesn’t degrade with mesh refinement. This is optimal, and is a well known feature

of the Neumann-Neumann preconditioner. In fact, for a symmetric decomposition of the

domain (i.e., L1 = L2 = 1/2), we have

eig(P−1NNS)n =

1

42 tanh(kn/2) 2 coth(kn/2) = 1, (2.35)

so that the preconditioner is equal to the operator and convergence is achieved in one

iteration.

Note that comparing (2.20) and (2.32) we can see that the preconditioning is effective

as long as

eig(S−)n ≈ eig(S+)n. (2.36)

This is true for symmetric operators and symmetric domain partitions (i.e., L1 ≈ L2).

Even for L1 6= L2, if the operator is symmetric, then (2.36) is valid for large eigenvalues.

However, this fails for non-symmetric operators as in the advection-diffusion case, and

also for irregular interfaces.

Another aspect of the Neumann-Neumann preconditioner is the occurrence of indef-

inite internal Neumann problems, which leads to the need of solving a coarse prob-

lem [Man93, Cro02] in order to solve the ‘rigid body modes’ for internal floating sub-

domains. The coarse problem couples the sub-domains and hence ensures scalability

when the number of sub-domains increases. However, this adds to the computational cost

of the preconditioner.

2.3.2 The Interface Strip Preconditioner (ISP)

A key point about the Steklov operator is that its high frequency eigenfunctions decay

very strongly far from the interface, so that a preconditioning that represents correctly the

high frequency modes can be constructed if we solve a problem on a narrow strip around

the interface. In fact, the n-th eigenfunction with wave number kn given by (2.16) decays

far from the interface as exp(−kn|s|) where s is the distance to the interface. Then, these

high frequency modes will be correctly represented if we solve a problem on a strip of

width b around the interface, provided that the interface width is large with respect to

the mode wave length λn.


Methods

The ‘Interface Strip Preconditioner’ (ISP) is defined as

PISv = f, (2.37)

where

f =∂w

∂x

∣∣∣∣x=L−1

− ∂w

∂x

∣∣∣∣x=L+

1

(2.38)

and∆w = 0 in 0 < |x− L1| < b and 0 ≤ x ≤ 1,

w = 0 at |x− L1| = b or y = 0, 1,

w = v at x = L1.

(2.39)

Please note that for high frequencies (i.e., knb large) the eigenfunctions of the Steklov

operator are negligible at the border of the strip, so that the boundary condition at

|x − L1| = b is justified. The eigenfunctions for this preconditioner are again given

by (2.19) and the eigenvalues can be taken from (2.20), replacing L1,2 by b, i.e.,

eig(PIS)n = 2 eig(Sb)n = 2kn coth(knb), (2.40)

where Sb is the Steklov operator corresponding to a strip of width b.

For the preconditioned Steklov operator, we have

eig(P−1IS S)n = 1/2 tanh(knb) [coth(knL1) + coth(knL2)] . (2.41)

Note that eig(P−1IS S)n → 1 for n→∞, so that the preconditioner is optimal, independently

of b. Also, for b large enough we recover the original problem so that the preconditioner is

exact (convergence is achieved in one iteration). However, in this case the use of this pre-

conditioner is impractical, since it implies solving the whole problem. Note that in order

to solve the problem for v information from both sides of the interface is needed, while

the Neumann-Neumann preconditioner can be solved independently in each sub-domain.

This is a disadvantage in terms of efficiency, since we have to waste communication time

in sending the matrix coefficients in the strip from one side to the other or otherwise

compute them in both processors. However, we will see that efficient preconditioning can

be achieved with few node layers and negligible communication. Moreover, we can solve

the preconditioner problem by iteration, so that no migration of coefficients is needed.

2.4 The Advective-Diffusive Case

Consider now the advective diffusive case,

κ∆φ− uφ,x = g in Ω, (2.42)

2.4. The Advective-Diffusive Case 29

where κ is the thermal conductivity of the medium and u the advection velocity. The

problem can be treated in a similar way, and the Steklov operators are defined as

S∓ψ = ± φ,x|L∓1 , (2.43)

whereκ∆φ− uφ,x = 0 in Ω1,2,

φ =

0 at Γ,

ψ at Γ1.

(2.44)

The eigenfunctions are still given by (2.19). Looking for solutions of the form v ∝exp(µx) sin(kny) we obtain a constant coefficient second order differential equation with

characteristic polynomial

κµ2 − uµ− κk2n = 0, (2.45)

whose roots are

µ± =u±

√u2 + 4κ2k2

n

2κ=

u

2κ± δn. (2.46)

After some algebra, the solution of (2.44) is

φn =

eu(x−L1)/2κ sinh(δnx)sinh(δL1)

sin(kny) for 0 ≤ x ≤ L1, 0 ≤ y ≤ L,

eu(x−L1)/2κ sinh(δn(L−x))sinh(δL2)

sin(kny) for L1 ≤ x ≤ L, 0 ≤ y ≤ L,(2.47)

and the eigenvalues are then

eig(S−)n =u

2κ+ δn coth(δnL1)

eig(S+)n = − u

2κ+ δn coth(δnL2).

(2.48)

In Figure 2.11 we see the first and tenth eigenfunctions for a problem with an advection

term at a global Peclet number of Pe = uL/2κ = 2.5. For low frequency modes, advective

effects are more pronounced and the first eigenfunction (on the left) is notably biased

to the right. In contrast, for high frequency modes (like the tenth mode shown at the

right) the diffusive term prevails and the eigenfunction is more symmetric about the

interface, and (as in the pure diffusive case) concentrated around it. Note that now the

eigenvalues for the right and left part of the Steklov operator may be very different due

to the asymmetry introduced by the advective term. This difference in splitting is more

important for the lowest mode. In Figures 2.12 to 2.15 we see the eigenvalues as a function

of the wave number kn. Note that for a given side length L only a certain sequence of

wave numbers, given by (2.17) should be considered. However, it is perhaps easier to

consider the continuous dependence of the different eigenvalues upon the wave number k.


Methods



Figure 2.11: Eigenfunctions of Schur complement matrix with 2 sub-domains and advec-

tion (global Peclet 5)

For a symmetric operator and a symmetric partition (see Figure 2.12), the symmetric

flux splitting is exact and the Neumann-Neumann preconditioner is optimal. The largest

discrepancies between the ISP preconditioner and the Steklov operator occur at low fre-

quencies and yield a condition number less than two. If the partition is non-symmetric

(see Figure 2.13) then the Neumann-Neumann preconditioner is no longer exact, because

S+ 6= S−. However, its condition number is very low whereas the IS preconditioner con-

dition number is still under two. For a relatively important advection term, given by a


! #"$&%'

() *+,-.-'/01 23465 798

:

;<= >@?

ACBED

Figure 2.12: Eigenvalues of Steklov operators and preconditioners for the Laplace operator

(Pe = 0) and symmetric partitions (L1 = L2 = L/2, b = 0.1L)


Methods

"!#$ %&'()

*,+ -.0/1

2

34 56798 :<;

=>9?@A

BCED

Figure 2.13: Eigenvalues of Steklov operators and preconditioners for the Laplace operator

(Pe = 0) and non-symmetric partitions (L1 = 0.75L, L2 = 0.25L, b = 0.1L)


global Peclet number of 5 (see Figure 2.14), the asymmetry in the flux splitting is much

more evident, mainly for small wave numbers, and this results in a large discrepancy

between the Neumann-Neumann preconditioner and the Steklov operator. On the other

hand, the IS preconditioner is still very close to the Steklov operator. The difference

between the Neumann-Neumann preconditioner and the Steklov operator increases for

larger Pe (see Figure 2.15). This behavior can be directly verified by computing the con-

dition number of Schur complement matrix and preconditioned Schur complement matrix

for the different preconditioners (see Tables 2.1 and 2.2). We can see that, for low Pe, both

the Neumann-Neumann and ISP preconditioners give a similar preconditioned condition

number regardless of mesh refinement (it almost doesn’t change from a mesh of 50×50 to

a mesh of 100× 100), whereas the Schur complement matrix exhibits a condition number

roughly proportional to 1/h. However, the Neumann-Neumann preconditioner exhibits a

large condition number for high Peclet numbers whereas the IS preconditioner seems to

perform better for advection dominated problems.

Table 2.1: Condition number for the Steklov operator and several preconditioners (mesh:

50× 50 elements, strip: 5 layers of nodes)

Pe cond(S) cond(P−1NNS) cond(P−1

IS S)

0 41.00 1.00 4.92

0.5 40.86 1.02 4.88

5 23.81 3.44 2.92

25 5.62 64.20 1.08

Table 2.2: Condition number for the Steklov operator and several preconditioners (mesh:

100× 100 elements, strip: 10 layers of nodes)

u cond(S) cond(P−1NNS) cond(P−1

IS S)

0 88.50 1.00 4.92

0.5 81.80 1.02 4.88

5 47.63 3.44 2.92

25 11.23 64.20 1.08


Methods

!"

#$ %&'(*)

+, -./1032

45 67819:93;<

=>1? @BA

C

Figure 2.14: Eigenvalues of Steklov operators and preconditioners for the advection-

diffusion operator (Pe = 5) and symmetric partitions (L1 = L2 = L/2, b = 0.1L)


"!#

$% &'(*),+

-. /01243

56 789*:;:,<

=>@? AB

C

Figure 2.15: Eigenvalues of Steklov operators and preconditioners for the advection-

diffusion operator (Pe = 50) and symmetric partitions (L1 = L2 = L/2, b = 0.1L)


Methods

Algorithm 1: Preconditioned Conjugate Gradients (PCG)

1: Initialize variables:

2: x initial guess 3: r = b−Ax matrix × vector + vect. sum 4: solve Pz = r system solution 5: ρ = (r, z) internal product 6: ρ0 = ρ

7: p = z

8: k = 1

9: while k < kmax do Iterate Convergence test:

11: if ρ < tol then

12: ρ0 end of iterations 13: end if

14: a = Ap matrix × vector 15: m = (p, a) internal product 16: α = ρ

m

17: x = x + αp AXPY operation 18: r = r− αa AXPY operation 19: solve Pz = r system solution 20: ρold = ρ

21: ρ = (r, z) internal product 22: γ = ρ

ρold

23: p = z + γp AXPY operation 24: k = k + 1,

25: end while

2.5 Implementation of the Neumann-Neumann Pre-

conditioner

The most critical steps in terms of CPU time in the algorithm written below are those

appeared in lines 3 and 14, i.e., the matrix-vector products. Besides, the preconditioner

application of lines 4 and 19 are high CPU time consuming.

The matrix-vector product can be written as

a = Sp, (2.49)

2.5. Implementation of the Neumann-Neumann Preconditioner 37

being S the Schur complement matrix, and due to the fact that (B.60),

a =Ns∑i=1

ai =Ns∑i=1

Sip (2.50)

the contribution to a are calculated separately on each sub-domain Ωi. Then, from equa-

tion (2.4)

Sip = [AiII −Ai

IL(AiLL)−1Ai

LI ]p, (2.51)

and considering the problem restricted to the sub-domain[Ai

LL AiLI

AiIL Ai

II

][vi

p

]=

[0

ai

]. (2.52)

The sub-domain contribution to the vector (2.50) is

ai = AiILv

i + AiIIp. (2.53)

being vs the solution of

AiLLv

i = −AiIIp. (2.54)

Equations (2.52) to (2.54) show that in order to evaluate the matrix-vector product a

Dirichlet problem must be solved on each sub-domain where the prescribed values of p

are imposed on the interface Γi; then the associated vector ai is obtained. Finally each

sub-domain contribution ai are summated.

For the application of the preconditioner, on each sub-domain the matrix DiII is defined

such thatNs∑i=1

DiII = III . (2.55)

That means that if all DiII matrices are assembled an identity matrix is obtained. This

matrix is projected into the global interface space. Let DiII be the diagonal matrix, whose

entries are computed as follows. If xk ∈ Ωi, then (DiII)

−1kk is the number of sub-domains

that share the node (or dof) xk. Note that∑

s DiII = I.

On each iteration of the PCG algorithm, the residual is projected onto the sub-domain

ri = (DiII)

Tr. (2.56)

Then, the following system is solved on each sub-domain

Sizi = ri. (2.57)


Methods

Finally, sub-domain contributions to vector z are averaged on the interface

z =Ns∑i=1

Dizi. (2.58)

This is equivalent to precondition as

P−1 =Ns∑i=1

Di(Si)−1DiT. (2.59)

The solution of equation (2.57) is essentially equal to solve a Neumann problem on

each sub-domain Ωi, where the solution vector zi contains, on an elastic problem, the

displacements on the interface dof’s. It is no necessary to assemble of global matrix S

since [Ai Ai

LI

AiIL Ai

II

][vi

zi

]=

[0

ri

]. (2.60)

Therefore, on each sub-domain, the solution for the Neumann problem is

vi = −(AiLL)−1Ai

LIzi, (2.61)

zi =(Ai

II −AiIL(Ai

LL)−1AiLI

)−1ri =

(Si)−1

ri. (2.62)

When the interface problem is solved iteratively, the matrix-vector products 3 y 14 in

the PCG algorithm and the system solution in 4 y 19 are replaced by the solution of the

problem restricted to each sub-domain alternatively with Dirichlet or Neumann conditions

respectively. These operations have good scalability properties, but in practice, as global

Neumann-Neumann preconditioner has no coarse grid correction the scheme is poorly

scalable (see Reference [Man93]). The Balancing Neumann-Neumann preconditioner pro-

posed by Mandel in the above article is an extension of the classical Neumann-Neumann

with a global or coarse grid operator that outperforms scalability properties.

2.5.1 The Balancing Neumann-Neumann Version

The Neumann-Neumann preconditioner with a coarse space correction was introduced

in [Man93] under the name Balancing Domain Decomposition and further studied in

numerous articles in the context of the solution of plate and shell problems.

First, being H(Ω) the space of compatible unknowns, the local space of restriction is

introduced

H(Ωi) = vi = v|Ωi, v ∈ H(Ω)

H(Ωi) = vi ∈ H(Ω), trvi = 0 on Ω \ Ωi

= the space of functions of H(Ωi) with zero trace on Γi.

(2.63)

2.5. Implementation of the Neumann-Neumann Preconditioner 39

If Ri denotes the restriction operator from H(Ω) to H(Ωi), the matrix A in equation (2.1)

can be written asNs∑i=1

RTi AiRiu =

Ns∑i=1

RTi fi (2.64)

and ui = Riu. In order to determine the trace uI the global trace space V = trH(Ωi) is

introduced

Vi = vi = trvi|Γi: vi ∈ H(Ωi) = vi = trv|Γi

: v ∈ H(Ω). (2.65)

Finally, the interface restriction operator is given by

RiuI = uI |Γi, ∀uI ∈ V, (2.66)

the global Schur complement operator by

S =Ns∑i=1

RTi SiRi, (2.67)

and the interface right hand side (see equation (2.5)) is

g =Ns∑i=1

RTi

(AILA

−1LL

)fL. (2.68)

For the abstract problem given by equation (2.2), it seems natural to precondition the

sum S =∑

RTi SiRi by a weighted sum of the inverses P−1 =

∑DiS

+−1i DT

i , where

Ri is the restriction operator such that Riu = uI . When floating sub-domains occur,

Si are singular matrices, then S+−1i , the Moore-Penrose pseudo-inverse, is used instead.

Mandel have proposed a two level generalization of this algorithm in order to handle

the multidomain case and corner singularities. This framework is a generalization of the

Neumann-Neumann preconditioner described above with a coarse space. Suppose that S

and Si are positive self-adjoint operators, and that S is coercive.

Some additives:

• the choice of a local coarse space Zi containing potential local singularities, such

that kerSi ⊂ Zi ⊂ Vi;

• a space V oi that contains a complement of Zi, i.e., Vi = V o

i ⊗ Zi;

• a global coarse space vo =∑Ns

i=1 DiZi on which S is coercive;

• the S-orthogonal projection M : V → Vo;

• the inverse S−1io : V o

i → V oi defined by the solution S−1

io g of the local variational

problem

S−1io g ∈ V o

i : 〈Si(S−1io g), v〉 = 〈g, v〉 ∀v ∈ V o

i , (2.69)


Methods

Figure 2.16: Robert Lee Moore (1882–1974)

Figure 2.17: Roger Penrose (1931–)

where 〈u, v〉 = uTv. Also, for a symmetric positive semidefinite B, 〈u, v〉B = 〈Bu, v〉and ||u||B = (〈u, u〉B)1/2.

• a local ‘fine’ space V ⊥i = (I − M)DiV

oi ⊂ V equipped with the scalar product

bi(uiI , v

iI) = 〈Siu

ioI , v

ioI 〉, were uio

I , vioI are defined (uniquely) by

uioI , v

ioI ∈ V oo

i , (I −M)DiuioI = ui

I , (I −M)DivioI = vi

I , (2.70)

with

V ooi = vi

I ∈ V oi : 〈Siv

iI , z

iI〉 = 0, ∀zi

I ∈ V oi ∩ ker(I −M)Di. (2.71)

Using the decomposition V = Vo⊗∑V ⊥

i with the scalar products bi(., .) the preconditioner

is defined by

M−1 : V → V ; M−1 : rI 7→ uI = uoI +

∑i

uiI , (2.72)

2.6. The Interface Strip Preconditioner: Solution of the Strip Problem 41

where uoI , u

iI are solutions of the variational problems

uoI ∈ Vo : bi(u

oI , v

oI) = 〈rI , v

oI〉 ∀vo

I ∈ Vo,

uiI ∈ V ⊥

i : bi(uiI , v

iI) = 〈rI , v

iI〉 ∀vi

I ∈ V ⊥i .

(2.73)

The input vector rI to the preconditioner has the meaning of a residual associated with

an error vector eI ∈ V and is given by rI = SeI . From (2.73) and the definition of M ,

the coarse component is uoI = PeI = S−1rI . Then, substituting

uiI = (I −M)Diu

oI , uo

I ∈ V ooI , (2.74)

and using the definition of bi(·, ·) it can be seen that the second problem in (2.73) is

equivalent to find ui,oI ∈ V oo

i such that

〈Siui,oI , v

i,oI 〉 = 〈rI , (I −M)Div

i,oI , (2.75)

for all vi,oI ∈ V oo

i . By definition,

V oi = V oo

i ⊗ (Vi ∩ ker(I −M)Di), V ooi ⊥Si

(Vi ∩ ker(I −M)Di). (2.76)

Let vi,oI ∈ Vi ∩ ker(I −M)Di. Then the right hand side of (2.75) is zero, and, by (2.76),

the left hand side of (2.75) is also zero since ui,oI ∈ V oo

i . So (2.75) holds also for all

vi,oI ∈ Vi ∩ ker(I −M)Di and by (2.76) for all vi,o

I ∈ V oi . Then, the conclusion is that

uiI = (I −M)DiS

−1io DT

i (I −M)TrI = (I −M)DiS−1io DT

i S(I −M)S−1rI , (2.77)

and the preconditioned operator is

P−1S = M +∑

i

(I −M)DiS−1io DT

i S(I −M). (2.78)

Remark 2.5.1. It follows from (2.78) that in the case when the spaces V oi are chosen

so that Vi = V oi ⊗ kerSi, the abstract Balancing Domain Decomposition algorithm from

Reference [Man93] is recovered.

Remark 2.5.2. The space V ⊥i is independent of the choice of V o

i , so it is only a function

of the coarse space ZiI . See [LTV97] for a detailed proof.

Remark 2.5.3. Although the space Vi does not depend on the choice of V oi , the scalar

product bi and hence the proposed preconditioner does depend on the choice of V oi .


Methods

nlay = 2strip

21

interface (I)strip boundaries (SB)

internal layers (S)

Figure 2.18: Strip Interface problem

2.6 The Interface Strip Preconditioner: Solution of

the Strip Problem

Some hints are given for an efficient implementation of the ISP preconditioner in a parallel

environment. Consider a sub-domain interface with a strip of two element layers (nlay =

2), as shown in Figure 2.18. The preconditioning consists in, given a vector fI defined on

the nodes at the interface (I in the figure) to compute an approximate solution vI given

by AII AIS AI,SB

ASI ASS AS,SB

ASB,I ASB,S ASB,SB

vI

vS

vSB

=

fI

0

0

, (2.79)

with ‘Dirichlet boundary conditions’ at the strip boundary vSB = 0, so that it reduces to[AII AIS

ASI ASS

][vI

vS

]=

[fI

0

]. (2.80)

Once this equation is solved, vI is the value of the proposed preconditioner applied to fI ,

i.e.,

vI = P−1IS fI . (2.81)

A direct solution of this interface problem is not easily parallelizable. This approach

would involve transferring all the interface matrix to a single processor and solving the

problem there. So that, one possibility is to partition the strip problem among processors,


much in the same way as the global problem is, and solving the strip problem by an

iterative method. The idea of an iterative method is also suggested by the fact that

the preconditioning matrix (i.e., the matrix obtained by assembling on the strip domain

with Dirichlet boundary conditions at the strip boundary) is highly diagonal dominant

for narrow strips. Care must be taken to avoid nesting a non-stationary method like

CG or GMRes inside another outer non-stationary method [Kel95]. We recall that in

a stationary method the solution x at the iteration k depends, only, on the solution at

the previous step (i.e., xk = f(xk−1)), then we can find the guess xk after k successive

applications of the same operator to the initial value x0. The problem here is that a

non-stationary method executed a finite number of times is not a linear operator, unless

the inner iterative method is iterated enough and then approaches the inverse of the

preconditioner. In this respect, relaxed Richardson iteration is suitable. Nevertheless,

FGMRes can be used instead.

For the Richardson interface problem, a fixed predetermined number m of Richardson

iterations are performed. If m is too low, then the preconditioner has no effect, and if

it is too large the efficiency of the preconditioner tends to saturate, while the cost is

roughly proportional to m, so in general there is an optimal value for m. We have found

that adjusting m so that Richardson iteration converges one order of magnitude (relative

to the initial residual) is fine for most problems. Note that the number of iterations

may depend on the intrinsic conditioning of the interface problem and also on the strip

width. For small strip widths (nlay < 5) m was chosen in the range 5 ≤ m ≤ 10. A

subsequent possibility is preconditioning the Interface Strip preconditioner problem itself

with block-Jacobi.

In general, in parallel implementation, each processor may have several sub-domains.

In this way, the memory and time computation requirements (i.e., the cost of factorize

smaller matrices) are reduced. If the number of dof’s in the interfaces grows toward the

number of total dof’s the method results in a fully iterative method.

Even if the preconditioner has been described through figures in terms of finite ele-

ment structured meshes, the implementation is purely algebraic (in contrast to previous

approaches, like, notably, the wire-basket one) based on the graph connectivity of the

matrix. The preconditioner has been implemented in a FEM production code [SNPD06]

and tested in large scale problems with unstructured tetrahedral meshes with up to one

million of elements.


Methods

2.6.1 Implementation Details of the IISD Solver

Currently unknowns and elements are partitioned in the same way as for the PETSc

solver. The best partitioning criteria could be different for this solver than for the PETSc

iterative solver.

!!!!""""

####$$$$

%%%%&&&&

''((

))))****

++++,,

--..

Element in processor 1dof in processor 0dof in processor 1

dof’s in processor 0 connected to dof’s in processor 1

Element in processor 0

Figure 2.19: IISD decomposition by sub-domains. Actual decomposition

Selecting ‘interface’ and ‘local’ dof’s: One strategy could be to mark all dof’s that

are connected to a dof in other processor as ‘interface’. However this could lead to an

‘interface’ dof set twice larger in the average than the minimum needed. As the number

of nodes in the ‘interface’ set determines the size of the interface problem (2.3) it is clear

that we should try to choose an interface set as small as possible.

Partitioning is done on the dual graph, i.e., on the elements. Nodes are then parti-

tioned in the following way: A node that is connected to elements in different processors is

assigned to the highest numbered processor. As shown in Figure 2.19, when partitioning,

all nodes in the interface would belong to processor 1. Then, if a dof i is connected to

a dof j on other processor we mark as ‘interface’ that dof that belongs to the highest


numbered processor. In the mesh of Figure 2.19 all dof’s in the interface between element

sub-domains are marked to belong to processor 1. The nodes in the shadowed strip belong

to processor 0 and are connected to nodes in processor 1 but they are not marked as ‘in-

terface’ since they belong to the lowest numbered processor. Note that this strategy leads

to an interface set of 4 nodes, whereas the simpler strategy mentioned first would lead to

an interface set of 4 (i.e., including the nodes in the shadowed strip), which is two times

larger. The IISDMat matrix object contains three MPI PETSc matrices for the ALI , AIL

Proc 0 Proc 1

e

r

p

q

!!!!""""

##$$

%%%%&&&&

''''((

))**

Element in processor 0Element in processor 1dof in processor 0dof in processor 1

dof’s in processor 0 connected to dof’s in processor 1

Figure 2.20: Non local element contribution due to bad partitioning

and AII blocks and a sequential PETSc matrix on each processor for the local part of the

ALL block. The ALL block must be defined as sequential because otherwise we couldn’t

factorize it with the LU solver of PETSc. However this constrains that MatSetValues has

to be called in each processor for the matrix that belongs to its block, i.e., elements in

a given processor shouldn’t contribute to ALL elements in other processors. Normally,

this is so, but for some reasons this condition may be violated. One reason could be the

imposition of periodic boundary conditions and constraints in general (they are not taken

into account for the partitioning). Another reason is that a very bad partitioning may


Methods

arise in some not so common situations. Consider for instance Figure 2.20. Due to bad

partitioning a rather isolated element e belongs to processor 0, while being surrounded by

elements in processor 1. Now, as nodes are assigned to the highest numbered processor

of the elements connected to the node, nodes p, q and r are assigned to processor 1. But

then, nodes q and r will belong to the local subset of processor 1 but will receive contribu-

tions from element e in processor 0. However, the solution is not to define these matrices

as PETSc because, so far, PETSc does not support a distributed LU factorization. The

solution is to store those ALL contributions that belong to other processors in a temporary

buffer and after, to send those contributions to the correct processors directly with MPI

messages.

2.7 Classical Overlapping Domain Decomposition

Method: Alternating Schwarz Methods

The original alternating procedure described by Schwarz in 1870 consisted of three parts:

alternating between two overlapping domains, solving the Dirichlet problem on one do-

main at each iteration, and taking boundary conditions based on the most recent solution

obtained from the other domain. This procedure is called the Multiplicative Schwarz pro-

cedure. In matrix terms, this is very reminiscent of the block Gauß–Seidel iteration with

overlap defined with the help of projector operators. The analogue of the block-Jacobi

procedure is known as the Additive Schwarz procedure. A procedure that alternates be-

tween solving an equation in one sub-domain and then in the other one does not seem to

be parallel at the highest level because if one processor contains all of first sub-domain and

another processor contains all of second one then each processor must wait for the solution

of the other processor before it can execute Such approaches are known as multiplicative

approaches because of the form of the operator applied to the error. Alternatively, ap-

proaches that allow for the solution of sub-problems simultaneously are known as additive

methods. The difference is akin to the difference between Jacobi and Gauß–Seidel.

The analysis of the Schwarz methods as preconditioners was presented by Dryja and

Widlund in Reference [DW87] for the additive symmetric case and by Cai and Widlund

in [CW92] for the additive and multiplicative algorithms used in some nonsymmetric

problems. The successful application of Schwarz methods for solving symmetric elliptic

problems on stretched meshes was the inspiration to use them to solve the nonsymmetric

equations corresponding to discretization of flow problems. The algorithm can be sum-

marized as follows:

2.7. Classical Overlapping Domain DecompositionMethod: Alternating Schwarz Methods 47

Figure 2.21: Hermann Amandus Schwarz (1843–1921)

i) decompose the support mesh/grid into Ns overlapping sub-domains Ωi.

ii) each sub-domain Ωi is associated with the local space Vi. In addition, a coarse space

(often associated to a coarse mesh/grid) Vo ⊂ V (∪iVi) is defined. Subspaces Vi are used to

define the additive and multiplicative Schwarz methods, which can be identified with the

block-Jacobi method (if Ωi do not have common internal nodes) or with the Gauß–Seidel

method. The operators defining the two algorithms can be written as

M = I −Ns∑i=0

RTi A−1RiA and Mm = I −

Ns∏i=0

(I −RTi A−1RiA), (2.82)

where A is the global matrix and Ai, i = 1, ..., Ns, are the matrices associated with the

sub-domains Ωi. Projectors, Ri extract from the global vector of unknowns the dof’s

associated with Ωi, while projectors RTi extend by zeros dof’s of Ωi to the global vector.

Analogously, Ao is the matrix corresponding to Vo ⊂ V and Ro,RTo associate dof’s of

Vo with the global ones. Operators M and Mm consist of a sequence of local projection

operators Mi represented by matrices RTi A−1RiA.

iii) according to the idea of preconditioning by the standard iteration Krylov-like methods

are used to solve the preconditioned systems P−1(Au − f) = 0 with the preconditioner

P−1 = (I −M)A−1 or P−1 = (I −Mm)A−1.


Methods

2.8 Conclusions

A new preconditioner for Schur complement domain decomposition methods was theo-

retically presented. This preconditioner is based on solving a global problem posed in a

narrow strip around the inter-subdomain interfaces. Some analytical results have been

derived to present its mathematical basis. Numerical experiments will be carried out in

next chapters to show its convergence properties and performance.

The IS preconditioner is easy to construct because it does not require any special

calculation (it can be assembled with a subset of sub-domain matrices coefficients). It is

much less memory-consuming than classical optimal preconditioners such as Neumann-

Neumann in primal methods (or Dirichlet in FETI methods). Moreover, it permits to

decide how much memory to assign for preconditioning purposes.

In advective-diffusive real-life problems, where the Peclet number can vary on the

domain between low and high values, the proposed preconditioner outperforms classical

ones in advection-dominated regions while it is capable to handle reasonably well diffusion-

dominated regions.

Chapter 3

Numerical Tests

There is a concept

which corrupts and upsets all others.

I refer not to Evil, whose limited realm is that of ethics;

I refer to the infinite.

‘The Avatars of the Tortoise’, Jorge Luis Borges

This chapter is dedicated to confirm theoretical results developed in previous chapters

with numerical examples. The examples cover a vast number of physical problems and

applications in computational fluid dynamics and mechanics areas that range from simple

scalar diffusive advective models to viscous/inviscid compressible/incompressible Navier-

Stokes at high Mach and Reynolds numbers models, coupled surface/subsurface water

flow over complex large scale domains, etc. Also, the performance of the preconditioner

is studied in the context of monolithic and disaggregated time integration schemes. The

use of this new solver is extended to two major problems in CFD like general dynamics

boundary conditions imposition and weak/strong fluid/structure interaction in chapters §4and §5.

The problems presented in this thesis were solved using the PETSc-FEM code (see

Reference [SNPD06]), a general purpose, parallel, multi-physics FEM program for CFD

applications based on MPI and PETSc libraries (see [GLS94] and [BGCMS04], respec-

tively). PETSc-FEM comprises both a library that allows the user to develop FEM

49

50 Chapter 3. Numerical Tests

(or FEM-like, i.e., unstructured mesh oriented) programs, and a suite of application

programs (e.g., compressible/incompressible Navier-Stokes, multi-phase flow, compress-

ible Euler equations, shallow water model, general advective-diffusive systems, coupled

surface/subsurface water flow over multi-aquifer systems, linear elasticity and Laplace

equation, weak/strong fluid-structure interaction, multiphase flow). Mesh partitioning is

performed by using METIS and the library takes in charge of passing the appropriate

elements to each processor, to find out the vectors, to assemble the residual and matrices,

and to fix the boundary conditions, for all the processors.

Figure 3.1: Leonhard Euler (1707–1783)

3.1 Numerical Examples in Sequential Environments

3.1.1 The Poisson’s Problem

The performance of the proposed preconditioner is compared in a sequential environment.

For this purpose, we consider two different problems. The domain Ω in both cases is the

unit square discretized on an unstructured mesh of 120×120 nodes, and decomposed in 6

rectangular sub-domains. We compare the residual norm versus iteration count by using

no preconditioner, Neumann-Neumann preconditioner, and the IS preconditioner (with

several node layers at each interface side).

The first example is the Poisson’s problem ∆φ = g, where g = 1 and φ = 0 on all the

boundary Γ. The iteration counts and the problem solution (obtained in a coarse mesh

3.1. Numerical Examples in Sequential Environments 51

for visualization purposes) are plotted in Figure 3.2. As it can be seen, the Neumann-

Neumann preconditioner has a very low iteration count, as it is expected for a symmetric

operator. The IS preconditioner has a larger iteration count for thin strip widths, but

it decreases as the strip is thickened. For a strip of five-layers width, we reach an iter-

ation count comparable to the Neumann-Neumann preconditioner with significantly less

computational effort. Regarding memory use, the required core memory for thin strip

is much less than for the Neumann-Neumann preconditioner. The strip width acts in

fact as a parameter that balances the required amount of memory and the preconditioner

efficiency.

3.1.2 The Scalar Advective-Diffusive Problem

The second example is an advective-diffusive problem (see equation (2.42)) at a global

Peclet number of Pe = uL/2κ = 25, g = δ(1/4, 7/8)+δ(3/4, 1/8), and φ(0, y) = 0. Therefore, the

problem is strongly advective. The iteration count and the problem solution (interpolated

in a coarse mesh for visualization purposes) are plotted in Figure 3.3. In this example, the

advective term introduces a strong asymmetry. The Neumann-Neumann preconditioner

is far from being optimal. It is outperformed by IS preconditioner in iteration count (and

consequently in computing time) and memory demands, even for thin strips.

SUPG Variational Formulation

The stabilizing Finite Element formulation for the linear scalar advection-diffusion equa-

tion (2.42) is written as follows: find φh ∈ Sh such that ∀wh ∈ Vh∫Ω

∇wh · (κ∇φh)dΩ +

∫Ω

wh(u · ∇uh)dΩ+

nel∑e=1

∫Ωe

(u · ∇wh)τ supg[u · ∇φh −∇ · (κ∇φh)− g]dΩ =

=

∫Ω

whg,

(3.1)

where

Sh = φh|φh ∈ [H1h(Ω)]ndof , φh|Ωe ∈ [P 1(Ωe)]ndof , φh = g onΓφ

Vh = wh|wh ∈ [H1h(Ω)]ndof , wh|Ωe ∈ [P 1(Ωe)]ndof , wh = 0 on∂Ωφ(3.2)

(for the sake of simplicity only Dirichlet boundary conditions are considered).

The stabilization tensor τ supgij can be defined as

τ supgij = βsupguiujhmesh/(2||u||2) (3.3)


0 20 30 40 50 60

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

101

iteration number10

ISP n=1 layers

not preconditioned

n=2n=3

n=4

n=5Neumann−Neumann

Figure 3.2: Solution of Poisson’s problem

0 5 10 15 20 25 30 35

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

101

iteration number

not preconditionedISP n=1 layers

Neumann−Neumann

n=2n=3

n=4n=5

Figure 3.3: Solution of advective-diffusive problem


and βsupg = coth(Pe)− 1/Pe.

For the advection-diffusion equation discretized with linear elements the stabilized

term reduces tonel∑e=1

∫Ωe

(u · ∇wh)τ supg(u · ∇φh − g)dΩ, (3.4)

because the second derivatives of φh cancel out and thus, τ supg = (h/2u)(coth Pe− 1/Pe)

produces the exact nodal solution on a uniform 1D mesh. If we consider the case in which

u = 0, the standard variational formulation for the pure diffusion case is obtained.

3.1.3 The Hypersonic Flow Over a Flat Plate Test

In this section the hypersonic flow around a flat plate is analyzed. This is a typical flow

problem where nonlinearities become high so that any difficulty in the convergence of

the linear system may influence the non-linear convergence and finally make the solution

blow up. This problem, deeply documented by Carter in Reference [Car72], shows a

strong interaction between the boundary layer and the shock wave; and there is also a

discontinuity introduced at the flat plate leading edge where the flow has to stagnate from

a very high free stream velocity. Both are sources of numerical drawbacks making this

test a very challenging problem. Figure (3.4) shows the problem definition with a sketch

of the physical structures present in the flow field and the boundary condition applied to

it.

M∞

Tw

∂T

∂y

pw

p∞

T∞

u∞

boundary layer edge

x/L

y/L shock wave

free stream conditions

downstream

conditions: unknown

unknown

T=p=

u=

or =0wall conditions: u=0, T=

Figure 3.4: Problem definition


Physical Model

We focus this test in the solution of compressible Navier-Stokes equations with the

SUPG/SC (“Streamline Upwind Petrov-Galerkin/Shock Capturing”) method proposed

by Brooks et al. in Reference [BH82] and by Aliabady et al. in Reference [ART93]. The

differential form of the conservation equations of mass, momentum and total energy that

governs the dynamics of compressible and viscous fluid flow may be written in a compact

intrinsic (vector) form as (Einstein summation convention is assumed, i, j = 1, 2, 3)

Figure 3.5: Claude Louis Marie Henri Navier (1785–1836)

Figure 3.6: George Gabriel Stokes (1819–1903)

∂U

∂t+∂(Fa)i

∂xi

=∂(Fd)i

∂xi

+ G in Ω× (0, t+], (3.5)

where Ω is the model domain with boundary Γ. U = (ρ, ρu, ρe)t is the unknown state

vector expressed in conservative variables as above, e represents the specific total energy,


Fa accounts for the (vector) advective fluxes, Fd for the (vector) diffusive fluxes and G is

used for the external source terms (i.e, G = (0, ρfe,Wf + qH), Wf = ρfe · u is the work

done by the external forces fe and n represents the boundary unit normal vector). Also,

initial and boundary condition must be added (see [Hir90]). In this thesis, we treat the

so-called absorbent boundary conditions (see chapter §4). The integral conservation form

is

∂

∂t

∫Ω

ρ

ρu

ρE

dΩ +

∮Γ

ρu

ρu⊗ u + pI − τ

ρuH − τ · u− k∇T

· ndΓ =

∫Ω

0

ρfe

Wf + qH

dΩ. (3.6)

In (3.6), H is the total specific enthalpy defined in terms of the specific internal energy

h and the specific kinetic energy as H = e + p/ρ + 1/2|u|2 = E + p/ρ and h = e + p/ρ,

respectively. The above mentioned advective and diffusive fluxes are defined as

Fa =

ρui

ρu1ui + δi1p

ρu2ui + δi2p

ρu3ui + δi3p

(ρE + p)ui

, Fd =

0

τi1

τi2

τi3

τikuk − qi

. (3.7)

Here δij is the Kronecker isotropic tensor of rank 2 (also denoted as I), and τij are the

components of the Newtonian viscous stress tensor τij = 2µεij(u) − 2/3µ(∇ · u)δij. The

strain rate tensor εij is εij(u) = 12(∂jui + ∂iuj) and qi is the heat flux defined according

to the Fourier law assumptions as qi = −κ∇T with κ the thermal conductivity and

T the absolute temperature. The coefficients of viscosity and thermal conductivity are

assumed to be given by the Sutherland formula (i.e, the gas is considered in a standard

atmosphere),

µ = µ0

(T

T0

)3/2T0 + 110

T + 110κ =

γRµ

(γ − 1)Pr, (3.8)

where µ0 is the viscosity at the reference temperature T0 and Pr is the Prandtl number

(i.e., Pr = ν/ι, ι is the thermal diffusivity coefficient).

The physical model is closed by the definition of the constitutive law for the specific

internal energy in terms of the thermodynamic state and some state equation for the

thermodynamic variables. Normally an ideal gas law is adopted, then ρe = pγ−1

+ 1/2ρ||u||2

and p = ρRT , where R = (γ− 1)Cv is the particular gas constant and γ = Cp

Cvis the ratio

of the specific heat at constant pressure relative to that at constant volume. Alternatively,


Figure 3.7: Leopold Kronecker (1823–1891)

equation (3.6) can be written in the quasi-linear form

∂U

∂t+ Ai

∂U

∂xi

=∂

∂xi

(Kij

∂U

∂xj

)+ G (3.9)

where the assumption that the flux vectors are only function of the state variables, i.e.,

Fa = Fa(U) and Fd = Fd(U) is made. Then, the divergence of the flux vector functions

can be written as

∂Fa

∂xi

=∂Fa

∂U

∂U

∂xi

= Ai∂U

∂xi

(3.10)

and

∂Fd

∂xi

=∂Fd

∂U

∂U

∂xi

= Kij∂U

∂xi

. (3.11)

Inviscid Approximation

In some particular cases, when inertial forces are predominant over viscous effects and

no heat conduction is considered, the fluid motion is described by the Euler equations

and they are obtained from the Navier-Stokes equations neglecting all shear stresses and

heat conduction terms. This is a valid approximation for flows at high Reynolds numbers

(Re = ||u||L/ν, L is a characteristic length scale and ν is the kinematic viscosity). The

use of this approach changes the mathematical behavior of the set of equations. The set of

differential equations becomes first order and hyperbolic. The boundary conditions must

be reformulated and the solution can accept discontinuous variables. The imposition of

non-reflecting boundary conditions will be treated further.


Figure 3.8: Osborne Reynolds (1842–1912)

Variational Formulation

In this section, the variational formulation of the compressible Navier-Stokes equations

using SUPG Finite Element Method and the shock capturing operator is presented. Con-

sider a finite element discretization of the Ω into sub-domains Ωe, e = 1, 2, . . . , nel. Based

on this discretization, the finite element function spaces for the trial solutions and for

the weighting functions, Vh and Sh respectively, can be defined. These function spaces

are selected as subsets of [H1h(Ω)]ndof when taking Dirichlet boundary conditions, where

H1h(Ω) is the finite dimensional Sobolev functional space over Ω, and ndof = nsd + 2 is

the number of dof’s in the continuum problem (nsd is the number of spatial dimensions).

The stabilizing finite element formulation of the quasi-linear form of (3.5) is written

as follows: find Uh ∈ Sh such that ∀Wh ∈ Vh∫Ω

Wh ·(∂Uh

∂t+∂Fh

a

∂xi

)dΩ =

∫Ω

Wh ·(∂Fh

d

∂xi

+ G

)dΩ∫

Ω

Wh ·(∂Uh

∂t+ Ah

i

∂Uh

∂xi

− G

)dΩ +

∫Ω

∂Wh

∂xi

·Khij

∂Uh

∂xj

dΩ−∫

Γh

Wh ·HhdΓ+

+

nel∑e=1

∫Ωe

τ(Ahk)

T ∂Wh

∂xk

·∂Uh

∂t+ Ah

i

∂Uh

∂xi

− ∂

∂xi

(Kh

ij

∂Uh

∂xj

)− G

dΩ+

+

nel∑e=1

∫Ωe

δshc∂Wh

∂xi

· ∂Uh

∂xi

dΩ = 0,

(3.12)

where

Sh = Uh|Uh ∈ [H1h(Ω)]ndof ,Uh|Ωe ∈ [P 1(Ωe)]ndof ,Uh = g onΓg

Vh = Wh|Wh ∈ [H1h(Ω)]ndof ,Wh|Ωe ∈ [P 1(Ωe)]ndof ,Wh = 0 on∂Ωg,(3.13)


and where matrices Ai and Kij are defined in section §3.1.3.

The first three terms inside the first two integrals in the variational formulation (3.12)

constitute the Galerkin formulation of the problem and the third integral accounts for the

Neumann boundary conditions. The first series of element level integrals in (3.12) are the

SUPG stabilization terms added to prevent spatial oscillations in the advection-dominated

range. The second series of element level integrals in (3.12) are the shock capturing terms

added to assure the stability at high Mach and Reynolds number flows, specially to sup-

press spurious overshoot and undershoot effects in the vicinity of discontinuities. Various

Figure 3.9: Ernst Mach (1838–1916)

options for calculating the stabilization parameters and defining the shock capturing terms

in the context of the SUPG formulation were introduced in Reference [TMRS92]. In this

section we describe some of these options. The first one is the standard SUPG intrinsic

time tensor τ introduced by Aliabadi and Tezduyar in Reference [ART93]. In this case

this matrix is defined as τ = max[0, τa − τd − τδ], with each τx taking into account the

advective and diffusive effects and also avoiding the duplication of the shock capturing

operator and the streamline upwind operator. These matrices are defined as

τa =h

2(c+ |u|)I, τd =

∑nsd

j=1 β2j diag (Kjj)

(c+ |u|)2I, τδ =

δshc

(c+ |u|)2I, (3.14)

where c is the acoustic speed, h = 2(∑nen

a=1 |u · ∇Na|)−1

is the element size computed

here as the element length in the direction of the streamline using for its definition the

multi-lineal trial function Na and δshc is the shock capturing parameter defined in the

next paragraph. The τ matrix computation is already an open problem because it is

not possible to diagonalize the system of equations. It follows some heuristics arguments

based on the maximum value of the set of eigenvalues of the advective Jacobian matrices


for the characteristic velocity, some measure of the element size that may not be very well

justified but is equivalent to any other element size and some mechanism able to remove

stabilization when physical diffusion is present.

The design of the shock capturing operator is also an open problem. Two versions are

presented here: an isotropic operator and an anisotropic one, both proposed by Tezduyar

et al. in [TS04]. A unit vector oriented with the density gradient is defined as j =

∇ρh/|∇ρh| and a characteristic length as hJGN = 2 (∑nen

a=1 |j · ∇Na|)−1, where Na is the

finite element shape function corresponding to the node a. The above cited isotropic

shock capturing factor included in (3.12) is then defined as

δshc =hJGN

2uchar

(|∇ρh|hJGN

ρref

)β

, (3.15)

where uchar = |u| + c is the characteristic velocity defined as the addition of the flow

velocity magnitude and the acoustic speed. Here ρref is the gaussian point interpolated

density and β parameter may be taken as 1 or 2 according to the sharpness of the dis-

continuity to be captured as suggested in Reference [TS04]. However, only β∗ = 1 was

successfully used in this study.

The anisotropic version of the shock capturing term in (3.12) is changed as follows

nel∑e=1

∫Ωe

∂Wh

∂xi

jiδshcjk∂Uh

∂xk

dΩ. (3.16)

The anisotropic shock capturing term showed good behavior. Nevertheless, for some

applications, both terms may be needed, the isotropic one weighted by a factor close to

0.2 or lower.

Test and Results

For this test a constant viscosity µ = 2.5 · 10−5 Kg/m sec is adopted and the Reynolds

number based on the flat plate length and the free stream state is 104. The test case is

an isothermal flow at Mach M = 5 at the inlet wall. The thermal conductivity coefficient

is κ = 3.47 · 10−5W/mK and the plate is located 0.02 m from the inflow wall. The

characteristic length is the length of the plate, L = 0.25 m. The free-stream Prandtl

number based on this length is 0.72, while the gas constant is R = 287J/(kgK) and the

specific heat ratio of the gas is γa = 1.4. The temperature and pressure of the free stream

are T∞ = 80K and p∞ = 105Pa, respectively and the temperature of the flat plate surface

is Twall = 288K.

Experimental and theoretical data are available for the skin friction coefficient and the

wall Stanton number (heat conduction problem). This problem was successfully solved


using the IISD+ISP solver and the overlapping additive Schwarz preconditioner, but

it was not possible to obtain a solution with a Global GMRes with diagonal scaling

preconditioner (i.e., GMRes over the whole matrix with point Jacobi preconditioning)

solver, using for the three cases a Krylov subspace dimension of 200. In the latter case,

the solution presented poor resolution of the strong shock wave after some time steps

and finally crashed. It should be remarked that up to M = 2.5 preconditioned Global

GMRes iteration works fine, giving results in agreement with experimental results and

theoretical approaches. The number of sub-domains used for the IISD+ISP case is 4.

For the Schwarz scheme, 4 sub-blocks (an ILU(0) solver is used on each block) and an

overlapping of a single layer of nodes around the interface between the blocks are used.

This kind of example is for cases where the computational resources are limited to

a single processor architecture and it is not possible to get a solution using the precon-

ditioned Global GMRes scheme. The mesh used was composed by 24150 quadrangular

elements and 24462 nodes. In order to capture the high thermal and flow gradients, the

normal spacing close to the flat plate was chosen about 4 ·10−6 and the time step adopted

was ∆t = 0.005. The initial state adopted is a stationary flow at Mach 2.5 at the inlet,

previously obtained via the IISD+ISP method. Two Newton loops were used for the

non-linear problem.

Figures (3.10) and (3.11) show the skin friction coefficient and the Stanton number

against theoretical predictions based on analytical solutions of an approximate theory

called Eckert reference enthalpy method [GLD94]. These results show a good behavior of

the numerical results relative to the analytical predictions. The test was conducted in a

PC Pentium IV - 2.8 GHz (RAM DDR, 400 MHz). The CPU time per time steps (i.e.,

less than 3 minutes in average) and the residual convergence rate (roughly 150-170 itera-

tions to converge 7 orders of magnitude) were comparable for both domain decomposition

methods. In the IISD+ISP solver, the sub-domain problems are solved with a LU de-

composition with nested dissection reordering. If complete LU factorization is used in the

Schwarz method, the memory requirements and CPU time per time steps are increased.

3.2 Numerical Examples in Parallel Environment

In this section, we present numerical results for diffusive and advective problems and some

discussions about these results. The tests were carried out on a Beowulf cluster of PC’s.

The cluster at CIMEC laboratory has twenty (uniprocessor) nodes, where 10 nodes are

Pentium IV - 2.4 GHz, 1 GB RAM (DDR, 333 MHz), 7 nodes Pentium IV - 1.7 GHz,

512 MB RAM (RIMM, 400/800 MHz) and 2 nodes Pentium IV 1.7 GHz, 256 MB RAM

3.2. Numerical Examples in Parallel Environment 61

0 0.05 0.1 0.15 0.2 0.25−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

skin

fric

tion

coef

ficie

nt

coordinate along plate

IISD+ISPAdditive Schwarztheoretical

Figure 3.10: Skin friction coefficient

0 0.05 0.1 0.15 0.2 0.2510−3

10−2

10−1

100

Sta

nton

num

ber

coordinate along plate

IISD+ISPAdditive Schwarztheoretical

Figure 3.11: Stanton number


Table 3.1: CPU time and memory requirements per proc. for Poisson problem (mesh 500×500 elements). Note: * in table means iteration failed to converge to a specified tolerance

in a maximum of 200 its.

Precond. none Jacobi glob. block − Jacobi N −N ISP (nlay = 1) ISP (nlay = 5)

factoriz. [secs] - - 1.9 4.7 2.3 2.3

CG st. [secs] * * * 1.51 5.4 4.9

tolerance 1.e-10 1.e-10 1.e-10 1.e-10 1.e-10 1.e-10

mem./proc [Mb] * * * 70 62 62.5

(RIMM, 400/800 MHz). Usually, the first node works as server. The nodes are connected

through a switch Fast Ethernet (100 Mbit/sec, latency=O(100) µsecs).

The performance of the proposed preconditioner is studied in a parallel environment.

For this purpose, we consider two different problems. The domain Ω in both cases is

the unit square discretized on an structured mesh of 500 × 500 nodes, and decomposed

in 4 rectangular sub-domains. We compare the residual norm versus iteration count by

using no preconditioner, Neumann-Neumann preconditioner, block-Jacobi preconditioner,

Global Jacobi preconditioner and the IS preconditioner (with several strip widths at the

interfaces). Global Jacobi is a diagonal scaling preconditioning algorithm. Block-Jacobi

preconditioner is a block-diagonal preconditioner and is obtained by (approximately) in-

verting the local diagonal blocks on each processor (see [Saa00] for a detailed description

of these preconditioners).

3.2.1 The Poisson’s Problem

As in the sequential run (section §3.1.1), the parallel version of this test has shown a good

behavior in terms of performance related to the classical preconditioners and the global

solution. The iteration counts and the problem solution are plotted in Figure 3.12. We

split the system solution in two stages, the factorization stage (for the local problems) and

the GMRes iteration stage (including the Richardson iteration for the IS preconditioner),

in order to compute the time consumed to achieve a given tolerance in the residual vector

(see Table 3.1). CPU times for the iteration stage and memory requirements are not

given in Table 3.1 for Jacobi preconditioning and not preconditioning at all because these

methods failed to converge.


1e−12

1e−10

1e−08

1e−06

0.0001

0.01

1

0 10 20 30 40 50 60 70 80

90 100

resi

dual

nor

m =

||r(

n)||/

||r(0

)||

iteration number

Neumann−Neumann

IS n=1 layers

IS n=5 layers

Block jacobi NoneJacobiglobal

Figure 3.12: Solution of Poisson’s problem (mesh 500× 500 elements)

3.2.2 The Scalar Advective-Diffusive Problem

The second example is an advective-diffusive problem at a global Peclet number of

Pe = 25, g = δ(1/4, 3/4) + δ(3/4, 1/4), and φ(−0.5, y) = 0, where δ is the Dirac’s delta func-

tion. Therefore, the problem is strongly advective. We compare the iteration counts in

two different meshes and two different decompositions. The mesh of 500 × 500 nodes is

decomposed in 4 rectangular domains, one per processor, and the mesh of 1000 × 1000

is partitioned into 7 sub-domains. The iteration count and the problem solution (inter-

polated in a coarse mesh for visualization purposes) are plotted in Figures 3.13 and 3.14.

In this example, the advective term introduces a strong asymmetry. CPU times and

memory requirements are not given in Table 3.2 for N-N preconditioner because this

method failed to converge. However, only to give an idea, the required memory for N-

N preconditioner (coarse mesh) for 50/60 iterations (IS was converged at this point) is

73 Mb/proc (megabytes per processor), whereas for 200 iterations (the maximum al-

lowed) the consumed memory was 120 Mb/proc. For the refined mesh, the memory used

in 70/80 iterations is 210 Mb/proc and for the 200 iterations (the maximum allowed)

was 320 Mb/proc. Clearly, the Neumann-Neumann preconditioner is outperformed by

IS preconditioner in iteration count (and consequently in computing time) and memory

demands, even for thin strips. The CPU time and memory used (per processor) are shown

in Table 3.2.


1e−09

1e−08

1e−07

1e−06

1e−05

0.0001

0.001

0.01

0.1

1

0 10 20 30 40 50 60 70 80 90 100

resi

dual

nor

m =

||r(

n)||/

||r(0

)||

iteration number

IS n=1 layers

IS n=5 layers

BlockJacobi

None Jacobiglobal

Neumann−Neumann

Figure 3.13: Solution of advective-diffusive problem (mesh 500× 500 elements)

1e−07

1e−06

1e−05

0.0001

0.001

0.01

0.1

1

0 10 20 30 40 50 60 70 80 90 100

resi

dual

nor

m =

||r(

n)||/

||r(0

)||

iteration number

IS n=5 layers

IS n=1 layers

NoneJacobiGlobal

Neumann−Neumann

Figure 3.14: Iteration counts for advective-diffusive problem (mesh 1000×1000 elements)


Table 3.2: CPU time and memory requirements per proc. for advective-diffusive problem

(mesh 1000 × 1000 elements). Note: * in table means iteration failed to converge to a

specified tolerance in a maximum of 200 its.

Preconditioner none Jacobi glob. N −N ISP (nlay = 1) ISP (nlay = 5)

factoriz. [secs] - - 4.0 8.0 7.8

GMRes st. [secs] * * * 13.0 12.0

tolerance 0.25e-06 0.25e-06 0.25e-06 0.25e-06 0.25e-06

mem./proc. [Mb] * * * 140 142

3.2.3 The Coupled Hydrological Flow Model

Subsurface Flow

The equation for the flow in a confined (phreatic) aquifer integrated in the vertical direc-

tion is∂

∂t(S(φ− η)φ) = ∇ · (K(φ− η)∇φ) +

∑Ga, on Ωaq × (0, t], (3.17)

where the per-node property η represents the height of the aquifer bottom to a given

datum. The corresponding unknown for each node is the piezometric height or the level

of the phreatic surface at that point φ and Ωaq is the aquifer domain, S the storativity,

K the hydraulic conductivity and Ga is a source term, due to rain, losses from streams

or other aquifers.

Surface flow [Whi74, Hir90]

2D Saint-Venant Model. The equations for the 2D Saint-Venant open channel flow

are the well known mass and momentum conservation equations integrated in the ver-

tical direction. If we write these equations in the conservation matrix form (Einstein

summation convention is assumed), we have

∂U

∂t+∂Fi(U)

∂xi

= Gi(U), i = 1, 2, on Ωst × (0, t], (3.18)

where Ωst is the stream domain, U = (h, hu, hv)T is the state vector and the advective

flux functions in (3.18) are

F1(U) = (hu, hu2 + gh2

2, huv)T ,

F2(U) = (hv, huv, hv2 + gh2

2)T ,

(3.19)


Figure 3.15: Adhemar Jean Claude Barre de Saint-Venant (1797–1886)

where h is the height of the water in the channel with respect to the channel bottom,

u = (u, v)T is the velocity vector and g is the acceleration due to gravity. Here Gs

represents the gain (or loss) of the river, the source term is

G(U) = (Gs, gh(S0x − Sfx), gh(S0y − Sfy))T , (3.20)

where S0 is the bottom slope and Sf is the slope friction, given by

Sfx =1

C2hhu|u|, Sfy =

1

C2hhv|u| for the Chezy model,

Sfx =n2

h4/3u|u|, Sfy =

n2

h4/3v|u| for the Manning model,

(3.21)

where Ch and n (the Manning roughness) are model constants. In the case of great

lakes, wide rivers and estuaries we should take in account the effect of Coriolis force

(see [PSI+03]).

1D Saint-Venant Model. When velocity variations on the channel cross section are

neglected, the flow can be treated as one dimensional. The equations of mass and mo-

mentum conservation on a variable cross sectional stream (in conservation form) are

∂A(s, t)

∂t+∂Q(A(s, t))

∂s= Gs(s, t),

1

A(s, t)

∂Q

∂t+

1

A(s, t)

∂

∂s(β

Q2

A(s, t)) + g(S0 − Sf )+

+g∂h

∂s=

qtA(s, t)

(v − vt), on Ωst × (0, t],

(3.22)


where A is the cross sectional area, Q is the discharge, Gs(s, t) represents the gain or loss

of the stream (i.e., the lateral inflow per unit length of channel), s is the arc-length along

the channel, v = Q/A the average velocity in s-direction, vt the velocity component in

s-direction of lateral flow from tributaries and the Boussinesq coefficient β = 1v2A

∫u2dA

(u the flow velocity at a point). The bottom shear stresses are approximated by using

the Chezy or Manning equations,

Sf =v2

C2h

P (h)

A(h),Chezy model.

Sf =

(n

a

)2

v2P4/3(h)

A4/3(h),Manning model.

(3.23)

where P is the wetted perimeter of the channel and a is a conversion factor (a = 1 for

metric units).

Boundary Conditions

Boundary Conditions to Simulate River-Aquifer Interactions/Coupling Term.

The stream/aquifer interaction process occurs between a stream and its adjacent flood-

plain aquifer. The coupling term is not explicitly included in equation (3.17) but it is

treated as a boundary flux integral. At a nodal point we can write the coupling as

Gs = P/Rf (φ− hb − h), (3.24)

where Gs represents the gain or loss of the stream, and the main component is the loss

to the aquifer and Rf is the resistivity factor per unit arc length of the perimeter. The

corresponding gain to the aquifer is

Ga = −Gs δΓs , (3.25)

where Γs represents the planar curve of the stream and δΓs is a Dirac’s delta distribution

with a unit intensity per unit length, i.e.,∫f(x) δΓs dΣ =

∫ L

0

f(x(s)) ds. (3.26)

The stream loss element set represents this loss, and a typical discretization is shown in

Figure 3.16. The stream loss element is connected to two nodes on the stream and two

on the aquifer. If the stream level is over the phreatic aquifer level (hb + h > φ) then the

stream losses water to the aquifer and vice versa. Contrary to standard approaches, the

coupling term is incorporated through a boundary flux integral that arises naturally in

the weak form of the governing equations rather than through a source term.


x

y

stream

aquifer node

stream node

n1n2

n3

n4

n5

Figure 3.16: Stream/Aquifer coupling

Initial Conditions. First, Second and Third Kind Boundary Conditions. Ground-

water flow. In the previous section, the equation that governs subsurface flow was es-

tablished. In order to obtain a well posed PDE problem, initial and boundary conditions

must be superimposed on the flow domain and on its limits. The initial condition for the

groundwater problem is a constant hydraulic head in the whole region that obeys levels

observed in the basin history.

Now, consider a simply connected region Ω bounded by a closed curve ∂Ω such that

∂Ωφ ∪ ∂Ωσ ∪ ∂Ωφσ = ∂Ω. We consider the stream partially penetrating and connected, in

a Hydraulic sense, to the aquifer, hence, we set

φ = φ0, on ∂Ωφ × (0, t]

K(φ− η)∂φ

∂n= σ0, on ∂Ωσ × (0, t]

K(φ− η)∂φ

∂n= C(φ− h), on ∂Ωφσ × (0, t],

(3.27)

where φ0 is a given water head, σ0 is a given flux normal to the flux boundary ∂Ωσ and

C the conductance at the river/stream interface.

Surface Flow - Fluid Boundary. We recall that the type of a flow in a stream or in

an open channel depends on the value of the Froude number Fr = |u|/c (where c =√gh

is the wave celerity). A flow is said

• fluvial, for |u| < c.

• torrential, for |u| > c.

Saint-Venant Equations. Fluvial Boundary


• inflow boundary: u specified and the depth h is extrapolated from interior points,

or vice versa.

• outflow boundary: depth h specified and velocity field extrapolated from interior

points, or vice versa.

Torrential Boundary

• inflow boundary: u and the depth h are specified.

• outflow boundary: all variables are extrapolated from interior points.

Solid Wall Boundary Condition. We prescribe the simple slip condition over

Γslip (⊂ Γst)

u · n = 0. (3.28)

Upon using the SUPG Galerkin finite element discretization procedure similar to the

formulation described in §3.1.2 with linear triangles and/or bilinear rectangular elements

and the trapezoidal rule for time integration, we obtain the system to be solved at each

time step

R = K(U)[θUk+1 + (1− θ)Uk] + B(U)Uk+1 −Uk

∆t−Qk+1, (3.29)

where θ is the time-weighting factor satisfying 0 ≤ θ ≤ 1, ∆t is the time increment and

k denotes the number of time steps. K and B are the stiffness nonsymmetric matrix and

the symmetric mass matrix, respectively (K and B depend on U), Q is the source vector

and R is the residual vector.

Saint-Venant Numerical Example

The example is a 2D Saint-Venant subcritical flow over an impermeable unit square chan-

nel with a parabolic bump in the bottom and a sinusoidal wave-train perturbation in

x−velocity at the inflow boundary. The parabolic variation of the bottom has the form

η(x, y) = minh1, h2+(h1−h2)(r/R)2, where r is the distance to the center of the bump,

located at (0, 0), h1 = 1, h2 = 0.5 and R = 0.3. The period of the incident plane wave

is T = 0.1 sec. Hence, roughly, five wave-lengths enter in the diameter of the bump.

The initial global Froude and Courant numbers (based in longitudinal velocity u) are

Fr = u/√gh = 0.3 and C = u∆t

∆x= 15. Null flux is considered in y = ±0.5 and fluvial

boundary conditions at the inflow/outflow sections. For the computations we use the


Chezy model with friction coefficient Ch = 110 m1/2/sec. The mesh of 105 linear triangles

was partitioned with METIS into five sub-domains (one per processor).

The iteration counts for the linear system corresponding to a typical Newton iteration

at a given time step is plotted in Figure 3.17. Figure 3.18 shows the elevation for the

steady periodical state. In this example, the system of conservation laws (3.18) introduces

a strong asymmetry. As in the linear advection-diffusion problem, the IS preconditioner

improves the iteration counts and memory demands. Although each iteration is more ex-

pensive for the IS preconditioner, the consumed time to reach a given tolerance is smaller.

The CPU consumed time, tolerances and consumed memory are shown in Table 3.3.

1e−07

1e−06

1e−05

0.0001

0.001

0.01

0.1

1

0 20 40 60 80 100 120 140 160 180 200

resi

dual

nor

m =

||r(

n)||/

||r(0

)||

iteration number

IS n=5 layers

IS n=1 layers

block Jacobi

None

Jacobi global

Figure 3.17: Iteration counts for Saint-Venant system of equations (mesh 500 × 500 ele-

ments)

Coupled Surface-Subsurface Flow Numerical Test

In this section two examples of surface and subsurface interaction flow for the Cululu

basin are presented. The cases have periodic rainfall. Different species with different

evapotranspiration have been planted. The first case is a random soybean plantation

(50 % of total area and an evapotranspiration 50% less than eucalyptus plantation). In

the second case only eucalyptus are planted. A period of 12 months is simulated where

the total precipitation is the annual average precipitation observed in last years (1000

mm/year), but divided in two wet seasons with a rainfall rate of 2000 mm/year (april-


Figure 3.18: Solution of Saint-Venant system of equations (mesh 500× 500 elements)

Table 3.3: CPU time and memory requirements for Saint-Venant equations (mesh 500×500 elements). Note: * in table means iteration failed to converge to a specified tolerance

in a maximum of 400 its.

Preconditioner none Jacobi glob. block − Jacobi ISP (nlay = 1) ISP (nlay = 5)

factorization [secs] - - 8.1 9.0 9.2

GMRes stage [secs] * * 522 68 43

tolerance 1.e-05 1.e-05 1.e-05 1.e-05 1.e-05

memory/proc. [Mb] * * 605 548 550


march and september-october) and dry seasons of 500 mm/year (the rest of the year).

At time t = 0 the piezometric height in the phreatic aquifer is 30 meters above the

aquifer bottom, while the water height in stream is 10 meters above the streambed. The

hydraulic conductivity and storativity of phreatic aquifer are 2 · 10−3 and 2.5 · 10−2 msec

,

respectively. The Manning friction law is adopted for this case. The roughness of stream

channel is 3 · 10−3 and the river width is 10 meters. The stream loss resistivity average

value is 105sec. A mesh of 96131 triangular elements and 48452 nodal points is used

to represent the aquifer domain. The average space between nodal river points is 100

meters. The time step adopted in both cases is Dt = 1 day. Figure 3.19 we can see the

1e−07

1e−06

1e−05

0.0001

0.001

0.01

0.1

1

0 50 100 150 200 250 300 350 400

resi

dual

nor

m =

||r(

n)||/

||r(0

)||

iteration number

IS n=5 layers

IS n=1 layers

Jacobi global

none

Figure 3.19: Iteration counts for the coupled flow

iteration counts for different preconditioners. In Figure 3.21 the correlation between the

presence of soybean and a greater phreatic elevation with respect to the levels observed

in the case where only eucalyptus are present is shown. In Figure 3.22 shows the phreatic

elevation after two years of simulation. The time adopted to solve each time step in the

non-linear coupled problem with six processors Pentium IV 1.4-1.7 Ghz and 512 Mb RAM

(Rambus) connected through a switch Fast Ethernet (100 Mbit/sec, latency=O(100)) was

13.2 seconds in average.


Figure 3.20: Soybean location

50% soybean − 50% eucalyptus

* only eucalyptus

soybean placement

−3

−2

−1

0

1

2

3

4

−40000 −30000 −20000 −10000 0 10000 20000 30000 40000 50000 60000

phi [

mts

]

x (profile at y=0 mts [north−south])

water level difference in freatic aquifer [mts]

Figure 3.21: Difference in phreatic levels for both cases


Figure 3.22: Aquifer State at t=2 years

3.2.4 The Stokes Flow in a Long Horizontal Channel

Note: The cluster architecture for the tests given hereafter is slightly different to the

architecture used for the above tests. The tests were carried out on sixteen (uniprocessor)

nodes Pentium IV - 2.8 GHz, 2 GB RAM (DDR, 400 MHz). The nodes are connected

through a switch Fast Ethernet (100 Mbit/sec, latency=O(100) µsecs, each node has a

3COM 3c509 (Vortex) Nic cards).

Triggered by observed discrepancies between experimental results and computer sim-

ulations [KK03] using standard solvers [SB03] (i.e., GMRes with acceptable rates of con-

vergence), this example shows the improvement of the solution of lubricated contacts by

means of the ISP preconditioner. So far the lubricant flow in the narrow gap between two

contacting elements has been described using the Reynolds equation. This equation fol-

lows from the Navier-Stokes equations at low Reynolds number (Re < 1) when a narrow

gap is assumed (i.e., when e = H/L 1 ifH is the gap width and L a characteristic length

scale). Nominally the assumption e 1 will generally hold. An accurate description of

the flow then requires the use of incompressible laminar Navier-Stokes model.


Incompressible Navier-Stokes Equations

The incompressible Navier-Stokes equations present two important difficulties for solving

it with finite elements. First, the character of the equation becomes highly advective

dominant when the Reynolds number increases. In addition, the incompressibility con-

dition does not represent an evolution equation but a constraint on the equations. This

is a drawback because only some combination of interpolation spaces for velocity and

pressure can be used with the Galerkin formulation, namely those ones that satisfy the

so-called Ladyzhenskaya-Brezzi-Babuska condition. In the formulation of Tezduyar et al.

advection is stabilized with the well known SUPG stabilization term, and a similar stabi-

lization term called PSPG is included in order to stabilize incompressibility. In this way, it

is possible to use stable equal order interpolations. Once these equations are discretized,

the resulting system of ODE’s is discretized in time with the standard trapezoidal rule

(backward Euler and Crank-Nicolson schemes are allowed to be used). The resulting

non-linear system of equation is solved iteratively at every time step. Viscous flow is

Figure 3.23: Olga Alexandrovna Ladyzhenskaya (1922–2004)

well represented by Navier-Stokes equations. The incompressible version of this model

includes the mass and momentum balances that can be written in the following form. Let

Ω ∈ Rnsd and (0, t+] be the spatial and temporal domains respectively, where nsd is the

number of space dimensions, and let Γ be the boundary of Ω. Thus, the equations are

∇ · u = 0 in Ω× (0, t+]

ρ(∂u

∂t+ u · ∇u)−∇ · σ = 0 in Ω× (0, t+],

(3.30)


Figure 3.24: Phyllis Nicolson (1917–1968)

Figure 3.25: John Crank (1916–)

with ρ and u the density and velocity of the fluid and σ the stress tensor, given by

σ = −pI + 2µ∗ε(u)

ε(u) = 1/2(∇u + (∇u)t),(3.31)

where p is the pressure and µ∗ is the effective dynamic viscosity defined as sum of the

dynamic (molecular) viscosity and the algebraic eddy viscosity of the LES model proposed

by Smagorinsky [Sma63], i.e., µ∗ = µ+ µSGS. Here I represents the identity tensor and ε

the strain rate tensor.


The initial and boundary conditions are

Γ = Γg ∪ Γh

Γg ∩ Γh = ∅

u = g at Γg

n · σ = h at Γh,

u(t = 0) = u0 ∀x ∈ Ω

p(t = 0) = p0 ∀x ∈ Ω,

(3.32)

where Γg and Γh are the Dirichlet and Neumann boundaries, respectively. When the flow

velocity is very small (i.e., the fluid is very viscous) or the geometric dimensions are very

small, that is when Reynolds number is very small, the inertial term in (3.30) plays a

minor role and the flow is dominated by the viscous and the pressure gradient terms.

This is the so-called ‘Stokes flow’.

Spatial Discretization. The spatial discretization has equal order for pressure and ve-

locity and is stabilized through the addition of two operators. Advection at high Reynolds

numbers is stabilized with the well known SUPG operator, while the PSPG operator pro-

posed by Tezduyar et al. [TMRS92] stabilizes the incompressibility condition, which is

responsible of the checkerboard pressure modes.

The computational domain Ω is divided in nel finite elements Ωe, e = 1, . . . , nel, and

let E be the set of these elements, and H1h the finite dimensional space defined by

H1h =φh|φh ∈ C0(Ω), φh|Ωe ∈ P 1,∀Ωe ∈ E

, (3.33)

with P 1 representing polynomials of first order. The functional spaces for the interpolation

and weight functions are defined as

Shu = uh|uh ∈ (H1h)nsd ,uh=gh on Γg

V hu = wh|wh ∈ (H1h)nsd ,wh=0 on Γg

Shp = q|q ∈ H1h .

(3.34)


The SUPG-PSPG scheme is written as follows: Find uh ∈ Shu and ph ∈ Sh

p such that∫Ω

wh · ρ(∂uh

∂t+ uh · ∇uh

)+

∫Ω

ε(wh) : σhdΩ+

+

nel∑e=1

∫Ω

δh ·[ρ(∂uh

∂t+ uh · ∇uh)−∇ · σh

]︸︷︷︸

(SUPG term)

+

+

nel∑e=1

∫Ω

εh ·[ρ(∂uh

∂t+ uh · ∇uh)−∇ · σh

]︸︷︷︸

(PSPG term)

+

+

∫Ω

qh∇ · uhdΩ =

∫Γh

wh · hhdΓ, ∀wh ∈ V hu , ∀qh ∈ Sh

p ,

(3.35)

where the stabilization parameters in equation (3.35) are defined as

δh = τSUPG(uh · ∇)wh

εh = τPSPG1

ρ∇qh

τPSPG = τSUPG =helem

2||uh||z(Reu).

(3.36)

Note that the SUPG and the PSPG terms are defined on different functional spaces.

These stabilizations terms act, at the linear system level, adding nonzero values on the

diagonal entries associated with the pressure equations. The Reynolds number Reu based

on the element parameters is

Reu =||uh||helem

2ν, (3.37)

and the element size helem is computed as,

helem = 2( nn∑

a=1

|s · ∇Na|)−1

, (3.38)

Na being the shape function associated with the node a, nn the number of nodes in the

element, and s a unit vector on the streamline direction. The function z(Re) is defined

as

z(Re) =

Re/3 0 ≤ Re < 3

1 3 ≤ Re.(3.39)

Test and Results

The channel is 8 · 10−5m width and 9 · 10−2m long. The dynamic viscosity used is 5.33 ·10−4m2/sec and no body forces are considered. The Reynolds number based on the


channel width is Re = 0.1 and the aspect ratio of the quadrilateral elements is 5 to assure

non-stretched elements. This problem leads to an ill-conditioned matrix due to the high

aspect ratio of channel dimensions and a big number of residual vectors (iterations) in

Krylov methods are needed to converge to an accurate solution. The non-linear steady

simulation with a maximum of 100 Newton loops is considered. The normalized residuals

in the solution step of the linear system are shown in Figure (3.26) for all Newton iterations

(hereafter nnwt is the number of iterations in the non-linear loop). In the case of Global

GMRes solver (point Jacobi preconditioning is assumed hereafter for this method) two

Krylov subspace dimensions are considered (i.e., 400 and 800). The test was conducted in

16 nodes and each sub-domain was sub-partitioned into 7 interior sub-domains (2000 dof’s

per interior sub-domains in average) in the case of IISD+ISP solver. The interface strip

width used is nlay = 1 (see Reference [PS05]). For the overlapping additive Schwarz and

the block-Jacobi methods 7 sub-blocks per processor were chosen (ILU(0) decomposition

is used on each block). As in previous tests, sub-blocks overlap (for the overlapping

additive Schwarz) each other by a layer of nodes. In Stokes flow the convective terms

are quite small. However as these terms remain in the formulation they lead to a weak

non-linear problems with non-symmetrical matrices. For this reason GMRes iteration is

accomplished. The core memory demanded for the IISD+ISP solver was 48.9 Mb per

0 10000 20000 30000 400001e−11

1e−09

1e−07

1e−05

0.001

0.1

1

Iteration counts

resi

dual

nor

m =

||r(

n)||/

||r(0

)||

IISD+ISP / nlay

=1 / 7 sub−domains / nnwt=2

Additive Schwarz / 7 sub−blocks / nnwt=20

Block Jacobi / 7 sub−blocks / nnwt=22

global GMRes / Kdim

=800 / Jacobi prec. / nnwt=100

global GMRes / Kdim

=400 / Jacobi prec. / nnwt=100

Figure 3.26: Residual history

processor at each Newton iteration, including the LU factorization stage (solution of local

problems) and the GMRes iteration (solution of inter-subdomain problems). The CPU

time was 0.33 minutes per Newton loop.

The memory used in the Global GMRes stage was 126.9 Mb per processor for a Krylov


subspace dimension (KSPdim) of 800 and 63.2 Mb per processor for a KSPdim of 400. The

CPU time was 6.61 minutes and 1.87 minutes per Newton iteration, respectively.

For the overlapping additive Schwarz preconditioning the consumed memory and the

CPU time per Newton iteration were 107 Mb and 1.1 minutes, respectively. Block-Jacobi

scheme consumed 99.3 Mb and 0.97 minutes per non-linear iteration.

In Figures (3.27), (3.28), (3.29) and (3.30) the numerical solution of the horizontal

velocity and pressure fields are compared to the analytical one. Figures (3.27) and (3.28)

correspond to the solution of both fields after one loop in the Newton scheme and Fig-

ures (3.29) and (3.30) correspond to one hundred iterations in Newton’s loop for the

preconditioned Global GMRes method (20 and 22 iterations for additive Schwarz and

block-Jacobi schemes, respectively). For the IISD+ISP solver the residual of the Newton

loop after three iterations was 10−14 and we consider that there is no need to go further

in this loop to converge to the solution. The same residual tolerance was obtained by ad-

ditive Schwarz and block-Jacobi preconditioned on the 20th and 22nd loops, respectively.

Clearly, in this case IISD+ISP outperforms other domain decomposition techniques not

only in memory and CPU time demands but also in the number of non-linear iterations

to achieve a given tolerance. Figures 3.27 and 3.29 show a slight loss in momentum due

0 2 4 6 8x 10

−5

0

0.2

0.4

0.6

0.8

1

y−coordinate [m]

x−ve

loci

ty [m

/sec

]

global GMRes Kdim

=400/Jacobi prec.global GMRes K

dim=800/Jacobi prec.

IISD+ISP nlay

=1

Additive Schwarz/7 sub−blocksBlock Jacobi prec./7 sub−blocksanalytical solution

Figure 3.27: Velocity field in the channel height (nnwt=1)

to the coarse discretization in the transversal direction in order to maintain the aspect

ratio of the elements.

This example was the first evidence that has inspired the article published in [PNS06].

It shows that for high aspect ratio geometries the Global GMRes suffers of a strong


0 0.02 0.04 0.06 0.08 0.10

1

2

3

4

5

6x 10

4

y−coordinate [m]

Pre

ssur

e [P

a]

global GMRes Kdim

=400/Jacobi prec.global GMRes K

dim=800/Jacobi prec.

IISD+ISP nlay

=1

Additive Schwarz/7 sub−blocksBlock Jacobi prec./7 sub−blocksanalytical solution

Figure 3.28: Pressure field along channel (nnwt=1)

0 2 4 6 8x 10

−5

0

0.5

1

1.5

y−coordinate [m]

x−ve

loci

ty [m

/sec

]

global GMRes Kdim

=400/Jacobi prec./nnwt=100global GMRes K

dim=800/Jacobi prec./nnwt=100

IISD+ISP nlay

=1/nnwt=3

Additive Schwarz/7 sub−blocks/nnwt=20Block Jacobi prec./7 sub−blocks/nnwt=22analytical solution

Figure 3.29: Velocity field in the channel height (nnwt=100 for Global GMRes, nnwt=3

for IISD+ISP, nnwt=20 for additive Schwarz, nnwt=22 for block-Jacobi)

convergence deterioration and even though using an unusually high size of Krylov subspace

dimension the final solution is unacceptable.


0 0.02 0.04 0.06 0.08 0.10

1

2

3

4

5

6

7

8

x 104

y−coordinate [m]

Pre

ssur

e [P

a]

global GMRes Kdim

=400/Jacobi prec./nnwt=100global GMRes K

dim=800/Jacobi prec./nnwt=100

IISD+ISP nlay

=1/nnwt=3

Additive Schwarz/7 sub−blocks/nnwt=20Block Jacobi prec./7 sub−blocks/nnwt=22analytical solution

Figure 3.30: Pressure field along channel (nnwt=100 for Global GMRes, nnwt=3 for

IISD+ISP, nnwt=20 for additive Schwarz, nnwt=22 for block-Jacobi)

3.2.5 The Viscous Incompressible Navier-Stokes Flow Around

an Infinite Cylinder

The unsteady viscous external flows past objects have been extensively studied (experi-

mentally and numerically, see References [BCHM86, Cho73, ST90, BLST90, Ros54, Wil85,

Nor01]) because of their many practical applications. For example, airfoils have stream-

line shapes in order to increase the lift and reduce the aerodynamic drag exerted on the

wings at the same time. Another example is the flow past a blunt body, such as a cir-

cular cylinder (e.g., the wind forces acting on the tensors of a hanging bridge), usually

experiences boundary layer separation and very strong flow oscillations in the wake region

behind the body. This example is directly related to a great number of problems. In cer-

tain Reynolds number range, a periodic flow motion will develop in the wake as a result of

the boundary layer vortex being shed alternatively from either side of the cylinder. This

regular pattern of vortices in the wake is called the von-Karman vortex street. It creates

an oscillating flow at a discrete frequency that is correlated to the Reynolds number of

the flow. The periodic nature of the vortex shedding phenomenon can sometimes lead

to unwanted structural vibrations, especially when the shedding frequency matches one

of the resonant frequencies of the structure. In order to generate vortex shedding, an

artificial perturbation may be imposed introducing for example a rotation of the cylinder

for a short time. The perturbations introduced correspond to a clockwise rotation of the

cylinder followed by a counterclockwise rotation (it is of the same nature as the pertur-


Figure 3.31: Theodore von Karman (1881–1963)

bation used by Braza et al. [BCHM86]). The problem is 2D and the cylinder radius is

1 m. On the inlet boundary a uniform free stream velocity (||u|| = 1) is imposed. On the

outlet section the pressure is equal to a reference value (zero in this test) and the velocity

vector has no component in the y-direction. On the top and bottom walls a slip condition

is adopted. These boundaries are located far enough to prevent any influence on the flow

development (see Reference [ST90]). The mesh of 138600 quadrilaterals (with homoge-

neous refinement near the cylinder wall) was partitioned into 15 sub-domains (processors)

and sub-partitioned into 14 interior (local) sub-domains in average (2000 dof’s per local

sub-division). In Figure 3.32 the residual history for several time steps (one Newton

0 1000 2000 3000 4000 500010

−8

10−6

10−4

10−2

100

Iteration counts

resi

dual

nor

m =

||r(

n)||/

||r(0

)||

IISD+ISP nlay

=1

global GMRes / Kdim

=200 / Jacobi prec.

global GMRes / Kdim

=400 / Jacobi prec.

Figure 3.32: Re = 100. Residual history


iteration is considered) is plotted for the unsteady simulation of this flow. Clearly, the

IISD+ISP solver reaches the lower residual tolerances (10−7 vs 10−3) with 70 % less iter-

ations (roughly) compared to Global GMRes iteration with different KSPdim. The lower

tolerances achieved with IISD+ISP are directly related to the accuracy in the solution

and the reduction of the iteration number influences the overall simulation time. In

0 50 100 150−0.35

−0.345

−0.34

−0.335

−0.33

−0.325

−0.32

−0.315

−0.31

−0.305

Time [secs]

CD

visc

global GMRes / Kdim

=200 / Jacobi prec.

global GMRes / Kdim

=400 / Jacobi prec.

IISD+ISP / nlay

=1

Figure 3.33: Re = 100. viscous x-force coefficient

0 50 100 150−0.06

−0.04

−0.02

0

0.02

0.04

0.06

Time [secs]

CLv

isc

global GMRes / Kdim

=400 / Jacobi prec.

global GMRes / Kdim

=200 / Jacobi prec. IISD+ISP / n

lay=1

Figure 3.34: Re = 100. viscous y-force coefficient

Figures 3.33, 3.34 and 3.35 the viscous forces and moment evolution in time are shown.

The solution obtained with both Global Iteration (KSPdim = 400) and IISD+ISP are in

agreement with the experimental results reported by Braza et al., and with the numerical


0 50 100 150−8

−6

−4

−2

0

2

4

6x 10−3

Time [secs]

z−m

omen

t visc

global GMRes / Kdim

=400 / Jacobi prec.

global GMRes / Kdim

=200 / Jacobi prec.

IISD+ISP / nlay

=1

Figure 3.35: Re = 100. viscous z-moment coefficient

results shown in References ([BCHM86, ST90, BLST90]). However, if Global Iteration

is stopped prematurely (i.e., KSPdim = 200, see Figure 3.32) the solution is no more

accurate and divergences of 50% are observed (see Figure 3.35). Although the residuals

are lowered between two and three orders of magnitude the error in the solution of the

linear system is not accurate enough. Remember that to pass from 200 iterations to 400

iterations in a GMRes scheme the computational resources (CPU and consumed memory)

are considerably increased due to the storage requirements for the Krylov subspace bases.

The CPU consumed time and core memory requirements for each time step were: 28

secs and 100 Mbytes for Global GMRes with KSPdim = 200, 107.5 secs and 152 Mbytes

for Global GMRes with KSPdim = 400 and 18.5 secs and 98 Mbytes for IISD+ISP. The

vorticity field for the LES flow around a 3D cylinder at Re = 5 · 104 is shown in Fig-

ure 3.36. This well known test case shows the need to use Global GMRes with a high

Krylov subspace dimension size to reach the accuracy of IISD+ISP solver. Usually, the

user is pushed to adopt a small value of Krylov subspace dimension in order to reduce

memory and CPU time consumed. The sensitivity of the results obtained using Global

GMRes to the Krylov subspace dimension size is high, and with no a priori knowledge

of this dimension the uncertainties in the results tend to be high. In summary, Global

GMRes iteration makes the simulation more user dependent.


Figure 3.36: 3D LES flow at Re = 5 · 104. Top: initial state, bottom: pseudo-stationary

state

3.2.6 Navier-Stokes Flow Using the Fractional Step Scheme.

The Lid Driven Cavity

A test of the disaggregated method was performed on two-dimensional unit cavity flow

at Re = 1000. This is a test that has been computed extensively in the past and it is well

understood (see Reference [GGS82] for a detailed description of this example).


Disaggregated Scheme

Fractional step methods for the incompressible Navier-Stokes equations have been popular

over the last two decades. The reason for this relies on the computational efficiency of

these methods, basically because of the uncoupling of the pressure from the velocity

components. In Reference [Cod01] the study of computed pressure stability of schemes

that use a pressure Poisson equation was presented. These results are used in this section.

The results to be presented refer to second-order algorithm based on the implicit

(θ = 1) discretization for the viscous and convective terms and a second-order pressure

splitting, leaving the pressure gradient at a given time level in the first step and computing

its increment in the second one.

The time discretization of the problem (3.30) written in a compact matrix form is

M1

∆t(Un+1 −Un) + K(Un+θ)Un+θ + GPn+1 = Fn+θ, (3.40)

DUn+1 = 0, (3.41)

where M is the mass matrix, U is the vector of velocity unknowns, K is the stiffness

matrix, G is the matrix form of the gradient operator, P is the vector of nodal pressures,

D is the matrix form of the divergence operator and F is the vector of source terms.

Superscripts n and n+1 denote variables at time t = n∆t and t = (n+1)∆t, respectively.

The fractional step scheme applied to the fully discrete problem (3.40) and (3.41) is

exactly equivalent to

M1

∆t(Un+1 −Un) + K(Un+θ)Un+θ + γGPn = Fn+θ (3.42)

M1

∆t(Un+1 − Un+1) + G(Pn+1 − γPn) = 0, (3.43)

DUn+1 = 0, (3.44)

where Un+1 is an auxiliary variable and γ is a numerical parameter, whose values of in-

terest are between 0 and 1. The essential approximation K(Un+θ)Un+θ ∼ K(Un+θ)Un+θ

is made, where Un+θ = θUn+1 + (1− θ)Un. Writing Un+1 in terms of Un+1 using (3.43)

and inserting the result in (3.44), the equations to be solved are

M1

∆t(Un+1 −Un) + K(Un+θ)Un+θ + γGPn = Fn+θ, (3.45)

∆tDM−1G(Pn+1 − γPn) = DUn+1, (3.46)


M1

∆t(Un+1 − Un+1) + G(Pn+1 − γPn) = 0. (3.47)

The order of this equations is made according to the sequence of solution, i.e., first for

Un+1, then Pn+1 and finally Un+1. The operator DM−1G in (3.46) can be approximated

by the Laplace operator if the matrix M is approximated by a diagonal matrix.

Test and Results

A structured mesh of 400 × 400 quadrilateral was used for calculations. For Global

GMRes iteration 14 processor and 14 sub-domains sub-divided into local partitions in

order to have 1500 dof’s per sub-division for the IISD+ISP solver were used. Also, for

additive Schwarz scheme method 14 sub-blocks per processor were used. The parameters

∆t = 0.02 and γ = 0.9 (see equation (3.42)) were chosen.

In Figure (3.37) the residual history for the Poisson step for different solvers is shown.

In the predictor (advection-diffusion equation) and projection steps a few iterations are

0 500 1000 1500 2000 250010−6

10−4

10−2

100

Iteration counts

resid

ual n

orm

= ||

r(n)||

/||r(0

)||

0 100 200 300 400

10−1

100

Iteration counts

resid

ual n

orm

= ||r

(n)||

/||r(0

)||

IISD+ISP / 14 procs. / 14 subdomains/proc.

global GMRes (Jacobi preconditioner) / 14 procs.

Additive Schwarz / 14 procs. / 10 sub−blocks.

Figure 3.37: Residual history for Poisson Step

needed to achieve relative low tolerances for these schemes. Nevertheless, in the Poisson

step, the mesh size used leads to a high condition number for CG iteration (i.e., ∝ 1/h2

without preconditioning), so it is necessary to increase the iteration number in order to

avoid spurious oscillation in the solution. If Global CG iteration is stopped at 1000-1200

iterations (Poisson step), where the residual history plot reaches a ‘plateau’, i.e., the


residuals may be considered acceptable, large spurious oscillations appear in the solution

when the steady state is reached. Moreover, it is necessary to overpass the 2400 iterations

to avoid oscillation in all time steps. Although the required memory is not affected

(please recall that in CG iteration, only the last two residual vectors are needed), but the

CPU time grows linearly with the iterations. Using IISD+ISP the amount of memory

to solve the interior direct problem and the preconditioning is chosen (1500 dof’s where

considered for each interior sub-division). The residuals reach low tolerances with few

iterations and the CPU time required for each time step is diminished (although the

local and interface problems are solved). Moreover, it is not necessary to iterate over

the 50 iterations to obtain accurate solutions. Figure (3.38) shows the steady-converged

solution stopping when the iteration reaches 50 using IISD+ISP. The total CPU time

consumed in average for each time step (i.e., predictor, pressure and projection steps)

for Global Iteration (GMRes in predictor step, CG in Poisson and projection step) was

23.61 seconds and for IISD+ISP (with one layer around the interface) solver was 2.13 secs.

Additive Schwarz scheme shows a poor convergence in residuals but it is not necessary to

go beyond 200 iterations to have an acceptable solution. This last scheme uses 26 secs

per time step. The reader could refer to reference [PS05] for a study of the performance

of several preconditioners (including IISD+ISP and Neumann-Neumann preconditioners)

applied to a Poisson’s problem.

Although the residuals for IISD+ISP simulation are higher than those of 1000 iter-

ations of Global CG (see Figure (3.37)), the solution for domain decomposition method

and preconditioning (IISD+ISP) is accurate enough, being oscillatory for Global Iteration.

This behavior can be explained through a study of the error in residuals when iteration

proceeds.

Let b, uk, u0 ∈ RN , A a non-singular matrix such that u∗ = A−1b. Here b is the load

vector and uk the solution at iteration k. Since

rk = b− Auk = Au∗ − Auk = A(uk − u∗) = −Aek, (3.48)

where e is the error in the solution u at the iteration k. Then,

‖ek‖ =∥∥A−1Aek

∥∥ ≤ ∥∥A−1∥∥ ‖Aek‖ =

∥∥A−1∥∥ ‖rk‖ , (3.49)

and

‖r0‖ ≤ ‖A‖ ‖e0‖ , (3.50)

thus,‖ek‖‖e0‖

≤ ‖A−1‖ ‖rk‖‖A‖−1 ‖r0‖

= κ(A)‖rk‖‖r0‖

, (3.51)


where κ(A) is the condition number of A and ‖.‖ is any suitable norm. The division by

‖r0‖ and ‖e0‖ in equation (3.51) normalizes the residuals. In Reference [SDP+03] it is

shown that the condition number is κ(A) ∝ O(1/h2) for Global Iteration and κ(A) ∝O(1/h) for Schur complement domain decomposition methods. Moreover, the IISD+ISP

still reduces the last condition number. Though the error in residuals for IISD+ISP (due

to earlier stopping) is higher than the error in residuals for CG at high iteration count, the

factor that determines the error in the solution (xk) is the distribution of the eigenvalues

of the global matrix and its condition number κ (see equation (3.51)). If the solution of

the Poisson step is not accurate, the error is propagated to the other steps and oscillation

may occur. This problem is stressed with the refinement (i.e., h → 0). The primary

0

0.2

0.4

0.6

0.8

1

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1

y-co

ordi

nate

[m]

x-velocity [m/sec]

Ghia et al.

IISD+ISP Krylov dim.=50

Figure 3.38: Time-converged solution for IISD+ISP solver (Re = 1000)

vortex center was computed to be at (x, y) = (0.531, 0.562) with the coordinate reference

system placed at the bottom left corner of the cavity. IISD+ISP solver compares well

(for the 50 iteration run) with the values reported by Ghia et al. The core memory

used was 45 Mbytes/processor for Global Iteration, 60 Mbytes/processor for IISD+ISP

and 55 Mbytes/processor for the additive Schwarz preconditioner. Recalling that for CG

iteration only the last two residuals are needed, then the memory is not increased with

iterations.

This example allows to emphasize another interesting use of IISD+ISP solver. In

fractional step-like flow solvers the Poisson step has normally the higher CPU time con-


sumption and for ill-conditioned problems this step demands a lot of resources to achieve

a good solution. It is difficult to know right from the start how large the Krylov sub-

space dimension size of conjugate gradient method has to be and this example shows its

strong influence on the final solution. Moreover, with an unusually high Krylov subspace

dimension of 1000-1500, the solution of lid driven square cavity is already unacceptable

making its usage too limited. With IISD+ISP it is possible to strongly reduce these re-

quirements and this fact drastically improves the solution and makes the system solution

less user-dependent.

Some Comments on the Scalability of the IISD+ISP Preconditioner

In article [PS05] the condition number for the preconditioned Poisson and Advection-

Diffusion problems was calculated theoretically for the Interface Strip preconditioner. In

this section the scalability of the Interface Strip preconditioner is studied by showing how

the number of iterations grows with the global size of the problem while keeping the size

of the problem in each processor constant. Stokes flow at Re = 0.01 with monolithic time

integration (time step used is ∆t = 0.1) is considered. The square cavity is divided into

(20 ·nproc)×(20 ·nproc) bilinear elements, with nproc being the number of processors. Inside

each processor, the problem is further sub-divided into 4 sub-domains and the number

of element layers around the interface is kept constant to nlay = 1. In Figure (3.39) the

number of iterations to achieve a relative tolerance of 10−8 in the residuals in a given

time step is shown. It may be observed that the number of iterations saturates for an

increasing problem size, ensuring the scalability of the preconditioner.

2 4 6 8 10 120

20

40

60

80

100

120

num

ber o

f ite

ratio

ns

number of processors

Figure 3.39: Scalability properties


3.2.7 The Wind Flow Around a 3D Immersed Body.

The AHMED Model

Current vehicle design needs a strong background in aerodynamics to improve flow control

via mechanical devices. The complexity involved in the automobile design, specially due

to the great amount of accessories that define its geometry, makes the validation tasks

unaffordable. The Ahmed model is a simple geometric body that retains the main flow

features, specially the vortex wake flow where most part of the drag is concentrated, and

it is a good candidate to be used as a benchmark test. The flow regime of interest for

car designers is fully turbulent. So, a large eddy simulation (LES) turbulence model is

employed (see References [KD03, Sma63]). The aerodynamic forces on road vehicles are

the result of complex interactions between flow separations and the dynamic behavior

of the released vortex wake. The results obtained with two solvers (i.e., IISD+ISP and

Jacobi preconditioned Global GMRes methods) are compared to the detailed flow patterns

previously published by Ahmed and coworkers [ARF84].

The body geometry is defined in [ARF84]. The flow domain chosen is one in which

the body of length L is suspended to 0.05 m to the ground in a domain of 10L×2L×1.5L

in the stream-wise (y), span-wise (x) and stream-normal (z) directions. The boundary

conditions for this problem are: uniform flow at the inlet (given by the Reynolds number),

slip condition on both sides, non-slip condition for the surface of the body and non-slip

condition at the floor. Imposed pressure (zero in this case) is used at the outflow boundary

condition.

Ahmed Body: Numerical Results for Very Low Reynolds Number

First, consider the steady Stokes flow (solved using incompressible Navier-Stokes model at

Re = 0.1) around the Ahmed body. The inlet (free stream) condition is Re = 0.1 with no

transversal velocity and the pressure, p = 0 atm, is imposed at the outflow wall (located

far enough from the body). In this test a non-structured tetrahedral mesh is used in the

whole flow domain and a three-layer structured mesh of wedge (prismatic) elements is

built for capturing details at the boundary layers. The body surface mesh contains 90606

nodes while the boundary layer mesh has 180600 elements and the tetrahedral mesh has

1322876 elements. The tests were carried out on 15 processors. For IISD+ISP solver 2000

dof’s were considered for each local sub-division.

The residual history for several Newton loops is shown in Figure 3.40. The calculated

forces and moments for both solvers are shown in Figures 3.41 to 3.43. Clearly, IISD+ISP

converges to the steady solution in one Newton iteration whereas Global Iteration needs


0 1000 2000 3000 4000 5000 6000

10−10

10−8

10−6

10−4

10−2

100

Iteration counts

resid

ual n

orm

= ||

r(n)||

/||r(0

)||

global GMRes (Jacobi preconditioner) / Kdim=300

IISD+ISP / nlay=1

Figure 3.40: Stokes Flow. Residual history (max. of 100 Newton iterations)

0 2 4 6 8 100.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Newton iterations

C D

global GMRes (Jacobi prec.) / Kdim=350

IISD+ISP / nlay=1

0 2 4 6 8 10−7

−6.5

−6

−5.5

−5

−4.5

−4 x 10−4

Newton iterations

C Y


IISD+ISP / nlay=1

Figure 3.41: Stokes Flow. Force and moment coefficients

more iteration counts to achieve convergence. This behavior is directly related to the

CPU time needed to obtain converged solution in a simulation.

The CPU consumed time and core memory requirements (in average) for each Newton

iteration were: 185.1 secs and 443 Mbytes for Global GMRes with KSPdim = 300 and

64.18 secs and 588 Mbytes for IISD+ISP.

Ahmed Body: Numerical Results for High Reynolds Number

In this section the unsteady incompressible Navier-Stokes simulation for the flow around

the Ahmed body for Re = 1000 is shown. The same architecture, mesh and partitions of

the previous example were used. The initial state used is the steady converged solution of


0 2 4 6 8 10−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

Newton iterations

C L


IISD+ISP / nlay=1

0 2 4 6 8 10−3

−2.8

−2.6

−2.4

−2.2

−2

−1.8

−1.6

−1.4

−1.2 x 10−4

Newton iterations

x−m

omen

t coe

fficie

nt

IISD+ISP / nlay=1



0 2 4 6 8 10−0.2

−0.18

−0.16

−0.14

−0.12

−0.1

−0.08

−0.06

Newton iterations

y−m

omen

t coe

fficie

nt


IISD+ISP / nlay=1

0 2 4 6 8 10−5

−4

−3

−2

−1

0

1

2

3 x 10−4

Newton iterations

z−m

omen

t coe

fficie

nt


IISD+ISP / nlay=1


the previous example (IISD+ISP case). The Smagorinsky model (with the Smagorinsky

parameter equal to 0.18) is used for the LES prediction of the turbulent effects.

The Residual history for several time steps is shown in Figure 3.44. The calculated

force and moment coefficients for both solvers (Re = 1000.) are shown in Figures 3.45

to 3.47. The scale of the force and moment coefficient plots is dominated by the bad

approximation and oscillations in the solution obtained with the preconditioned Global

GMRes iterations. The calculated forces with IISD+ISP solver converge very fast to the

values reported in the literature [ARF84].

The CPU consumed time and core memory requirements (in average) for time step in

this test were: 186 secs and 460 Mbytes for Global GMRes iteration with KSPdim = 300

and 114.5 secs and 630 Mbytes for the ISD+ISP preconditioner.

The ‘friction lines’ for the Navier-Stokes flow at Re = 4.25 · 106 are shown in fig-


0 1000 2000 3000 4000 500010−8

10−6

10−4

10−2

100

Iteration counts

resid

ual n

orm

= ||

r(n)||

/||r(0

)||

global GMRes Kdim=300 (Jacobi prec.)IISD+ISP / nlay=1

Figure 3.44: Re = 1000. Residual history (100 time steps, 10 seconds of simulation)

0 10 20 30 40 50 60 70 80 90 100−20

−15

−10

−5

0

5

10

15

20

25

Time steps

C D


0 10 20 30 40 50 60 70 80 90 100−4

−2

0

2

4

6

8

Time steps

C Y


Figure 3.45: Re = 1000. Force and moment coefficients

ure (3.48) (IISD+ISP case). This 3D example shows that even when the final solution

of Global GMRes solver seems to be similar to IISD+ISP, the former needs more time

steps or more Newton iterations to reach the final solution. This fact is highlighted in the

Stokes flow example (Re = 0.1). For medium Reynolds numbers the Global GMRes has

a strong oscillatory behavior until it reaches the good solution, making the scheme under

external perturbations more unstable.


0 10 20 30 40 50 60 70 80 90 100−60

−40

−20

0

20

40

60

Time steps

C L


0 10 20 30 40 50 60 70 80 90 100−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Time steps

x−m

omen

t coe

fficie

nt



0 10 20 30 40 50 60−40

−30

−20

−10

0

10

20

30

40

Time steps

y−m

omen

t coe

ffici

ent

global gmres 300 prec. jacobiiisd+isp nlay=1

0 10 20 30 40 50 60−4

−3

−2

−1

0

1

2

Time steps

z−m

omen

t coe

ffici

ent

global gmres 300 prec. jacobiiisd+isp nlay=1


Figure 3.48: Re = 4.25e6. Friction lines

3.3. Conclusions 97

3.3 Conclusions

This section emphasizes the quality and the efficiency of solver schemes for CFD problems.

Both criteria should be evaluated together to analyze the performance of a simulation.

Reasonable efficiency might not be very significant if the solution is not accurate enough

for the final purpose. Several examples presented in this section shed light on the patholo-

gies that may appear when solving large scale CFD problems by means of fully iterative

solvers with limited computational resources.

Numerical experiments of several physical (real) problems were carried out to show

their convergence properties, the computation time and memory requirements using both

monolithic and disaggregated schemes. These tests showed that it is not always possible

to obtain an acceptable solution for the problem using classical Global Krylov methods.

Moreover, for some problems Krylov dimension and Newton iterations need to be enlarged

to obtain an accurate solution making its usage more user-dependent.

Domain Decomposition techniques, especially the Schur Complement domain decom-

position using the Interface Strip Preconditioner are suitable in order to achieve accurate

solutions efficiently. In all cases, the performance of IISD+ISP is decisive when it comes

to assigning computational resources to solve a time step in the simulation of a prob-

lem. Also, the ISP preconditioner is easy to construct as it does not require any special

calculation (it can be assembled with a subset of sub-domain matrix coefficients). It is

much less memory-consuming than classical preconditioners such as Neumann-Neumann

as References [SDP+03] and [PS05] shown. Moreover, it permits to decide how much

memory to assign for preconditioning purposes.

The ISP preconditioner is well suited for flows with high Reynolds numbers where

the contribution of advective terms are predominant in the governing equations, while

it is capable to handle diffusion-dominated regions well. Furthermore, IISD+ISP is a

good alternative to treat problems where domain discretization presents high refinement

gradients.

Part II

Applications and Usage

99

Chapter 4

Dynamic Boundary Conditions in

CFD

No one is anyone,

one single immortal man is all men.

Like Cornelius Agrippa,

I am god, I am hero,

I am philosopher, I am demon

and I am world,

which is a tedious way of saying

that I do not exist.

‘The Immortal’, Jorge Luis Borges

The number and type of boundary conditions to be used in the numerical modeling

of fluid mechanics problems is normally chosen according to a simplified analysis of the

characteristics, and also from the experience of the modeler. The problem is harder at

input/output boundaries which are, in most cases, artificial boundaries, so that a bad de-

cision about the boundary conditions to be imposed may affect the precision and stability

of the whole computation. For inviscid flows, the analysis of the sense of propagation

in the normal direction to the boundaries gives the number of conditions to be imposed

101

102 Chapter 4. Dynamic Boundary Conditions in CFD

and, in addition, the conditions that are ‘absorbing’ for the waves impinging normally to

the boundary. In practice, it amounts to counting the number of positive and negative

eigenvalues of the advective flux Jacobian projected onto the normal. The problem is

still harder when the number of incoming characteristics varies during the computation,

and the correct treatment of these cases poses both mathematical and practical problems.

One example considered here is a compressible flow where the flow regime at a certain

part of an inlet/outlet boundary can change from subsonic to supersonic and the flow

can revert. In this chapter the technique for dynamically imposing the correct number of

boundary conditions along the computation, using Lagrange multipliers and penalization,

is discussed and several numerical examples are presented.

4.1 Introduction

Deciding how many and which boundary conditions to impose at each part of an artificial

boundary is often a difficult problem. This decision is taken from the number of incoming

characteristics n+ and the quantities known for each problem. On one hand, if the number

of conditions imposed on the boundary is in excess they are absorbed through spurious

shocks at the boundary. On the other hand if fewer conditions are imposed, then the

problem is mathematically ill posed. Even if the number of imposed boundary conditions

is correct, this does not guarantee that the boundary conditions are non-reflective.

When dealing with models in infinite domains one has to introduce an artificial bound-

ary distant as far as possible from the region of interest. The simplest choice is to impose

a boundary condition assuming that the flow far from the region of interest is undis-

turbed. However, one has the freedom of choosing the boundary condition so as to give

the best solution for a given position of the boundary. Boundary conditions that tend to

give the solution as if the domain were infinite are called generally ‘absorbing’ (ABC) or

‘non-reflective’ (NRBC). ABC’s tend to give a better solution for a given position of the

artificial boundary or, in other words, they allow to put the artificial boundary closer to

the region of interest for a given admissible error. Of course, the advantage of putting the

artificial boundary closer to the region of interest is the reduction in computational cost.

However, in some cases, like for instance the solution of the Helmholtz equation on exte-

rior domains, using absorbing boundary conditions is required since using a non absorbing

boundary condition (like Dirichlet or Neumann) may lead to a lack of convergence of the

problem, because these conditions are completely reflective and therefore wave energy is

trapped in the domain, producing false resonance modes.

There are basically two approaches for the design of ABC’s, global and local. Global


boundary conditions are usually more accurate but expensive. In the limit, a global ABC

may reproduce the effect of the whole external problem onto the boundary, i.e., even

maintaining a fixed position of the artificial boundary the ABC may give a convergent

solution while refining the interior mesh. In general these ABC’s are non-local, i.e., its

discrete operator is a dense matrix. Global boundary conditions exist and are popular for

the simpler linear operators, like potential flow problems and frequency domain analysis of

wave problems, like the Helmholtz equations for acoustics or the Maxwell equations[GK90,

GK89, BR92, HH92, SDEI97, Hag87].

The discrete operator for local absorbing boundary conditions is usually sparse but

has a lower order of accuracy and, in general, it is needed to bring the artificial boundary

to infinity while refining meshes in order to make the whole algorithm convergent. These

kinds of ABC’s are popular for more complex non-linear fluid dynamic problems, like

compressible or incompressible, Navier-Stokes equations or the inviscid Euler equations.

An excellent review has been written by Tsynkov [Tsy98].

In order to have an ABC not any n+ conditions must be imposed at the boundary

but exactly those n+ corresponding to the incoming characteristics. This can be deter-

mined through an eigenvalue decomposition problem of the advective flux Jacobian at the

boundary.

In many cases the number of incoming characteristics may change during the compu-

tation, for instance in compressible flow it is common that the flow goes from subsonic

to supersonic in certain parts of the outlet boundary. In 3D this means passing from one

imposed boundary condition to none.

In more complex problems one can go through the whole possible combinations of

regimes: subsonic inlet, supersonic inlet, subsonic outlet, supersonic outlet. A typical

case where this can happen is the free fall of a blunt symmetrical object like an ellipse,

for instance. If the body starts from rest, it will initially accelerate and, depending on

the size and relation between the densities of the body and the surrounding atmosphere

it may reach the supersonic regime. As the body falls, even at subsonic speeds, its angle

of attack tends to increase until eventually it stalls, and then falls towards its rear part,

and repeating the process in a characteristic movement that recalls the fall of tree leaves.

During the falling, the speed of the object varies periodically, accelerating when the angle

of attach is small and the body experiences little drag, and decelerating when the angle of

attack is large. For a supersonic fall the regime may change from supersonic to subsonic

and back during the fall. In addition, if the problem is solved in a system of coordinates

attached to the body, the unperturbed flow may come from every direction relative to

the body’s axis. In this way the regime and direction of the flow at a given point of the


boundary may change through the whole possible combinations.

Another example is the modeling of the ignition of a rocket exhaust nozzle. In this

case the condition at the outlet boundary changes from rest to supersonic flow as the

shock produced at the throat reaches the exterior boundary.

For transport of scalars this behavior may happen if the transport velocity varies in

time and the flow gets reverted at the boundary. One such situation is when modeling the

transport of a scalar like smoke or contaminant concentration in a building with several

openings under an exterior wind. Assume that the concentration of solid particles or

contaminant is so low that its influence on the fluid is negligible so that we can solve

first the movement of the fluid inside the building and then a transport equation for the

scalar, taking the velocity of the fluid as the transport velocity. As the flow in the interior

fluctuates, the normal component of velocity at a given opening may reverse direction.

The change of the number of imposed boundary conditions at a given point of the

boundary is hard to implement from the computational point of view since it involves

the change of the structure of the Jacobian matrix. The solution proposed here is to

impose these conditions through Lagrange multipliers or penalization techniques. The

main objective of this section is to explain how these variable boundary conditions may

be implemented through Lagrange multipliers or penalization techniques, to discuss nu-

merical aspects related to the use of these techniques, to discuss specific issues relative to

the physical problems described above, and to show some numerical examples.

4.2 General Advective-Diffusive Systems of Equations

Consider an advective diffusive system of equations in conservative form

∂H(U)

∂t+∂Fc,j(U)

∂xj

=∂Fd,j(U,∇U)

∂xj

+ G. (4.1)

Here U ∈ IRn is the state vector, t is time, Fc,j,Fd,j are the advective and diffusive fluxes

respectively, G is a source term including, for instance, gravity acceleration or external

heat sources, and xj are the spatial coordinates.

The notation is standard, except perhaps for the ‘generic enthalpy function’ H(U).

The inclusion of the enthalpy function allows the inclusion of conservative equations in

terms of non-conservative variables. Some well-known advective diffusive systems of equa-

tions may be cast in this general setting as follows.

4.2. General Advective-Diffusive Systems of Equations 105

4.2.1 Linear Advection-Diffusion Model

For instance, the heat advection-diffusion equation in terms of temperature can be put in

this form through the definitions

U = T,

H(U) = ρCpT.

Fc,j(U) = ρCpTuj,

Fd,j(U,∇U) = −qj = −k ∂T∂xj

,

(4.2)

where ρ is the density, Cp the specific heat, u a given velocity field, T is the temperature

(the unknown field), q the heat flux vector and k the thermal conductivity of the medium.

4.2.2 Gas Dynamic Equations

Gas dynamic equations of a compressible flow can be put in conservative form with the

following definitions

Up = [ρ,u, p]T ,

U = Uc = [ρ, ρu, ρe]T ,

H(U) = U,

Fc,jnj =

ρ(u · n)

ρu(u · n) + pn

(ρe+ p)(u · n)

,

Fd,j(U,∇U)nj =

0,

T · nTikukni − qini

.

(4.3)

Note that even if the equations are put in terms of conservative variables, the diffusive

and convective fluxes are expressed in term of the primitive variables Up = [ρ,u, p]T .

However, the fluxes can be thought as implicitly depending on the conservative variables,

since the relation Uc(U) is one to one. Now, the conservation equations can be also

thought in terms of any other set of variables, for instance the primitive variables, if we

introduce the ‘enthalpy function’ H(Up) = Uc(Up).


4.2.3 Shallow Water Equations

Shallow water equations describes the open flow of fluids over regions whose characteristic

dimensions are much larger than the depth. In this case

Up = [h,u]T ,

U = Uc = [h, hu]T ,

H(U) = U,

Fc,jnj =

[h(u · n)

h(u · n)u + 1/2gh2 I

],

(4.4)

where h is the fluid depth, u the velocity vector, Up,Uc the primitive and conservative

variables and g the gravity acceleration. We assume that the height of the bottom with

respect to a fixed datum is constant. If this is not so, additional terms must be included

in the source term G, but this is irrelevant for the absorbing boundary condition issue.

4.2.4 Channel Flow

Flow in a channel can be cast in advective form as follows

Up = [h, u]T ,

U = Uc = [A,Q]T ,

H(U) = U,

F =

[Q

Q2/A+ F

],

(4.5)

where h and u are water depth and velocity (as in the shallow water equations). Here

A(h) is the section of the channel occupied by water for a given water height h and then

defines the geometry of the channel. For instance

• Rectangular channels: A(h) = wh, w=width.

• Triangular channels: A(h) = 2h2 tan θ/2; with θ=angle opening.

• Circular channel:

A(h) =

∫ h

h′=0

√2Rh− h2 dh

= θR2 − w(h)(R− h)/2

(4.6)

where R is the radius of the channel, w(h) = 2√

2Rh− h2 is the waterline for a

given water height and θ = atan[w/(2(R− h))] is the angular aperture.

4.3. Variational Formulation 107

The variable Q = Au is the water flow rate and F (h) is a function defined by

F (h) =

∫ h

h′=0

A(h′) dh′. (4.7)

Again, for the sake of simplicity, we restrict the analysis to the case of constant channel

section and channel depth. For more general situations, other terms that can be included

in the source and diffusive terms are present, not needed for the discussion of absorbing

boundary conditions. For rectangular channels the equations reduce to those for one

dimensional shallow water equations. Channel flow is very interesting since it is in fact a

family of different 1D hyperbolic systems depending on the area function A(h).

Figure 4.1 shows the absorption of an initial perturbation in the water height when

the two resulting impinging waves reach the artificial boundaries. The Figure shows (from

left-top to right-bottom) the water height at times 0.0, 2.0, 3.0, 5.0, 5.5 and 5.95 secs,

respectively.

4.3 Variational Formulation

The weighted variational form for this kind of systems is to find Uh ∈ Sh such that, for

every Wh ∈ Vh,∫Ω

Wh ·(∂H(Uh)

∂t+∂Fc,j

∂xj

−G

)dΩ +

∫Ω

∂Wh

∂xj

Fd,j dΩ−∫

Γh

Wh ·Hh dΓ

+

nelem∑e=1

∫Ω

τe ATk

∂Wh

∂xk

·(∂H(U)

∂t+∂Fc,j(U)

∂xj

− ∂Fd,j(U,∇U)

∂xj

−G

)dΩ = 0,

(4.8)

whereSh =

Uh|Uh ∈ [H1h(Ω)]m, Uh

∣∣Ωe ∈ [P 1(Ωe)]m, Uh = g at Γg

Vh =

Wh|Wh ∈ [H1h(Ω)]m, Uh

∣∣Ωe ∈ [P 1(Ωe)]m, Uh = 0 at Γg

(4.9)

are the space of interpolation and weight function respectively, τe are stabilization param-

eters (a.k.a. ‘intrinsic times’ ), Γg is the Dirichlet part of the boundary where U = g is

imposed, and Γh is the Neumann part of the boundary where Fd,jnj = H is imposed.

4.4 Absorbing Boundary Conditions

For steady simulations using time-marching algorithms, it can be shown that the error

going towards the steady state propagates like waves, so that absorbing boundary con-

ditions help to eliminate the error from the computational domain. In fact, it can be


Figure 4.1: Shallow water flow and wave absorption at artificial boundaries

4.4. Absorbing Boundary Conditions 109

shown that for strongly advective problems, absorption at the boundaries is usually the

main mechanism of error reduction (the other mechanism is physical or numerical dis-

sipation in the interior of the computational domain). It has been shown that in such

cases the rate of convergence can be directly related to the ‘transparency’ of the boundary

condition[BSI92].

In general, absorbing boundary conditions are based on an analysis of the characteristic

waves. A key point is to determine which of them are incoming and which are outgoing.

Absorbing boundary conditions exist from the simplest first order ones based on a plane

wave analysis at a certain smooth portion of the boundary (as will be described below),

to the more complex ones that tend to match a full analytic solution of the problem in

the exterior region with the internal region.

In this part of this thesis we will concentrate in the usage of absorbing boundary

conditions in situations where the conditions at the boundary change, so as the number

of incoming and outgoing characteristic waves varies during the temporal evolution of the

problem, or even when the conditions at the boundary are not well known a priori.

4.4.1 Advective-Diffusive Systems in 1D

Consider a pure advective system of equations in 1D, i.e., Fd,j ≡ 0

∂H(U)

∂t+∂Fc,x(U)

∂x= 0, in [0, L]. (4.10)

If the system is ‘linear’, i.e., Fc,x(U) = AU, H(U) = CU (A and C do not depend on

U), then we obtain a first order linear system

C∂U

∂t+ A

∂U

∂x= 0. (4.11)

The system is ‘hyperbolic’ if C is invertible, C−1A is diagonalizable with real eigenvalues.

If this is the case, we can make the following eigenvalue decomposition for C−1A

C−1A = SΛS−1, (4.12)

where S is real and invertible and Λ is real and diagonal. If we define new variables

V = S−1U, then (4.11) becomes

∂V

∂t+ Λ

∂V

∂x= 0. (4.13)

Now, each equation is a linear scalar advection equation

∂vk

∂t+ λk

∂vk

∂x= 0, (no summation over k). (4.14)

Here vk are the ‘characteristic components’ and λk are the ‘characteristic velocities’ of

propagation.


4.4.2 Linear 1D Absorbing Boundary Conditions

Assuming λk 6= 0, the absorbing boundary conditions are, depending on the sign of λk,

if λk > 0: vk(0) = vk0; no boundary condition at x = L

if λk < 0: vk(L) = vkL; no boundary condition at x = 0(4.15)

This can be put in compact form as

Π+V (V − V0) = 0; at x = 0

Π−V (V − VL) = 0; at x = L

(4.16)

where Π±V are the projection matrices onto the right/left-going characteristic modes in the

V basis,

Π+V,jk =

1; if j = k and λk > 0

0; otherwise,

Π+ + Π− = I.

(4.17)

It can be easily shown that they are effectively projection matrices, i.e., Π±Π± = Π± and

Π+Π− = 0. Coming back to the boundary condition at x = L in the U basis, we have

Π−V S−1(U− UL) = 0 (4.18)

or, multiplying by S at the left

Π±U (U− U0,L) = 0, at x = 0, L, (4.19)

where

Π±U = SΠ±

V S−1, (4.20)

are the projection matrices in the U basis. These conditions are completely absorbing for

1D linear advection system of equations (4.11).

The rank of Π+ is equal to the number n+ of positive eigenvalues, i.e., the number of

right-going waves. Recall that the right-going waves are incoming at the x = 0 boundary

and outgoing at the x = L boundary. Conversely, the rank of Π− is equal to the number

n− of negative eigenvalues, i.e., the number of left-going waves (incoming at x = L and

outgoing at the x = 0 boundary).

Numerical Example. 1D Compressible Flow

We consider the solution of 1D compressible flow in 0 ≤ x ≤ L = 4. The unperturbed

flow has a Mach number of 0.5 and at t = 0 there is a perturbation in the form of a

Gaussian as follows

U(x, t = 0) = Uref + ∆U e(x−x0)/σ2

, (4.21)


where ρref = 1, uref = 0.5, pref = 0.714, (Maref = 0.5) ∆ρ = ∆p = 0, ∆u = 0.1, R = 1,

x0 = 0.8 and σ = 0.3. The evolution of this perturbation is simulated using N = 50 equal-

spaced finite elements (h = L/N = 0.08) with SUPG stabilization and Crank-Nicolson

temporal scheme with ∆t = 0.05 (CFL number ≈ 0.84). As the flow is subsonic we

have to impose two conditions at inlet and one at outlet. We will compare the results

using standard and absorbing boundary conditions at outlet (x = L), while imposing

non-absorbing ρ = ρref and u = uref at inlet (x = 0). In Figure 4.2 we see the evolution in

time (in the form of an elevation view) of the velocity when using the condition p = pref

at outlet, while in Figure 4.4 we see the results when using first order linear absorbing

boundary conditions based on the unperturbed state. On one hand, we see that without

absorbing boundary condition the perturbation reflects at both boundaries. Even after

t = 40 a significant amount of perturbation is still inside the domain. At this point the

perturbation has reflected four times at the boundaries. On the other hand, when using

the absorbing boundary condition the perturbation is almost completely absorbed after

it hits the outlet boundary. Note that the absorption is performed in two steps. First the

perturbation splits in two components, one propagating downstream an another upstream.

The first hits the outlet boundary and is absorbed, the other travels backwards, reflects

at the inlet boundary and then travels to the outlet boundary, where it hits at t = 4.5.

This shows that in 1D it is enough with only one absorbing boundary to have a strong

dissipation of energy.

Figure 4.2: Temporal evolution of axial velocity in 1D gas dynamics problem without

absorbing boundary condition at outlet


Figure 4.3: Temporal evolution of axial velocity in 1D gas dynamics problem with ab-

sorbing boundary condition at outlet

4.4.3 Multidimensional Problems

For multidimensional problems we can make a simplified 1D analysis in the normal direc-

tion to the local boundary and with the flux Jacobian A in equation (4.12) replaced with

its projection onto the exterior normal n, as follows

Π−n (U− U) = 0,

Π−n = Sn Π−

V n S−1n ,

(Π−V n)jk =

1; if j = k and λj < 0,

0; otherwise.

C−1An = SnΛnS−1n , (Λn diagonal),

An = Alnl.

(4.22)

These conditions are perfectly absorbing for perturbations reaching the boundary normal

to the surface. For perturbations not impinging normally, the condition is partially ab-


||∆

U||

time

1e−14

1e−12

1e−10

1e−08

1e−06

0.0001

0.01

1

0 5 10 15 20 25 30 35 40 45 50

not absorbing

absorbing

Figure 4.4: Rate of converge of 1D gas dynamics problem with and without absorbing

boundary conditions

1e−16

1e−14

1e−12

1e−10

1e−08

1e−06

0.0001

0.01

1

0 100 200 300 400 500 600 700 800 900 1000

absorbing linear

absorbing non−linear

p=cnst (non absorbing)

||∆

U||

time step

Figure 4.5: Rate of converge of 1D gas dynamics problem in full non-linear regime with

different kind of absorbing boundary conditions


sorbing, with a reflection coefficient that increases from 0 at normal incidence to 1 for

tangential incidence.

4.4.4 Absorbing Boundary Conditions for Nonlinear Problems

If the problem is non-linear, as the gas dynamics or shallow water equations, then the

flux Jacobian A is a function of the state of the fluid, and then the same happens for the

projection matrices Π±. If we can assume that the flow is composed of small perturbations

around a reference state Uref , then we can compute the projection matrix at the state

Uref

Π(Uref)−n (U−Uref) = 0. (4.23)

However, as long as the fluid state departs from the reference value the condition becomes

less and less absorbing.

Numerical Example. Varying Section Compressible 1D Flow

Consider a one-dimensional flow in a tube with a contraction of 2:1. The inlet Mach

number is 0.2 and the variation of area along the tube axis is

A(x) = A0

(1− C

tanh(x− Lx/2)

Lc

), (4.24)

where A0 is some (irrelevant) reference area, C is a constant given by C = (α−1)/(α+1),

α = Ain/Aout is the area ratio and Lc = 0.136 is a parameter controlling the width of

the transition. We impose ρ and u at the inlet and consider different outlet conditions,

namely

• non-absorbing, p =cnst,

• absorbing linear (see (4.19)), and

• absorbing non-linear (see (4.23)).

In Figure 4.4 and 4.5 we see the evolution in time of the state vector increment (‖∆U‖)for different absorbing and non-absorbing boundary conditions.

4.4.5 Riemann Based Absorbing Boundary Conditions

Suppose that we take for a small interval t ≤ t′ ≤ t + ∆t the state U(t) as the reference

state, then, during this interval we can take Π−(U(t)) as the projection operator onto

the incoming characteristics and the absorbing boundary conditions are


Figure 4.6: Georg Friedrich Bernhard Riemann (1826–1866)

Π−(U(t)) (U(t′)−U(t)) = 0. (4.25)

But regarding the equivalent expression (4.18) we can see that it can be written as

lj(U) · dU = 0, if λj < 0, (4.26)

where lj is the j-th left eigenvalue of the normal flux Jacobian. Note that, as lj is a

function of U, this is a differential form on the variable U. If it happens that this is a

exact differential, i.e.,

µ(U) lj(U) · dU = dwj(U), (4.27)

for some non-linear function wj and an ‘integration factor’ µ(U), then we could impose

wj(U) = wj(Uref), (for wj an incoming char.) (4.28)

which would be an absorbing boundary condition for the whole non-linear regime. The

functions wj are often referred as ‘Riemann invariants’ (RI) for the flux function.

For the 2D shallow water equations the Riemann invariants are well known (see Ref-

erence [San01]). For 1D channel flow, Riemann invariants are known for a few channel

shapes (rectangular and triangular). For general channel sections they are not known and

in addition there is not a general numerical method for computing them. They could

be computed by numerical integration of (4.27) along a path in state space, but the

integration factor is not known.

Riemann invariants are known for the shallow water equations

w± = u · n± 2√gh, (4.29)


and for channel flow they are known only for rectangular and triangular channel shapes.

For the triangular case RI are

w± = u · n± 4√gh. (4.30)

For the gas dynamics equations, the well known Riemann invariants are invariant only

under isentropic conditions, so that they are not truly invariant. They are

w± = u± 2c

γ − 1. (4.31)

4.4.6 Absorbing Boundary Conditions Based on Last State

While integrating the discrete equations in time, we can take the state of the fluid in the

previous state as the reference state

Π−(Un) (Un+1 −Un) = 0. (4.32)

It is clear that the assumption of linearization is well justified, since in the limit of ∆t→ 0

we should have Un+1 ≈ Un. In fact, (4.32) is equivalent, for ∆t → 0 to (4.26), so that

if Riemann invariants exist, then this scheme preserves them in the limit ∆t → 0 and

∆x→ 0. We call this strategy ULSAR (for Use Last State As Reference).

However, if this scheme is used in the whole boundary, then the flow in the domain is

only determined by the initial condition, and it can drift in time due to numerical errors.

Also if we look for a steady state at a certain regime, one has no way to guarantee that

that regime will be obtained. For instance, if we want to obtain the steady flow around an

aerodynamic profile at a certain Mach number, then we can set the initial state with a non

perturbed constant flow at those conditions, but we cannot assure that the final steady

flow will preserve that Mach number. In practice we often use a mix of the strategies, with

linear boundary conditions imposed at inlet regions and absorbing boundary conditions

based on last state on the outlet regions.

Numerical Example. ULSAR Strategy Keeps RI Constant

Consider a 1D compressible flow example, as in §4.4.2, with ρref = 1, uref = 0.2, pref =

0.714, (Maref = 0.2), ∆ρ = ∆p = 0, ∆u = 0.6, R = 1, x0 = 0.5L = 2 and σ = 0.3. Note

that this represents a perturbation in velocity that goes from Ma =0.2 to 0.8, so that full

non-linear effects are evidenced. The evolution of this perturbation is simulated using

N = 200 equal-spaced finite elements (h = L/N = 0.08) with SUPG stabilization and

Crank-Nicolson temporal scheme with ∆t = 0.02 (CFL number ≈ 1.2). All values are


made nondimensional by selecting L, ρref and uref as reference values for length, density

and velocity. Absorbing boundary conditions based on the ULSAR strategy are applied at

both ends x = 0, L. The values of the Riemann invariants (4.31) are computed there and

they are plotted in Figure 4.7. It can be seen that the incoming RI (the right going w+) is

kept approximately constant at the left boundary x = 0 and the same happens, mutatis

mutandis, at the other boundary x = L. Convergence history is shown in Figure 4.8.

Note that absorption is very good, despite the full non-linear character of the flow.

w−

w+

time

x=0

x=L

left

−goi

ng R

I

x=0 time

righ

t−go

ing

RI

x=L

0 2 4 6 8 10 12 14 16

−4.8

−4.7

−4.6

−4.5

−4.4

−4.3

0 2 4 6 8 10 12 14 16

5.1

5.2

5.3

5.4

5.5

5.6

5.7

5.8

Figure 4.7: Riemann invariants at boundaries with ULSAR ABC’s

4.4.7 Imposing Nonlinear Absorbing Boundary Conditions

In this section we discuss how the absorbing boundary conditions can be integrated in a

numerical code. For linear systems, the discrete version of equation (4.11) is of the form

CUn+1

0 −Un0

∆t+ A

Un+11 −Un

0

h= 0,

CUn+1

k −Unk

∆t+ A

Un+1k+1 −Un

k−1

2h= 0, k ≥ 1

(4.33)


||∆U

||

time[sec] 0.0001

0.001

0.01

0.1

1

0 1 2 3 4 5 6 7 8 9 10

Figure 4.8: Convergence history when using with ULSAR ABC’s

where Unk is the state at grid point k at time tn = n∆t. We assume a constant mesh step

size of h, i.e., xk = kh, and assume a boundary at mesh node x0 = 0. We have made a lot

of simplifications here, no source or upwind terms, and a simple discretization based on

centered finite differences was used. Alternatively, it can be thought as a pure Galerkin

FEM discretization with mass lumping. If the projector onto incoming waves Π+U has

Figure 4.9: Boris Grigorievich Galerkin (1871–1945)

rank n+ = n, then Π+U = I and the absorbing boundary condition reduce to U = Uref

(being Uref a given value or Un0 for ULSAR). This happens for instance in a supersonic

inlet for gas dynamics or an inlet boundary for linear advection. In this case we simply

replace the balance equation for the boundary node (the first equation in (4.33)) with the

absorbing condition U = Uref , keeping the balance between equations and unknowns.


Conversely, if the projector onto incoming waves Π+U has rank n+ = 0, then Π+

U = 0

and the absorbing boundary condition reduce to not imposing anything. This happens

for instance in a supersonic outlet for gas dynamics or an outlet boundary for linear

advection. In this case we simply discard the absorbing condition U = Uref . Again the

number of equations and unknowns is maintained.

The case is more complicated when 0 < n+ < n. We cannot simply add the absorbing

condition (either (4.19), (4.28) or (4.32)), because we can neither discard the boundary

balance equation nor keep it.

There are at least two strategies for imposing this non-linear boundary condition.

One is to replace the boundary balance equation for the outgoing waves with a null

first derivative condition. Then a discrete version can be generated with finite difference

approximations. (This requires, however, a structured mesh at least near the boundary).

The other is to resort to the use of Lagrange multipliers or penalization techniques. One

advantage of using Lagrange multipliers or penalization is that not only the boundary

conditions coefficients can easily be changed for non-linear problems, but also the number

of imposed boundary conditions. This is important for problems where the number of

incoming characteristics can not be easily determined a priori, or for problems where the

flow regime is changing from subsonic to supersonic, or the flow reverts. In the rest of

this section we will describe in detail this second strategy.

In the base of the characteristic variables V, (4.33) can be written as

Vn+10 −Vn

0

∆t+ Λ

Vn+11 −Vn

0

h= 0,

Vn+1k −Vn

k

∆t+ Λ

Vn+1k+1 −Vn

k−1

2h= 0, k ≥ 1.

(4.34)

For the linear absorbing boundary conditions (4.19) we should impose

Π+V (Vref) (V0 −Vref) = 0, (4.35)

while discarding the equations corresponding to the incoming waves in the first rows

of (4.34). Here Uref/Vref is the state about which we make the linearization.

Using Lagrange Multipliers

This can be done, via Lagrange multipliers in the following way

Π+V (Vref) (V0 −Vref) + Π−

V (Vref)Vlm = 0,

Vn+10 −Vn

0

∆t+ Λ

Vn+11 −Vn

0

h+ Π+

V (Vref)Vlm = 0,

Vn+1k −Vn

k

∆t+ Λ

Vn+1k+1 −Vn

k−1

2h= 0, k ≥ 1,

(4.36)


where Vlm are the Lagrange multipliers for the imposition of the new conditions. Note

that, if j is an incoming wave (λj ≥ 0), then the equation is of the form

vj0 − vref0 = 0,

vn+1j0 − vn

j0

∆t+ λj

vn+1j1 − vn

j0

h+ vj,lm = 0,

vn+1jk − vn

jk

∆t+ λj

vn+1j,k+1 − vn

jk

2h= 0, k ≥ 1.

(4.37)

Note that, due to the vj,lm Lagrange multiplier, we can solve for the vjk values from the

first and last rows, while the value of the multiplier vj,lm ‘adjusts’ itself in order to satisfy

the equations in the second row.

On the other hand, for the outgoing waves (λj < 0), we have

vj,lm = 0,

vn+1j0 − vn

j0

∆t+ λj

vn+1j1 − vn

j0

h= 0,

vn+1jk − vn

jk

∆t+ λj

vn+1j,k+1 − vn

jk

2h= 0, k ≥ 1.

(4.38)

So that the solution coincides with the unmodified original FEM equation, and the La-

grange multiplier is vj,lm = 0.

Coming back to the U basis, we have

Π+U(Uref) (U0 −Uref) + Π−

U(Uref)Ulm = 0,

CUn+1

0 −Un0

∆t+ A

Un+11 −Un

0

h+ CΠ+

U(Uref)Ulm = 0,

CUn+1

k −Unk

∆t+ A

Un+1k+1 −Un

k−1

2h= 0, k ≥ 1.

(4.39)

Using Penalization

The corresponding formulas for penalization can be obtained by adding a diagonal term

scaled by a small regularization parameter ε to the first equation in (4.39)

−εUlm + Π+U (U0 −Uref) + Π−

U Ulm = 0,

CUn+1

0 −Un0

∆t+ A

Un+11 −Un

0

h+ Π+

U Ulm = 0;(4.40)

where, for the moment, we dropped the dependence of the projectors on Uref . Eliminating

Ulm from the first and second rows we obtain

CUn+1

0 −Un0

∆t+ A

Un+11 −Un

0

h+ Π+

U (Π−U + εI)−1 Π+

U(U0 −Uref) = 0. (4.41)

4.5. Dynamically Varying Boundary Conditions 121

Now, using projection algebra we can show that

(Π−U + εI)−1 = (

1

εΠ+

U +1

1 + εΠ−

U) (4.42)

so that the last term in (4.41) reduces to Π+U(U0 −Uref) and the whole equation is

CUn+1

0 −Un0

∆t+ A

Un+11 −Un

0

h+

1

εCΠ+

U(U0 −Uref) = 0. (4.43)

Here 1/ε can be thought as a large penalization factor.

4.4.8 Numerical Example. Viscous Compressible Subsonic Flow

Over a Parabolic Bump

In order to evaluate the absorption of waves impinging at fictitious boundaries, a 2D

test consisting of a compressible subsonic flow over a parabolic bump at Maref = 0.5 is

considered (see Figure 4.10). The idea is to assess how the length from bump trailing

edge to the fictitous outflow (Lout) affects the predicted forces and their time evolution.

Two set of simulation were carried out. One set considering non-absorbent boundary

conditions where variables are imposed as specified in Figure 4.10. At inlet wall the

imposed conditions are ρ = ρref = 1, u = uref = Maref

√γpref/ρref = 0.5 and v = 0. At the

outflow boundary pressure is imposed, i.e., p = pref = 1/γ, where γ = 1.4. The second set

of simulations is considering ULSAR non-reflecting conditions at channel inlet and outlet.

Initial state for both set of problems is U = (ρref , uref , 0, pref). Parameters in Figure 4.10

are: Lin = 1.4, Lbump = 2, hbump = 0.1 and Lout = 1, 2, 4, 8. The values are made

nondimensional by selecting L, ρref and uref as reference values for length, density and

velocity Figures 4.11 and 4.12 show how ULSAR conditions produce the wave absorption

at fictitious boundaries.

4.5 Dynamically Varying Boundary Conditions

4.5.1 Varying Boundary Conditions in External Aerodynamics

During a flow computation the number of incoming characteristics n+ may change. This

can occur due to a flow regime changing (i.e., from subsonic to supersonic) or due to a flow

sense changing (flow reversal). A typical case is the external flow around an aerodynamic

body as shown in Figure 4.13. Consider first a steady subsonic flow. The flow is normally

subsonic at the whole infinite boundary, even if some supersonic pockets can develop at

transonic speeds. Then the only two possible regimes are subsonic inlet (n+ = nd + 1, nd


L in

Lbump

hbump

L out

ρuv

p

slip condition

slip condition

Figure 4.10: Problem geometry

Lout=1,no abso

Lout=2,no absoLout=4,no abso

Lout=1, abso

Lout=2,4,8, abso

Lout=8,no abso

−1.46

−1.44

−1.42

−1.4

−1.38

−1.36

−1.34

−1.32

−1.3

0 200 400 600 800 1000 1200

Fy [N

]

t [secs]

−1.46

−1.44

−1.42

−1.4

−1.38

−1.36

−1.34

−1.32

−1.3

0 200 400 600 800 1000 1200

Fy [N

]

t [secs]

Figure 4.11: y-Force

is the spatial dimension) and subsonic outlet (n+ = 1). We can determine whether the

boundary is inlet or outlet by looking at the projection of the unperturbed flow velocity

u∞ with the local normal n. For the steady supersonic case the situation is very different.

A bow shock develops in front of the body and forms a subsonic region which propagates

downstream. Far downstream the envelope of the subsonic region approaches a cone with

an aperture angle equal to the Mach angle for the undisturbed flow. At the boundary we

have now a supersonic inlet region, and on the outlet region we have both subsonic and

supersonic parts. The point where the flow at outlet changes from subsonic to supersonic

may be estimated from the Mach angle, but it may be very inaccurate if the boundary is

close to the body. Having a boundary condition that can automatically adapt itself to the


Lout=1, abso

Lout=2,4,8, abso

t[secs]

Fy[N

]

−1.46

−1.44

−1.42

−1.4

−1.38

−1.36

−1.34

−1.32

0 10 20 30 40 50 60 70 80

Figure 4.12: y-Force evolution for absorbent conditions

whole possibilities can be of great help in such a case. Now, consider the unsteady case,

for instance a body slowly accelerated from subsonic to supersonic speeds. The inlet part

will change at some point from subsonic to supersonic. At outlet, some parts will change

also from subsonic to supersonic, and the separation between both parts will change its

position, following approximately the instantaneous Mach angle.

subsonic flow (Minf<1)

subsonic incomingrho,u,v

psubsonic outgoing

supersonic flow (Minf>1)

rho,u,v,p

M>1M<1

bow shock

p

supersonic outgoing

subsonic outgoing

(no field imposed)

Figure 4.13: Number of incoming/outgoing characteristics changing on an accelerating

body


4.5.2 Aerodynamics of Falling Objects

An interesting case is the aerodynamics of a falling body[FKMN97, Bel99, Hua00, Hua01,

Hua02]. Consider, for simplicity, a two dimensional case of an homogeneous ellipse in free

fall. As the body accelerates, the pitching moments tend to increase the angle of attack

until it stalls (A). Then, the body starts to fall towards its other end, and accelerates while

its main axis aligns with gravity (B). As the body accelerates the pitching moment grows

until it eventually stalls again (C). The pattern is repeated during the downfall. This kind

of falling mechanism is typical of slender bodies with relatively small moment of inertia

like a sheet of paper and is called ‘flutter’. However, depending on several parameters,

but mainly depending on the moment of inertia of the body, if it has a large angular

moment at (B), it may happen that it rolls on itself, keeping always the same sense of

rotation. This kind of falling mechanism is called ‘tumble’ and is a tipical pattern for

thicker and massive objects. For massive objects (like a ballistic projectile, for instance)

tumbling may convert a large amount of potential energy in the form of rotation, causing

the object to rotate at very large speeds. As the body falls it accelerates and can reach

A

B

C

B

D

flutter tumble

Figure 4.14: Falling ellipse

supersonic speeds. This depends on the density of the body relative to the surrounding


atmosphere, its dimensions and shape. As the weight of the body goes with ∝ L3, being

L the characteristic length, while the drag force goes with ∝ L2, larger bodies tend to

reach larger limit speeds and eventually reach supersonic regime.

One can model a falling body in several ways. In order to avoid the use of deform-

ing meshes, a fixed mesh attached to the body can be used. Then one can choose to

perform the computation in a non-inertial frame moving with the body or to perform

the computation in an inertial frame using a moving but not deforming mesh. In the

first case ‘inertial forces’ (Coriolis, centrifugal) must be added, while in the second case

convective terms must take into account the mesh velocity as in the ‘Arbitrary Lagrangian

Eulerian (ALE)’ formulation. In this example we choose to use the first strategy. The

Figure 4.15: Gaspard-Gustave de Coriolis (1792–1843)

computation of the flow is linked to the dynamics of the falling object. The strategy is

a typically staggered fluid/solid interaction process [Ceb96, LYC+98, LC96]. Basically

we solve the fluid problem in a non-inertial frame with inertial terms computed with the

actual state of the body (linear acceleration a, angular rotation velocity ω and angular

rotation acceleration ω). Also boundary conditions in the non-inertial frame at infinity

must take into account the actual linear and angular velocity of the object. The fluid

solver updates the state of the fluid from tn to tn+1. Then, with the state of the fluid at

tn+1 the forces exerted by the fluid on the body are computed. With these forces, the

equations for the rigid motion of the body are solved (six dof’s, accounting for two linear

position and velocities, rotation angle and its derivative).

Coming back to the boundary conditions issue, added the fact that the body can

accelerate and decelerate, and going back and forth from subsonic to supersonic speeds,

we have now to take into account the angle from which the unperturbed flow impinges on


the body varies with time. So, as the body can rotate arbitrarily, the flow can impinge

from any direction relative to the non-inertial frame fixed to the body.

Numerical Example. Ellipse Falling at Supersonic Speed

As an example consider the fall of an ellipse with the following physical data

• a = 1, b = 0.6 (major and minor semi-axes, eccentricity e =√

1− b2/a2 = 0.8),

• m = 1, (mass),

• w = 2.5, (weight of body),

• r = 1, (radius of inertia),

• c.m. = (−0.15, 0.0), (center of mass),

• ρa = 1, (atmosphere density),

• p = 1, (atmosphere pressure),

• γ = 1.4, (gas adiabatic index γ = Cp/Cv),

• Rext = 10, (Radius of the fictitious boundary),

• uini = [0, 0, 1.39, 0, 1.3, 0], (ellipse initial position and velocity [x, y, θ, u, v, θ]).

These values are made nondimensional by selecting a, ρa and c0 as reference values for

length, density and velocity, so that the nondimensional quantities are ρ′a = 1, p′ = 1/γ,

u′ = 0.5, and in the following the prime indicating nondimensional quantities is dropped.

A coarse estimation of the limit speed v can be obtained by balancing the vertical forces

on the body, i.e., the drag on the body (Faero), the weight and the hydrostatic flotation

Faero +W + Ffloat = CDρav2A− ρsgV + ρagV, (4.44)

where V = πab is the volume of the body (the area in 2D) and A = 2b the area of the

section facing the fluid (length in 2D). CD = 0.2 is an estimation for the drag coefficient

of the body and ρs = m/V, ρa the densities of solid and atmosphere respectively. For the

data above this estimation gives a limit speed of v = 2.8 approximately. The speed of

sound of the atmosphere is c =√γp/ρa = 1.18, so that it is expected that the body will

reach supersonic speeds. Of course, if the body does reach supersonic speed, then the

drag coefficient will be higher and probably the average speed will be lower than that one

estimated above.

The initial conditions are the ellipse starting at velocity (0,−1.39), and an angle of its

major axis of 80with respect with the vertical, the fluid is initially at rest. The computed

trajectory until t = 50 time units is shown in Figure 4.16. The computed trajectory is

shown in a reference system falling at velocity v = (−0.5, 0.5) (this is done in order to

reduce the horizontal and vertical span of the plot). In Figures 4.17 we see colormaps of

4.6. Conclusions 127

Mach number at six instants, in the non inertial frame fixed to the body. The instants

are marked as A,B,C and identified in the trajectory. Note that as the ellipse rotates,

each part of the boundary experiments all kind of regimes and the absorbing boundary

condition copes with all of them. Note also that the artificial boundary is located very

near to the body, the radius of the external circle is 3.25 times the major semi-axis of

the ellipse (in the case simulated with the minor external radius, i.e., Rext = 5). In

C

B

A

Figure 4.16: Computed trajectory of falling ellipse

Figure 4.18 the velocities of the ellipse are shown in order to evaluate the absorption of

ULSAR conditions when waves reach boundaries as the ellipse falls and tumble/flutter

when the fictitious boundary (exterior circle) is located at Rext = 5m and Rext = 10m

and the size of finite elements remain constant.

4.6 Conclusions

Absorbing boundary conditions reduce computational cost by allowing to put the artificial

exterior boundary closer to the region of interest. Extension to the non-linear cases can


be done either by using Riemann invariants or by using the state at the previous time

step as reference state for a linearized boundary condition. In complex simulations the

number of incoming characteristic waves may vary during the computation or may not be

known a priori. In those cases absorbing boundary conditions can be imposed with the

help of Lagrange multipliers or penalization techniques.

Figure 4.17: Ellipse falling at supersonic speeds. Colormaps of |u|. Station A (t = 3.75),

station B (t = 6.25), station C (t = 10). Stations in the trajectory refer to Figure 4.16.

Results are shown in a non-inertial frame attached to the ellipse

u,Rext=5

u,Rext=10

v,Rext=10

v,Rext=5

t[secs]

u,v[

m/s

ec]

−3.5

−3

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

0 5 10 15 20 25 30

Figure 4.18: Ellipse velocities for different external radius

Chapter 5

Strong Coupling Strategy for

Fluid-Structure Interaction

Problems in Supersonic Regime Via

Fixed Point Iteration

I am not sure that I exist,

actually.

I am all the writers that I have read,

all the people that I have met,

all the women that I have loved;

all the cities that I have visited,

all my ancestors...

Perhaps I would have liked to be my father,

who wrote and had the decency of not publishing.

Nothing, nothing, my friend;

what I have told you: I am not sure of anything,

I know nothing...

Can you imagine that I not even know the date of my death?

Jorge Luis Borges, Attributed

131

132Chapter 5. Strong Coupling Strategy for Fluid-Structure Interaction

Problems in Supersonic Regime Via Fixed Point Iteration

In this chapter some results on the stability of the time integration when solving

fluid/structure interaction problems with strong coupling via fixed point iteration strategy

are presented. The flow-induced vibration of a flat plate aligned with the flow direction

at supersonic Mach number is studied. The precision of different predictor schemes and

the influence of the partitioned strong coupling on stability are discussed.

5.1 Introduction

Multidisciplinary and multiphysics coupled problems represent nowadays a paradigm

when studying/analyzing even more complex phenomena that appear in nature and in new

technologies. There exists a great number of problems where different physical processes

(or models) converge, interacting in a strong or weak fashion (e.g., Acoustics/Noise distur-

bances in flexible structures, Magneto-Hydrodynamics devices, Micro-Electro-Mechanical

devices, Thermo-Mechanical problems like continuous casting process, Fluid/Structure in-

teraction like wing flutter problem or flow-induced pipe vibrations). In the fluid/structure

interaction area, the dynamic interaction between an elastic structure and a compress-

ible fluid has been the subject of intensive investigations in the last years (see Refer-

ences [GR05, PF01, Lef05]). This part of the present thesis concerns with the numerical

integration of this type of problems when they are coupled in a loose or strong manner.

For simple structural problems (like hinged rigid rods with one or two vibrational dof’s)

it is possible to combine into a single (simple) formulation the fluid and the structural

governing equations (see Reference [DCC+95]). In those cases, a fully explicit or fully

implicit treatment of the coupled fluid/structure equations is attainable. Nevertheless,

for complex/large scale structural problems, the simultaneous solution of the fluid and

structure equations using a ‘monolithic’ scheme may be mathematically unmanageable

or its implementation can be a laborious task. Furthermore, the monolithic coupled

formulation would change significantly if different fluid and/or structure models were

considered.

An efficient alternative is to solve each sub-problem in a partitioned procedure where

time and space discretization methods could be different. Such a scheme simplifies ex-

plicit/implicit integration and it is in favor of the use of different codes specialized on each

sub-area. In this application a staggered fluid/structure coupling algorithm is considered.

A detailed description of the ‘state of the art’ in the computational fluid/structure in-

teraction area can be found in works [PF01, FPF01, PF00, DP06] and the references

therein.

Beyond the physical and engineering importance, this problem is interesting from the

5.2. Strongly Coupled Partitioned Algorithm Via Fixed Point Iteration 133

computational point of view as a paradigm of multiphysics code implementation that

reuses preexistent fluid and elastic solvers. The partitioned algorithm is implemented in

the PETSc-FEM code (www.cimec.org.ar/petscfem) which is a parallel multi-physics

finite element program based on the Message Passing Interface MPI and the Portable

Extensible Toolkit for Scientific Computations PETSc. Two instances of the PETSc-FEM

code simulate each sub-problem and communicate interface forces and displacements via

Standard C FIFO files or ‘pipes’. The key point in the implementation of this partitioned

scheme is the data exchange and synchronization between both parallel processes. These

tasks are made in a small external C++ routine.

5.2 Strongly Coupled Partitioned Algorithm Via Fixed

Point Iteration

In this section the temporal algorithm that performs the coupling between the structure

and the fluid codes is described. It is a fixed point iteration algorithm over the states

of both fluid and structure systems. Each iteration of the loop is called a ‘stage’, so if

the ‘stage loop’ converges, then a ‘strongly coupled’ algorithm is obtained. Hereafter, this

algorithm is called ‘staged algorithm’. The basic staggered algorithm considered in this

section proceeds as follows: (i) transfer the motion of the wet boundary of the solid to

the fluid problem, (ii) update the position of the fluid boundary and the bulk fluid mesh

accordingly, (iii) advance the fluid system and compute new pressures (and the stress field

if compressible Navier-Stokes model is adopted), (iv) convert the new fluid pressure (and

stress field) into a structural load, and (v) advance the structural system under the flow

loads. Such a staggered procedure, which can be treated as a weakly coupled solution

algorithm, can also be equipped with an outer loop in order to assure the convergence of

the interaction process. The algorithm can be stated as follow

where

wn : is the fluid state (ρ,v, p) at time tn,

un : is the structure state (displacements) at time tn,

un : are structure velocities at time tn,

Xn : are fluid mesh node positions at time tn,

nstep : is the number of time steps in the simulation,

nstage : is the number of stages in the coupling scheme

nnwt : is the number of Newton loops in the non-linear problem,

CMD : is intended for Computational Mesh Dynamics,

www.cimec.org.ar/petscfem



Algorithm 2: Strong FSI coupling via fixed point iteration

1: Initialize variables:

2: for n = 0 to nstep do Main time step loop 3: tn = n∆t,

4: CFD CODE: 5: receive un from STRUCTURE

6: Xn = CMD(un) run CMD code 7: u(n+1)P = u(n+1,0) = predictor(un,un−1) compute predictor 8: STRUCTURE CODE: 9: receive wn from FLUID fluid state

10: for i = 0 to nstage do stage loop 11: CFD CODE: 12: receive u(n+1,i) from STRUCTURE

13: Xn+1,i+1 = CMD(un+1,i)

14: Compute skin normals and velocities 15: for k = 0 to nnwt do Fluid Newton loop 16: wn+1,i+1 = CFD(wn,Xn+1,i+1,Xn)

17: end for

18: send wn+1,i+1 to STRUCTURE

19: FLUID CODE: after each stage iteration 20: CSD CODE: 21: receive wn+1,i+1 from FLUID;

22: compute structural loads (wn,wn+1,i+1)

23: for k = 0 to nnwt do Structure Newton loop 24: un+1,i+1 = CSD(wn,wn+1,i+1)

25: end for

26: send un+1,i+1 to FLUID

27: STRUCTURE CODE: after each stage iteration 28: end for

29: FLUID CODE: after time each step 30: send un to FLUID;

31: STRUCTURE CODE: after each time step 32: send wn to STRUCTURE

33: end for

5.2. Strongly Coupled Partitioned Algorithm Via Fixed Point Iteration 135

CSD : for Computational Structure Dynamics,

CFD : for Computational Fluid Dynamics.

5.2.1 Notes on the Fluid/Structure Interaction (FSI) Algorithm

• Two codes (CFD and CSD) are running simultaneously. For simplicity, the basic

algorithm can be thought as if there were no ‘concurrence’ between the codes,

i.e., at a given time only one of them is running. This can be controlled using

‘semaphores’ and this is done using MPI ‘synchronization messages’.

• The most external loop is over the time steps. Internal to it is the ‘stage loop’.

‘Weak coupling’ is achieved if only one stage is performed (i.e., nstage = 1). In each

stage the fluid is first advanced using the previously computed structure state un

and the current estimate value un+1,i. In this way, a new estimate for the fluid state

wn+1,i+1 is computed. Next the structure is updated using the forces of the fluid

from states wn and wn+1,i+1. At the first stage, the state un+1,0 is predicted using

a second or higher order approximation (see equation (5.2)). Inside the stage loop

there are Newton loops for each code to solve the non-linearities. In this application

the Computational Structure Dynamics (CSD) is linear, so nnwt = 1.

• Once the coordinates of the structure are known, the coordinates of the fluid mesh

nodes are computed by a ‘Computational Mesh Dynamics’ code, which is symbolized

as

Xn = CMD(un). (5.1)

Even though the CMD may be performed with a general strategy using both nodal

reallocation or remeshing, in this chapter only the former is adopted, keeping the

topology unchanged. Relocation of mesh nodes can be done using an elastic or

pseudo-elastic model (see Reference [LNST06]) through a separate PETSc-FEM

parallel process (code named MESH-MOVE). For the simple geometry of the exam-

ple a simple spine strategy is used.

• The general form of the predictor for the structure state was taken from Refer-

ence [PF01] and can be written as

u(n+1)P = α0∆tun + α1∆t(u

n − un−1). (5.2)

It is at least first order accurate when no predictor is employed and it may be

improved to second order using the above predictor with some values for α0 and

α1 according to the problem at hand. To understand how to specify these two

parameters a simple two dof’s wake oscillator model represented by two second



order differential equations as follows

mz z + cz z + kzz = fz(y, y, y, t),

myy + cyy + kyy = fy(z, z, z, t),(5.3)

with (m, c, k) the mass, the damping and the stiffness parameters for each dof and

y, z representing the structure and the fluid simple model. The forcing terms at the

right hand side contain the coupling between the two blocks. This coupling may be

formulated in terms of the main variables and their two first derivatives, generally

velocities and accelerations. If the coupling contains only the main variables, i.e.,

fz(y, t) and fy(z, t), the predictor with α0 = 1 and α1 = 0 achieves second order

accuracy in time solutions. If the coupling contains velocities it is necessary to use

α1 = 1/2 to recover second order in time. In fluid structure interaction problems

solved via ALE it is known that the mesh velocity dependent of the fluid-solid

interface velocity is incorporated in the formulation, therefore to guarantee second

order accuracy in time it is necessary to use α0 = 1 and α1 = 1/2 for the predictor.

Note that, if the trapezoidal (Simpson’s) rule with α = 1/2 is used for both the

structure and the fluid and the predictor is chosen with at least second order preci-

sion, then the whole algorithm is second order, even if only one stage is performed.

• At the beginning of each fluid stage there is a computation of skin normals and

velocities. This is necessary due to the time dependent slip boundary condition for

the inviscid case implemented as a constraint (see equation (5.5)) or also in the case

of using a non-slip boundary condition for the viscous case where the fluid interface

has the velocity of the moving solid wall, i.e., v|Γg = v|Γg = u|Γg .

Figure 5.1: Thomas Simpson (1710–1761)

5.3. Description of Test Case 137

5.3 Description of Test Case

The flutter of a flat solid plate aligned with a gas flow at supersonic Mach numbers (see

Figure 5.2) is studied. A uniform fluid at state (ρ∞, U∞, p∞) flows over an horizontal rigid

wall y = 0 parallel to it. This test case has been studied also in [PF01]. In a certain

(ρ∞

, U∞

, p∞

)

x

y

A

C

B

D

u(x)

L

Figure 5.2: Description of test

region of the wall (0 ≤ x ≤ L) the wall deforms elastically following thin plate theory,

i.e.,

mu+D∂4u

∂x4= −(p− p∞) + f(x, t), (5.4)

where m is the mass of the plate per unit area in kg/m2, D = Et3/12(1− ν2) the bending

rigidity of the plate module in Nm, E is the Young modulus in Pa, t the plate thickness in

m, ν the Poisson modulus, u the normal deflection of the plate in m, defined on the region

0 ≤ x ≤ L and null outside this region, p the pressure exerted by the fluid on the plate in

Pa and f is an external force in N and will be described later. The plate is clamped at both

ends, i.e., u = (∂u/∂x) = 0 at x = 0, L. For the sake of simplicity the fluid occupying

the region y > 0 is inviscid. The compressible Euler model with SUPG stabilization

and ‘anisotropic shock-capturing’ method is considered (see Reference [TS04]). A slip

condition is assumed

(v − vstr) · n = 0 (5.5)

on the (curved) wall y = u(x), where

vstr = (0, u),

n ∝ (−∂u∂x, 1)

(5.6)



are the velocity of the plate and its unit normal. Finally, initial conditions for both the

fluid and the plate are taken as

u(x, t = 0) = u0(x),

u(x, t = 0) = u0(x),

(ρ,v, p)x,t=0 = (ρ,v, p)0, for y ≥ u0(x).

(5.7)

Note that for the fluid pressure load on the plate the free stream fluid pressure is

subtracted so that in the absence of any external perturbation (f ≡ 0) the undisturbed

flow (ρ,v, p)x,t ≡ (ρ,v, p)∞ is a solution of the problem for the initial conditions

u ≡ 0,

u ≡ 0,

(ρ,v, p)x,t=0 ≡ (ρ,v, p)∞.

(5.8)

5.3.1 Dimensionless Parameters

As the fluid is inviscid, it is determined by the ‘adiabatic index’ γ = Cp/Cv = 1.4 for air,

and the Mach number M∞ = U∞/c∞, where c∞ is the speed of sound c =√γp/ρ for the

undisturbed state.

Another dimensionless parameter can be built by taking the ratio between the char-

acteristic time of the structure Tstr =√mL4/D, and the characteristic time of the fluid

Tfl = L/U∞. Then, the dimensionless number NT is defined as the square of the ratio of

both characteristic times

NT =

(Tfl

Tstr

)2

=D

mL2U∞2 . (5.9)

Finally, a (dimensionless) number can be formed by taking the ratio between the mass of

the fluid being displaced by the structure and the structure mass

NM =ρ∞L

3

mL2=ρ∞L

m. (5.10)

The same parameters as reported in Reference [PF01] are considered. In this contribution,

flutter was studied near the point M∞ = 2.27, NT = 4.3438×10−5 and NM = 0.054667.

The flutter region was studied by varying the M∞ value while keeping ρ∞ and the structure

parameters (m, L, D) constant (so that NM is constant and NT ∝ M∞−2), and the same

approach is taken here. The dimensionless parameters are obtained by choosing the


following dimensional values

ρ∞ = 1 kg/m3,

p∞ = 1/γ = 0.71429 Pa,

U∞ = M∞, (since c∞ =√γp∞/ρ∞ = 1 m/sec),

D = 0.031611 Nm,

m = 36.585 kg/m2,

L = 2 m.

(5.11)

5.3.2 Houbolt’s Model

In this section the linear flutter instability by means of the modal analysis is studied.

First, the ‘Houbolt’s approximation’ (see Reference [Hou58]) is assumed for the fluid,

p− p∞ = Cx∂u

∂x+ Ct

∂u

∂t,

Cx =ρ∞U∞

2√M∞

2 − 1,

Ct =ρ∞U∞ (M∞

2 − 2)

(M∞2 − 1)3/2

.

(5.12)

With this approximation the governing equation for plate deflection (5.4) becomes

mu+D∂4u

∂x4= −Cx

∂u

∂x− Ct

∂u

∂t. (5.13)

Plane Wave Analysis

If an infinite plate is considered, a plane wave analysis may shed some light on the

mechanism that leads to a flutter behavior. Then, let us consider plane waves of the

form

u(x, t) = Reu ei(kx−ωt)

. (5.14)

Replacing (5.14) in (5.13) an implicit ‘dispersion law’ ω = ω(k) it is obtained

−ω2m+Dk4 = iCtω − ikCx. (5.15)

Regarding the last equation instability (flutter) occurs whenever Im ω > 0. Note that

lowering the mass ratio parameter NM while keeping NT and M∞ constant, it is equivalent

to scaling the fluid terms on the the right hand side of equation (5.12) by this factor. When

there is no fluid (NM → 0) the dispersion law simply reduces to

ω0 = ±√D

mk2. (5.16)



As expected, the eigenvalues are real, meaning neither damping nor amplification of the

waves. The positive (negative) sign corresponds to right-going (left-going) waves, i.e.,

waves that run in the same (opposed) direction as the fluid. The subscript 0 indicates

that this dispersion law is valid in the absence of fluid. Now assume that NM is small

enough so that the right hand side of equation (5.15) is a small perturbation to the terms

in the left hand side. Then a first order expansion of the left hand side with respect to ω

around ω0 can be done

−2mω0 δω = iCtω0 − ikCx, (5.17)

so that

ω ≈ ω0 − iCt

2m± i

Cx

2k√mD

. (5.18)

From this equation it is clear that the temporal term (the second one in the Houbolt

approximation (5.12)) has a stabilizing effect (negative imaginary part), while the spatial

term has a damping effect for left-going waves and destabilizing effect for right-going

ones, i.e., for those that run in the same direction as the fluid. Flutter occurs when the

destabilizing term is strong enough so as to overcome the stabilizing temporal term.

More physical insight is obtained analyzing the work that is done by the fluid onto the

plate. If a plane wave given by equation (5.14) is considered in its equivalent real form

u(x, t) = |u| cos(kx− ωt+ ϕ), (5.19)

with u = |u|eiϕ, then the instantaneous vertical velocity and pressure are

v =∂u

∂t= ω|u| sin(kx− ωt+ ϕ),

p− p∞ = (−Cxk + Ctω) |u| sin(kx− ωt+ ϕ).(5.20)

The instantaneous work done by the fluid onto the plate, averaged on a wave length

λ = 2π/k is

W = −∫ λ

0

pv dx,

= ω(Cxk − Ctω) |u|2λ2.

(5.21)

A positive work means that the structure is absorbing energy from the fluid, and then

has a destabilizing effect, while the opposite means dissipation of the structural wave

energy into the fluid. It can be seen (again) from equation (5.21) that the temporal

term has always a stabilizing effect, while the spatial term has a destabilizing effect when

sign(ω) sign(k) > 0, i.e., for right-going waves (traveling in the same sense of the fluid).

Note that at the basis of the destabilizing effect is the fact that the spatial term in

the Houbolt approximation produces a pressure perturbation field that is non-symmetric


with respect to the crest of the waves, i.e., before the crest ((du/dx) > 0) p − pref > 0

whereas after the crest ((du/dx) < 0) p− pref < 0. In inviscid subsonic flow the pressure

perturbation field would be symmetric.

Galerkin Model for the Finite Length Plate

The plate normal displacement is expanded in a global basis using

u(x) =N∑

k=1

akψk(x),

ψk(x) =4x(L− x)

L2sin(kπx/L).

(5.22)

These basis functions satisfy the essential boundary conditions for the plate equation

u = (∂u/∂x) = 0 at x = 0, L. Replacing the Houbolt approximation in equation (5.4),

using Galerkin method and integrating by parts as needed, the following matrix equation

is obtained

Ma + Ka + Hxa + Hta = 0, (5.23)

where

Mjk =

∫ L

0

mψj(x)ψk(x) dx,

Kjk =

∫ L

0

Dψ′′j (x)ψ′′k(x) dx,

Hx,jk =

∫ L

0

Cx ψj(x)ψ′k(x) dx,

Ht,jk =

∫ L

0

Ct ψj(x)ψk(x) dx.

(5.24)

Solution of these system of ODE’s can be found by standard operational methods by

replacing a(t) by the ansatz

a(t) = aeλt (5.25)

leading to the eigenvalue equation(λ2M + λHt + K + Hx

)a = 0. (5.26)

Flutter is detected whenever some eigenvalue λ has a positive real part.

Numerical Solution Details

• The series (5.22) are truncated at a certain number of terms N . Usually N = 10 or

20.



• Matrix entries for M, K, Ht and Hx are computed by approximating derivatives

with second order finite differences and integrating with a second order rule.

• The quadratic eigenvalue problem of size N is solved by converting it to a linear

eigenvalue problem of size 2N (and then finding eigenvalues and eigenvectors).

Results

Using for instance the values described in (5.11) with N = 20 terms in the series and

5000 intervals for computing the matrix coefficients integrals, and varying Mach from

1.8 to 3 the results shown in Figure 5.3 are obtained. For M∞ < Mcr = 2.265, all the

eigenvalues have negative real part being stable. For M∞ > Mcr = 2.265 there are two

complex conjugate roots with positive real part. In Figure 5.3 the real and imaginary

part of the unstable mode are plotted. For M∞ < Mcr = 2.265 the eigenvalue with

the lower frequency was taken as a continuation of the flutter mode. It was checked

that for M∞ < Mcr = 2.265 the plate does positive power on the fluid whereas for

M∞ > Mcr = 2.265 the converse is true. The instantaneous power done by the plate on

the fluid is

P =

∫ L

x=0

pu dx. (5.27)

In Figures 5.4 to 5.6 the form of the plate deflection of the flutter mode for Mach 2.22,

real(lambda)

imag(lambda)

imag

(lam

bda)

real(lambda)

Mach

Mach_cr = 2.265flutterno flutter

1.6 1.8 2 2.2 2.4 2.6 2.8 3 0.3

0.32

0.34

0.36

0.38

0.4

0.42

−0.02

−0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Figure 5.3: Lowest frequency mode for test case

2.27 and 2.35 can be observed. For each Mach, plate deflection, fluid pressure on the plate

and power that is being given by the plate on the fluid (pu) are shown.


−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Figure 5.4: Mach 2.2, phase 0. Black= plate deflection, blue=pressure, green=power.

Quantities normalized (not to scale)

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Figure 5.5: Mach 2.27, phase 0

Flutter Region

A large number of flutter computations in the space NM , NT ,M∞ were performed in

order to determine the flutter region. A grid of 20 × 20 points in the region [0.001 ≤

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Figure 5.6: Mach 2.35, phase 0



NM ≤ 0.1]× [10−5 ≤ NT M∞2 ≤ 10−3] was scanned. For each point in the grid instability

is scanned in the Mach region 1.8 ≤ M∞ ≤ 3. The flutter region has the following

characteristics:NM

NT M∞2 < 200 no flutter for any Mach number,

NM

NT M∞2 > 300 flutter for the lowest Mach number considered (M∞ ≥ 1.8).

(5.28)

In the intermediate region flutter is produced. This suggests that flutter is highly corre-

lated to quantityNM

NtM∞2 =

ρ∞L3c∞D

, (5.29)

which happens to be independent of the density of the plate.

A simple model presented in Reference [DCC+95] draws a similar conclusion. The

explanation is as follows. In that reference, the term proportional to (∂u/∂t) is neglected.

This is true if the characteristic times of the structure are much lower than those of

the fluid, i.e., NT 1. This is a valid assumption because of all points in the grid

located in the region NT < 10−3 are considered. But if the temporal term in the Houbolt

approximation is neglected the characteristic equation can be written in the form

det(λ2M + K + Hx) = 0, (5.30)

whereλ =

√mλ,

M =1√m

M.(5.31)

As now the coefficients in M, K, Hx do not depend on m, neither do the eigenvalues of

equation (5.30), and then by (5.31) the λ eigenvalues are of the form

λj =λj√m, (5.32)

with λ not depending on m. This means that the sign of the real part of the λ is

independent of m.

5.3.3 FSI Code Results

The aeroelastic problem defined above was modeled with the strongly coupled partitioned

algorithm described in section §5.2 with a mesh of 12800 quadrilateral elements for the

fluid and 5120 for the plate. As the flow is supersonic only a small entry section of 1/8L

upstream the plate and 1/3L downstream is considered. The vertical size of the com-

putational domain was chosen as 0.8L. It is assured that no reflection from the upper

boundary affects the plate itself when considering these sizes for the fluid domain.


Determination of Flutter Region

This section presents some results obtained with PETSc-FEM code using the weak cou-

pling between fluid and structure, i.e., nstage = 1. The physical characteristics of the

plate are the same as in previous section. In order to find (numerically) the critical Mach

number for this problem a sweep in the Mach number in the range of 1.8 to 3.2 was done.

Results for some Mach numbers can be seen in Figures 5.7 to 5.12. In these plots the time

evolution of displacements of several points distributed along the skin plate are shown.

The fluid density field and the structure displacement at Mach=3.2 (flutter region) for a

given time step is shown in Figures 5.13 and 5.14. For Mach numbers below the

-0.002

-0.0015

-0.001

-0.0005

0

0.0005

0.001

0.0015

0.002

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

plat

e de

flect

ion

in d

istri

bute

d po

ints

[m]

time [sec]

Figure 5.7: Plate deflection in distributed points along plate at M=1.8

Mcr, Figures 5.7 to 5.9, the maximum plate displacement grows until the forces exerted

by the fluid dump the plate displacements. The time needed to reduce the response a

given factor (30% for instance) grows with the Mach number. For Mach numbers near the

Mcr, Figure 5.10, the maximum amplitude grows slightly. The flutter mode is triggered

at this point. For Mach numbers above the Mcr, Figures 5.11, 5.12, 5.13 and 5.14, the

fluid forces cannot damp the structure response and displacements grow without limit in

a unstable fashion according to the theory.

Time Accuracy

If the stage loop converges, i.e., (u,w)n+1,i → (u,w)n+1,∗, then it can be shown that the

limit states (u,w)n+1,∗ satisfy the fully implicit, strong coupled equations. The main effect



-0.005

-0.004

-0.003

-0.002

-0.001

0

0.001

0.002

0.003

0.004

0.005

0 0.5 1 1.5 2 2.5 3 3.5 4

plat

e de

flect

ion

in d

istri

bute

d po

ints

[m]

time [sec]


-0.005

-0.004

-0.003

-0.002

-0.001

0

0.001

0.002

0.003

0.004

0.005

0 0.5 1 1.5 2 2.5 3 3.5 4

plat

e de

flect

ion

in d

istri

bute

d po

ints

[m]

time [sec]


of the staged algorithm is to have a strong coupling and then, enhanced stability, regardless

the time accuracy, i.e., second or higher order temporal schemes can be achieved with a

non-staged weak algorithm, while a strong coupled staged algorithm not necessarily have

high order accuracy. In Figure 5.15 the error obtained after the simulation of a certain

fixed amount of time t0 and increasing time refinement is shown. The exact solution is

estimated through a Richardson extrapolation with the two more refined simulations for


-0.008

-0.006

-0.004

-0.002

0

0.002

0.004

0.006

0.008

0 0.5 1 1.5 2 2.5 3 3.5 4

plat

e de

flect

ion

in d

istri

bute

d po

ints

[m]

time [sec]


-0.015

-0.01

-0.005

0

0.005

0.01

0.015

0 0.5 1 1.5 2 2.5 3 3.5 4

plat

e de

flect

ion

in d

istri

bute

d po

ints

[m]

time [sec]


the more accurate scheme (α = 0.5). The error at t0 is evaluated for a certain number

of different ∆t values. It can be seen that for α = 0.6 (the parameter of the trapezoidal

rule in both the structure and the fluid) the convergence curve has initially second order

slope, but for ∆t small enough this order is lost. This is typical when the error has mixed

first and second order terms, for instance E ∝ c∆t + c′∆t2. For high ∆t the second order

term rules and a second order convergence is perceived. However, as the time step is



-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

plat

e de

flect

ion

in d

istri

bute

d po

ints

[m]

time [sec]


diminished, at a certain point the first term rules and the slope switches to first order.

For α = 0.5 the curve is O(∆t2) on the whole studied range of ∆t. When using α = 0.5

with no predictor (equation (5.2)), a second order convergence is still obtained, but the

convergence slightly slows down in the very last segment. Note that in this case the

scheme is second order, except for the fluid-structure interaction. As the interaction is

weak, perhaps the first order convergence can be observed for smaller time steps.

Convergence of Stage Loop

The convergence of the stage loops has been assessed by running the test case over 20

time steps and performing 10 stages at each time step. In Figure 5.16 the convergence of

the fluid state (i.e., ‖un+1,i+1 − un+1,i‖) for all the time steps (convergence curves of the

time steps are concatenated) is shown. Analogously, the convergence of the structure is

plotted in Figure 5.17. The average convergence is one order of magnitude per stage or

higher, suggesting that for such a situation a small nstage (2 or 3) would be enough.

Stability of the Staged Algorithm

The following numerical test allows to evaluate the stability of the staged algorithm

presented in section §5.2. The example is similar to the aeroelastic test case presented in

section §5.3 with some different parameters for the plate in order to produce larger plate

deformations and stronger instabilities. Some parameters are similar to those presented in


Figure 5.13: Fluid and structure fields at M=3.2

equation (5.11). Here, only the parameters that have been modified and the dimensionless

parameters that may be obtained with them are included.

U∞ = M∞ = 2

t = 0.06

ν = 0.33

m = 0.002

E = 39.6

D = 8.010−4

NT =D

mL2U∞2 = 0.025

NM =ρ∞L

m= 1000.0

(5.33)

Therefore according to the section §5.3.2

NM

NT M∞2 = 10000 > 300 (5.34)



Figure 5.14: Fluid and structure fields at M=3.2

implies that the flow is inside the flutter region.

The following figures shows results obtained with both strategies, the staged and non-

staged algorithms. In order to compare both results in terms of the computational cost,

for the non-staged algorithmic, a time step reduced by the number of stages used for the

staged algorithm. So, the cost is similar for each one of them.

The vertical displacements on some points of the plate for the staged algorithm using

nstage = 5 after approximately 1300 time steps are shown in Figure 5.18. The results for

the non-staged algorithm diverge at 40 time steps and are shown in Figure 5.19. Even

though the staged algorithmic shows an extra stability compared with the non-staged one,

the conclusions about this numerical experiment are not obvious because the flow regime

is in a flutter condition. Further work needs to be done towards the understanding about

how the staged algorithmic improves the stability of the whole coupled problem.

5.4. Stability of the Weak/Strong Staged Coupling Outside the FlutterRegion 151

10−3 10−2 10−110−9

10−8

10−7

10−6

10−5

10−4

Dt

|| U

− U

est ||

Time accuracy for Fluid Structure Interaction problems

alpha=05 (final slope = 2)

alpha=05 − no predictor (final slope = 1.75)

alpha=06

prop. to Dt^2

prop. to Dt

Figure 5.15: Experimentally determined order of convergence with ∆t for the uncoupled

algorithm with fourth order predictor

5.4 Stability of the Weak/Strong Staged Coupling

Outside the Flutter Region

This section describes the stability properties of the weak coupling when the free stream

conditions and plate parameters are such that the oscillations due to flutter do not appear.

The flutter region can be characterized by the dimensionless number FL = NM/NTM2∞

(see the previous section). Therefore, in order to study the stability behavior of the

weak/strong algorithms (nstage = 1 and nstage > 1, respectively) intrinsic to the physical

coupling, the region FL 200 is studied and particularly FL = 12 is chosen. This

non-dimensional number does not depend on the plate density m then a sweep can be

done on this variable in a wide range avoiding triggering flutter. The idea is to find a

value for this variable in which the weak coupling algorithms becomes unstable while the

strong coupling and the uncoupled problems remain stable (i.e., fluid pressures are not

transferred to the structure). To accomplish convergence in the non-linear loop for the

fluid problem 2 Newton iterations (typically for this problem, 3-5 orders of magnitude are

lowered in the residual in 2 Newton loops) are adopted. The mesh is the same as in the

previous simulations and the Courant number is CFL = 0.5. The plate mass varies in



0 20 40 60 80 100 120 140 160 180 20010−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

one time step

one stage

Nstep * Nstage

|| D

elta

( U

) ||

Fluid convergence (20 steps , 10 stages each)

Figure 5.16: Convergence of fluid state in stage loop

the range m = 35 to m = 0.0001. It was found that the value of m where instabilities

appear relies in the neighbor of m = 0.65. Above this value the weakly coupling scheme is

stable, below this, the weakly coupling algorithm is unstable meanwhile each sub-problem

(fluid and structure) are stable when considering no coupling. Instabilities disappear when

the strong coupling scheme with nstage = 2 is considered. Moreover, even when a lower

value of m is used, i.e., m = 0.0135 and CFL = 1, only 2 stages are enough to achieve

convergence of the strong coupling algorithm. Obviously, at this point each detached

problem is still stable (see Figures 5.20 and 5.21). In the case of m = 0.0001 a smaller

CFL number is needed (e.g., CFL = 0.5) in order to have at least 15 time steps in one

period (recall that the plate frequency depends on the plate density). Even though the

convergence of the coupled problem is not affected when considering a very small plate

density and a strong partitioned scheme, it is necessary to refine the fluid mesh in the

normal direction to the plate in order to have a better definition of the problem in that

direction, i.e., a wave train is propagated in the fluid with the plate frequency (recall that

the plate frequency grows as the plate density decrease, ωstr = 2π/Tstr = [mL4/D]−1/2).

In the coupled simulation the interaction may produce the amplification of the pressures


0 20 40 60 80 100 120 140 160 180 20010−13

10−12

10−11

10−10

10−9

10−8

10−7

10−6

10−5

one time step

one stage

10−4

10−3

Nstep * Nstage

|| D

elta

( U

) ||

Structure convergence (20 steps , 10 stages each)

Figure 5.17: Convergence of structure state in stage loop

0 200 400 600 800 1000 1200 1400−8

−6

−4

−2

0

2

4

6

8

10x 10−3

Time [ # of time steps ]

uy

Vertical displacement of the plate vs time

Figure 5.18: Stability analysis - Staged algorithm with nstage = 5. Vertical displacements

of the plate vs time



0 5 10 15 20 25 30 35 40 45 50−12

−10

−8

−6

−4

−2

0

2

4x 10−4

Time [ # of time steps ]

uy

Vertical displacement of the plate vs time

Figure 5.19: Stability analysis - Non-staged algorithm. Vertical displacements of the plate

vs time

Figure 5.20: Unstable weak coupling for m = 0.0135 and CFL = 0.5

and plate displacements. This fact can be shown in Figures 5.22 and 5.23. Figure 5.22

shows the plate deflection when m = 0.00135 and mesh size of the fluid mesh near the

plate is hy = 0.018808 m and hx = 0.035625 m. Figure 5.23 shows the same results when


Figure 5.21: Stable staged coupling for m = 0.0135, CFL = 1 and nstage = 2

considering an homogeneous refinement (i.e., hy = 0.00932 m and hx = 0.01781 m). The

coarse mesh exhibit a spurious vibration mode similar to a flutter mode that is corrected

in the finer mesh.

Figure 5.22: Strong partitioned scheme in a coarse mesh



Figure 5.23: Strong partitioned scheme in a fine mesh

5.5 Conclusions

Stability is enhanced through a strong coupling scheme and it shows to be necessary

for situations where the structural response is fast. Partitioned schemes using staged

strong coupling show to be very efficient avoiding the tedious and problem dependent

task of building a monolithic coupling formulation. For the benchmark considered in this

work two stages were enough for having the same behavior of the monolithic scheme.

Furthermore, the staged strategy provides a smooth blending between weak coupling and

strong coupling, i.e., moderately coupled problems that can not be treated with the pure

weak coupling approach, can be solved with the staged algorithm using few stages per

time step.

Time-accuracy is in agreement with the accuracy of the underlying fluid and structure

solvers, if an accurate enough predictor is used. Second order accuracy can be obtained

with second order fluid and structure solvers, and one stage coupling with a high order

predictor, as already reported in Reference [PF01].

The elastic flat plate problem is geometrically simple, but gives physical insight in the

flutter phenomena, and was very useful in testing the proposed algorithm in a wide range

of non-dimensional parameters.

Part III

Final Conclusions

157

Chapter 6

Overview and Final Remarks

The main goal of this thesis was the proposition, description and test of a new precondi-

tioner for Schur complement-based Domain Decomposition Methods that performs better

than classical ones in a wide range of flow problems and specially when advection terms

are high. The performance of the new preconditioner has been tested and compared with

other preconditioner/solvers that are extensively used in Computational Fluid Dynamics.

Details of the implementation and its testing were given.

Another goal of this thesis was the proposition of a set of local linear/non-linear dy-

namic boundary conditions for CFD application. It has been tested an absorbent bound-

ary conditions on fictitious boundaries where (energy) wave reflection is accomplished

when imposing conditions in a classical manner.

Also, in an ulterior part of this thesis have been treated fluid/structure interaction

problems in the supersonic regime of a compressible fluid that flows over/around elastic

bodies. In that section, a new partitioned algorithm for the time integration of coupled

governing equations has been proposed and tested. The proposed ‘staged’ algorithm can

be used to achieved either lose and strong coupling depending on the problem at hand.

During this thesis it has been published the following articles in referred journals

1. Rodrigo R. Paz and Mario A. Storti. ‘An Interface Strip Preconditioner for

Domain Decomposition Methods: Application to Hydrology’. International Journal

for Numerical Methods in Engineering, 62(13):1873-1894, 2005.

2. Rodrigo R. Paz, Norberto M. Nigro and Mario A. Storti. ‘On the ef-

ficiency and quality of numerical solutions in CFD problems using the Interface

Strip Preconditioner for domain decomposition methods’. International Journal for

Numerical Methods in Fluids, 52(1):89-118, (2006).

3. Storti, M. ; Dalcın, L. ; Paz, R. ; Yommi, A. ; Sonzogni, V. ; Nigro, N. ‘A

159

160 Chapter 6. Overview and Final Remarks

Preconditioner for Schur Complement Matrix’. Advances in engineering Software,

(2006). In press.

4. Lisandro Dalcın, Rodrigo R. Paz and Mario A. Storti. ‘MPI for Python’.

Journal of Parallel and Distributed Computing, 65(9):1108-1115, 2005.

5. Storti, M. ; Dalcın, L. ; Paz, R. ; Yommi, A. ; Sonzogni, V. ; Nigro,

N. ‘Interface Strip Preconditioner for Domain Decomposition Methods’. Journal of

Computational Methods in Sciences and Engineering, (2003). In press.

6. Mario A. Storti, Norberto M. Nigro and Rodrigo R. Paz. ‘Strong coupling

strategy for fluid-structure interaction problems in supersonic regime via fixed point

iteration’. Journal of Sound and Vibration. Submitted.

7. Mario A. Storti, Norberto M. Nigro, Rodrigo R. Paz and Lisandro Dalcın.

‘Dynamics boundary conditions in CFD’. Journal of Computational Physics. Sub-

mitted.

8. S. A. Vera; M. Febbo; C. G. Mendez; R. R. Paz. ‘Vibrations of a plate

with an attached two degree of freedom system’. Journal of Sound and Vibration,

285(1-2):457-466, 2004.

Besides, several articles have been submitted and presented in international confer-

ences on numerical methods for computational mechanics.

Part IV

Appendix

161

Appendix A

Functional Spaces

A.1 Some Used Sobolev Spaces

Let us denote by L2(Ω) the space of functions that are square integrable over its support

Ω. Let the standard inner product be a norm for this space

(u, v) =

∫Ω

uv, with the norm ||v0|| = (u, v)1/2. (A.1)

Figure A.1: Sergei Lvovich Sobolev (1908–1989)

Assume that k is a non-negative integer, then the Sobolev space Hk(Ω) is defined

using multi-index notation

Hk(Ω) =

u ∈ L2(Ω)| ∂|α|u

∂xα11 ∂x

α22 ...∂x

αndnd

∈ L2(Ω) ∀ |α| ≤ k

. (A.2)

given a n-tuple α = (α1, α2, ..., αnd) ∈ Rnd and non-negative integer |α| = α1+α2+...+αnd

.

163

164 Chapter A. Functional Spaces

Then, Hk(Ω) consists of square integrable functions all of whose derivatives of order

up to k are also square integrable. Hk(Ω) is equipped with the norm

||u||k =

k∑s=0

∑|α|=s

∣∣∣∣∣∣∣∣ ∂|α|u

∂xα11 ∂x

α22 ...∂x

αndnd

∣∣∣∣∣∣∣∣20

1/2

. (A.3)

For k = 0 we note in fact that the Sobolev space H0(Ω) is L2(Ω). In the case of k = 1

the Sobolev space is defined by

H1(Ω) =

u ∈ L2(Ω)| ∂v

∂xi

∈ L2(Ω), i = 1, 2, ..., nd

. (A.4)

The inner product associated with this space is

(u, v)1 =

∫Ω

(uv +

nd∑i=1

∂u

∂xi

∂v

∂xi

)dΩ, (A.5)

and the induced norm is

(u, u)1 = ||u||1 =√

(u, u)1. (A.6)

Frequently, we use the subspace

H10(Ω) =

v ∈ H1(Ω)|v = 0 on Γ

, (A.7)

whose elements have square integrable first derivatives over Ω and vanish on its boundary

Γ. Its inner product and norm coincide with those of H1(Ω).

Remark A.1.1. Note that the Sobolev spaces L2(Ω),H1(Ω) and H10(Ω) are Hilbert spaces

with corresponding inner products. Recall that a Hilbert space is a linear space with an

inner product in which all Cauchy sequences are convergent sequences.

Remark A.1.2. H10(Ω) is usually defined as the closure of C∞

0 (Ω) (the set of all contin-

uous functions with continuous derivatives whose support is a bounded subset of Ω) with

respect to the norm of ||.||1. That is, H10(Ω) is the set of all functions u in H1(Ω) such

that u is the limit in H1(Ω) of a sequence us∞s=0 whose us are in C∞0 (Ω).

A.2 Extension to Vector-Valued Functions

The finite element formulation deals not only with scalar-valued functions (such as pres-

sure and temperature) but also with vector-valued functions (e.g., velocity fields). For

A.2. Extension to Vector-Valued Functions 165

Figure A.2: David Hilbert (1862–1943)

Figure A.3: Augustin Louis Cauchy (1789–1857)

vector-valued functions with m components, that is uuu,vvv : Ω → Rm is essentially the same

as for scalar functions.

Consider a domain Ω ⊂ Rnd , nd ≥ 1 and denote by HHHk(Ω) (or [Hk(Ω)]m) the space of

vector functions with m components

uuu = (u1, u2, ..., um), (A.8)

for which each component ui ∈ Hk(Ω), 1 ≤ i ≤ m. The space HHHk(Ω) is equipped with an

inner product inducing the norm

||uuu||k =

√√√√ m∑i=1

||ui||2k. (A.9)

For the particular case of functions belonging to LLL2(Ω) = HHH0(Ω) the inner product is

166 Chapter A. Functional Spaces

given by

(uuu,vvv) =

∫Ω

uuu · vvvdΩ, (A.10)

and there should be no ambiguity in using the same notation to represent the inner

product of both scalar and vector-valued functions.

Appendix B

Resumen extendido en castellano

Tecnicas de Descomposicion de Dominio yProgramacion Distribuida en Mecanica de

Fluidos Computacional

B.1 El Metodo de Descomposicion de Dominios en

Mecanica de Fluidos Computacional

La diversidad de escalas de tiempo y de espacio presentes en problemas relacionados con

la mecanica de fluidos y su interaccion con cuerpos solidos (e.g., problemas de hidrologıa

superficial y subterranea acoplados o no, flujo de aire alrededor de cuerpos o vehıculos,

etc.) requiere un alto grado de refinamiento en las mallas utilizadas en el metodo de

elementos finitos y, por lo tanto, demanda grandes recursos computacionales.

La solucion de problemas en ‘gran escala’ en mecanica computacional tiene un desafıo

particular y es el de utilizar eficientemente los recursos disponibles [LTV97, SP96]. Si no se

utilizan adecuadas tecnicas numericas para reducir, optimizar y/o simplificar el problema,

es menester contar con grandes recursos computacionales para tratar el problema. Por otro

lado, el auge de computadoras cada vez mas rapidas y con mayor capacidad de calculo

167

168 Chapter B. Resumen extendido en castellano

hace que los problemas que se quieren resolver sean cada vez mas grandes y complejos

(i.e., mayores y mas variadas escalas, acople de distintos campos, modelos que tengan

en cuenta otras variables y su evolucion e interaccion con las demas, etc.). Es ası, que

los modelos matematicos son cada vez mas complejos y sofisticados haciendo que las

simulaciones de los sistemas resultantes sean extensas y complicadas. La restriccion sobre

los recursos computacionales disponibles esta siempre presente y por eso la urgencia en

el desarrollo y verificacion de tecnicas de solucion capaces de explotar eficientemente el

potencial de las modernas computadoras y la posibilidad de obtener soluciones de buena

calidad en un tiempo aceptable de simulacion (tiempo de CPU). La presente tesis nace

en esta necesidad.

Durante varias decadas se han desarrollado y probado tecnicas concernientes a solu-

ciones de problemas lineales que son resultado de la aplicacion del metodo de elementos

finitos (MEF) a ecuaciones diferenciales en derivadas parciales (EDDP) que tratan de

describir un conjunto de eventos de la fısica (e.g., mecanica de cuerpos solidos, dinamica

estructural, dinamica de fluidos, etc.). Hasta no hace mucho tiempo, la solucion directa

de estos sistemas era preferida a la solucion iterativa debido a su mayor robustez y al

caracter predictivo de su comportamiento. Sin embargo, la gran cantidad de tecnicas ite-

rativas que han sido desarrolladas, conjuntamente con la necesidad de resolver sistemas

de ecuaciones cada vez mas grandes en diferentes arquitecturas, han dado como resultado

una inclinacion al uso de este tipo de tecnicas y al desarrollo de nuevas.

Esta tendencia se viene dando desde 1970 cuando dos importantes desarrollos mar-

caron un punto de inflexion en la solucion de grandes sistemas de ecuaciones. Uno fue

la explotacion de la ‘baja densidad’ (por sparsity, matrices ralas o matrices con tasa de

llenado baja) de los sistemas que resultan de la aplicacion del MEF (como ası tambien

del metodo de diferencias finitas, MDF) a las EDDP. El otro fue el desarrollo de meto-

dos tales como los de Krylov (o metodos tipo gradientes conjugados). Gradualmente los

metodos iterativos (precondicionamiento e iteracion en el espacio de Krylov) comenzaron

a aproximarse en calidad a las soluciones provistas por los metodos directos. Particular-

mente, mucho se ha escrito sobre el metodo de gradientes conjugados precondicionado

para sistemas lineales simetricos que resultan de operadores simetricos (e.g., elasticidad

lineal y no lineal, flujo potencial, etc.).

Hoy, los grandes sistemas de ecuaciones obtenidos de las EDDP no lineales mediante

el MEF para problemas transitorios en dos y tres dimensiones, donde pueden haber va-

rias incognitas por nodo, son resueltos con metodos iterativos en computadoras de alta

performance (arquitecturas paralelas o vectoriales) debido a que requieren mucha me-

nor comunicacion entre los procesadores que la necesaria en metodos directos donde la

B.2. Ecuaciones de Gobierno 169

solucion de cada una de las incognitas esta acoplada con las demas.

El metodo de subestructuracion (o metodo de descomposicion de dominios e iteracion

sobre la matriz complemento de Schur para dominios no solapados) conduce a sistemas

reducidos mejor condicionados para la solucion mediante metodos de iteracion en el espa-

cio de Krylov. El numero de condicion de estos problemas se ve disminuido en un factor

1/h (∝ 1/h vs ∝ 1/h2 para el sistema global, siendo h la dimension caracterıstica de la

malla) y el costo computacional por iteracion no se ve encarecido debido a que las matrices

correspondientes a los grados de libertad correspondientes a los sub-dominios ya han sido

factorizadas. La eficiencia de estos metodos puede ser mejorada mediante el uso de pre-

condicionadores [Man93, BPS86, Cro02]. Diferentes tecnicas de precondicionamiento han

sido propuestas y la reduccion del numero de condicion de las matrices ha sido demostra-

da en el marco de ecuaciones diferenciales lineales elıpticas (e.g., precondicionadores del

tipo wire basket, Neumann-Neumann y sus variantes para los problemas de elasticidad y

flujo de Stokes).

En este trabajo se buscara solucionar eficientemente los sistemas de ecuaciones pro-

venientes de la discretizacion mediante el MEF o MDF de ecuaciones diferenciales no

lineales en derivadas parciales que representan modelos numericos de problemas reales

(como los descriptos arriba) considerados un desafıo para los metodos computacionales

actuales, usando tecnicas de programacion avanzada que integran paradigmas de progra-

macion, i.e., programacion orientada a objetos y calculo distribuido (calculo en paralelo).

Es objetivo tambien el desarrollo de un codigo de elementos finitos orientado a objetos

(que reduce drasticamente las dependencias de implementacion entre subsistemas y que

conduce al principio de reusabilidad de disenos de interfases) que resuelva problemas de

la mecanica de fluidos computacional en gran escala en forma distribuida mediante la

tecnica de paso de mensajes (MPI/PETSc [GLS94, BGCMS04]). Esta tecnica es amplia-

mente explotable en arquitecturas de computadoras paralelas, tales como la de ‘clusters’

Beowulf [SSBS99].

B.2 Ecuaciones de Gobierno

Centralizaremos el trabajo en la solucion de las ecuaciones de Navier-Stokes para flui-

dos compresibles e incompresibles estabilizadas con el metodo SUPG/PSPG propuesto

por Tezduyar et al.[TMRS92]. Las ecuaciones de Navier-Stokes presentan dos dificulta-

des importantes en su solucion por el metodo de elementos finitos. La primera dificultad

es que a medida que el numero de Reynolds del flujo aumenta, el caracter advectivo

de estas es cada vez mayor (con los problemas que esto trae aparejado en cuanto a es-


tabilidad de la solucion [BH82]). Ademas, en el caso incompresible, esta condicion re-

presenta una restriccion para las ecuaciones mas que la evolucion de las cantidades en

juego. Debido a esto, solo algunas combinaciones de los espacios de interpolacion para

las velocidades y las presiones pueden ser utilizadas (i.e., debe satisfacerse la condicion

de Ladyzhenskaya-Brezzi-Babuska [Bre74], ver apendice §B.5). En el trabajo de Tezdu-

yar et al. el termino advectivo es estabilizado con el bien conocido metodo SUPG (por

Streamline Upwind Petrov/Galerkin). Un termino similar, llamado PSPG (por Pressure

Stabilizing Petrov/Galerkin) es introducido para estabilizar la presion. Una vez que las

ecuaciones son discretizadas, el sistema de ecuaciones diferenciales ordinarias resultante

son discretizadas en el tiempo mediante la regla trapezoidal.

Se abordara tambien la integracion temporal de las ecuaciones de Navier-Stokes con

esquemas de tipo desagregado como lo es el de Paso Fraccionado (por Fractional Step)

propuesto inicialmente por Chorin [Cho73] y Temam [Tem69]. A diferencia del esquema

monolıtico anteriormente senalado, los esquemas desagregados desacoplan el campo de

presiones del campo de velocidades solucionando por separado ambos problemas. Este

desacople permite para una dada potencia de calculo aumentar en gran medida el numero

de grados de libertad del problema (y por consiguiente refinar la malla o agrandar el

dominio de calculo si el problema lo requiriese). Un gran desventaja de este metodo es

que por razones de estabilidad (que seran contempladas en el contexto de esta tesis) el

paso de integracion esta acotado y es necesario un mayor tiempo de simulacion para

obtener soluciones estacionarias/periodicas.

B.2.1 Propiedades Continuas de los Fluidos

La idea de tratar a un medio fluido como un medio continuo es natural y ‘familiar’. Sin

embargo es necesario rever la hipotesis del continuo para evitar confusion cuando se habla

de ‘partıculas de fluido’ y ‘elementos materiales infinitesimales’. Las longitudes de escala y

tiempo del movimiento molecular son extremadamente pequenas en relacion a las escalas

que el hombre maneja habitualmente. Si consideramos al aire en condiciones normales

atmosfericas por ejemplo, el espacio medio entre moleculas es 3 · 10−9 m, el camino medio

entre partıculas, λp, es 6 · 10−8 m y el tiempo medio entre sucesivas colisiones de una

molecula es 10−10 s. En comparacion, la menor escala geometrica ` que se pueda considerar

en el flujo de un fluido es a menudo menor que 0,1 mm = 10−4 m, que para velocidades de

hasta 100 m/s supone una escala de tiempo mayor a 10−6 s. Por lo tanto, inclusive para

este ejemplo donde las escalas de tiempo y longitud del flujo son pequenas, estas exceden

las escalas moleculares en varios ordenes de magnitud.


La separacion de las escalas de longitud esta cuantificada por el numero adimensional

de Knudsen Kn = λp/`. En el ejemplo anterior, Kn es menor que 10−3, y en general para

la aproximacion del continuo es necesario un numero de Knudsen Kn 1.

Para un Kn muy pequeno, existen escalas de longitud intermedias `∗, tal que `∗ es

grande comparado con las escalas moleculares pero pequeno comparado con las escalas

del flujo (i.e., λp `∗ `).

Las propiedades continuas de los fluidos pueden ser tomadas como las propiedades

moleculares promediadas en un volumen de tamano V = (`∗)3. Sea Vx una region esferica

de volumen V centrado en el punto x. Entonces en el tiempo t la densidad del fluido

ρ(x, t) es la masa de las moleculas contenidas en Vx dividida por el volumen V .

De igual manera la velocidad del fluido u(x, t) es la velocidad promedio de las molecu-

las dentro de Vx. Debido a la separacion de escalas la dependencia de las propiedades

continuas con la eleccion de `∗ es despreciable.

Es importante apreciar que cuando usamos la hipotesis del continuo para obtener

campos continuos, como ρ(x, t) y u(x, t), podemos dejar de lado toda nocion de naturaleza

discreta (molecular) en las propiedades del fluido; todas las escalas moleculares dejan de

ser relevantes. Tambien podemos considerar diferencias en las propiedades a lo largo de

distancias menores a las escalas moleculares: por lo tanto se pueden definir gradientes de

esas propiedades bajo la hipotesis del continuo,

∂ρ

∂x1

≡ lımh→0

(1

h[ρ(x1 + h, x2, x3, t)− ρ(x1, x2, x2)]

). (B.1)

B.2.2 Campos Lagrangianos y Eulerianos

Los campos continuos densidad y velocidad, ρ(,t) y u(x, t) son campos ‘Eulerianos’ en el

sentido de que son funcion de la posicion x en un marco inercial. El punto de partida para

la descripcion ‘Lagrangiana’ es la definicion de la ‘partıcula fluida’ que es un concepto o

propiedad continua. Por definicion, una partıcula fluida es un punto que se mueve con la

velocidad local del fluido: X+(y,Y) denota la posicion en el tiempo t de una partıcula

fluida que esta ubicada en Y con respecto a un sistema de referencia fijo en el tiempo t0,

ver Figura B.1.

Matematicamente, la posicion de la partıcula fluida X+(y,Y) esta definida por dos

ecuaciones. Primero, la posicion en tiempo de referencia t0 es definida como

X+(y,Y) = Y. (B.2)

Segundo, la ecuacion∂

∂tX+(y,Y) = u(X+(t,Y), t), (B.3)


t1t0

Y

X+(t1,Y)

t

Figura B.1: Trayectoria de una partıcula

expresa el hecho que la partıcula de fluido se mueve con la velocidad local del fluido. Dado

el campo de velocidad Euleriano u(x, t), entonces, para cualquier Y, la ecuacion (B.2)

puede ser integrada hacia adelante y hacia atras en el tiempo para obtener X+(y,Y) para

todo t.

Los campos Lagrangianos como la velocidad y la densidad pueden ser definidos en

contrapartida por la forma Euleriana como,

ρ+(y,Y) ≡ ρ(X+(t,Y), t), (B.4)

u+(y,Y) ≡ u(X+(t,Y), t). (B.5)

Notar que los campos Lagrangianos ρ+ y u+ son escritos en funcion de la posicion

Y en el tiempo de referencia t0. Por lo tanto Y es llamada coordenada Lagrangiana o

coordenada material.

Para Y fijo, X+(y,Y) define una trayectoria (en el espacio (x, t)) que es el camino

que recorre la partıcula y de manera similar ρ+(x, t) es la densidad de la partıcula fluida.

La derivada parcial ∂ρ+(t,Y)/∂t es la razon de cambio de la velocidad en el punto Y

(fijo), por lo tanto, siguiendo a la partıcula fluida, de la ecuacion (B.4) se puede escribir


(se asume el uso de notacion indicial o de Einstein)

∂

∂tρ+(t,Y) =

∂

∂tρ(X+(t,Y), t)

=

(∂

∂tρ(x, t)

)x=X+(t,Y)

+∂

∂tX+

i (y,Y)

(∂

∂tρ(x, t)

)x=X+(t,Y)

=

(∂

∂tρ(x, t) + ui(x, t)

∂

∂xi

ρ(x, t)

)x=X+(t,Y)

=

(D

Dtρ(x, t)

)x=X+(t,Y)

,

(B.6)

donde la derivada material o substancial esta definida por

D

Dt≡ ∂

∂t+ ui

∂

∂xi

=∂

∂t+ u · ∇. (B.7)

Por lo tanto la razon de cambio de la densidad siguiendo a la partıcula esta dada por la

derivada parcial del campo Lagrangiano (i.e., ∂ρ+/∂t) y la derivada substancial del campo

Euleriano (i.e., Dρ/Dt). Tambien, para Y fijo, u+(t,Y) es la velocidad de la partıcula

fluida, entonces

∂

∂tu+(t,Y) =

(D

Dtu(x, t)

)x=X+(t,Y)

(B.8)

es la razon de cambio de la velocidad de la partıcula fluida o aceleracion de la partıcula

fluida.

Una partıcula fluida es tambien llamada un punto material y es definido por el punto

Y en el tiempo t0 y por su movimiento con la velocidad local del fluido (ecuacion (B.3)).

Lıneas materiales, superficies y volumenes son definidos de la misma manera. Por ejemplo,

consideremos en el tiempo t0 una superficie simple cerrada S0 que rodea al volumen V0.

La correspondiente superficie material S(t) esta definida coincidiendo con S0 en el tiempo

t0, y por la propiedad de que todo punto en S(t) se mueve con la velocidad local del fluido.

Por lo tanto, S(t) esta compuesta por las partıculas fluidas X+(t,Y) que en t0 componen

a S0:

S(t) ≡ X+(t,Y) : Y ∈ S0. (B.9)

Debido a que la superficie material se mueve con el fluido, la velocidad relativa entre

la superficie y el fluido es cero. En consecuencia una partıcula fluida no puede atravesar

una superficie material ni puede haber flujo de masa a traves de ella.


B.2.3 La Ecuacion de Continuidad

La ecuacion de conservacion de masa es un postulado general de naturaleza cinematica,

es decir, independiente de la naturaleza del fluido o de las fuerzas actuando sobre el y

expresa el hecho de que la materia no se puede crear ni destruir de un dado sistema. Por

lo tanto no existen flujos difusivos en el transporte de masa, lo que quiere decir que la

masa solo se transporta por medio del mecanismo de conveccion (o adveccion). Vamos a

suponer en esta tesis que en el flujo de un fluido no se producen reacciones quımicas.

Entonces la ecuacion de balance de masa en forma integral es

∂

∂t

∫V(t)

ρ+

∫S(t)

ρui(dS)i = 0, (B.10)

que en su forma diferencial es

∂ρ

∂t+

∂

∂xi

(ρui) = 0, ∀ (x, t) en V × (t0,∞), (B.11)

cuya forma intrınseca se escribe

∂ρ

∂t+∇ · (ρu) = 0, ∀ (x, t) en V × (t0,∞), (B.12)

o tambienDρ

Dt+ ρ∇ · u = 0 ∀ (x, t) en V × (t0,∞). (B.13)

Teniendo en cuenta el volumen especıfico del fluido V(x, t) = 1/ρ(x, t) se puede escribir

D ln V

Dt= ∇ · u, ∀ (x, t) en V × (t0,∞). (B.14)

El termino izquierdo es la razon de cambio logarıtmica del volumen especıfico mientras que

la dilatacion ∇·u es la razon de cambio logarıtmica de un volumen material infinitesimal.

Por lo tanto la ecuacion de continuidad puede ser vista como una condicion de consistencia

entre el cambio del volumen especıfico siguiendo a la partıcula fluida y el cambio en el

volumen de un elemento material infinitesimal.

Si el fluido es incompresible, o si el flujo es tal que ρ no es una funcion de la posicion x

y del tiempo t, la ecuacion de evolucion (B.11) se transforma en una condicion cinematica

en la cual el campo de velocidades es solenoidal,

∇ · u = 0. (B.15)

Ademas, para este tipo de flujo (o fluido) se puede verificar facilmente que

J(t,Y) ≡ det

(∂X+

i (t,Y)

∂Yj

)= 1, (B.16)

y que el lado izquierdo de la ecuacion (B.14) se anula.


B.2.4 La Ecuacion de Cantidad de Movimiento

La ecuacion de momento o cantidad de movimiento basada en la segunda ley de Newton

relaciona la aceleracion de la partıcula fluida Du/Dt con las fuerzas de superficie (traccio-

nes) y las fuerzas de volumen que el fluido soporta. En general, las fuerzas de superficie

que son de origen molecular son descriptas por una ecuacion constitutiva que relaciona el

tensor de tensiones τij(x, t) con las fuerzas de presion y las velocidades de deformacion

que se dan en el flujo. La forma mas general para un fluido newtoniano es

τij = −Pδij + µ

(∂ui

∂xj

+∂uj

∂xi

− 2

3

∂uα

∂xα

δij

), (B.17)

donde P es la presion, µ es el coeficiente (constante) de viscosidad dinamica y δij es el

tensor Delta de Kronecker.

Tambien hay que considerar dentro de las fuerzas de superficie a las fuerzas externas

fext. Las fuerzas de volumen que consideramos son las fuerzas de gravedad. Siendo Ψ

el potencial gravitatorio (i.e., la energıa potencial por unidad de masa asociada a la

gravedad), la fuerza de volumen por unidad de masa es,

g = −∇Ψ, (B.18)

que para un campo gravitacional constante el potencial Ψ = gz, donde g es la aceleracion

de la gravedad y z es la coordenada vertical. Estas fuerzas hacen que el fluido se acelere

de acuerdo a la ecuacion integral de conservacion de la cantidad de movimiento,

∂

∂t

∫V(t)

ρuidV +

∫S(t)

ρuiuj(dS)j =

∫V(t)

ρ(fext)i +

∫S(t)

τij(dS)j +

∫V(t)

ρ∂Ψ

∂xi

dV. (B.19)

Aplicando el teorema de Gauß∫V(t)

(∂

∂tρui +

∂

∂xj

ρuiuj − ρ(fext)i −∂τij∂xj

− ρ∂Ψ

∂xi

)dV = 0, (B.20)

por lo tanto la forma diferencial es

∂

∂tρui +

∂

∂xj

ρuiuj = ρ(fext)i +∂τij∂xj

− ρ∂Ψ

∂xi

, ∀ (x, t) en V × (t0,∞). (B.21)

o tambien

D

Dtρui = ρ(fext)i +

∂τij∂xj

− ρ∂Ψ

∂xi

, ∀ (x, t) en V × (t0,∞). (B.22)

Si escribimos ecuacion (B.21) en forma intrınseca queda

∂

∂tρu +∇ · (u⊗ u) = −∇P. (B.23)


sistema de referenciainercial

sistema de referenciano inercial

y

x

z

|ω|2R

O

x′

y′

z′

r

R

u

a

ω

Figura B.2: Sistema de referencia no inercial

B.2.5 Las Ecuaciones de Navier-Stokes en Sistemas de Referen-

cia No Inerciales

En muchos problemas de la mecanica de fluidos es necesario adoptar un sistema de re-

ferencia no inercial. Tal es el caso de flujos oceanicos, en rıos, flujo en turbomaquinas,

helices donde sistemas de referencia que rotan pueden usarse.

Asumimos que el sistema esta rotando con velocidad angular ~ω(t) alineada con el eje

z (ver Figura B.2). La variable w es el campo velocidades relativo al sistema que rota y

v = ~w(t) × ~r la velocidad del punto P debido a la rotacion (i.e., es normal a ~w(t) y a

~r). Debido a que ~u no contribuye al balance de masa, la ecuacion de continuidad queda

invariante y puede ser escrita en el sistema rotante como,

∇ · ~w = 0. (B.24)

En relacion a la ecuacion de conservacion de cantidad de movimiento, dos observadores

ubicados en sendos sistemas de referencia (inercial y no inercial) verıan distintos campos

de fuerzas debido a que el termino inercial Du/Dt no es un invariante cuando se pasa de

un sistema a otro. Es necesario adicionar tres fuerzas debido a la rotacion del sistema de

referencia: La fuerza de Coriolis, la fuerza centrıfuga y la fuerza debida a la variacion de

~ω(t). La fuerza de Coriolis por unidad de masa es

fC = −2(~ω × ~w), (B.25)


la fuerza centrıfuga por unidad de masa es

fc = −~ω × (~ω × ~r) = |~ω|2 ~R, (B.26)

y la fuerza debido al cambio de ~ω(t) por unidad de masa

fr =d~ω(t)

dt= ~ω(t) = ~θ(t), (B.27)

La ecuacion de conservacion de momento queda ahora

∂uj

∂t+ uj

∂ui

∂xi

= −1

ρ

∂P

∂xj

+ ν∂2

∂xi∂xi

uj −∂p

∂xj

− ρ∂Ψ

∂xj

, ∀ (x, t) en V× (t0,∞). (B.28)

B.2.6 Las Ecuaciones de Navier-Stokes Incompresibles

Si consideramos el flujo de un fluido newtoniano incompresible (con propiedades constan-

tes), la ecuacion constitutiva se escribe de la forma

τij = −Pδij + µ

(∂ui

∂xj

+∂uj

∂xi

). (B.29)

Recordando que para fluidos incompresibles el campo de velocidades es solenoidal (i.e.,

∂ui/∂xi = 0), entonces la ecuacion constitutiva resulta de la suma de un termino isotropi-

co, −Pδij y de un termino deviatorico. La ecuacion de momento (B.22) se escribe

ρD

Dtuj = µ

∂2

∂xi∂xi

uj −∂p

∂xj

− ρ∂Ψ

∂xj

, ∀ (x, t) en V × (t0,∞). (B.30)

que es la ecuacion de Navier-Stokes (vectorial) para la conservacion de la cantidad de

movimiento. Si escribimos a la viscosidad cinematica como ν = µ/ρ, la ecuacion (B.30)

quedaD

Dtuj = ν

∂2

∂xi∂xi

uj −1

ρ

∂P

∂xj

− ∂Ψ

∂xj

, ∀ (x, t) en V × (t0,∞). (B.31)

Reescribiendo el lado izquierdo de la ecuacion (B.31), la forma Euleriana no conservativa

de la ecuacion de conservacion de la cantidad de movimiento queda

∂uj

∂t+ uj

∂ui

∂xi

= −1

ρ

∂P

∂xj

+ ν∂2

∂xi∂xi

uj −∂p

∂xj

− ρ∂Ψ

∂xj

, ∀ (x, t) en V× (t0,∞). (B.32)

Para que el problema quede bien planteado en el sentido de Hadamard ([Had02], ver

apendice §B.5) se deben definir condiciones para el vector velocidad en la frontera S(t)

del volumen V(t) e iniciales a las ecuaciones (B.31) y (B.32). Por ejemplo, en una pared

solida estacionaria cuyo vector normal unitario es n, las condiciones de borde que debe

satisfacer el campo de velocidades es la condicion de impermeabilidad

n · u = 0, ∀ (x, t) en S× (t0, tn), (B.33)


y la condicion de no deslizamiento

u− n(n · u) = 0, ∀ (x, t) en S× (t0, tn), (B.34)

(que de manera conjunta hacen u = 0 en la frontera). La condicion inicial es del tipo

u = u0, ∀ (x, t) en V × 0. (B.35)

Estas condiciones son del tipo mas simple que podemos plantear en este capıtulo, dejando

para posteriores capıtulos y secciones el tratamiento de condiciones de borde e iniciales

mas especiales, que se ajustan a distintos problemas y modelos (i.e., flujos turbulentos, pa-

redes moviles, condiciones de borde absorbentes, etc.). Tambien hay que agregar aquı que

se necesita la definicion de la presion P en un punto de V, debido a que esta variable debe

satisfacer una ecuacion de tipo elıptica como se vera mas adelante y la solucion para este

problema esta definido a diferencia de una constante.

En algunos casos puede ser interesante considerar el caso hipotetico de un fluido ideal

que no posee viscosidad (o es invıscido) para el cual el tensor de tensiones se define

isotropico (i.e., τij = −pδij). La ecuacion de conservacion de momento resultante es la

ecuacion de Euler incompresible

D

Dtuj = −1

ρ

∂p

∂xj

− ∂Ψ

∂xj

, ∀ (x, t) en V × (t0,∞). (B.36)

Debido a que estas ecuaciones no contienen derivadas segundas de la velocidad requieren

diferentes condiciones de borde que las ecuaciones de Navier-Stokes incompresibles. Por

ejemplo, en el caso una frontera solida estacionaria solo la condicion de impermeabilidad

puede verificarse, mientras que en general las velocidades tangenciales son distintas no

nulas.

El Rol de la Presion

El rol de la presion en las ecuaciones de Navier-Stokes incompresible (densidad constante)

requiere ser comentado. Primero hay que observar que las tensiones isotropicas y las fuer-

zas de volumen conservativas tienen el mismo efecto. Por lo tanto pueden ser agrupadas

dentro de un mismo termino (i.e., la presion modificada), p = P + ρΨ. Ası, las fuerzas de

volumen no tienen influencia sobre el campo de velocidades y sobre el campo de presiones

modificadas (en contraste con los flujos de fluido de densidad variable donde las fuerzas de

boyancia pueden resultar significantes). Por lo tanto, de aquı en adelante nos referiremos

simplemente a p como la ‘presion’.

Es costumbre tomar a la presion como una variable termodinamica que es funcion de

la densidad y de la temperatura a traves de una ecuacion de estado. Sin embargo, para

B.3. Formulacion de otros Modelos Matematicos a Tratar 179

fluidos con densidad constante no hay ninguna vinculacion con la densidad (ni con la

temperatura) y una vision distinta de esta variable es necesaria.

Si aplicamos el operador divergencia a las ecuaciones de Navier-Stokes, ecuacion (B.30),

sin asumir que el campo de velocidades es solenoidal pero denotando a ∆ como la dilata-

cion (o razon de dilatacion ∆ = ∇ · u), el resultado es(D

Dt− ν∇2

)∆ = R, (B.37)

siendo

R =1

ρ∇2p− ∂ui

∂xj

∂uj

∂xi

. (B.38)

Consideremos la solucion a la ecuacion (B.37) con condiciones iniciales y de contorno

∆ = 0. La solucion es ∆ = 0 si y solo si R es cero en todo V que a su vez implica (por la

ecuacion (B.38)) que p debe satisfacer la ecuacion de Poisson

∇2p = S ≡ −ρ∂ui

∂xj

∂uj

∂xi

. (B.39)

La verificacion de la ecuacion de Poisson es una condicion necesaria y suficiente para que

todo campo de velocidades solenoidal permanezca solenoidal.

En una pared solida estacionaria la ecuacion (B.30) se reduce a

∂p

∂n= µ

∂2un

∂n2, (B.40)

donde n es la coordenada en la direccion normal a la pared y un la proyeccion de la

velocidad sobre n. Esta ecuacion provee una condicion de borde del tipo Neumann para el

problema de Poisson (ecuacion (B.37)). Dada esta condicion y la ecuacion (B.37) queda

determinada la presion p a diferencia de una funcion constante ([Fol76]).

B.3 Formulacion de otros Modelos Matematicos a

Tratar

B.3.1 Problemas Hidrologicos

Se abordara tambien en esta tesis el desarrollo de un modulo de calculo en paralelo orien-

tado a objetos que sea capaz de modelar problemas (en una gran diversidad de escalas)

de hidrologıa superficial y subterranea acoplada∗ (proyecto PID-99/74 FLAGS de la

∗La importancia de la escala y la complejidad asociada de un modelo en hidrologıa fue abordada porDooge (1998), quien dijo que ‘en orden de predecir el comportamiento de una cuenca en forma confiable,


ANPCyT). Es necesario para esta parte del trabajo desarrollar/adaptadar herramientas

que no son estandar en mecanica de fluidos computacional y que resultan de gran impor-

tancia para el tratamiento de este tipo de problemas, como por ejemplo lo es la generacion

de mallas de calidad a partir de los modelos digitales de terrenos y la interpolacion de pro-

piedades fısicas a partir de datos medidos en campo que son necesarias en la modelacion.

Este desarrollo constituye una aporte original de la tesis.

Flujo Superficial

El flujo a superficie libre que escurre tanto sobre la superficie del terreno como concentrado

en canales abiertos, constituye la respuesta de dinamica mas rapida de una cuenca hıdrica

ante la solicitacion ocasionada por la precipitacion caıda en parte o en la totalidad de su

superficie.

Si L0 representa una longitud tıpica donde se observan variaciones apreciables en

la dinamica del escurrimiento, y h0 representa un espesor medio de la capa en dicha

distancia, la aproximacion de aguas poco profundas (u ondas largas) se basa en asumir

que h0/L0 1. Pueden derivarse las ecuaciones de Saint-Venant para el flujo en canales

abiertos a partir de las ecuaciones de Navier-Stokes integrando las variables en la direccion

vertical. Estas ecuaciones en la forma matricial conservativa resulta (la convencion de

Einstein para la suma es usada),

∂U

∂t+∂Fi(U)

∂xi

= Gi(U), i = 1, .., 3, sobre Ωst × (0, t], (B.41)

donde Ωst es el dominio de calculo (i.e., el rıo) y U = (h, hw, hv)T es el vector de estado.

Las funciones de flujo advectivo en la ecuacion (B.41) son

F1(U) = (hw, hw2 + gh2

2, hwv)T ,

F2(U) = (hv, hwv, hv2 + gh2

2)T ,

(B.42)

donde h es la altura de la superficie libre del canal con respecto al fondo del mismo,

u = (w, v)T es el vector velocidad y g es la aceleracion debida al campo gravitatorio. Si

Gs representa la recarga o las perdidas del rıo, el termino fuente es

G(U) = (Gs, gh(S0x−Sfx)+fchv+Cf$x|$|, gh(S0y−Sfy)−fchw+Cf$y|$|)T (B.43)

o bien debemos resolver modelos extremadamente complejos basados en las leyes fısicas de los procesos

involucrados, y que tengan en cuenta la variabilidad espacial de los diversos parametros, o bien debemos

resolver modelos realistas a escala de cuenca en los que el efecto global de esas propiedades variables en

el espacio este parametrizado de alguna manera’, enfatizando tambien la inexistencia de un principio desimilitud en hidrologıa que describa el comportamiento de una cuenca.

B.3. Formulacion de otros Modelos Matematicos a Tratar 181

donde S0 es la pendiente del fondo y Sf es la pendiente de friccion dada por

Sfx =1

Chhw|u|, Sfy =

1

Chhv|u|, Modelo de Chezy,

Sfx =n2

h4/3w|u|, Sfy =

n2

h4/3v|u|, Modelo de Manning,

(B.44)

donde Ch y n (la rugosidad de Manning) son constantes del modelo. Generalmente el

efecto de la fuerza de Coriolis, referida al factor de Coriolis fc, debe ser tenido en cuenta

cuando se estudian casos en grandes lagos, anchos rıos o estuarios. El factor de Coriolis

esta dado por fc = 2ω sinψ, donde ω es la velocidad de rotacion de la tierra y ψ es la

latitud del area en estudio. Las tensiones en la superficie libre en la ecuacion (B.43) son

expresadas como el producto entre un coeficiente de friccion y una forma cuadratica de

la velocidad de viento, $($x, $y), y

Cf = c$ρair

ρ, (B.45)

donde c$ es funcion de la velocidad del viento.

Flujo Subterraneo Saturado

La ecuacion de movimiento para el medio saturado parte de considerar la conservacion de

la masa. Si φ representa el potencial piezometrico en un punto del acuıfero, suma de la

energıa potencial gravitatoria mas la contribucion de la presion, la ecuacion de movimiento

del agua en el medio poroso se reduce a

∂

∂t(S(φ− η)φ) = ∇ · (K(φ− η)∇φ) +

∑Ga, sobre Ωaq × (0, t], (B.46)

donde Ωaq representa el dominio para el flujo subterraneo, η es una posicion de referencia

(datum), K representa el tensor de conductividades hidraulica y S el coeficiente de alma-

cenamiento especıfico que puede considerarse como la cantidad de agua almacenada que

se libera por unidad de volumen del acuıfero cuando el potencial disminuye una unidad.

Condiciones de Borde para Simular la Interaccion Rıo-Acuıfero/Termino de

Acople

El proceso de interaccion Rıo-Acuıfero ocurre entre un rıo (o canal, en general) y el

acuıfero adyacente. El termino de acople no esta incluido explıcitamente en la ecuacion

(B.46) pero se trata como una integral de flujo en el borde. En un punto nodal podemos

escribir el acople como

Gs = P/Rf (φ− hb − h), (B.47)


donde Gs representa la recarga o perdida del rıo al acuıfero adyacente, y Rf es el factor de

resistencia por unidad de longitud de arco del perımetro de la seccion. La correspondiente

recarga del acuıfero es

Ga = −Gs δΓs , (B.48)

donde Ga representa la curva del rıo (en el plano del acuıfero) y δΓs es una distribucion

delta de Dirac con una intensidad unitaria por unidad de longitud, es decir∫f(x) δΓs dΣ =

∫ L

0

f(x(s)) ds. (B.49)

Las ecuaciones (B.41) y (B.46) acopladas, con las condiciones de borde e iniciales apropia-

das, constituyen en sistema fuertemente acoplado no lineal de muy difıcil resolucion. El

acoplamiento se da a traves de las condiciones de borde que representan los mecanismos

de transferencia de masa entre cada subsistema. La solucion simultanea de las ecuaciones

no solo constituye un esfuerzo considerable de calculo sino que es computacionalmente

ineficiente puesto que ignora algunos rasgos fısicos esenciales de la interaccion entre los

subsistemas. El esquema de avance temporal adoptado constituye el algoritmo que mejor

refleja la competencia entre los mecanismos fısicos presentes en el sistema.

Por otra parte, la integracion vertical de la ecuacion de gobierno para el flujo sub-

terraneo tridimensional en cada una de las capas bajo la hipotesis de Dupuit lleva a una

ecuacion bidimensional promediada en la vertical, que es una de las aproximaciones que

se adoptaran en el trabajo.

B.4 Computacion de Alta Performance

B.4.1 Resolucion Numerica del Modelo de CFD/Hidrologıa Su-

perficial y Subterranea

El modelo se implementara en PETSc-FEM, un codigo de Elementos Finitos desarrollado

en el CIMEC de uso general y orientado a problemas multi-fısica (http://www.cimec.

org.ar/petscfem), escrito en C++ y basado en PETSc (‘Portable, Extensible Toolkit for

Scientific Computation’, http://www.mcs.anl.gov/petsc). PETSc es una librerıa de ruti-

nas orientadas a metodos numericos para la resolucion de Ecuaciones en Derivadas Par-

ciales (EDDP’s) y requiere alguna implementacion de la librerıa de paso de mensajes MPI

(Message Passing Interface). En nuestro caso usamos MPICH (http://www.mcs.anl.gov/mpi/mpich)

desarrollado en ANL (Argonne National Laboratory). El particionamiento de la malla es

un punto muy importante para una implementacion eficiente y sera realizado con METIS

(http://www.cs.umn.edu/ metis).

http://www.cimec.org.ar/petscfem

http://www.cimec.org.ar/petscfem

B.4. Computacion de Alta Performance 183

Consideraciones Generales

Si bien el problema matematico es bien conocido, las escalas de longitud y tiempo en juego

en problemas de hidrologıa hacen que los tiempos de calculo sean considerables y requieran

el uso de metodologıas especiales, como el empleo de calculo distribuido (paralelismo) y

metodos iterativos. De la misma manera en el tratamiento de flujos complejos (e.g., flujos

a altos Reynolds, a altos Mach y alrededor de cuerpos de variada geometrıa) mediante

las ecuaciones de Navier-Stokes es necesario utilizar modelos numericos que describan los

fenomenos turbulentos que se dan y para esto es necesario un alto grado de refinamiento

en las mallas (ademas de contar con la adaptacion de estas a las estructuras que se forman

en dichos regımenes) que se traduce en un problema de gran escala debido a la cantidad de

grados de libertad necesaria (en una simulacion DNS (por Direct Numerical Simulation)

de las ecuaciones de Navier-Stokes 3D incompresible son necesarios Re9/4 nodos con 4

grados de libertad por nodo para captar las estructuras turbulentas que se desarrollan).

El procesamiento distribuido (calculo en paralelo) usando clusters de PC’s de tipo

‘Beowulf’ permite abordar estos tipos de problemas a un costo bajo y con equipamiento

de gran accesibilidad (COTS = ‘Commodities-Off-The-Shelf’).

B.4.2 Solucion de Grandes Sistemas de Ecuaciones

La resolucion de grandes sistemas de ecuaciones algebraicas lineales subyace en la solucion

numerica de problemas de la mecanica del continuo y muchos otros problemas ingenie-

riles, llegando a constituir en muchos casos el principal factor de costo computacional.

Entre las plataformas de calculo, los clusters de microprocesadores han resultado ser una

alternativa muy eficiente y abordable para resolver grandes problemas numericos. Los

metodos clasicos de resolucion se suelen clasificar en directos o iterativos. Los primeros

proporcionan una solucion cerrada pero el principal inconveniente es el costo tanto en

tiempo de procesamiento como en almacenamiento en memoria. La cantidad de operacio-

nes requeridas para una matriz llena es del orden de n3, siendo n la cantidad de incognitas.

Los procedimientos iterativos son entonces preferidos para resolver sistemas grandes. Sin

embargo en grandes sistemas de ecuaciones el numero de condicion de la matriz empeora

y la solucion se dificulta. En la solucion iterativa se hace necesario precondicionar satis-

factoriamente la matriz del sistema de modo de poder obtener la solucion en un numero

razonable de iteraciones. Aun ası se han encontrado casos en que la solucion directamente

no es alcanzable. Hay tecnicas basadas en particionar el dominio y efectuar resoluciones

a niveles de las incognitas de cada subdominio y a nivel de aquellas en la interfaz entre

subdominios. Son las tecnicas de descomposicion de dominio que combinan resoluciones


directas e iterativas. Se requiere adecuados precondicionadores a fin de obtener resultados

satisfactorios.

Metodos Iterativos

Los metodos iterativos se basan en construir una secuencia de soluciones aproximadas

xk (k = 1, 2, . . .) tal que cuando el ındice de la iteracion k →∞ converja a la solucion

del sistema de ecuaciones.

Las formulas de recurrencia pueden presentarse de diferentes maneras. Por ejemplo se

puede plantear

xk = Gxk−1 + c (B.50)

que a partir de una estimacion inicial x0 permita obtener la solucion aproximada en

cada iteracion. Para que el metodo converja se requiere que ||G|| < 1. Clasicos metodos

iterativos como Jacobi, Gauss-Seidel o SOR pueden encuadrarse en esta tipologıa.

Procedimientos basados en optimizacion resultan eficientes para la resolucion iterativa

de sistemas de ecuaciones. Entre ellos, el metodo de gradientes conjugados (para matrices

simetricas) o GMRes (para matrices no simetricas) gozan de extendida popularidad. En

ellos las iteraciones se realizan en la forma

xk+1 = xk + αk pk (B.51)

a partir de una estimacion inicial x0. En cada paso es preciso calcular una direccion

de busqueda pk, con formulas propias de cada metodo, y un paso de avance αk en esa

direccion.

Estos metodos, basados en iteracion por subespacios, poseen buenas propiedades de

convergencia. No obstante, si el numero de condicion es alto (lo que tıpicamente sucede en

grandes problemas) es necesario introducir un precondicionamiento. Esto se discutira mas

adelante.

En el algoritmo de la Cuadro B.1 se indica el tipo de operacion a realizar. Los calculos

mas pesados son los relativos a producto de matriz por vector. Tambien es necesario

realizar producto interno de vectores y actualizacion de vectores (AXPY).

Si las matrices se distribuyen adecuadamente entre los procesadores, por bloques de

filas, el producto matriz por vector puede realizarse con independencia en cada procesador,

lo mismo que la actualizacion de vectores. El producto interno, sin embargo, requiere

comunicacion para realizar la operacion de reduccion y el posterior envıo del escalar

resultante a todos los procesadores.

La matriz del sistema de ecuaciones interviene en el proceso solamente al realizar

el producto matriz por vector. Es importante resaltar que no se precisa disponer de la


matriz global en ningun momento. Para efectuar el producto matriz por vector podrıan

utilizarse las matrices elementales del metodo de elementos finitos y ensamblarse el vector

resultante. Por otra parte la solucion iterativa contiene un error algorıtmico. El costo de

la solucion puede ajustarse segun la tolerancia especificada para el problema.

B.4.3 Metodos de Descomposicion de Dominio

Estos metodos se basan en descomponer el dominio de definicion del problema en sub-

dominios de modo que en cada uno de estos el problema sea mas facil de resolver (por

ejemplo si se lo resuelve analıticamente), o bien sea de un tamano adecuado para ser alo-

jado en un procesador. Estos subdominios pueden solaparse (metodo de Schwarz) o no.

Entre los que no se solapan, es decir el contacto se produce unicamente en las fronteras

inter-subdominio, se incluyen los procedimientos que utilizan el complemento de Schur.

El metodo de descomposicion de dominio puede mirarse como un buen procedimiento

para precondicionar el problema global. Como casos lımites se comporta como un proce-

dimiento directo cuando el tamano de los subdominios tiende a uno, o como un metodo

iterativo global cuando la frontera es extendida a todos los subdominios (no hay dominios

internos).

Matriz Complemento de Schur

Considerese una descomposicion como en la Figura B.3. Con Ωs se designa el subdominio

s (s = 1, Ns) y con Γsi (i = 1, 3) las fronteras del mismo. Aquı Γ1 se utilizara para

indicar el contorno con condiciones de tipo Dirichlet, Γ2 para aquel con condiciones de

tipo Neumann y Γ3 para las fronteras con otros subdominios (Figura B.3). Si se utiliza

el metodo de los elementos finitos, para resolver numericamente el problema en derivadas

parciales se llega a un sistema de ecuaciones

Ku = f , (B.52)

siendo u el vector de las variables nodales (velocidades y presiones en un problema de

fluidos, desplazamientos en un problema de solido deformable elastico), f es el vector de

variables discretas duales (fuerzas nodales), y K la matriz del sistema (rigidez).

Las matrices de la ecuacion (B.52) se construyen por subdominios y aquellas restrin-

gidas al subdominio s se designan por Ks, us y f s. Se pueden particionar en grupos de

incognitas (grados de libertad) internos al subdominio us y aquellos en la interfaz us. La


nlay = 2strip

21

interface (I)strip boundaries (SB)

internal layers (S)

Figura B.3: Descomposicion del Dominio

matriz de rigidez se puede escribir

Ks =

[Ks Ks

Ks,T Ks

](B.53)

y los vectores de desplazamientos y fuerzas nodales

us =

[us

us

]y f s =

[f s

f s

], (B.54)

El sımbolo 2 se usa aquı para grados de libertad internos, 2 para aquellos en la interfaz

y 2 para la interaccion entre ambos.

Si, por otra parte, se ensambla la contribucion de todos los subdominios a los grados

de interfaz globales, se puede escribir

KI =Ns

As=1

Ks, (B.55)

siendo uI el vector de desplazamientos en grados de libertad de interfaz de todo el dominio.

Las matrices con subındice I tienen el tamano del problema de interfaz global.

La ecuacion de equilibrio (B.52) se puede particionar en los siguientes sistemasKsus + ~Ks

IuI = f s , s = 1, Ns

Ns∑s=1

~Ks,TI us + KIuI = fI

(B.56)


y efectuando la eliminacion (gaussiana) por bloques

Ksus = f s − ~Ks

IuI , s = 1, Ns

[KI −Ns∑s=1

~Ks,TI (Ks)

−1 ~KsI ]uI = fI −

Ns∑s=1

~Ks,TI (Ks)

−1f s

(B.57)

La matriz de la segunda ecuacion en (B.57):

S = KI −Ns∑s=1

~Ks,TI (Ks)

−1 ~KsI (B.58)

se conoce como matriz complemento de Schur o matriz de capacitancia.

El primer grupo de ecuaciones en (B.57) representa el sistema asociado a los grados de

libertad internos us a cada subdominio, resultantes de la caracterıstica de no penetracion

de la descomposicion. Esta parte de la solucion es perfectamente paralelizable. El tamano

de estos problemas esta dado por la granularidad de la descomposicion en subdominios.

La segunda parte de la ecuacion (B.57) representa el problema de interfaz. El tamano

de la matriz complemento de Schur S es menor que el de la matriz global K, pero S

resulta densa. El numero de condicion de S es tambien menor que el de K. Por otra parte

no es necesario el ensamble explıcito de la matriz S, pudiendo efectuarse las operaciones

para la resolucion en cada subdominios. Esta parte de la solucion es acoplada para todo

el problema requiriendo comunicacion entre los distintos procesadores para su resolucion

en paralelo.

La solucion del problema (B.57) puede verse como efectuada en dos partes: un pro-

blema de grados de libertad de interfaz y otro de grados de libertad internos a los subdo-

minios. Se suele resolver el problema interno a traves de metodos directos y el problema

en la interfaz mediante tecnicas iterativas. El uso de metodos directos para los proble-

mas internos evita errores algorıtmicos que se propaguen al problema de interfaz. Como

el tamano de los problemas internos es acotado al subdominio los metodos directos son

aplicables.

Para el problema de interfaz, sin embargo, un metodo directo no es atractivo, por

los requerimientos de almacenamiento. La matriz S es una matriz completa y cara para

construir. Por este motivo se suele recurrir a metodos iterativos (Gradiente conjugado,

GMRes) con un adecuado precondicionamiento. La descomposicion de dominio brinda un

adecuado precondicionamiento para el problema global.


Solucion Iterativa del Problema de Interfaz

La submatriz de rigidez asociada a los grados de libertad de interfaz KI puede escribirse

KI =Ns∑s=1

KsI (B.59)

y la matriz complemento de Schur

S =Ns∑s=1

Ss, (B.60)

donde

Ss = KsI − ~Ks,T

I (Ks)−1 ~Ks

I . (B.61)

La ecuacion (B.60) muestra que la contribucion de cada subdominio a la matriz S puede

calcularse independientemente. La ecuacion (B.57-b) puede ser re-escrita

S uI = gI (B.62)

La solucion de la ecuacion (B.62) por un metodo iterativo (GMRes para el caso de ope-

radores no simetricos y/o no definidos positivos, gradiente conjugado para el caso de

operadores simetricos como ocurre en elasticidad lineal) se puede realizar con un algorit-

mo como el de la Cuadro B.1. Allı se considera un sistema generico

Ax = b (B.63)

y se realiza ademas un precondicionamiento con la intencion de bajar el numero de con-

dicion de la matriz.

Puede observarse que las fases del proceso que resultan mas demandantes en tiempo

de procesamiento son el producto matriz-vector en los pasos I.2 y II.2, y la solucion del

sistema de ecuaciones implıcita en el precondicionamiento, en los pasos I.3 y II.7. Las

restantes operaciones sobre vectores, del tamano del problema de interfaz global, son

productos internos y actualizacion de vectores (AXPY).

B.4.4 Precondicionamiento

El precondicionamiento resulta indispensable para la resolucion de grandes sistemas de

ecuaciones por tecnicas iterativas. Existen varios procedimientos para precondicionamien-

to. Entre ellos pueden mencionarse:


Cuadro B.1: Algoritmo Gradiente Conjugado Precondicionado

I. Inicializacion

I.1 x estimacion inicial

I.2 r = b−Ax matriz × vector + suma vect.

I.3 solve Pz = r solucion sistema

I.4 ρ = (r, z) producto interno

I.5 ρ0 = ρ

I.6 p = z

I.7 k = 1

II. Iterar: mientras k < Kmax hacer

II.1 Test de Convergencia: si ρ < Tol ρ0 finaliza iteraciones

II.2 a = Ap matriz × vector

II.3 m = (p, a) producto interno

II.4 α = ρm

II.5 x = x + αp AXPY

II.6 r = r− αa AXPY

II.7 resolver Pz = r solucion sistema

II.8 ρold = ρ

II.9 ρ = (r, z) producto interno

II.10 γ = ρρold

II.11 p = z + γp AXPY

II.12 k = k + 1, ir a II.1

1. Jacobi o escalado diagonal

Este es el mas simple y se basa en tomar como precondicionador la matriz

P = diag(A). (B.64)

No requiere resolver sistema alguno, pero solamente es eficiente si la matriz del

sistema es diagonal dominante.

2. Factorizacion incompleta En este caso la factorizacion LU, o la factorizacion de

Cholesky CTC para el caso de matrices simetricas positiva definidas, se efectua


pero deteniendo el proceso al alcanzar la estructura rala de la matriz A. Se propone

entonces como precondicionador:

P = LTU (B.65)

siendo L y U los factores incompletos. Este precondicionador podrıa llevar a te-

ner pivotes nulos. Se han introducido modificaciones tales como Shifted Incomplete

Cholesky Factorization para eliminar este problema.

Metodo Neumann-Neumann

Si se observa el algoritmo de la Cuadro B.1 se puede ver que las partes que demandan

mayor tiempo de procesamiento son, como se ha indicado, el producto matriz-vector en los

pasos I.2 y II.2, y la solucion del sistema de ecuaciones implıcita en el precondicionamiento,

en los pasos I.3 y II.7.

El producto matriz por vector (I.2 y II.2) puede escribirse

a = Sp, (B.66)

siendo S la matriz complemento de Schur y, debido a (B.60),

a =Ns∑s=1

as =Ns∑s=1

Ssp, (B.67)

Esto es, las contribuciones a a se calculan separadamente en cada subdominio. En el

subdominio s se tiene (B.61)

Ssp = [KsI − ~Ks,T

I (Ks)−1 ~Ks

I ]p (B.68)

y considerando el problema restringido al subdominio[Ks ~Ks

~Ks,T KsI

][vs

p

]=

[0

as

]. (B.69)

La contribucion del subdominio s al vector (B.67) es

as = ~Ks,Tvs + Ksip, (B.70)

siendo vs solucion de

Ksvs = −~Ksp. (B.71)

Las ecuaciones (B.69) a (B.71) muestran que para evaluar el producto matriz-vector (B.67)

basta con resolver, en cada subdominio, un problema de Dirichlet donde valores prescriptos


de p se impone en la interfaz Γs3, y se obtiene el vector asociado as. Finalmente se suman

las contribuciones as de cada subdominio.

Para realizar el precondicionamiento se efectua el siguiente procedimiento. En cada

subdominio se define una matriz DsI de modo que

Ns∑s=1

DsI = II . (B.72)

Esto significa que ensamblando las matrices DsI de todos los subdominios se obtiene la

matriz identidad en el espacio de interfaz global. La manera mas simple de construir DsI

es con una matriz diagonal cuyos terminos son la inversa de la cantidad de subdominios

que comparten el grado de libertad en cuestion.

En cada iteracion del gradiente conjugado, el residuo se proyecta sobre cada subdo-

minio

rs = Ds,T r (B.73)

y en cada subdominio se resuelve el sistema

Sszs = rs (B.74)

Finalmente las contribuciones de cada subdominio al vector z se promedian sobre la

interfaz

z =Ns∑s=1

Dszs (B.75)

Esto es equivalente a usar un precondicionamiento de la forma

P−1 =Ns∑s=1

Ds(Ss)−1DsT . (B.76)

La solucion de (B.74), a su vez, es equivalente a resolver el problema de Neumann en

el subdominio s, donde el vector solucion zs contiene, en un problema elastico, desplaza-

mientos de los grados de libertad de interfaz. Puede ser realizado sin formar explıcitamente

la matriz del complemento de Schur S, escribiendo para cada subdominio[Ks Ks

Ks,T KsI

][vs

zs

]=

[0

rs

]. (B.77)

La solucion del problema de Neumann (B.77) en el subdominio s es

vs = −(Ks)−1 ~Kszs (B.78)


zs =(Ks

I − ~Ks,T (Ks)−1 ~Ks

)−1rs =

(Ss)−1

rs. (B.79)

Cuando se resuelve iterativamente el problema de interfaz el producto matriz-vector (I.2

y II.2) y la solucion del sistema de ecuaciones (I.3 y II.7) se reemplazan por resoluciones

del problema restricto a cada subdominio, alternativamente con condiciones de Dirichlet o

de Neumann, respectivamente. Estas soluciones por subdominio resultan bien escalables.

Para un problema de Laplace mientras el numero de condicion de la matriz global es

O( 1h2 ) (siendo h el tamano de los elementos) aquel para el complemento de Schur es O( 1

h),

y utilizando el precondicionador (B.76) se reduce a O(1).

En general, en un problema de elasticidad, la ecuacion (B.74), sobre un subdominio,

posee a una matriz S singular a menos que se restrinjan suficientes grados de libertad

para impedir movimientos de cuerpo rıgido. Diversas tecnicas han sido desarrolladas para

eliminar este inconveniente en los subdominios ‘flotantes’.

Precondicionador de Franja alrededor de la Interfaz (‘Interface Strip Precon-

ditioner’)

El precondicionador propuesto aquı (y que se describe en el capıtulo §2) esta basado en

resolver un problema de Dirichlet alrededor de una franja (strip) de nodos alrededor de

la interfaz entre subdominios. Cuando el ancho de la franja es pequeno, el costo compu-

tacional y la demanda de memoria para resolver el problema de la interfaz son bajos y

el numero de iteraciones para converger a una dada tolerancia es relativamente alto. Lo

contrario sucede cuando el ancho de la interfaz es aumentada. La formulacion matematica

de este nuevo precondicionador y su aplicacion en la solucion de problemas en CFD son

expuestas en la tesis.

Este precondicionador resulta tener mejor desempeno para matrices que son resultado

de la discretizacion de operadores simetricos y ademas carece del problema de dominios

flotantes (modos rıgidos) como por ejemplo ocurre con el clasico precondicionador de

Neumann-Neumann. Se hace hincapie que para operadores que implican derivadas de las

incognitas (e.g., ecuacion de Laplace, elasticidad estacionaria, adveccion-difusion estacio-

naria) una parte de la frontera deberıa tener condiciones de tipo Dirichlet o mixtas. De

otra manera el problema esta mal planteado y la matriz resulta singular. Para el precon-

dicionador de Neumann-Neumann los subdominios heredan las condiciones del problema

original en la frontera externa, mientras que condiciones de Neumann son impuestas en

las fronteras de los subdominios interiores. Ası, subdominios que tienen interseccion vacıa

con la porcion de la frontera externa con condiciones Dirichlet y/o mixtas tendrıan condi-


ciones Neumann en toda la frontera y consecuentemente aparecerıan modos rıgidos para

los operadores descriptos anteriormente.

En contraste con otros precondicionadores, e.g., precondicionadores del tipo ‘wire-

basket’ [BPS86], el precondicionador de interfaz es puramente algebraico y puede ser

ensamblado a partir de un subconjunto de los coeficientes de la matriz. Ademas no hay

restricciones en cuanto a la topologıa de la malla y mas aun puede ser aplicado a ma-

trices ralas que provienen de otro tipo de problemas, no necesariamente de ecuaciones

diferenciales en derivadas parciales.

Consideremos la interfaz que resulta de la descomposicion de la Figura B.3 con una

franja de dos capas de elementos a cada lado (nlay = 2). El precondicionamiento consiste

en, dado el vector fI definido en los nodos en la interfaz (I en la Figura B.3) calcular vI

dado por el siguiente problema KII KIS KI,SB

KSI KSS KS,SB

KSB,I KSB,S KSB,SB

vI

vS

vSB

=

fI

0

0

, (B.80)

con ‘condiciones Dirichlet’ en las fronteras de la franja vSB = 0, tal que el problema se

reduce a [KII KIS

KSI KSS

][vI

vS

]=

[fI

0

]. (B.81)

Una vez que el sistema es resuelto, vI es el valor del precondicionador aplicado a fI , por

lo tanto

vI = P−1IS fI . (B.82)

B.4.5 Implementacion Operativa del Cluster

El cluster responde a la filosofıa Beowulf (http://www.beowulf.org/) desarrollado en el

CESDIS del Goddard Space Flight Center (GSFC-NASA). Existen una gran cantidad de

clusters de este tipo, que van desde unos pocos nodos hasta 1000 o mas (500 P-II duales,

http://www.genetic-programming.com/).

En nuestro caso se ha construido un cluster de 20 procesadores Intel P-IV, 2.8Ghz, con

2Gb RAM conectados entre sı por una red Fast Ethernet (100 Mbit/sec) soportado por

un switch Encore ENH924-AUUT+. La configuracion responde a la metodologıa disk-less,

es decir que los nodos no cuentan con monitor, teclado ni disco duro. Al bootear el nodo

carga el Sistema Operativo de un diskette, envıa un ‘RARP request’ al server y monta el

root filesystem via NFS (Network File System) en el disco del server.


El CIMEC ha proyectado construir ademas un cluster de alrededor 100 nodos (proyecto

PME 209 ANPCyT) donde se debera prestar especial atencion a la escalabilidad del

codigo generado.

B.5 Algunas Definiciones Topologicas

B.6 Dominio Lipschitz, Frontera Lipschitz

Un dominio abierto Ω es llamado ‘dominio Lipschitz’ y su frontera, ‘frontera Lipschitz’,

si esta conectado y si para cada punto en su frontera , x ∈ ∂Ω := Ω\Ω, existe una

transformacion de coordenadas Φ : Rd → Rd, a δ > 0, y una funcion Lipschitz-continua

η : [−δ,+δ]d−1 → R tal que

Ω ∩B(x, δ) = Φ(y1, ..., yd) ∈ B(x, δ) : η(y1, ..., yd) > yd

∂Ω ∩B(x, δ) = Φ(y1, ..., yd) ∈ B(x, δ) : η(y1, ..., yd) = yd

B(x, δ)\Ω = Φ(y1, ..., yd) ∈ B(x, δ) : η(y1, ..., yd) < yd,

(B.83)

donde B(x, δ) es una bola abierta de radio δ centrada en x. Un conjunto cerrado es llamado

‘dominio Lipschitz’ (cerrado) si es el cierre de un dominio Lipschitz (abierto). Un dominio

es un dominio Lipschitz si localmente su frontera puede ser representada como un grafo

de una funcion Lipschitz y si el dominio esta localmente a un lado de la frontera.

Figura B.4: Rudolf Otto Sigismund Lipschitz (1832–1903)

B.7. Funcion Lipschitz 195

B.7 Funcion Lipschitz

Sea f(t, x) una funcion continua por tramos en t. Si f satisface

||f(t, x)− f(t, y)|| ≤ L||x− y|| (B.84)

∀x, y ∈ B(x, δ) y ∀t ∈ [t0, t0 + β], β > 0, se dice que f es Lipschitz-continua en x o que

satisface continuamente la condicion de Lipschitz en x y L es la constante de Lipschitz.

Se dice que f(x) es localmente Lipschitz en un dominio (conjunto abierto y conexo)

D ⊂ Rn si cada punto de D tiene un entorno B(x, δ) tal que f satisface (B.84) con alguna

constante de Lipschitz L0. Tambien f(x) es Lipschitz en un conjunto W si satisface (B.84)

en todos los puntos deW , con la misma constante de Lipschitz L. Toda funcion localmente

Lipschitz en un dominioD es Lipschitz en todo subconjunto compacto (cerrado y acotado)

deD. Decimos que f(x) es globalmente Lipschitz si es Lipschitz en Rn. Decimos que f(t, x)

es localmente Lipschitz en x en [a, b]×D ⊂ R×Rn si cada punto x ⊂ D tiene un entorno

D0 tal que f satisface (B.84) en [a, b]×D0 con alguna constante de Lipschitz L0. Se dice

que f(t, x) es localmente Lipschitz en x en [t0,∞) × D si es localmente Lipschitz en x

en [a, b]×D para todo intervalo compacto [a, b] ⊂ [t0,∞). Tambien se dice que f(t, x) es

Lipschitz en [a, b]×W si satisface (B.84) para todo t ⊂ [a, b] y todo punto en W , con la

misma constante de Lipschitz L.

B.8 Problemas Bien Planteados en el Sentido de Ha-

damard

El termino matematico ‘problema bien planteado’ es debido a la definicion que diera

Hadamard en su artıculo de 1902 [Had02]. Hadamard decıa que los modelos matematicos

de los fenomenos fısicos debıan tener las siguientes propiedades,

i) La solucion debe existir,

ii) ser unica

iii) y depender en forma continua de los datos en una topologıa razonable.

Ejemplos de este tipo de problemas bien planteados son el problema de Dirichlet para la

ecuacion de Laplace y la ecuacion de la transmision no estacionaria de calor con condicio-

nes iniciales especificadas. Estos deben ser vistos como problemas ’naturales’ ya que hay

procesos fısicos que son descriptos por esas ecuaciones.

En contraste la ecuacion del calor integrada temporalmente hacia atras para determi-

nar una distribucion de temperatura, en tiempos previos, a partir de la distribucion final,


es un problema mal planteado ya que la solucion es altamente dependiente del estado

final. Los problemas inversos a menudo son mal planteados.

Problemas que surgen en la mecanica del continuo son discretizados para obtener solu-

ciones numericas (muy a pesar de algunos cientıficos) y en terminos del analisis funcional

resultan continuos. Si embargo pueden ‘sufrir’ de inestabilidades numericas cuando son

resueltos con precision finita o errores en los datos. Una medida del ‘buen planteamiento’

de un problema lineal discreto es el numero de condicion

Si un problema esta bien planteado se espera que la solucion sea la adecuada cuan-

do es resuelto en una computadora con precision finita usando un algoritmo estable. Si

no esta bien planteado es necesario que sea re-formulado para su resolucion numerica.

Tıpicamente son necesarias otras condiciones como las de regularidad o suavidad de la

solucion.

Bibliography

[Ali94] S.K. Aliabadi. Parallel finite element computations in aerospace applications.

phd. thesis. Department of Aerospace Engineering and Mechanics, University

of Minnesota, 1994.

[ARF84] S.R. Ahmed, G. Ramm, and G. Faltin. Some salient features of the times-

averaged ground vehicle wake. SAE Society of Automotive Eng., Inc.,

1(840300):1–31, 1984. 92, 94

[Arn51] W.E. Arnoldi. The principle of minimized iterations in the solution of the

matrix eigenvalue problem. Quarterly of Applied Mathematics, 9:17–29,

1951. 6

[ART93] S. Aliabadi, S. Ray, and T. Tezduyar. SUPG finite element computation of

viscous compressible flows based on the conservation and entropy variables

formulations. Computational Mechanics, 11:300–312, 1993. 54, 58

[BCHM86] M. Braza, P. Chassaing, and H. Ha Minh. Numerical study and physical

analysis of the pressure and velocity fields in the near wake of a circular

cylinder. Journal of Fluid Mechanics, 164:79–130, 1986. 82, 83, 85

[Bel99] A. Belmonte. Flutter and tumble in fluids. Physics World, 1999. 124

[BEM98] A. Belmonte, H. Eisenberg, and E. Moses. From flutter to tumble: Iner-

tial drag and froude similarity in falling paper. Physical Review Letters,

81(2):345–348, 1998.

[BGCMS04] S. Balay, W.D. Gropp, L. Curfman McInnes, and B.F. Smith. PETSc 2.2.0

user’s manual. Argonne National Laboratory, 2004. xiii, xvii, 49, 169

[BH82] A.N. Brooks and T.J.R. Hughes. Streamline upwind/petrov-galerkin for-

mulations for convection dominated flows with particular emphasis on the

197

198 BIBLIOGRAPHY

incompressible navier-stokes equations. Computer Methods in Applied Me-

chanics and Engineering, 32:199–259, 1982. 54, 170

[BLST90] M. Behr, J. Liou, R. Shih, and T.E. Tezduyar. Vorticity-stream function

formulation of unsteady incompressible flow past a cylinder: sensitivity of the

computed flow field to the location of the downstream boundary. University

of Minnesota Supercomputer Institute Research Report, UMSI 90/87, 1990.

82, 85

[BPS86] J.H. Bramble, J.E. Pasciak, and A.H. Schatz. The construction of precon-

ditioners for elliptic problems by substructuring, I. Mathematics of Compu-

tation, 47(175):103–134, 1986. xii, xvi, 14, 169, 193

[BPS89] J.H. Bramble, J.E. Pasciak, and A.H. Schatz. The construction of pre-

conditioners for elliptic problems by substructuring, IV. Mathematics of

Computation, 53(187):1–24, 1989. xii, 14

[BR92] J. Broeze and J.E. Romate. Absorbing boundary conditions for free surface

wave simulations with a panel method. Journal of Computational Physics,

99:146, 1992. 103

[BR96] F. Bourquin and N. Rabah. Decoupling and modal synthesis of vibrating

continuos systems. In 9th International Conference on Domain Decomposi-

tion Methods, 1996. 24

[Bre74] F. Brezzi. On the existence, uniqueness and approximation of saddle-point

problems arising from lagrangian multipliers. Rev. Francaise Automat.

Informt. Recherche Operationnelle Ser. Rouge Anal. Numer, R-2:129–151,

1974. 170

[BSI92] C. Baumann, M.A. Storti, and S.R. Idelsohn. Improving the convergence rate

of the petrov-galerkin techniques for the solution of transonic and supersonic

flows. International Journal for Numerical Methods in Engineering, 34:543–

568, 1992. 109

[Car72] J.E. Carter. Numerical solutions of the Navier-Stokes equations for the

supersonic laminar flow over two-dimensional compression corner. National

Aeronautics ans Space Administration (NASA), Thechnical Report R-385,

1972. 53

BIBLIOGRAPHY 199

[Ceb96] J. Cebral. Loose Coupling Algorithms for fluid structure interaction. PhD

thesis, Institute for Computational Sciences and Informatics, George Mason

University, 1996. 125

[Cho67] A.J. Chorin. Numerical method for solving incompressible viscous problems.

Journal of Computational Physics, 2(12), 1967.

[Cho73] A.J. Chorin. Numerical study of slightly viscous flow. Journal of Fluid

Mechanics, 57:785–796, 1973. 82, 170

[Cod01] R. Codina. Pressure stability in fractional step finite element methods for

incompressible flows. Journal of Computational Physics, 170:112–140, 2001.

87

[Cro02] J.M. Cros. A preconditioner for the Schur complement domain decomposi-

tion method. In 14th International Conference on Domain Decomposition

Methods, 2002. xvi, 24, 27, 169

[CW92] X.C. Cai and O.B. Widlund. Domain decomposition algorithms for indefinite

elliptic problems. SIAM Journal on Scientific Statistic Computing, 13:243–

258, 1992. 46

[DCC+95] E. Dowell, E. Crawley, H. Curtiss, D. Peters, R. Scanlan, and F. Sisto. A

Modern Course in Aeroelasticity. Kluwer Academic Publishers, Dordrecht,

1995. 132, 144

[DP06] W. Dettmer and D. Peric. A computational framework for fluid-rigid body

interaction: Finite element formulation and applications. Computer Methods

in Applied Mechanics and Engineering, 195:1633–1666, 2006. 132

[DW87] M. Dryja and 0. Widlund. An additive variant of the schwarz alternating

method for the case of many subregions. Technical Report 339, Courant

Institute of Mathematical Sciences, 1987. 46

[FKMN97] S. Field, M. Klaus, M. Moore, and F. Nori. Instabilities and chaos in falling

objects. Nature, 387:252–254, 1997. 124

[FLLT+01] C. Farhat, M. Lesoinne, P. Le Tallec, K. Pierson, and D. Rixen. FETI-DP: a

dual-primal unified FETI method-part I: A faster alternative to the two-level

FETI method. International Journal for Numerical Methods in Engineering,

50:1523–1544, 2001. 25

200 BIBLIOGRAPHY

[FM98] C. Farhat and J. Mandel. The two-level FETI method for static and dynamic

plate problems. Computer Methods in Applied Mechanics and Engineering,

155:129–152, 1998. 25

[FMR94] C. Farhat, J. Mandel, and F.X. Roux. Optimal convergence properties of

the FETI domain decomposition method. Computer Methods in Applied

Mechanics and Engineering, 115:365–385, 1994. 25

[Fol76] G.B. Folland. Introduction to partial differential equations. Princeton Uni-

versity Press-Uni, 1976. 179

[FPF01] C.A. Felippa, K.C. Park, and C. Farhat. Partitioned analysis of coupled me-

chanical systems. Computer Methods in Applied Mechanics and Engineering,

190:3247–3270, 2001. 132

[FR91] C. Farhat and F.X. Roux. A method of finite element tearing and inter-

connecting and its parallel solution algorithm. International Journal for

Numerical Methods in Engineering, 32:1205–1227, 1991. 25

[GGS82] U. Ghia, K.N. Ghia, and C.T. Shin. High-re solutions for incompressible

flow using the Navier-Stokes equations and a multigrid method. Journal of

Computational Physics, 48:387–411, 1982. 86

[GK89] D. Givoli and J.B. Keller. A finite element method for large domains. Com-

puter Methods in Applied Mechanics and Engineering, 76:41–66, 1989. 103

[GK90] D. Givoli and J.B. Keller. Non-reflecting boundary conditions for elastic

waves. Wave Motion, 12:261–279, 1990. 103

[GLD94] F. Grasso, G. Leone, and J. Delery. Validation procedure for the analysis of

shock-wave/boundary-layer interaction problems. AIAA Journal, 32,9:1820–

1827, 1994. 60

[GLS94] W. Gropp, E. Lusk, and A. Skjellum. Using MPI: Portable Parallel Pro-

gramming with the Message-Passing Interface. 2nd edition. The MIT Press,

London, England, 1994. xiii, xvii, 49, 169

[GR05] V. Gnesin and R. Rzadkowski. A coupled fluid structure analysis for 3-d

inviscid flutter of IV standard configuration. Jornal of Sound and Vibration.,

49:349–369, 2005. 132

BIBLIOGRAPHY 201

[Had02] J. Hadamard. Sur les problemes aux derivees partielles et leur signification

physique. Princeton University Bulletin, pages 49–52, 1902. 177, 195

[Hag87] T. Hagstrom. Boundary conditions at outflow for a problem with transport

and diffusion. Journal of Computational Physics, 69:69–80, 1987. 103

[HH92] I. Harari and T.J.R. Hughes. Galerkin least-squares finite element methods

for the reduced wave equation with non-reflecting boundary conditions in

unbounded domains. Computer Methods in Applied Mechanics and Engi-

neering, 98:411–454, 1992. 103

[Hir90] C. Hirsch. Numerical Computation of Internal and External Flows - Vol. II.

Wiley Series in Numerical Methods in Engineering., 1990. 55, 65

[Hou58] J.C. Houbolt. A study of several aerothermoelastic problems of aircraft

structures. Mitteilung aus dem Institut fur Flugzeugstatik und Leichtbau

5, E.T.H., Zurich, Switzerland, 1958. 139

[HS52] M.R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving

linear systems. Journal of Research of the National Bureau of Standards,

49:409–436, 1952. 12

[HT84] T. Hughes and T. Tezduyar. Finite element methods for first-order hy-

perbolic systems with particular emphasis on the compressible euler equa-

tions. Computer Methods in Applied Mechanics and Engineering, 45:217–

284, 1984.

[Hua00] J.Y. Huang. Trajectory of a moving curveball in viscid flow. In Proceedings

of the Third International Conference: Dynamical Systems and Differential

Equations, pages 191–198, 2000. 124

[Hua01] J.Y. Huang. Moving Boundaries VI, chapter Moving Coordinates Methods

and Applications to the Oscillations of a Falling Slender Body, pages 73–82.

WIT Press, 2001. 124

[Hua02] J.Y. Huang. Advances in Fluid Mechanics IV, chapter Aerodynamics of a

Moving Curveball in Newtonian Flow, pages 597–608. WIT Press, 2002. 124

[KD03] S. Krajnovic and L. Davidson. Numerical study of the flow around the bus-

shaped body. Journal of Fluids Engineering, ASME, 125:500–509, 2003.

92

202 BIBLIOGRAPHY

[Kel95] C.T. Kelley. Iterative Methods for Linear and Nonlinear Equations. Frontiers

in Applied Mathematics, Vol. 16, SIAM, 1995. 11, 43

[KK03] J. Koo and C. Kleinstreuer. Liquid flow in microchannels: experimental

observations and computational analyses of microfluidics effects. Journal of

Micromechanics and Microengineering, 13:568–579, 2003. 74

[Lan50] C. Lanczos. An iteration method for the solution of the eigenvalue problem of

linear differential and integral operators. Journal of Research of the National

Bureau of Standards, 45(4):255–282, 1950. 6

[LC96] R. Lohner and J.R. Cebral. Fluid-structure interaction in industry: Issues

and outlook. In Proc. World User Association in Applied Computational

Fluid Dynamics, 3rd World Conference in Applied Computational Fluid Dy-

namics, Germany, May 19-23, 1996. 125

[LC98a] R. Lohner and J.R. Cebral. Fluid-structure-(thermal) interaction in indus-

try: Issues and outlook. In Proc. 4th World Conference and Exhibition in

Applied Fluid Dynamics, Freinurg i. Br., Germany, June 7-11, 1998.

[LC98b] R. Lohner and J.R. Cebral. Loads transfer for viscous fluid-structure inter-

action. In Proc. IV World Congress in Computational Mechanics, Buenos

Aires, Argentina, June 29-July 2, 1998.

[Lef05] E. Lefrancois. Numerical validation of a stability model for a flexible over-

expanded rocket nozzle. International Journal for Numerical Methods in

Fluids, 49:349–369, 2005. 132

[LMPV87] R. Lohner, K. Morgan, J. Peraire, and M. Vahdati. Finite element flux-

corrected transport (FEM-FCT) for the euler and navier-stokes equations.

International Journal for Numerical Methods in Engineering, pages 1093–

1109, 1987.

[LNST06] E. Lopez, N.M. Nigro, M.A. Storti, and J. Toth. A minimal element distor-

tion strategy for computational mesh dynamics. International Journal for

Numerical Methods in Engineering, 2006. 135

[LTV97] P. Le Tallec and M. Vidrascu. Solving large scale structural problems on par-

allel computers using domain decomposition techniques. In M. Papadrakakis,

editor, Parallel Solution Methods in Computational Mechanics, chapter 2,

pages 49–85. John Wiley & Sons Ltd., 1997. xi, xii, xv, 14, 17, 41, 167

BIBLIOGRAPHY 203

[LYC+98] R. Lohner, C. Yang, J.R. Cebral, J. Baum, H. Luo, D. Pelessone, and

C. Charman. Fluid-structure interaction using a loose coupling algorithm

and adaptive unstructured grids. AIAA paper AIAA-98-2419, 1998. 125

[LYC+00] R. Lohner, C. Yang, J.R. Cebral, J.D. Baum, H. Luo, E. Mestreau, and

E. Pelessone Charman. Fluid-structure interaction algorithms for rupture

and topology change. In Japan, 2000.

[Man93] J. Mandel. Balancing domain decomposition. Communications on Numerical

Methods in Engineering, 9:233–241, 1993. xvi, 24, 27, 38, 41, 169

[Meu99] G. Meurant. Computer Solution of Large Linear Systems, volume 28. Studies

in Mathematics and Its Applications. North-Holland, 1999. xvi, 14

[MRS99] L. Mahadevan, W.S. Ryu, and A.D.T. Samuel. Tumbling cards. Physics

Fluids, 11(1):1–3, 1999.

[Nor01] C. Norberg. Flow around a circular cylinder: Aspects of fluctuating lift.

Journal of Fluids and Structures, 15:459–469, 2001. 82

[NSI97] N.M. Nigro, M.A. Storti, and S.R. Idelsohn. GMRes physics based precondi-

tioner for all reynolds and mach number. numerical examples. International

Journal for Numerical Methods in Fluids, 25:1–25, 1997.

[PF00] K.C. Park and C.A. Felippa. A variational principle for the formulation of

partitioned structural systems. International Journal for Numerical Methods

in Engineering, 47:395–418, 2000. 132

[PF01] R. Piperno and C. Farhat. Partitioned procedures for the transient solu-

tion of coupled aeroelastic problems. Part II: energy transfer analysis and

three-dimensional applications. Computer Methods in Applied Mechanics

and Engineering, 190:3147–3170, 2001. 132, 135, 137, 138, 156

[PNS06] R.R. Paz, N.M. Nigro, and M.A. Storti. On the efficiency and quality of

numerical solutions in CFD problems using the interface strip preconditioner

for domain decomposition methods. International Journal for Numerical

Methods in Fluids, 52(1):89–118, 2006. xi, 14, 80

[PS05] R.R. Paz and M.A. Storti. An interface strip preconditioner for domain

decomposition methods: Application to hydrology. International Journal

204 BIBLIOGRAPHY

for Numerical Methods in Engineering, 62(13):1873–1894, 2005. 14, 79, 89,

91, 97

[PSI+03] R.R. Paz, M.A. Storti, S.R. Idelsohn, L.B. Rodrıguez, and C. Vionnet. Paral-

lel finite element model for coupled surface and subsurface flow in hydrology:

Province of santa fe basin, absorbent boundary condition. In XIII Argentine

Congress on Computational Mechanics - ENIEF2003, 2003. 66

[Rac97] W. Rachowicz. An anisotropic h-adaptive finite element method for com-

pressible Navier-Stokes equations. Computer Methods in Applied Mechanics

and Engineering, 146:231–252, 1997. xiii

[RMSB03] F.X. Roux, F. Magoules, L. Series, and Y. Boubendir. Approximations of

optimal interface boundary conditions for two-lagrange multiplier FETI me-

thod. In 15th International Conference on Domain Decomposition Methods,

2003. 18, 25

[Ros54] A. Roshko. On the drag and shedding frequency of two dimensional bluff

bodies. National Advisory Commitee for Aeronautics (NACA), Thechnical

Note 3169, 1954. 82

[Saa00] Y. Saad. Iterative Methods for Sparse Linear Systems. PWS Publishing Co.,

2000. xii, 14, 16, 62

[San01] B.F. Sanders. High-resolution and non-oscillatory solution of the st. venant

equations in non-rectangular and non-prismatic channels. Journal of Hy-

draulic Research, 39(3):321–330, 2001. 115

[SB03] M.S. Stay and V.H. Barocas. Coupled lubrication and stokes flow finite

elements. International Journal for Numerical Methods in Fluids, 42(2):129–

146, 2003. 74

[SBrG96] B. Smith, P. Bjø rstad, and W. Gropp. Domain Decomposition: Parallel

Multilevel Methods for Elliptic Partial Differential Equations. Cambridge

University Press, 1996. xiii

[SDEI97] M.A. Storti, J. D’ Elıa, and S.R. Idelsohn. Algebraic discrete non-local

(DNL) absorbing boundary condition for the ship wave resistance problem.

Journal of Computational Physics, 146:570–602, 1997. 103

BIBLIOGRAPHY 205

[SDP+03] M.A. Storti, L. Dalcın, R.R. Paz, A. Yommi, V. Sonzogni, and N.M. Nigro.

An interface strip preconditioner for domain decomposition methods. to ap-

pear in. Journal of Computer Methods in Science and Engineering, 2003.

90, 97

[Sma63] J. Smagorinsky. General circulation experiments with the primitive equa-

tions. Monthly Weather Review, 91(3):99–165, 1963. 76, 92

[SNPD06] M.A. Storti, N.M. Nigro, R.R. Paz, and L. Dalcın. PETSc-FEM: A general

purpose, parallel, multi-physics FEM program. 1999–2006. 14, 43, 49

[SOHL+01] M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra. MPI, The

Complete Reference. Vol. 1, The MPI Core. 2nd edition. The MIT Press,

London, England, 2001.

[SP96] C. Succi and F. Papetti. An Introduction to parallel computational Fluid

Dynamic. Nova Science Publishers, Inc., 1996. xi, xv, 167

[SS86] Y. Saad and M.H. Schultz. GMRes: a generalized minimal residual algo-

rithm for solving nonsymmetric linear systems. SIAM Journal on Scientific

Statistic Computing, 7(3):856–869, 1986. xii, 12

[SSBS99] T.L. Sterling, J. Salmon, D. Becker, and D.F. Savarese. How to Build a

Beowulf. Scientific and Engineering Computation. MIT Press, Cambridge

MA, 1999. xiii, xvii, 169

[ST90] R. Shih and T.E. Tezduyar. Numerical experiments with the location of

the downstream boundary for flow past a cylinder. University of Minnesota

Supercomputer Institute Research Report, UMSI 90/38, 1990. 82, 83, 85

[SYNS02] V. Sonzogni, A. Yommi, N.M. Nigro, and M.A. Storti. A parallel finite

element program on a Beowulf cluster. Advances in Engineering Sofware,

33(7-10):427–443, 2002. 14

[Tem69] R. Temam. Sur l’approximation de la solution des equations de navier stokes

par la methode des pas fractionaires (I). Archive for Rational Mechanics and

Analysis, 32(135), 1969. 170

[TMRS92] T. Tezduyar, S. Mittal, S. Ray, and R. Shih. Incompressible flow compu-

tations with stabilized bilinear and linear equal order interpolation velocity

206 BIBLIOGRAPHY

pressure elements. Computer Methods in Applied Mechanics and Engineer-

ing, 95(95):221–242, 1992. 58, 77, 169

[TS04] T. Tezduyar and M. Senga. Determination of the shock-capturing param-

eters in supg formulation of compressible flows. In Tsinghua University

Press & Springler-Verlag, editor, Computational Mechanics WCCM IV, Bei-

jing, China. 2004, 2004. 59, 137

[Tsy98] S.V. Tsynkov. Numerical solution of problems on unbounded domains. a

review. Applied Numerical Mathematics, 27:465–532, 1998. 103

[VDGP91] Q. Vinh Dihn, R. Glowinski, and J. Periaux. Solving elliptic problems by

domain decomposition methods with applications. International Journal for

Numerical Methods in Engineering, 32:1205–1227, 1991.

[Whi74] G.B. Whitham. Linear and Nonlinear Waves. Pure and Applied Mathe-

matics, A Wiley-Interscience Series of Texts, Monographs, and Tracts, 1974.

65

[Wil85] C.H.K. Williamson. Evolution of a single wake behind a pair of bluff bodies.

Journal of Fluid Mechanics, 159:1–18, 1985. 82

[Zem03] J.P.M. Zemke. Krylov Subspace Methods in finite precision: A unified ap-

proach. PhD thesis, Technischen Universitat Hamburg, 2003. 6

Index

densidad

del fluido, 169

Lagrangiana, 170

derivada substancial, 171

ecuacion de continuidad, 172

ecuaciones de Navier-Stokes

compresible, 168

incompresible, 168

escalas

escala geometrica intermedia, 169

escalas moleculares, 169

espaciales, 169

longitud, 169

menor escala geometrica representable,

169

tiempo, 169

estabilizacion

SUPG-PSPG, 168

integracion temporal

esquema de Paso Fraccionado, por Frac-

tional Step, 168

monolıtico, regla trapezoidal, 168

numerdo de Knudsen, 169

Reynolds

numero, 168

solenoidal, 172

207

DOMAIN DECOMPOSITION TECHNIQUES AND ... - CIMEC

Documents

Transcript of DOMAIN DECOMPOSITION TECHNIQUES AND ... - CIMEC