Parallel optimality criteria-based topology optimization for minimum compliance design

6th World Congresses of Structural and Multidisciplinary OptimizationRio de Janeiro, 30 May - 03 June 2005, Brazil

Parallel Optimality Criteria-based Topology Optimization for

Minimum Compliance Design

Arash Mahdavi, Raghavan Balaji, Mary Frecker, Eric M. Mockensturm.

Department of Mechanical and Nuclear Engineering, Pennsylvania State University, University Park, PA 16802, USA.

1. Abstract

Topology optimization is often used in the conceptual design stage as a preprocessing tool to obtain overall material distribution inthe solution domain. The resulting topology is then used as an initial guess for shape optimization. It is always desirable to use finecomputational grid to obtain high-resolution layouts that minimize the need for shape optimization and post processing [1], but thisapproach results in high computation cost and is prohibitive for large structures. To reduce the computation time of such problems,parallel computing in combination with domain decomposition is used. The power law approach has been used as the materialdistribution method and for locating the optimum solution; an optimality criteria-based optimizer is used [2, 3]. The equilibriumequations are solved using a preconditioned conjugate gradient algorithm. These calculations have been done using a master-slaveprogramming paradigm on coarse grain Multiple Instruction Multiple Data (MIMD) shared memory architecture. In this study, byavoiding assembly of the global stiffness matrix, the memory requirement as well as computation time has been reduced. The resultsof the current study show that the parallel computing technique is a valuable tool for solving computationally intensive topologyoptimization problems.

2. Keywords

Topology optimization, parallel computing, finite element analysis, MPI, SIMP, domain decomposition.

3. Introduction

Topology optimization has gained widespread popularity in academia and industry and is being applied to the design of materials,mechanisms, micro electro mechanical systems (MEMS) and many other complex structural design problems. In the literature, onecan find a multitude of approaches for solving topology optimization problems. In 1988, Bendsoe and Kikuchi [4] introduced the so-called microstructure/homogenization approach. In 1989, an alternative approach to topology optimization was introduced byBendsoe [5]. This method is called the “Power-Law” or Solid Isotropic Material with Penalization (SIMP) approach. A similarapproach was advocated by Zhou and Rozvany in 1991 [6] and also by Mlejnik in 1992 [7]. In power law approach, materialproperties are assumed to be constant within each element and the variables are the elements’ relative densities. The materialproperties are modeled as the relative material density raised to some power times the material properties of the solid material [2].Despite much theoretical progress in this field, and considering the major benefits that industries can gain by using this method in theconceptual design stage, the application of topology optimization to industrial problems is not yet widespread. The main reason isthat, as an iterative process, topology optimization of large real world structures is computationally intensive. Thus, there is a need tofind computationally efficient ways to perform the topology optimization of complex structures with large number of degrees offreedom. One way to address this problem is using efficient discretization techniques like Boundary element method (BEM) andmeshless techniques. DeRose Jr. and Diaz [8] developed a meshless fictitious domain method based on wavelet basis and Galerkinscheme to solve computationally intensive 3D topology optimization problems. Maar and Schulz [9] used multigrid interior pointapproach for solving large topology optimization problems. Another approach is to use a faster finite element solver, either by usingdifferent element formulations, like p-version finite elements or by using a faster equation solver, like different iterative solvers.Reducing the total number of analyses through heuristic techniques like re-analysis can reduce the solution time as well. All of theabove approaches are based on reducing the number of floating point operations needed for solving the topology optimizationproblem.

Another approach is to increase the computational power of the system through parallel computing. Reviewing the literature, itseems that the application of parallel computing in topology optimization is rare. Borrvall and Petersson [10] considered topologyoptimization of 3D domains using parallel processing. In their work, they used the so-called regularized intermediate density controlmethod to ensure the existence and uniqueness of the solution and to obtain black-and-white final layouts. In their method in order toenforce a black-and-white solution a penalty parameter should be calculated for each problem, which needs some numerical

experiment adding to the complexity of the method. They used the MMA approach for locating optimal solution. Their study showedthe effectiveness of the preconditioned conjugate gradient solver for solving parallel topology optimization problems.The main objective of the present work is to implement a simple and efficient parallel computing technique for solving large-scaletopology optimization problems. To this end, a much simpler approach based on power law technique and a heuristic filter is adoptedhere [2]. An optimality criteria method has been employed for updating the design variables (relative densities) [2]. For solution ofthe equilibrium equations a preconditioned conjugate gradient method is used and in order to speed up the computation, the FEA andsensitivity calculations have been parallelized.The remainder of the paper is organized as follows. Section 3.1 describes the formulation of the topology optimization problem. Insection 3.2, different domain decomposition technique used in this paper is discussed. Section 3.3 explains the numerical methods forthe iterative solution of equilibrium equations. In section 3.4, the overall structure of the program and parallelization algorithm isoutlined. Section 4 includes numerical results of different case studies. In Section 5, some conclusions based on these case studies arepresented.

3.1. Problem FormulationThe objective of optimization process is to determine the stiffest possible structure for a given domain, amount of material, loaddistribution and support conditions by minimizing the compliance of the system. In order to solve this problem, a “power-law” orSIMP approach has been adopted [2]. A detailed description of the formulations described in this section can be found in Sigmund[2].The problem can be expressed as:

€

minx : c(x) =UTKU = (xee=1

N

∑ )p ueT k0ue = ce (x)

e=1

N

∑ (1)

Subject to :

€

V (x)V0

= f (2)

:

€

KU = F (3) :

€

0 < xmin ≤ x ≤1 (4)

Where U and F are global displacement and force vectors, respectively, K is the global stiffness matrix, ue and k0 are the elementdisplacement vector and stiffness matrix, respectively; x is the vector of design variables, and xmin is a vector of minimum relativedensities (nonzero to avoid singularity). N is the number of elements, p is the penalization power (typically p=3), while V(x) and V0

are material volume and design domain volume respectively, and f is the prescribed volume fraction [2].The above optimization problem can be solved using different techniques such as Optimality Criteria (OC), Sequential LinearProgramming (SLP), and the Method of Moving Asymptotes (MMA) [2]. Here a standard optimality criteria method and heuristicdesign-updating scheme is used

€

xenew =

max(xmin,xe −move) xeBeη ≤max(xmin,xe −move)

xeBeη max(xmin,xe −move) ≤ xeBe

η < min(1,xe + move)min(1,xe + move) min(1,xe + move) ≤ xeBe

η

(5)

Where move is a positive move limit and

€

η = 0.5 is a damping co-efficient. Be is found from the optimality condition: [2]

€

Be =−∂c∂xe

λ∂V∂xe

(6)

The Lagrange multiplier

€

λ is obtained using the bisection iterative method. The numerator is the sensitivity of the objective functionwith respect to the design variable, and is found as [2]:

eTe

pe

e

ukuxpx

c0

1)( −−=∂

∂ (7)

Since a power law approach is used, to ensure the existence of solutions to the optimization problem, it must be combined with eithera perimeter constraint, a gradient constraint or filtering techniques. In this program a filtering technique is used, which works bymodifying the element sensitivities as follows:

∑∑ =

=

∂

∂−

−=

∂

∂ N

f ffN

fe

filterede x

cxfedistr

fedistrxx

c

1min

1min

)),(()),((

1 (8)

The original sensitivities are thus averaged over a circular area with center located at the center of the corresponding element(element e) and radius

minr . These filtered sensitivities are used in the optimality criteria updating process [2]. In this study, for all

examples minr =1.2, p=3 has been used.

3.2. Domain DecompositionThe key element in the application of parallel computing for solving discretized elliptic PDE problems is the domain decompositionmethod. Domain decomposition is a technique to divide the computational load between different processors to speed up thecomputation. It can be classified into two categories, explicit domain decomposition and implicit domain decomposition [11, 12]. Inexplicit domain decomposition, the design domain is partitioned into several sub-domains (Fig. 1), one for each processor. All of theelements within each sub-domain are subject to the same instructions. In implicit domain decomposition, instead of a physical

partitioning of the domain, first the global system of equations is assembled, and then the resulting matrices are partitioned and sentto separate processors.

Fig. 1 Explicit domain decomposition for 2D rectangular domain.

Traditionally, there are two major techniques that are employed for domain decomposition, iterative techniques and sub-structuring.In the iterative methods, information concerning nodes along the common boundaries is communicated between processors at eachiteration. In the sub-structuring method, each sub-domain is treated as a super-element. Thus using static condensation, internaldegrees of freedom of each sub-domain are condensed and the problem is formulated using just those degrees of freedom at theboundaries. After solving for these retained degrees of freedom, internal degrees of freedom are evaluated (recovery) [12]. In thepresent work a third approach has been adopted for domain decomposition. In this approach there is no need to transfer boundarydata between neighboring sub-domains, and the calculations related to each sub-domain are independent of the neighboring sub-domains. The interaction of the neighboring sub-domains is taken into account during assembly of the internal force vector. Byadopting this strategy, inter-processor communication has been minimized.

3.3. SolversIn general, linear equation solvers may be classified into two types, direct solvers (e.g. Gauss elimination), and iterative solvers (e.g.conjugate gradient). Both direct and iterative solvers can be programmed for parallel machines. In the present work, parallel iterativesolvers are preferred over parallel direct solvers, since for large problem sizes, parallel direct solvers are less efficient due to greaterinter-processor communication, have much greater memory requirement compared to iterative solvers, and are generally not suitedfor large sparse systems of equations like in large-scale topology optimization.The main criteria for selecting a parallel solver are the type of analysis and the bandwidth of the stiffness matrix of the structure.According to the literature, conjugate gradient methods become more efficient compared to direct methods when either threedimensional elements are encountered or several thousands plane elements are used [12]. In other words, CG-based methods becomemore competitive when relatively large bandwidths are encountered [12]. For the current study, which considers meshes with morethan 10,000 elements, iterative methods like conjugate gradient are superior to direct solvers. Thus, in this paper the conjugategradient method has been used for solving the equilibrium equations.

3.4. Program StructureA parallel program has been written in Fortran 90 using Message Passing Interface (MPI) for inter-processor communication. Thisprogram consists of the following modules:

1. Finite Element Analysis (FEA)2. Sensitivity calculation3. Mesh independency filter4. Optimality criteria-based update.

In this work, both the FEA and sensitivity calculation modules are parallelized. It is observed that, parallelizing the FEA has the mostprominent effect on performance, since it contains the bulk of the computation [1, 10]. In addition, the optimizer module can proceedonly after the equilibrium solution has converged. In the program developed, the FEA block (module 1) is nested within theOptimization block (modules 2, 3 and 4).Mesh independency filtering is not parallelized, since filtering is not local by nature and requires the averaging of the sensitivitiesover several elements. Also, calculations related to sensitivity filtering are negligible compared to those required for the solution ofthe equilibrium equation and calculation of the element sensitivities (More than 97% of the operations are related to solution ofequilibrium equation and calculating the sensitivities).Master-slave paradigm is used for the parallelization of this program; with communication only between slave processors and themaster processor. Finally, to decrease the idleness of the master processor, sensitivity filtering and optimality criteria-based updateare assigned to the master processor.

3.4.1 Equilibrium Equation Solver LoopIn this module, explicit domain decomposition first partitions the domain into a number of vertical strips equal to the number of slaveprocessors (Fig. 1), then a CG iterative solver with Jacobi preconditioning is used to solve the equilibrium equations. In thisapproach, the global stiffness matrices are neither assembled nor stored, and all the computations requiring stiffness are performed

by the slave processors at the element level, thus replacing the time-consuming process of global assembly with assembly andelimination of constrained degrees of freedom at sub-domain level. For the load cases considered, a simple geometry and hence astructured mesh has been assumed, where all the elements are unit squares and the aspect ratio of the rectangular domain ismaintained through different number of divisions in the x and y directions [2].

3.4.2 Optimization LoopThe main feature of the optimization loop is that the structure is assumed to change only slightly between consecutive optimizationiterations, and hence the converged equilibrium solution of the previous iteration is used as the initial guess for the next iteration; thisspeeds up the overall process considerably. Also, for the first iteration, the initial guess, Uo, has been obtained by solving DUo=Fwhere D is a diagonal matrix consisting of the diagonal elements of the stiffness matrix and F is the load vector. Finally, thecontribution of elements with relative densities below a pre-defined threshold is neglected, to further speed up the solution process.A flowchart of the entire program is shown in Figure 2.

Fig. 2 Flowchart

4. Case Studies

In order to evaluate the efficiency of the program, four benchmark problems have been studied. In these problems, the effects of thenumber of processors, mesh size, loading configuration and support conditions have been studied. In all of the problems, threedifferent sizes of mesh have been studied, 40

€

×20, 80

€

×40 and 160

€

×80, in which the first number refers to number of elements in thex-direction and the second refers to that along the y-direction. Depending on the size of the problem, 1, 2, 4, 8, 16, and 32 processorswere used for parallel computation. The design problems are shown in Fig 3. For all test cases, E = 1, 3.0=ν and volume fraction, f= 0.5 has been used. In addition, all the force components are unity.

Fig. 3 Test Cases

The analysis was performed on SGI Origin 2000 workstation in National Center for Supercomputing Applications (NCSA) atUniversity of Illinois at Urbana-Champaign (UIUC). The hardware architecture was a shared memory MIMD with 64 MIPS R10000processors with clock speed of 195 MHz and 2 operations per clock cycle.Algorithmic speed-up was used as performance measure to compare the various test cases, and parallel efficiency, as defined in (9).

Algorithmic speed-up: SPA = T1 / TP (9)where TP is the time it takes to run the parallel code on ‘N’ parallel processors, and T1 is the time it takes to run a parallel code on asingle processor [14].

Problem 1 Problem 3

Problem 2 Problem 4

Fig. 4 Final layouts for problems 1 to 4

As expected, by using a fine mesh, fine details of the optimum shape can be captured and the final layouts consist of smoothboundary curves. It is possible to use detailed FE meshes of the domain to obtain final layouts that are near-final designs and do notrequire extensive post-processing.Reviewing the results, it appears that no direct correlation exists between the number of elements and number of optimizationiterations. The number of optimization iterations is found to depend more on load and support conditions and less on the number ofelements.As expected the accuracy of the calculated compliance increases with increase in mesh density. This increase in accuracy is the resultof the better representation of the continuum and as the element size tends to zero the compliance approaches the exact lower boundsolution.

4.1 Performance ChartsFigure 5 presents the variation of speed-up as a function of number of processors for the four benchmark problems. As can be seen,for a given mesh size, increasing the number of processors, initially, reduces the computational time significantly, but this increase inprocessing speed tops out asymptotically which is in agreement with Amdahl’s law [14].This law says that for a problem of fixed size by increasing the number of processors, the algorithmic speed-up asymptoticallyapproaches a theoretical limit. The reason is that each code inherently has some serial portion that cannot be parallelized and thissection of the code determines the maximum speed up that one can achieve through parallel computing.Amdahl’s law reads as:

€

SPA =1

(1− R) + (R /N) (11)

where,

€

SPA is the Algorithmic speed-up,

€

R =P

S + P is the ratio of the time spent in the parallel part of the code to the total execution

time, S is the time spent in the serial portion of the code, P is the time spent in the parallel portion of the code, and N is the number ofprocessors.

Problem 1 Problem 2

Speedup vs No. processors (Problem 1)

0

0.5

1

1.5

2

2.5

3

3.5

0 5 10 15 20 25 30 35

No. processors

Spee

dup

40 x 20 mesh

80 x 40 mesh

160 x 80 mesh


0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 5 10 15 20 25 30 35

No. processorsSpee

dup

40 x 20 mesh

80 x 40 mesh

160 x 80 mesh

Problem 3 Problem 4


0

1

2

3

4

5

6

7

0 5 10 15 20 25 30 35

No. processors

Spee

dup

40 x 20 mesh

80 x 40 mesh

160 x 80 mesh


0

0.5

1

1.5

2

2.5

3

3.5

4

0 5 10 15 20 25 30 35

No. processors

Spee

dup

40 x 20 mesh

80 x 40 mesh

160 x 80mesh

Fig. 5 Performance charts for problems 1 to 4

As it is clear from the above formula for the problem of fixed size by increasing the number of processors, there exists a theoreticallimit for SPA:

RSPALimSPA N −

== ∞→ 1

1)( max

(12)

This, partially, explains the saturation observed in the performance plots beyond certain number of processors. By increasing themesh size, this saturation occurs at the larger number of processors. In the benchmark examples documented here this saturationhappens at around 8 processors and the results of 16 and 32 processors are presented here just to show that the performance plateauhas reached and the computational gain through increasing the number of processors beyond a certain limit will not worth the cost ofadding these new processors.As can be seen Amdahl’s law does not take into account the time needed for inter-processor communications. By increase in thenumber of processors, for a problem of fixed size, the ratio of communication to computation time increases for each processor. Thisincrease in communication overhead is another reason for degradation of the parallel efficiency of the system.

It is worth noting that as the optimization iterations proceeds from theoretical point of view the number of equilibrium iterationsshould increase because more voids will be introduced in the solution domain and make it more heterogeneous as a result conditionnumber of the stiffness matrix will deteriorate but our numerical experiments shows that the appearance of these holes does notaffect the number of equilibrium iterations much. There are two reasons for this behavior; first we used the Jacobi preconditioningthat reduces the effect of system heterogeneity. Second, as the optimization proceeds the solution of last optimization iteration isused as the initial guess for the next equilibrium iteration this dramatically reduces the number of total equilibrium iterations.

5. Conclusions

In this paper a parallel processing algorithm for the compliance topology optimization problem is proposed, where the FEA andsensitivity calculations have been parallelized. The method is shown to significantly reduce computation time for problems with

relatively fine meshes, and it is expected that problems with very large numbers of elements would also benefit from the proposedapproach. The domain decomposition approach and master-slave programming paradigm are shown to be appropriate tools forsolving the compliance problem in topology optimization.Solution of the equilibrium equations is accomplished through use of a parallel Jacobi conjugate gradient solver, which is based on asimple diagonal pre-conditioner. The reason for using this is that the pure conjugate gradient solver is relatively slow and is notsuitable for an iterative design process like topology optimization. This pre-conditioning speeds up the solver drastically and isshown to provide about six times improvement in performance.It was found that a good initial guess of the displacements could improve the performance of the program. Here, the displacementvector obtained in the previous optimization iteration has been used as the initial guess for displacement vector.In the problems studied, no direct correlation was found between the number of elements and number of optimization iterations. Thenumber of optimization iterations was found to depend more on load and support conditions and less on the number of elements. Fora given mesh size, increasing the number of processors reduces the computational time significantly, but this increase in processingspeed tops out asymptotically. For a particular problem, increasing the mesh density results in an increase in the maximumachievable speed-up. In this study, a speed-up of up to 6 has been observed. In addition, efficiency drops quite rapidly with increasein the number of processors. Avoiding assembly of the global stiffness matrix decreases the memory requirement as well asprocessing time.

6. References

1. M.P. Bendsoe, O.Sigmund, “Topology Optimization Theory, Methods and Applications”, 2003 Springer-Verlag.2. Sigmund, O., “A 99 line topology optimization code written in MATLAB ” Struct. Multidisc. Optim. 21,120-127,20013. Rozvany G.I.N., Olhoff N., “Topology Optimization of Structures and Composites Continua”,2000, Kluver Academic Publishers.4. Bendsoe. M. P., Kikuchi. N., “Generating optimal topologies in optimal design using a homogenization method”, Comput.Methods Appl. Mech. Engrg, 71, pp.197-224, 1988.5. Bendsoe, M. P., “Optimal shape design as a material distribution problem”, Structural Optimization, Vol. 1, pp.193-202, 1989.6. Zhou, M.; Rozvany, G.I.N, “The COC algorithm, part II: Topological, geometry and generalized shape optimization”, Comp.Meth. Appl. Mech. Engrng, Vol. 89, pp.197–224, 1991.7. Mlejnek, H.P., “Some aspects of the genesis of structures”, Struct. Optim. Vol. 5, pp.64–69, 1992.8. DeRose Jr., G. C. A., Diaz, A. R., “Solving three-dimensional layout optimization problems using fixed-scale wavelets”,Computational Mechanics, Vol. 25, pp.274-285, 2000.9. Maar B., Schulz V., “Interior point multigrid methods for topology optimization”, Struct Multidiscip Optim, Vol. 19, pp214-224,200010. Borrvall. T., Petersson. J., “Large-scale topology optimization in 3D using parallel computing”, Comput. Methods Appl. Mech.Engrg, 190, pp. 6201-6229, 2001.11. Papadrakakis. M., “Parallel Solution methods in computational mechanics”, 1997, John-Wiley and sons.12. B.H.V. Topping and A.I. Khan, “Parallel Finite Element Computations”, 1996, Saxe-Coburg.13. Dongarra J., Duff I., Sorensen D.,Van Der Vorst H., "Solving Linear Systems on Vector and Shared Memory Computers”,SIAM, 199114. Course Website of Prof. Lyle N. Long“http://personal.psu.edu/lnl/424/”15. Pacheco P.S., “Parallel Programming with MPI”, 1997, Morgan Kaufmann

Parallel optimality criteria-based topology optimization for minimum compliance design

Documents

Transcript of Parallel optimality criteria-based topology optimization for minimum compliance design