Parallel form factors computation for radiative transfers in vegetation

14
Second Eurographics Workshop on Parallel Graphics and Visualisation, pp 117-131 24-25 September 1998, Rennes, France

Transcript of Parallel form factors computation for radiative transfers in vegetation

Parallel form factors computation

for radiative transfers in vegetation

J. Hondermarck+ M. Chelle C. Renaud+ B. Andrieu

+LIL INRABP 719 Station de Bioclimatologie

62228 Calais Cedex 78850 Thiverval-Grignon

honder,[email protected] chelle, [email protected]

Abstract

Ecient computation of radiative transfers in complex environments is an important problem incomputer graphics, and in some other interesting areas of research too. However accuracy and com-plexity requirements lead together to very high computation times. In this paper we presents a rststudy of parallelisation of a vegetation-oriented radiosity, so called Nested Radiosity. As the main partof simulating time corresponds to form factor calculations, we have centred our study on this part ofthe model. computation steps. It is used both in computer graphics (for plants illumination) and inenvironmental physics (for crop simulation). The constraints which have been taken into account are onone hand the large complexity of the vegetation environments, and on the other hand the heterogeneityof the target computing architecture. The rst results we have obtained provide both a high speedupand a good load-balancing.

1 Introduction

Radiosity is a well-known method for computing global illumination for computer graphics (CG). Ini-tially studied in heat transfer science, it has been rst applied to image synthesis by Goral [18] in caseof perfectly diuse emitters and reectors. With this method 3D objects are cut into small planarfacets (or elements) in order both to capture illumination variations across each object's surface andto reduce the problem complexity. Interactions between all the elements are processed through theevaluation of geometric quantities, known as form factor, between each pair of elements.

In regard to its origin, radiosity theory has applications in several research areas. By consideringboth the lot of research about this method that have been performed in the image synthesis domainand more especially the powerful algorithmic solutions proposed by computer scientists, it seems reallyinteresting and mutually enriching to develop common works with these other research areas.

Recent works in environmental physics [6, 17, 11] used a radiosity-based method for modelingradiative exchanges within plant canopies. The knowledge of these exchanges is an important topicof agronomical researches: on one hand it provides information about the amount of energy availablefor photosynthesis and thus for biomass production; on the other hand it aects the evolution of the3D structure of plants by modifying the spectral composition of light inside the canopy. Furthermoreearth's observation satellites enable to survey vegetation at a large scale using measurements of reectedand emitted radiative energy. Interpretation of these information requires to accurately model theinteractions between radiation and plants and the soil. Note that specic radiation wavelength areconsidered for canopy: the goal is not to display images of plants illumination but rather to computeenergy for each individual plant organ for biologically important wavebands. These ones are mainly thePhotosynthetically Active Radiation (PAR), the infrared and the ultraviolet. However the sequentialand parallel radiosity algorithms that will be described in this paper are successfully suitable for realisticimage synthesis of plant canopies.

In the same way than architectural scenes simulated by CG radiosity, plants are explicitly describedas a 3D set of small planar elements. This kind of description allows the energy exchanges to becomputed precisely. But it produces very high computation times mainly due to the large number ofform factors that have to be processed.

Reducing these computation times is the main topic of the work this paper presents. Form factorscomputation being the most computationally demanding step of the method, we focused our attentionabout this point for parallelization. However plant canopies and architectural environments present alot of dierences both in term of geometry and reectivity. These dierences led to design a specicapproach, the nested radiosity (NR) [11], which is rst detailed in the second part of this paper.

1

Second Eurographics Workshop on Parallel Graphics and Visualisation, pp 117-131 24-25 September 1998, Rennes, France

We then describe the parallel approach we have implemented for the NR by taking into account speci-cities of the canopy geometry and those of the sequential algorithm. Some results highlighting theperformances of the parallel nested radiosity are presented in part 4 and future works are discussed inpart 5.

2 The nested radiosity

2.1 Radiative transfer within plant canopies

For plants, light is (i) a source of energy which produces biomass by photosynthesis and has an eect onthe temperature, (ii) a signal which points out the presence of competitors (neighboring plants) or (iii)a factor of injury such as the level of ultraviolet (see [22] for a review). Modeling these physiologicalprocesses requires to compute the distribution of light over leaves and plant organs.

Modeling radiative transfer relies usually in environmental physics on idealizing the canopy as aturbid medium (see [27] for a review and [29, 25] for CG applications). In this approach, that wecall volume-based, radiation propagation is described by the equation of radiative transfer [8]. Thishas led to the development of ecient models describing volume-averaged uxes as a function of thespatial coordinates (x,y,z). However, because they use a statistical description of canopy structure,volume-based models do not allow to compute accurately uxes distributed over individual organs.

The surface-based approaches use an explicit description of the geometry of a plant population bya set of primitives. Radiative transfers between primitives are described by the radiance equation [23].This integral equation expresses the radiance scattered at a point of a surface as a function of radiancesreected to this point by the other surfaces. The equation is solved either stochastically with the MonteCarlo ray tracing (MCRT) [34] or deterministically with the radiosity method [6, 17]

With the Monte Carlo approach, results are statistical estimates, thus it is necessary to simulatethe paths of a large number of rays, in order to get low associated variances. This requires verylarge simulation times on usual workstations [9]. The Monte Carlo method is then too slow to be anoperational tool.

The radiosity method [18] is based on the assumptions that surfaces are lambertian and that ra-diative uxes over a primitive are constant. This enable to approximate the radiance equation as asystem of linear equations, called radiosity equation. To present this equation, we introduce now somenotations. The domain i of solid angles starting from the center of Ai is divided in two hemispheresi+ and i. i+ is the set of directions ~! such that ~!:~ni > 0, where ~ni is the normal to Ai. i isthe set of directions such as ~!:~ni < 0. From now on, the subscripts + and refer respectively toi+ and i. The radiosity equation is:

Bi = B0i + i

Xj2i+

BjFij + iXj2i

BjFij (1)

where i is the reectance of the surface Ai and i is the transmittance, B0i corresponds to the diusion

of the uncollided light from sun and sky and Fij is the form factor between Ai and Aj .The original radiosity was a 2-step method. First, computing and storing the full matrix of form

factor. Second, solving the resulting linear system. As storing the matrix is impossible for a large scenesuch as a canopy, we will present now constraints for developing a radiosity adapted to canopies andthe ways to satisfying their.

2.2 Radiosity for canopy

Specications The conditions of radiative simulations for crop modeling are dierent from thoseencountered in CG. First, the spectral domain of interest spreads over the PAR [0.4-0.7 m], thenear infrared (NIR) [0.7-0.1.35 m] and the mean infrared [1.35-2.6 m]. In this spectral domain, thelight sources are only the sun and the sky, whose features vary during a crop simulation. Thus, fora given canopy, the main variable of interest, say the energy absorbed by surface elements, should becomputed for several radiative conditions. For reasons of eciency the geometry-based computations(form factors, etc.) must be performed only once for all the simulation conditions. Second, the geometryof a plant canopy diers strongly from that of architectural scenes usually found in image synthesis.The number of polygons required to describe even a simple canopy is large (40 000 triangles describedroughly 100 maize plants [1]). These polygons are small and distributed throughout all the volume ofthe scene. Thus a leaf and its neighbor do not exchange energy with the same set of primitives, and

2

Second Eurographics Workshop on Parallel Graphics and Visualisation, pp 117-131 24-25 September 1998, Rennes, France

therefore the spatial coherence of a canopy is lower than that of usual CG scenes. Third, most of thesepolygons are translucent like leaves. Finally there is a high level of scattering for some wavelengths likein the NIR: that is, for each light interaction with a leaf, more than 85% of the energy is scattered.

To sum up, a radiosity model adapted to canopies must (i) handle large scenes, (ii) compute highorder scattering and (iii) be capable of reusing the results of geometry-based calculations.

Previous work The rst requirement cannot be fullled with a full-matrix radiosity because ofthe excessively large memory needed, while the second and third ones slow down the convergence ofprogressive radiosity [12].

The low spatial coherence of the canopy implies a high level of occlusion between two far awaypolygons. These occlusions imply low or null form factors, which makes the form factor matrix sparserand consequently makes a full matrix solution possible, by using data structures adapted to sparsematrixes. From now on, the method giving such a solution will be called sparse-matrix radiosity. Goelet al [17] proposed such a method limited to small canopies. Borel and Gerstl [6] developed a modelable to deal large scenes. However their solution introduced a bias in results.

Moreover recent works in CG have at applying radiosity to large architectural scenes, by partitioningthe scene into far and close polygons.

In the Domain Decomposition Method [14, 3], the whole domain (scene) is subdivided into smallersub-domains. Within each sub-domain, radiative simulation is performed independently. The sub-domains exchange energy through virtual walls. This is made possible if the exchanged uxes as well astheir directions are stored at each virtual wall. This storage requires a very large memory not availablein most computers. This constraint limits the method on a practical basis.

Clustering consists in recursively grouping neighboring polygons. The result is a hierarchy of clus-ters. Clusters of dierent levels may directly exchange energy. This is one of the advantages of clustering[37, 35]. On the other hand, clustering raises many issues: (1) it is dicult to distribute the energygathered by a cluster to it descendants, (2) due to the broad spatial distribution of polygons in acanopy, it is dicult to individualize clusters and calculating visibility between clusters would be veryexpensive.

2.3 Nested Radiosity

Principle A polygon Aj is considered to be close to Ai, if it belongs to the sphere Si of diameterDs and centered at the center of gravity of Ai. The diameter Ds is a parameter of the model. It resultsfrom the denition of form factor that for a canopy described by polygons of similar size, the diameterDs denes a high bound of the form factor between two polygons considered as distant. In our method,the radiosity over a primitive is approximated by the radiosity calculated at the center of this primitive.

The equation describing the radiative equilibrium of a polygon's face is derived from the radiosityequation (eq. 1):

Bi = B0i + i

Xj2Si+

BjFij + iXj2Si

BjFij +Bfari (2)

where Bfari is the radiosity due to the light scattered by polygons outside Si.

Bfari is calculated as Bfar

i = iEfari+

+ iEfari

. We note i the solid angle domain for which there

is no occlusion between Ai and Si (Fig. 1). Efari+

and Efari

are the integrals of the radiance L going

into Si respectively over i+ and i. Thus Bfari is given by:

Bfari = i

Zi+

L(P;~!) cos id! + i

Zi

L(P;~!) cos id! (3)

where L(P;~!) is the radiance at a point P of the sphere and in a direction ~! and i is the anglebetween ~ni and ~!.

The radiative equilibrium of all the faces describing the canopy is:

M:B = Bfar +B0 (4)

3

Second Eurographics Workshop on Parallel Graphics and Visualisation, pp 117-131 24-25 September 1998, Rennes, France

n i

Si

Domain Ω

ω

d

i∗

ωA

Ai

j

P

Ω

Ω

i+

i-

Figure 1: Close and far contributions (~! is a direction and d! is a dierential solid angle around ~!, P isthe intersection between ~! and Si)

where B, Bfar and B0 are vectors containing respectively the Bi, Bfari and B0

i of each face and

M =

0BBB@

1 11F11 12F12 1NF1N21F21 1 22F22 2NF2N

......

. . ....

N1FN1 N2FN2 1 NNFNN

1CCCA (5)

where

ij =

i if (~ni:~nj) < 0i if (~ni:~nj) > 0

(6)

The algorithm consists in the following steps: (i) computation of E0i by projection, (ii) computation

of the eld L(P; ~!) by a volume-based model, (iii) computation of the matrix of form factors and ofthe vector Bfar and (iv) solving the system. We present now the principles of these steps. Algorithmicdetails have been presented in [11].

Direct light In the solar spectra, light sources are only the sun and the sky, that can be consideredas innitely distant sources. We approximate the angular distribution of luminance by discretizingthe sky hemisphere in solid angles n. To each n is associated a collimated source, whose power iscalculated by the integration of the sky luminance over n. The irradiance over Ai due to a collimatedsource is eciently computed by projecting the polygons describing the canopy on a discretized screennormal to the direction of light. A Z-buer is used to take into account the shadows.

Computation of the eld of radiance To compute the eld of radiance L(P; ~!), we used amulti-layer version [28] of the SAIL model [39]. Main approximations are that the canopy is describedas a set of homogeneous innite layers and diuse uxes are isotropic.

SAIL provides, for the boundaries of each layer n, the irradiances due to the direct light Es(zn) andto the downward Ex

# (zn) and upward Ex" (zn) scattered uxes. Irradiance at any elevation is computed

by linear interpolation:

E(z) =(z zn)E(zn) + (zn+1 z)E(zn+1)

zn+1 zn(7)

with n such as zn < z < zn+1.With the assumption of isotropic scattered uxes, the radiance at a point P (Px; Py; Pz) in a direction

~! is calculated as:

L(P; ~!) =

(Ex" (Pz)

; if !z > 0

Ex# (Pz)

; else

(8)

4

Second Eurographics Workshop on Parallel Graphics and Visualisation, pp 117-131 24-25 September 1998, Rennes, France

Ai

Figure 2: Use of a 3D grid to determine polygons within Si

Determination of the list of inner polygons For each polygon i, the set of the polygons jpartially or completely inside the sphere Si surrounding i is determined. To make this determinationfast, the scene is divided into a regular cubic grid, whose edge is equal to the diameter of the sphere(g. 2).

Computation of form factors and far contributions We used a point-surface method [32]to compute the matrix of form factors and the vector Bfar. Its principle is the following. We considerthe hemisphereHi located above face Ai. All the polygons Aj belonging to Si are successively projectedonto Hi then orthogonally onto the base of Hi, called projection disk (Fig. 3). To take into accountocclusions, Z-buering is applied. All the pixels of the projection disk correspond to the same formfactor F [26] due to Nusselt relations (g. 2.3).

Figure 3: Projection disk

black pixels correspond to free paths of lightfrom the sphere boundary to the center of Ai

(

i ) and white pixels correspond to occlusionsby primitives belonging to Si.

Rd

dAi dA00

j

dAj

dA0

j

i

j

Figure 4: Nusselt equivalent

The form factor FdAidAjis equal to FdAidA

0

j,

where dA0

j is the projection of dAj on the sphere(dAi, R). FdAidAj

is given by FdAidAj=

cos i cos jd2

dAj =dA00

j

R2

Given a face Ai, once all the polygons have been projected, the Fij and Bfari are computed by

5

Second Eurographics Workshop on Parallel Graphics and Visualisation, pp 117-131 24-25 September 1998, Rennes, France

scanning the projection disk: for each pixel containing the label of a face j, the form factor Fij is

incremented with F ; for each empty pixel, Bfari is incremented following equation 3.

Calculation of distributed uxes Baranoski et al [4] showed that the conjugate gradient methodis well adapted to solve the symmetric systems of radiosity especially in the case of highly diusingmaterials. As the nested radiosity system is a non symmetric matrix (Eq. 5), we use the modiedconjugate gradient algorithm of Leyk [24] adapted to this case.

Form factors and the geometric coecients required to compute Bfar are stored after the rstsimulation on a given structure. Thus new radiative simulations on the same canopy structure requireonly quick computations: calculation of direct light, running the SAIL model and solving the linearsystem.

Innite periodic canopy SAIL simulates the radiative transfers only within innite canopies,whereas only a nite number of plants can be explicitly described in a computer. We use virtualreplications to simulate a periodic canopy from a nite set of plants. This was previously used withthe radiosity method [17] and with Monte Carlo ray tracing [34]. In the nested radiosity, it consists intaking into account occlusions due to polygons within the virtual replications of the nite set of plants,when calculating direct light and form factors [9].

2.4 Evaluation of the sequential NR

A comparison of distributed uxes calculted by NR and estimated by a a Monte Carlo ray tracing [9]was published in [11, 10]. Simulations was made with two types of virtual canopies. The rst type is aset of random media. Primitives are triangles randomly distributed in the space (Fig. 5a). We selecteddierent sizes of triangle and dierent distributions of triangle normal. The second type are a virtualeld of 100 maize plants generated with a 3D process-based model of development [15] (Fig. 5b).

(a) Randommedium (60000 faces in a volumeof 3x3x1 m)

(b) Virtual maize (56000 faces in a volume of 4x2.5x2.6 m)

Figure 5: 3D canopy mock-ups used for simulations

Simulations on turbid media validated the implementation and the principle of the NR [11]. Conclu-sions were that the calculation of form factors takes the main part of the simulating time and generateserrors when two triangles are too close. The case of maize showed that a relatively small diameter ofthe surrounding sphere produces satisfying results in a realistic case [10] (the diameter equals 0.5 mfor a density of 10 plants by square meter and a maize height of 2.5 m).

For such a diameter, simulation time is acceptable (table 1), but is too large for our aim to couplethis model with 3D crop model. Note that when Ds is null, that is polygons irradiance is calculated onlyfrom Bfar (no form factor calculations), simulations are very quick. As the 3D grid used to determineinner polygons generates a data locality for the computation of form factors, a way to speed-up themodel is the parallelization of these calculations.

6

Second Eurographics Workshop on Parallel Graphics and Visualisation, pp 117-131 24-25 September 1998, Rennes, France

Ds 0.0 0.1 0.5 1.0 2.0 4.0

Time (h:m:s) 00:00:20 00:14:20 00:23:40 00:50:33 02:51:00 13:36:00

Table 1: Eect of the parameter Ds on the simulation time. Data are for the virtual maize canopy at afull development ( 56000 diusers; computer is a Sun Ultra Enterprise (250Mhz))

3 Parallel Nested Radiosity

3.1 Parallel radiosity overview

Parallelising the radiosity algorithms is an important area of research in CG: radiosity is a powerfultool for the simulation of global illumination but it requires so large amounts of computation that itis really dicult to use. The main problem of this method lies in obtaining the form factors: beforesolving the radiosity system (either by a full matrix solver or by a progressive renement approach),it is necessary to compute the form factors between each pair of elements of the scene. This formfactors knowledge is required in order to accurately distribute the light energy on each part of theobjets surface. Progressive radiosity does not require the knowledge of the entire form factors matrix,but its use induces to compute the same rows (or columns) several times again. Progressive radiosityhas nally a more computational cost when an accurate solution is needed.

For all these reasons several studies have been performed for parallelising both the full matrix andprogressive radiosity methods. By considering mainly the form factor computation step, three levels ofparallelism are usually admitted:

computing several rows (or columns) of form factors in parallel ([7],[13],[20],[21],[30]): as the formfactors computation between element i and the remainder of the scene and the same computationbetween element j and the remainder of the scene, are independent, they can be performed inparallel. For full matrix methods this leads to compute several matrix rows in parallel, whileseveral shooting steps are performed simultaneously for progressive methods;

computing a row (or column) of the matrix in parallel [36]: several processors compute theform factors between an element i and the remainder of the scene, according to the form factorscomputation algorithm in use;

computing the form factor between two elements i and j in parallel [5]. This kind of parallelismexists when using projection methods for computing the form factors (hemicube, single plane,disk, ...).

All these levels have been studied either on MIMD or SIMD architectures. Because of the syn-chronism requirements of the third level, it has been mainly studied on SIMD architectures ([38],[33]).The rst two levels have been intensively studied on MIMD architectures. The main dierences be-tween these studies (both for progressive and full matrix radiosity) proceed from either the form factorscomputation algorithms, the way in which data are managed (duplication of the entire data sets ineach processor memory, distribution of the elements through the processors network or sharing of thedatabase) or still the topology of the processors network that is to be used.

Results of these approaches are generally good for a low number of processors and small sizedatabases. But as both the number of processors and the number of scenes elements grow up, e-ciency rapidly decreases. This comes from the radiosity constraints:

computing the form factors for an element i requires to have an access to all the other elementsbecause we do not know a priori which element will be visible from i and which element will not.There is generally not any locality property for the energy exchanges in radiosity;

The form factors computation is an irregular problem: computing the form factors from twodierent elements often involves large dierences in term of computation times;

radiosity databases that are designed in CG are larger and larger. This size growth of the datasets prohibits the copy of all the elements in each processor memory. Computing a form factorsrow involves thus a very large number of communications in order to fetch the parts of the datathat are necessary or the computation.

These constraints increase the radiosity parallelisation diculties and reduce both the speedup andthe load balancing of these approaches.

7

Second Eurographics Workshop on Parallel Graphics and Visualisation, pp 117-131 24-25 September 1998, Rennes, France

Some recent works in progressive radiosity parallelisation attempted to create some data local-ity properties inside the scenes ([2],[31]): these approaches cut out the scene into several 3D sub-environments using virtual interfaces, and distributed them through several processor memories. Energyexchanges in each sub-environment was then computed locally in each processor and energy exchangesbetween sub-environments was processed through visibility masks exchanges. This approach is a veryinteresting answer to the radiosity parallelisation constraints by introducing such a locality property inthe parallel process. But it requires a lot of visibility masks exchanges which reduces the eciency ofthe approach when the number of processors increases.

3.2 Parallelising the nested radiosity

The locality property of the energy exchanges calculated by the radiosity method is the basis of thenested radiosity, which has been specically designed for plant canopies. In such environments theoccultation probability is very high for two elements that are very far to each other. Thus only closed-elements energy exchanges are accurately simulated (through form factors computation), while farenergy exchanges are only approximated. Consequently a plant element only requires a low part of theentire data set. This property will be used for the parallelisation of the nested radiosity.

3.3 The target architecture

An important parameter which has to be taken into account is the target architecture for the parallelalgorithm. The parallel nested radiosity is to be used by the agronomy researchers for their studiesabout plants growth. Their laboratories are not t out with large parallel machines, but rather witha workstation network. This network is generally heterogeneous and built around a powerful server,and several workstations and PC machines. This heterogeneity is critical for the design of the parallelalgorithm and its eciency. Indeed it seems dicult to distribute statically the work between theprocessors; the ratio of their processing power is unknown and can change during the computationaccording to the use of the workstations by another users. For these reasons a dynamic allocation ofthe work seems to be a better approach.

3.4 The parallel implementation

In order to take into account this dynamic allocation of work, we have developed our parallel approachfollowing a master-slave scheme: a master task manages the work which has to be distributed to severalslave tasks.

As the workstations memory is generally small as compared to data set size, the entire canopycannot be duplicated in each memory. The plants database is thus only managed by the master taskwhich is ran by the server workstation (assuming this one has the large memory amount required forthis storage). During the computation process, the master replies to slaves requests by sending themsome work. This work is dened as the processing of several form factors rows.

Processing a form factors row from an element i requires to receive all the elements that lies intothe sphere Si, centred around element i. Remember that the other elements are not required becausetheir contribution to element i's radiosity will be approximated through the Bfar

i quantity. However,even if a small part of the elements set has to be send for each new form factors row computation, thelarge number of repetition of those kind of communication (one per form factors matrix row) shouldinduce poor performances.

Rather than sending only one element i at a time to a slave (and thus to compute only one formfactors row at a time), we chose to send several elements together (and their corresponding surroundingsphere Si) to each slave. This last one has then to compute the corresponding form factors rows. Theseelements are chosen such that they are geometrically closed for each other. The advantage of such achoice lies in the locality property of the nested radiosity: when some elements are neighbouring, theirsurrounding spheres have a large number of common elements inside. This allows the master to sendat the same time a lot of data that will be useful for several form factors rows computation.

Choosing the neighbouring elements for which form factors rows have to be processed must be donecarefully, both for reducing the size of the surrounding spheres union (and thus the messages size), andto compute easily the plants elements that are in this union. This choice is performed in the same waythat in the sequential algorithm: the entire canopy is divided into a regular 3D grid of voxels and whena slave needs some work, an entire voxel (and its associated elements) is send to him. This slave is thenin charge to process all the corresponding form factors rows. Note that plant canopies can be assumed

8

Second Eurographics Workshop on Parallel Graphics and Visualisation, pp 117-131 24-25 September 1998, Rennes, France

to be homogeneous. This homogeneity ensures a regular distribution of the plants data through theregular 3D grid.

According to the surrounding spheres size, contiguous voxels that intersect the union of spheresare quickly processed and send to the slave with their inner elements. Strictly some elements of thesecontiguous voxels will not be in any surrounding sphere, but computing exactly the subset of vegetationelements lying in at least a sphere would be too computationnally demanding.

For each new form factors row it has to process, the slave computes the corresponding elementsurrounding sphere, determines which elements of its local database is inside this sphere and appliesthe disk depth-buer algorithm ( 2.3). Once the processing of all its rows is completed, the slave sendsthem to the master and waits for any new rows to compute. Note that only non-zero form factors aresended to the master, that store their in the sparse matrix. The operating cycle of this master-slavescheme is shown in gure 6.

Slave Task#1

Slave Task#N

TaskMaster

an elements voxel

and its neighbours

some form factors rows

vegetation canopy voxels

Figure 6: The operating cycle for parallel nested radiosity

This approach has several advantages:

only a low part of the entire data set is stored in each slave memory during several form factorsrows computation;

by allocating several form factors rows to a slave and by taking care to choose the correspondingelements as close as possible, the number of communications between the master and its slaves isconsiderably reduced (as compared to a single row allocation at a time);

when an heterogeneous network has to be used for the run, this master-slave approach shouldallow a good load-balancing between the slaves: the faster a workstation is, larger the number ofrows it will process.

4 Experimental Results

4.1 Parallel Architecture

Our parallel algorithm has been implemented with the well known PVM library [16], which providesecient communication and parallel tasks management mechanisms. It has been ran onto an hetero-geneous workstation network which is available in the LIL laboratory. These network is built fromdierent SGI workstations. A more detailed description of those machines is given in table 2.

Machine type Processor/Frequency Memory Ki Nb of workstation

O2 R5000/180MHz 32 Mb 1 10

Indy R4600/133MHz 32 Mb 0.66 1

Indy R4600/100MHz 32 Mb 0.52 7

Table 2: Workstations Description

9

Second Eurographics Workshop on Parallel Graphics and Visualisation, pp 117-131 24-25 September 1998, Rennes, France

The fourth column of table 2 presents estimated comparison coecients (noted Ki) for the compu-tation power of each machine. They have been estimated by runing the sequential NR algorithm ontoeach machine type and by computing a computation time ratio, a O2 180 MHz being the referencemachine. Obviously those Ki do not give a general comparison of those machines power but they canbe assumed to be relatively accurate for our needs.

4.2 Measures denitions

We will use dierent measure criteria for benching our parallel approach. Their denition is givenbelow:

Speedup SN : It is dened as SN = T 1

Tmax, where T 1 is the best sequential computation time, N

the number of slaves machines and Tmax the computation time of the slowest slave. Note that the bestsequential time is these one we obtained on the most powerful O2 workstation.

Eciency EN : It is dened as EN = 100SnN+1 . En takes into account both the N slave machines

and the master workstation.Corrected Eciency CN : It is dened as CN = 100T 1

(1+PN

i=1Ki)Tmax

. Its purpose is to correct the

previous denition of the eciency, which assumes that each machine has a same processing capability.Due to high heterogeneity of the used network (Tab. 2), CN provides a better description of ourapproach eciency than EN would do.

4.3 Results

Figure 7 presents some simulation results for the two environments shown in gure 5. Computationtimes are presented in gure 7a and b, for an increasing number of workstations. The ten rst worksta-tions are the 10 SGI O2, the SGI Indys being added to the network for congurations including morethan 10 machines. These two gures include two dierent computation times for each conguration:the maximum time, which is the slowest slave machine's computation time; and the minimum time,for the fastest slave machine. It appears that the dierences between these two times are often verysmall. This highlights the well balancing of our approach. Note however that when the number ofslave increases, the ratio of maximum and minimum time increases too, decreasing the load balancing.Figures 7c and d present the two types of eciency we have previously dened and the speedup for thetwo scenes. Note that En and Cn are equal for 2 to 10 machines, as each of those machines has a unitKi. Both speedup and eciency are good for the rst scene (turbid medium), but they are less goodfor the maize environment. This dierence comes from two sources:

Maize elements have been assumed to be homogeneously distributed inside the canopy volume(like in the turbid medium). However this hypothesis is not right, because of the general 3Dstructure of plants.

The parallel computation are more computationnaly demanding as compared to the sequentialalgorithm. Obviously, some communications occurred. But as no change has been made to thesequential NR algorithm, each time a slave receives some elements, it rst builds the correspondingvoxels. In Sequential NR, each voxel is built only one time, but in our parallel NR, each voxel isbuilt several times during the resolution process.

Even if the second problem appears for each scene, the heterogeneity of the maize canopy increasesits impact for this environment. Indeed almost the same amount of computation is required to rebuiltthe voxels grid around a low density voxel (which lie in the plants interspace for example) than fora high density one. But the number of form factors rows the slave has to compute for a low densityvoxel is smaller than for a high density one. Consequently, more time is spent in useless computation(as compared to the sequential algorithm) for low density voxels than for the others. Turbid mediumvoxels have all the same density once the ratio between the grid rebuilt step and the strict form factorscomputation step is the same for each machine. Increasing both the voxel size and the surroundingsphere diameter provides better results for eciency and for speedup (see gure 7d). In this case, thevoxel content becomes more homogeneous for the maize environment, by distributing more ecientlythe plants interspaces. But it decreases load balancing between the machines. Eects of voxels andsphere sizes are currently under investigation.

10

Second Eurographics Workshop on Parallel Graphics and Visualisation, pp 117-131 24-25 September 1998, Rennes, France

SpeedUp

Cn

En

6 Slaves 8 Slaves 10 Slaves 12 Slaves 14 Slaves 16 Slaves 18 Slaves

100

25

75

18

16

14

12

10

8

6

4

2

4 Slaves2 Slaves

50

SpeedUp

En

Cn

EfficiencySpeedUp

(c) SpeedUp & Efficiency for turbid medium

SpeedUp

Cn

En

SpeedUpEfficiency

Cn

SpeedUp

SpeedUp Diam= 0.5

Cn diam= 0.5

En Diam =0.5

SpeedUp Diam = 1

Cn Diam = 1

6 Slaves 8 Slaves 10 Slaves 12 Slaves 14 Slaves 16 Slaves 18 Slaves

100

25

75

18

16

14

12

10

8

6

4

2

4 Slaves2 Slaves

50

(d) SpeedUp & Efficiency for maize

10 slaves1 master

2 slaves1 master

4 slaves1 master

6 slaves1 master

8 slaves1 master

12 slaves1 master

14 slaves1 master

16 slaves1 master

18 slaves1 master

Maximum time Minimum time

130120110

10090

8070

60

50

4030

2010

(a) computation times for the turbid medium

sequential

Time (minutes)

10 slaves1 master

2 slaves1 master

4 slaves1 master

6 slaves1 master

8 slaves1 master

12 slaves1 master

14 slaves1 master

16 slaves1 master

18 slaves1 master

Maximum time Minimum time

130120

110100

9080

70

60

50

4030

20

10

(b) computation times for the maize

sequential

Time (minutes)

Figure 7: Evaluation of the eciency of the parallel nested radiosity

11

Second Eurographics Workshop on Parallel Graphics and Visualisation, pp 117-131 24-25 September 1998, Rennes, France

5 Conclusion and perspectives

This paper has presented a parallel approach for computing the form factors in vegetation. Plants aredescribed as 3D sets of elements and the geometrical relationships between each pair of elements allowthe radiative exchanges in canopy to be simulated. Those exchanges can then be used for canopy imagedisplay or in environmental physics for earth's observation or as main parameter in the simulation ofplants growing. Our approach takes advantage of specicities of the nested radiosity (mainly the datalocality) for data and computation distribution. In this master-slaves approach each slave processorcomputes form factors between neighboring elements, while the master processor both distribute thework between slaves and recovers the form factors matrix rows. The results we obtained highlight boththe eciency and the speedup provided by this approach.

However some new improvements have to be considered in order to increase the parallel computationeciency. At rst each time a slave processor has completed it's elements' voxel computation, it asks fora new voxel. In our approach the master sends this new voxel (whether available) and its neighboringvoxels. This simple approach requires however to send a lot of data each time the elements of a newvoxel have to be processed. A way to decrease this quantity of data is to take care of elements voxelsthat can be already found in each slave memory when choosing a new voxel for this slave: if this newvoxel is already in the slave memory (it was one of the neighbors of the previously computed voxel)thus a part of its own neighboring voxels are already in the slave memory too (they was neighbors ofthe previous voxel too). Taking this property into account for the master voxel choice algorithm seemsto be more dicult but it should reduce the communication requirements.

Another investigation area is to remove the strict homogeneity hypothesis of canopy medium. In rstapproximation they have been classed as turbid medium, allowing the elements data to be distributedthrough a regular grid of voxels. But as for the maize scene, vegetations canopies are generally notstrickly homogeneous. Distributing data trough regular size voxels can produce unbalanced and uselesswork during form factors computation. This draw backs should be reduced by considering a canopycutting out using a more balanced binary space partition. Canopies with a greater heterogeneity willalso imply some investigation on the volume-based model: is the SAIL model accurate enough in thiscase or which more complex models [27] should be well adapted?

Finally our approach requires the entire canopy to be represented in the master processor memory.This imposes the master to have a large memory space for great size canopy. Removing this constraintrequires to distribute all the data through all the processors and to ensure some ecient collaborationmechanisms between them. As the basic operation of the linear solver is the multiplication of a vectorby a line of the system matrix, the solver can be easily parallelized [19] keeping the system matrixdistributed on all the processors. Furthermore distribution schemes of the voxels should be carefullystudied in order to reduce communication requirements and unbalanced work.

References

[1] Franck Aries. Modélisation surfacique d'un couvert végétal pour l'étude du rayonnement. Thèsede doctorat en informatique, Université de Nantes, 1997. (in French).

[2] B. Arnaldi, T. Priol, L. Renambot, and X. Pueyo. Visibility masks for solving complex radiositycomputations on multiprocessor. pages 219232. Eurographics, September 1996.

[3] Bruno Arnaldi, Xavier Pueyo, and Josep Vilaplana. On the division of environments by virtualwalls for radiosity computation. In P.Brunet and F. W. Jansen, editors, Photorealistic Renderingin Computer Graphics. Springer-Verlag, 1994.

[4] Gladimir Baranoski, Randall Bramley, and Peter Shirley. Fast radiosity solutions for high averagereectance environments. In P. Hanrahan and W. Purgathofer, editors, Rendering Techniques'95.Eurographics, Springer-Verlag, June 1995.

[5] D.R. Baum and J.M. Winget. Real time radiosity though parallel processing and hardware accel-eration. In Proceedings of SIGGRAPH '91, Computer Graphics Proceedings, Annual ConferenceSeries, pages 5160. ACM SIGGRAPH, ACM Press, 1991.

[6] C.C. Borel, S.A.W Gerstl, and B.J. Powers. The radiosity method in optical remote sensing ofstructured 3D surfaces. Remote Sens. Environ., 36(1):1344, 1991.

[7] A.G. Chalmers and D.J. Paddon. Parallel processing of progressive renement radiosity methods.Eurographics, May 1991.

12

Second Eurographics Workshop on Parallel Graphics and Visualisation, pp 117-131 24-25 September 1998, Rennes, France

[8] Subrahmanyan Chandrasekhar. Radiative Transfer. Clarendon Press, Oxford, UK, 1950.

[9] Michael Chelle. Développement d'un modèle de radiosité mixte pour simuler la distribution durayonnement dans les couverts végétaux. Thèse 3e cycle, Université de Rennes I (Informatique),1997. 160 p. (in french).

[10] Michael Chelle and B. Andrieu. The nested radiosity model for the distribution of light withinplant canopies. Ecological Modeling, to appear, 1998.

[11] Michaël Chelle, B. Andrieu, and K. Bouatouch. Nested radiosity for plant canopies. The VisualComputer, to appear, 1998.

[12] Michael F. Cohen, Shenchang E. Chen, John R. Wallace, and Donald P. Greenberg. A progressiverenement approach to fast radiosity image generation. Computer Graphics, 22(4):7584, August1988.

[13] M. Feda and W. Purgathofer. Progressive renement radiosity on a transputer network. Euro-graphics, May 1991.

[14] Alain Fournier, Eugene Fiume, Marc Ouellette, and Chuan K. Chee. FIAT LUX: Light-DrivenGlobal Illumination. Technical Report DGP89-1, Dynamic Graphics Project, University ofToronto, January 1989.

[15] Christian Fournier and Bruno Andrieu. A 3D architectural and process-based model of maizedevelopment. Annals of Botany, 81:233250, 1998.

[16] A. Geist, J. Beguelin, W. Dongarra, R. Jiang, and S. Mancheck. PVM 3 user's guide and referencemanual. Technical report, 1994.

[17] N. Goel, I. Rozehnal, and R. Thompson. A computer graphics based model for scattering fromobjects of arbitrary shapes in the optical region. Remote Sens. Environ., 36(2):73194, 1991.

[18] Cindy M. Goral, Kenneth E. Torrance, Donald P. Greenberg, and Bennett Battaile. Modelling theinteraction of light between diuse surfaces. In Computer Graphics (SIGGRAPH '84 Proceedings),volume 18, pages 21222, July 1984.

[19] C. Li A. Greenbaum and H. Z. Chao. Parallelizing preconditioned conjugate gradient algorithms.Computer Physics Communications, 53:295309, 1989.

[20] P. Guitton, J. Roman, and C. Schlick. Two parallel approaches for parallel radiosity. Eurographics,May 1991.

[21] P. Guitton, J. Roman, and G. Subrenat. Implementation results and analysis of a parallel pro-gressive radiosity. pages 3138. IEEE/ACM, 1995.

[22] Hamlyn G. Jones. Plant and microclimate, a quantitative approach to environmental plant physi-ology. Cambridge University Press, 2nd edition, 1992.

[23] James T. Kajiya. The rendering equation. In David C. Evans and Russell J. Athay, editors,Computer Graphics (SIGGRAPH '86 Proceedings), volume 20, pages 143150, August 1986.

[24] Zbigniew Leyk. Modied generalised conjugate residuals method for non-symetric systems of linearequations. Research Report CMA-MR33-93, Austr. Nat. Univ. CMA/SMS, 1993.

[25] Nelson Max, Curtis Mobley, Brett Keating, and En-Hua Wu. Plane-Parallel Radiance Transportfor Global Illumination in Vegetation. In J. Dorsey and P. Slusallek, editors, Rendering Tech-niques'97. Eurographics, Springer-Verlag (Wien), June 1997.

[26] Gregory M. Maxwell, Michael J. Bailey, and Victor Goldschmidt. Calculations of the radiationconguration factor using ray tracing. ComputerAided Design, 18(7):371379, September 1986.

[27] R. Myneni and J. Ross, editors. Photon-Vegetation Interactions - Applications in Optical RemoteSensing and Plant Ecology. Springer-Verlag, 565 p., 1991.

[28] Albert Olioso. The SAIL model and its extensions. Personnal communication [email protected], September 1995.

[29] Chris Patmore. Illumination of dense foliage models. In 4th Eurographics Workshop on Rendering,pages 6370, Paris, 1993.

[30] R.J. Recker, D.W. George, and D.P. Greenberg. Acceleration techniques for progressive renementradiosity. In Proceedings of SIGGRAPH '90, Computer Graphics Proceedings, Annual ConferenceSeries, pages 5966. ACM SIGGRAPH, ACM Press, 1990.

13

Second Eurographics Workshop on Parallel Graphics and Visualisation, pp 117-131 24-25 September 1998, Rennes, France

[31] L. Renambot, B. Arnaldi, T. Priol, and X. Pueyo. Towards ecient parallel radiosity for dsm-basedparallel computers using virtual interfaces. IEEE/ACM, 1997.

[32] C. Renaud, F. Bricout, and E. Leprêtre. Massively parallel hemispherical projection for progressiveradiosity. Computers & Graphics, 19(2):273279, March 1995.

[33] C. Renaud and F. Rousselle. Fast massively parallel progressive radiosity on the mp-1. pages201218. Eurographics, September 1996.

[34] Juhan K. Ross and A. L. Marshak. Calculation of canopy bidirectional reectance using the MonteCarlo method. Remote Sens. Environ., 24:213225, 1988.

[35] François Sillion. Clustering and volume scattering for hierarchical radiosity calculations. InG. Sakas and P. Shirley, editors, Photorealistic Rendering Techniques, pages 105115. Springer-Verlag, 1994.

[36] D.B. Singh, S.G. Abraham, and F.H. Westervelt. Computing radiosity solution on a high perfor-mances workstation lan. pages 248257, 1992.

[37] Brian Smits, James Arvo, and Donald Greenberg. A clustering algorithm for radiosity in complexenvironments. In Andrew Glassner, editor, Proceedings of SIGGRAPH '94 (Orlando, Florida, July2429, 1994), Computer Graphics Proceedings, Annual Conference Series, pages 435442. ACMSIGGRAPH, ACM Press, July 1994. ISBN 0-89791-667-0.

[38] A. Varshney and J.F. Prins. An environment-projction approach to radiosity for mesh connectedcomputers. Eurographics, 1992.

[39] W. Verhoef. Earth observation modeling based on layer scattering matrices. Remote Sens. Envi-ron., 17:164178, 1985.

14

Second Eurographics Workshop on Parallel Graphics and Visualisation, pp 117-131 24-25 September 1998, Rennes, France