High performance parallel computing for FDTD numerical technique in electromagnetic calculations for...

7
High Performance Parallel Computing for FDTD Numerical Technique in Electromagnetic Calculations for SAR Distribution Inside Human Head HESHAM ELDEEB # , HALA ELSADEK # , MAHA DESSOKEY # , HAYTHAM ABDALLAH # and NADER BAGHERZADEH * # Electronics Research Institute, Cairo, EGYPT # [email protected] * Electrical and Computer Engineering, Henri Samuli school of Engineering, University of California, Irvine, Irvine, CA, USA Abstract: - The interest in high performance computing (HPC) nowadays has increased the need of computational resources to solve large scale problems. The technological improvements over the past few years in areas such as microprocessors, memory, networks, and software, have made it possible to assemble groups of economical personal computers and/or workstations into a cost effective system with high processing power. In this paper, three HPC platforms are utilized and performance is compared to calculate the electromagnetic power absorbed by human head due to radiation from antennas of handheld devices. Parallel processing performance comparison of the three platforms shows that the IBM BlueGene supercomputer still preserves the largest speedup and efficiency. However the cluster and grid computing offer far cheaper solutions for small to moderate size problems beside the more utilization of the already existing computing resources. Key-Words: - High Performance Computing (HPC), cluster computing, grid computing, BlueGene supercomputer, Specific Absorption Rate (SAR), Finite Difference Time Domain (FDTD), Microstrip antenna 1. Introduction Nowadays, it seems that electronic designs increasingly require electromagnetic characterization. Due to the ever increasing usage of the wireless hand held devices that radiates electromagnetic waves in all the surrounding environments. To facilitate such analysis, numerical techniques have been developed. Among the most common computational techniques for lossy materials is the Finite Difference Time Domain (FDTD) method. The FDTD is one of the most common and robust numerical techniques that are trusted in computing scattering inside inhomogeneous materials such as the human body [3-8]. The main problem with this technique is the large time and memory consumption when solving for real practical scattering problems. The time consumption increases by a factor related to the multiplication of the three dimensions of the computational domain and the number of time steps. Run times in the order of hours, days, or even longer are common when solving electromagnetic waves problems of realistic size. Thus, the need for parallel processing algorithms becomes a must especially for on spot/ real time applications. In the FDTD, the Electric field E and the magnetic field H are evaluated at each time step from the neighbourhood fields in the previous time step. Hence no need for more matrix inversion. The near-field is also evaluated directly and the far field can then be evaluated using near-field to far-field transformation as shown in eq. 1 to eq. 6, which represents the six fields’ components at each discretized domain cell as shown in figure 1, assuming a dielectric material having a dielectric constant ε , permeability μ and conductivity σ. ( ) ( ) + + = + + k j, 1/2, i n x Ε a C k j, 1/2, i 1 n x Ε ( ) ( ) { k j , 2 / 1 , 2 / i n z k j i n z e N 1 2 / 1 , 2 / 1 , 2 / 1 2 / 1 + + Η + + + Η ( ) ( ) } 1/2 k j, 1/2, i 1/2 n y Η 1/2 k j, 1/2, i 1/2 n y Η + + + + + (eq.1) ( ) ( ) + + Ε = + + Ε k j i n y a C k j i n y , 2 / 1 , , 2 / 1 , 1 ( ) ( ) { 2 / 1 , , 2 / 1 2 / 1 2 / 1 , 2 / 1 , 2 / 1 + + Η + + + Η i n x k j i n x e N ( ) k j ( ) } k j , 2 / 1 , 2 / 1 + i n z k j i n z 2 / 1 , 2 / 1 , 2 / 1 2 / 1 + Η + + + Η (eq.2) LATEST TRENDS on COMPUTERS (Volume I) ISSN: 1792-4251 114 ISBN: 978-960-474-201-1

Transcript of High performance parallel computing for FDTD numerical technique in electromagnetic calculations for...

High Performance Parallel Computing for FDTD Numerical Technique in Electromagnetic Calculations for SAR Distribution Inside Human

Head HESHAM ELDEEB#, HALA ELSADEK#, MAHA DESSOKEY#, HAYTHAM ABDALLAH#and

NADER BAGHERZADEH*

#Electronics Research Institute, Cairo, EGYPT # [email protected]

* Electrical and Computer Engineering, Henri Samuli school of Engineering,

University of California, Irvine, Irvine, CA, USA

Abstract: - The interest in high performance computing (HPC) nowadays has increased the need of computational resources to solve large scale problems. The technological improvements over the past few years in areas such as microprocessors, memory, networks, and software, have made it possible to assemble groups of economical personal computers and/or workstations into a cost effective system with high processing power. In this paper, three HPC platforms are utilized and performance is compared to calculate the electromagnetic power absorbed by human head due to radiation from antennas of handheld devices. Parallel processing performance comparison of the three platforms shows that the IBM BlueGene supercomputer still preserves the largest speedup and efficiency. However the cluster and grid computing offer far cheaper solutions for small to moderate size problems beside the more utilization of the already existing computing resources.

Key-Words: - High Performance Computing (HPC), cluster computing, grid computing, BlueGene

supercomputer, Specific Absorption Rate (SAR), Finite Difference Time Domain (FDTD), Microstrip antenna

1. Introduction Nowadays, it seems that electronic designs

increasingly require electromagnetic characterization. Due to the ever increasing usage of the wireless hand held devices that radiates electromagnetic waves in all the surrounding environments. To facilitate such analysis, numerical techniques have been developed. Among the most common computational techniques for lossy materials is the Finite Difference Time Domain (FDTD) method. The FDTD is one of the most common and robust numerical techniques that are trusted in computing scattering inside inhomogeneous materials such as the human body [3-8].

The main problem with this technique is the large time and memory consumption when solving for real practical scattering problems. The time consumption increases by a factor related to the multiplication of the three dimensions of the computational domain and the number of time steps. Run times in the order of hours, days, or even longer are common when solving electromagnetic waves problems of realistic size. Thus, the need for parallel

processing algorithms becomes a must especially for on spot/ real time applications.

In the FDTD, the Electric field E and the magnetic field H are evaluated at each time step from the neighbourhood fields in the previous time step. Hence no need for more matrix inversion. The near-field is also evaluated directly and the far field can then be evaluated using near-field to far-field transformation as shown in eq. 1 to eq. 6, which represents the six fields’ components at each discretized domain cell as shown in figure 1, assuming a dielectric material having a dielectric constant ε , permeability μ and conductivity σ.

( ) ( )++=++ kj,1/2,inxΕaCkj,1/2,i1n

( ) ( ){ kj ,2/1,2/ −inzkjin

zeN 12/1,2/1,2/12/1 ++Η−+++Η

( ) ( )}1/2kj,1/2, −i1/2nyΗ1/2kj,1/2,i1/2n

yΗ ++−+++−

(eq.1) ( ) ( )++Ε=++Ε kjin

yaCkjiny ,2/1,,2/1,1

( )

( ){ 2/1,,2/1 −2/12/1,2/1,2/1 ++Η−+++Η inxkjin

xeN

( )kj

( )}kj ,2/1,2/1 +inzkjin

z2/1,2/1,2/12/1 −+Η−+++Η−

(eq.2)

LATEST TRENDS on COMPUTERS (Volume I)

ISSN: 1792-4251 114 ISBN: 978-960-474-201-1

( ) ( )++Ε=++Ε 2/1,,2/1,,1 kjinzaCkjin

z( )

( ){ 2/1, +kj,2/12/12/1,,2/12/1 −+Η−+++Η inykjin

yeN

( ) ( )}2/1,2 +k/1, −ji2/12/1,2/1,2/1 +Η−+++Η− nxkjin

x (eq.3) ( ) (

( ){ 2/1,,

2/1,2/1,2/12/1,2/1,2/1

+Ε−

++−Η=+++Η

kjinzmN

kjinxkjin

x

( ) ( ) (

)

)}1,2/1,,2/1,2/1,1, −+Ε++Ε−+−Ε− kjinykjin

ykjinz

(eq.4) ( ) (

( ){ kjinxmN

kjinykjin

y

,,2/1

2/1,,2/12/12/1,,2/12/1

+Ε−

++−Η=+++Η

( ) ( ) ( )

)

}2/1,,12/1,,1,,2/1 +−Ε++Ε−−+Ε− kjinzkjin

zkjinx

(eq.5) ( ) (

( ){ kjinymN

kjinzkjin

z

,2/1,

,2/1,2/12/1,2/1,2/12/1

+Ε−

++−Η=+++Η )

)

( ) ( ) ( }1,,2/1,,2/1,2/1,1 −+Ε++Ε−+−Ε− kjinxkjin

xkjiny

(eq.6) Where,

( ) ( )( )εσ

εσ2121,,

ttkjiaCaC

Δ+Δ−

==

( ) ( )εσε

21,,

thtkjieNeN

Δ+Δ

==

( )htkjimNmN

μΔ

== ,,

And the superscript (n) of the fields means that their components are calculated at time , is the electrical conductivity, ε is the dielectric constant, µ is the magnetic permeability and h is the dielectric material thickness and i &j& K are the indices at X,Y and Z directions, respectively. Thus the Yee algorithm can be summarized by the system of difference equations 1 to 6 presented in [1,2,6,8,11].

tntn Δ= σ

Figure 1 shows a unit cell from the discretized domain with fields’ components' positions.

Fig. 1 The Ε components are in the middle of the cell edges and the Η components are in the center of

the cell faces

A parallel algorithm was first built to calculate scattering from a dielectric sphere that contains one inhomogeneous material property that simulates the average properties of the human head tissues and liquids (dielectric constant, conductivity and resistivity) due to an incident electromagnetic plane wave [1].

The algorithm is then applied to simulate the Specific Absorption Rate (SAR) calculations inside a practical case of Magnetic Resonance Imaging (MRI) for human head. SAR is calculated as a

function in the electric field, SAR= 2

2E

ρσ

where

ρ is the sample density. The Widely used commercial dipole antenna is simulated in same radiation domain box and located 5 cm apart from the human head in average [2]. Another more complicated commercial antenna of rectangular microstrip patch is simulated with the same radiation conditions. Due to space limitations, only the results of head model with microstirp antenna will be illustrated in next sections.

The electrical properties of human tissues in this case are derived from the 7 materials Cole-Cole model [6]. The extracted human head model from MRI images is as shown in figure 2.

Fig. 2 Horizontal cross section of the human head MRI image passes through the brain and the eyes with 7 Cole-Cole model defined inhomogeneous

materials

Results for SAR calculations are obtained serially. Parallel algorithms are applied with three different high performance computing platforms, cluster computing, grid computing and BlueGene

LATEST TRENDS on COMPUTERS (Volume I)

ISSN: 1792-4251 115 ISBN: 978-960-474-201-1

supercomputer. Following sections describe the parallel algorithms and illustrate their results.

2. FDTD Calculations for SAR The everyday use of the wireless communication devices raises certain problems due to the interaction between human body and the electromagnetic waves radiated from these devices. Most countries have adopted, or are in the process of adopting one of the two prominent safety standards for the exposure of humans to RF energy. The two common standards are International Commission on Non-Ionizing Radiation and Protection (ICNIRP) 1998 Guidelines for limiting the exposure to time-varying electric, magnetic, and electromagnetic fields (up to 300 GHz) and the IEEE C95.1-1999 Standard for safety levels with respect to human exposure to radio frequency electromagnetic fields, 3 kHz to 300 GHz. Table 1 shows the basic SAR limits for both ICNIRP and IEEE standards [12].

TABLE 1 BASIC SAR RESTRICTIONS (W/KG)

The calculation of the SAR distribution is done

by calculating the steady state value of the scattered electric field inside the head model at the end point in the head. Figure 3 shows the flow chart of calculating the SAR values which, shows that the main bottleneck that takes the long computational time is due to nested loops on the three spatial dimensions (X, Y ,Z) and the time domain t. For example, the modeling of structure with dimensions of typical cells requires the storage of large arrays. In a three-dimensional simulation, arrays for 6 field components, ε and σ must be stored in memory. In addition all 6 field components and the incident source must be computed at all grid points for each time step. The approximate amount of memory required for a simulation is App. Mem. = NxNyNz [(6x8) +8].

Where: Nx, Ny, Nz are the array dimensions in X, Y and Z directions, respectively. The equation

assumes that the 6 field components and permittivity are stored as a double precision floating point values requiring 8 bytes of memory each. A three-dimensional simulation of a volume 15μm on a side (N=200) would require 450 Mb of memory according to the above equation. The actual amount of memory will be greater than this since it does not take into consideration storage of variables for boundary conditions and far field transformation. Also all three dimensions will be iterated through large number of time loops (Ex. 50,000 in head model with microstrip antenna at 1800 MHz operating frequency) to solve these weaknesses of the FDTD method, parallel algorithms are presented in this paper.

Fig. 3 SAR Calculation Flow Chart

3. Parallel calculations of 3D FDTD 3.1 Load distribution One of parallel processing strategies that are used is to

divide the space among the available processors, such that each processor is responsible to calculate the electric and magnetic fields in its space. The sub domain of each processor equals (Nz/No. of processors) where, Nz is the size of the Z dimension, and the remainder of the division is re-distributed among the processors, for example if the domain size equals to [8][8][15]. And if there are six processors then the Nz for each = 15/6=2 and the remainder is equal 3, so the first three processors will take subdomain of [8][8][3] and the rest will take subdomain of [8][8][2].

3.2 Data dependencies As shown previously in eq. 1 and eq. 2, electric

and magnetic fields at each time step are evaluated from the neighbourhood fields in the previous time

Standard Type

Frequency Range

Whole Body SAR

Local SAR (head and

trunk) W/kg

ICNIRP 100KHz-

10GHz 0.08 2

IEEE 100KHz-

6GHz 0.08 1.6

LATEST TRENDS on COMPUTERS (Volume I)

ISSN: 1792-4251 116 ISBN: 978-960-474-201-1

step. So the processors depend on each other, each one exchange the boundaries (electric and magnetic fields values) with its neighbours as shown in figure 4 [7]. From the figure, the tangential magnetic fields Hx and Hy are updated in the subdomain N, and then forwarded to the subdomain (N+1). They are then used as boundary conditions in the subdomain (N+1) to calculate the electric fields Ey and Ex on the subdomain (N+1) and then forwarded to the subdomain N. This procedure, which is repeated at each time step, requires that the electric field be exchanged between two sub-domains following the updating of this field. Likewise the magnetic field is to be exchanged after its updating has been completed, and hence, the field exchange procedure is carried out twice in each time step. Consequently, the sequence of the parallel algorithm in different iterations is as following: in First iteration, each processor will initialize the MPI, (Massively parallel Interface), using MPI_Init command, and each processor will calculate the subdomain in which it will calculate the E and H fields, as described. While At the last iteration the magnetic field will not be sent to the neighbours anymore and MPI will be finalized using the command MPI_Finalize.

Fig.4 Data Dependencies

3.3 Parallel Algorithm Performance Analysis The performance of the algorithm is measured

using two factors: • Wall clock execution time • Speedup factor ”S” which is the ratio of completion time on one processor to completion time on the n-processor system S = Sequential Time/ Parallel Time

4. Parallel platform structure In this paper, we utilize three parallel platforms,

ERI cluster: It is a high performance, private, dedicated,

Linux, homogeneous, group cluster as shown in figure 5. It is formed of one master node and four slave nodes. All of its nodes have a private IP address. This has an advantage that it saves in communication overhead. The communication with the outside world will eventually be through the

server only. MPI is installed and configured as parallel programming environment [9].

Fig. 5: ERI Cluster EUMED grid:

ERI has been participated as a partner in the EUMEDGRID (“Empowering eScience across the Mediterranean”) that is a project co-funded by the European Union. There are 14 partners participating in the EUMED project distributed in different Mediterranean countries. ERI has built the first Egyptian global grid site EG-01-ERI, through the project, that is one of the EUMED VO (virtual organizations) sites; it was built using the gLite middleware. Figure 6 shows the hardware architecture of the EG-01-ERI grid site, which contains a computing element connected to four worker nodes through a private network, a storage element, resource broker and user interface node. gLite middle ware is used to build the global EUMED grid. Network Time Protocol (NTP) with a time server is used for node synchronization.

Fig. 6 EG-01-ERI Hardware Configurations

Blue Gene/L: It is a massively-parallel computing system that

consists of up to 131 072 central processing units

LATEST TRENDS on COMPUTERS (Volume I)

ISSN: 1792-4251 117 ISBN: 978-960-474-201-1

(CPUs) in a Linux®-based programming environment. Blue Gene/L consists of multiple

ink cards, and Service cards A

s

l Association for Remote Sensing and Space).

components as listed and shown in figure 7 [10]: • The racks containing compute nodes, I/O

nodes, LCluster

• Wide File System (CWFS)• Front end node• A service node

The IBM Blue Gene/L machine that is utilized in this paper is one cabinet of 1024 nodes, each of Power PC 440 (700MHz) and memory 1 GB SDRAM-DDR. It is constructed in Egypt NARSS (Nationa

 Fig. 7 Blue Gene Architecture

5. Experimental results on parallel

man head model with m

tribution within the human tissues will decrease.

platforms

5.1 Experiment of huicrostrip antenna

In this experiment, the effect of rectangular microstrip antenna, that is most used now in mobile phones, on a real head model was studied. The antenna is 5 cm away from the head side. The space domain enclosing the human head and the antenna is equal to 60 x 60 x 90 cells. The microstrip antenna has a substrate material of relative permittivity 2.2, a substrate thickness of 6.73 mm and rectangular patch of 134.6 mm in the x direction and 111 mm in the z direction. The feed is performed via a microstrip line of 67.3 mm in the x direction and 37 mm in the z direction. The patch is adjusted to face the head and the ground plane in the opposite direction to get the maximum power that can be radiated from the antenna towards the human head to calculate the worst case of the SAR distribution in the head model. Figure 8 (a, b) shows The SAR distribution through the middle cross section plane in the head model at 900MHz (GSM mobile phone band) and 1800 MHz (DSC wireless laptop band),

respectively. In the case of microstrip the most distant point encounters steady state value after 50,000 time iterations. The Steady state is reached after 11.66 running hours using single processor [11]. From the figure, at the frequency of 900 MHz, the maximum observed SAR value is 0.32 W/kg, and at the frequency of 1800 MHz, the maximum observed SAR value is 0.22W/kg, and those are below that of the two standard guidelines showed in table 1. It is evident that if the electromagnetic wave incident on any medium, some portion of the power will penetrate the medium while other part will be scattered outside. Due to the fact that, the human tissues can be considered as high lossy material the scattered wave amplitude will increase with the increase in frequency. So the penetrated power portion will be reduced with the frequency increase. Hence, the SAR dis

(a)

(b)

Fig. 8 The SAR distribution in human head due to wave radiated from microstrip anscattering tenna at

b) 1800 MHz.

5.2

(a) 900MHz and (

Performance Analysis

Figure 9 shows the execution time and speedup gained of this experiment with different number of processors when running on the three platforms. Figure 9 (a) shows the ERI cluster performance; speed up of 4 was gained when running on 5 processors. More speedup was gained with increasing the number of processors. Then when running on EUMED grid, speedup of 6.5 was gained

LATEST TRENDS on COMPUTERS (Volume I)

ISSN: 1792-4251 118 ISBN: 978-960-474-201-1

at running on 16 processors as shown in figure 9 (b). Speed up of 25 was gained when using 80 processors on Blue Gene/L and time was 17.64 minutes as shown in figure 9 (c). Then a Comparison between the results is shown in figure 9 (d). The figure shows that the highest speedup gained when using Blue Gene/L with more number of processors. It is noticed that with the same number of processors the Blue Gene/L gives more speedup than EUMED Grid due to the high speed interconnection bus used in the Blue Gene/L which reduce the communication time between processors. From figure 9 it was found that the maximum number of processors to use is 45 processors as the speedup become almost constant after this number which is related to the domain size. Figure 10 shows that due to dependencies between processors, the first and last layer in the subdomain is needed to be exchanged every iteration. Then the smallest subdomain is equal to 2 points and the maximum number of processors is limited to Nz/2 processors where, Nz is the domain size in Z direction. In case the number of processors are bigger than Nz/2 then, the communication time will be larger than the computational time and no more speedup will be gained. Since the domain size in Z direction in our experiment is equal to 90 then the best number of processors to be used is 45.

(a)

(c)

(d)

Fig. 9 Performance analysis of human head model with microstrip antenna, on (a) ERI Cluster, (b)

EUMEDGrid, (c) Egypt NARSS Blue Gene and (d) is the comparison between the results on the three

platforms

Fig.10 Number of processors related to domain size

It is worth to mention that ERI construct a grid enabled portal to make researchers connect to ERI cluster as well as EUMED Grid using a web interface and show some services that the user can get benefit from. The parallel FDTD algorithm is published on the grid portal as a service for the user to select the antenna type with the discussed head model and run on ERI cluster to benefit remotely from the HPC facilities. The work on the web portal focused on the addition of visualization capabilities as well as the development of a Common Object (b)

LATEST TRENDS on COMPUTERS (Volume I)

ISSN: 1792-4251 119 ISBN: 978-960-474-201-1

Request Broker Architecture (CORBA) bridge between the web pages and HPC platform [11].

6. Conclusion

Three HPC platforms were presented as a base for any heavy computational applications; ERI Cluster, EUMED Grid and Blue Gene/L supercomputer. One of the heavy computational applications is calculating the Specific Absorption Rates distribution within the human head due to a radiation from antenna as that used in most wireless applications by the FDTD method. We succeeded to design a parallel algorithm for the application and indicate that parallel computing could speed up the runtime by dividing the application’s domain on different processors and exchange boundaries data between processors. By running the parallel algorithm on the three HPC platforms and comparing the performance of them, it was found that the Blue Gene/L gave the highest speedup due to the large number of processors used and lowest communication time. However the cluster and grid computing introduce low and optimized HPC platforms in case of small number of processors are enough (< 45 in our case here) hence, the processors’ communication time is low in this case.

The introduced grid portal gives a good start to build larger portal with more services on it to facilitate to researchers the use HPC Platforms from remotely with low cost.

REFERENCES [1] Elsherbeni, “A FDTD Scattered Field Formulation for Dispersive Media,” APS-2000, Salt Lake, PP. 248-251, July 2000. [2] E. A. Hashish, F.M.EL-Hefnawi, and H. H.Abdullah "Formulation of the FDTD method for Cole-Cole model representation of dispersive media", Engineering Research Journal, Mataria, Cairo, Egypt, Vol.8, 2002, PP. 85

[3] C. C. Johnson, C. H. Durney, and H. Massoudi, "Electromagnetic power absorption in anisotropic tissue media," IEEE Trans. Microwave Theory and Technique, Vol. 23, 1975, PP. 529-532. [4] K. M. Chen, and B. S. Guru, "Internal EM field and absorbed power density in human torsos induced by 1-500-MHz EM waves," IEEE Trans. Microwave Theory and Techniques, Vol. 25, 1977, PP. 746-755. [5] M. J. Hagmann and O. P. Gandhi "Numerical calculation of electromagnetic energy deposition in man with ground and reflector effects," Radio Science, Vol. 14, 1979, PP. 23-29. [6] [http://www.ee.olemiss.edu/atef/index.asp] [7] H. Eldeeb, H. Elsadek, H. Abdallh, M. Desouky and . Bagerhzadeh, "FDTD accelerator for SAR distribution in human head due to radiation from wireless devices", Conference proceeding of EMTS 2007, pp: 2310-2314. [8] Elsadek, H., H. Eldeeb , H. Abdallah, M. Desouky and N. Bagherzadeh, “Specific Absorption Rate Calculation using Parallel 3D Finite Difference Time Domain Technique,” WORLDCOMP'08, Las Vegas, July 2008. [9] A. A. Elsamea, H. Eldeeb and S. Nassar, “PC Cluster as a Platform for Parallel Applications”, WSEAS TRANSACTIONS ON COMPUTERS, pp.1220-1226, Issue 5, Volume 3, November, 2004. [10] G. Lakner and G. L. Mullen-Schultz. “IBM System Blue Gene Solution: System Administration”. IBM Redbooks publication. 2007 [11] H. Eldeeb, H. Elsadek, M. Desouky, H. Abdallah, I. Talkhan and N. Bagherzadeh, “Parallel SAR computations inside human head due to radiation from microstrip antenna using grid portal” HPCNCS-09 conference Orlando , FL , USA, July 2009. [12] K. Fujimoto and J. James.” Mobile Antenna Systems Handbook”. Artech House INC. Publishers. 2nd ed., 2001.

LATEST TRENDS on COMPUTERS (Volume I)

ISSN: 1792-4251 120 ISBN: 978-960-474-201-1