Introduction to the Message Passing Interface (MPI) - PRACE ...

11
Höchstleistungsrechenzentrum Stuttgart Introduction to the Message Passing Interface (MPI) Rolf Rabenseifner [email protected] University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de (for MPI-2.1, MPI-2.2, MPI-3.0, and MPI-3.1) Introduction to the Message Passing Interface (MPI) [03]

Transcript of Introduction to the Message Passing Interface (MPI) - PRACE ...

White cover for

HLRS Logo

Höchstleistungsrechenzentrum Stuttgart

Introduction to the Message Passing Interface (MPI)

Rolf Rabenseifner [email protected]

University of Stuttgart

High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de

(for MPI-2.1, MPI-2.2, MPI-3.0, and MPI-3.1)

Introduction to the Message P

assing Interface (MP

I) [03]

CAUTION Druckanleitung: • Fürs Web:

• Im ppt Druckmenü:

• Handzettel • 2 Folien pro Seite

• Via pdf995 • Papier-Titel:

• Im ppt Druckmenu:

• Folien 1;480 • Papier-Folien:

• Im ppt-Menue Folien alle

• Via pfd995 Treiber

• Im Treiber setup:

• 4 Folien pro Seite

• Landscape ____________ 480 Slides 67 for pri.note 36 Appendix 377 Slides

Aug 12, 2014 Slide 228

Aug 16, 2014 Slide 60 + 277 (short tour)

Sep 21, 2014 Slides 64,156,165,193

199,201,262,265,267,355 Oct 02+28, 2014

Slides 123,184,198,260,261

Nov 28, 2014 Skipped after basic exercises

Jan 13, 2015 Skipping optimized & others

Mar 08, 2015 3.1, shorter talks + more exe.

Nov 02, 2015 xxx

Nov 04, 2015 Separation: Basic+Advanced

Nov 04, 2015 Ch.11 Shmem in --> subsect.

Jan 06, 2016.

Oct 12, 2014 Slides 176, 251, 252,

262, 294, 351,352

Image – courtesy see Acknowledgements slide

Apr 29, 2016 15 new slides, 2 removed,

a lot from 2016/MAINZ + Claudia Blaas-Schenner + Irene Reichl, TU Wien

June 2016 Corrections from 2016-VSC1

(Courtesy to Claudia Blaas-Schenner & Irene Reichl),

2016-HY-S and ISC 2016 Aug 2016

Corrections 2016-ITWM1+ETH Sep 2016

Corrections 2016-ITER-G

Jan 2017 Corrections 2016-JSC

Mar 26, 2017 Corrections

Jul 10 + Aug 1, 2017 Changes in Ch.1 by Irene Reichl

Aug 26, 2017 Corrections from 2017-ETH

Sep 07, 2017 Corrections from 2017-ITER-G

Nov 03-05, 2017 Corrections from 2017-PAR

+ Nov 08, Claudia Blaas-Sch.

Jan 17, 2018 Corrections from 2017-JSC

+ Nov 26, Claudia Blaas-Sch.

Oct 29, 2018 Corrections from 2018-PAR

Nov 16, 2018 Corrections from 2018-VSC4

Jan 19, 2019 Corrections for 2019-ZIH

some based on 2019-HY-G

White cover for

HLRS Logo

Höchstleistungsrechenzentrum Stuttgart Rolf Rabenseifner MPI Course

[3] Slide 27 / 429

Groups & Communicators, Environment management

SoHPC: Parallelization with MPI a

trainers: Claudia Blaas-Schenner & Irene Reichl slides:

Rolf Rabenseifner (HLRS) Claudia Blaas-Schenner & Irene Reichl (VSC RC)

White cover for

HLRS Logo

Höchstleistungsrechenzentrum Stuttgart Rolf Rabenseifner MPI Course

[3] Slide 28 / 429

example:

White cover for

HLRS Logo

Höchstleistungsrechenzentrum Stuttgart Rolf Rabenseifner MPI Course

[3] Slide 29 / 429

example:

White cover for

HLRS Logo

Höchstleistungsrechenzentrum Stuttgart Rolf Rabenseifner MPI Course

[3] Slide 30 / 429

Goals

Support for libraries or application sub-spaces •  Safe communication context spaces

–  e.g., for subsets of processes, –  or duplicated communicators for independent software layers (middle-ware)

•  Group scope for collective operations (à course Chapter 6) •  Re-numbering of the ranks of communicators

–  e.g., for efficient communication on a cluster of shared memory nodes

•  Naming of context spaces •  Add additional user-defined attributes

to a communication context

Chap.8 Groups & Comms, Env.Manag.

à Section (2) of this course chapter

Apr 29, 2016.

A library should always use a duplicate of MPI_COMM_WORLD,

and never MPI_COMM_WORLD itself.

June 2016 Corrections from 2016-VSC1 and 2016-HY-S and ISC 2016

New comment on MPI_C_WORLD

Nov 03-05, 2017 Corrections from 2017-PAR 2nd part: Animated and grey

White cover for

HLRS Logo

Höchstleistungsrechenzentrum Stuttgart Rolf Rabenseifner MPI Course

[3] Slide 31 / 429

Methods – e.g., for coupled applications

ocean weather

Chap.8 Groups & Comms, Env.Manag.

MPI_COMM_WORLD

Additional inter-communicator combining the process groups of ocean and weather

Message within MPI_COMM_WORLD

from rank 7 to 3

Message within MPI_COMM_WORLD from 3 to 5, or within weather from 1 to 2

0 1

5

2 4

3 6

7 8

9

1 0

5 2

3 1

0 0 2

1 4 2

6 3

•  Sub-communicators: Collectively defined communication sub-spaces •  Intra- and inter-communicators

rank within ocean rank within weather

Message within MPI_COMM_WORLD from 6 to 1, or within inter-comm from 3 to 0

Jan 13, 2015 New figure, animated

Same ranks for inter- and intra-communi-cation

rank within MPI_COMM_WORLD

Mar 08, 2015 3.1, shorter talks + more exe.

Apr 29, 2016.

White cover for

HLRS Logo

Höchstleistungsrechenzentrum Stuttgart Rolf Rabenseifner MPI Course

[3] Slide 32 / 429

Example: MPI_Comm_split() All processes with same color are grouped into separate sub-communicators

Example: int my_rank, mycolor, key, my_newrank; MPI_Comm newcomm; MPI_Comm_rank (MPI_COMM_WORLD, &my_rank); mycolor = my_rank/4; key = 0; MPI_Comm_split(MPI_COMM_WORLD, mycolor, key, &newcomm); MPI_Comm_rank (newcomm, &my_newrank);

•  C/C++: int MPI_Comm_split (MPI_Comm comm, int color, int key, MPI_Comm *newcomm)

•  Fortran: MPI_Comm_split (comm, color, key, newcomm, ierror) mpi_f08: TYPE(MPI_Comm) :: comm, newcomm

INTEGER :: color, key; INTEGER, OPTIONAL :: ierror

mpi & mpif.h: INTEGER comm, color, key, newcomm, ierror

C

Fortran

key==0 à ranking in newcomm is sorted as in old comm key ≠ 0 à ranking in newcomm is sorted according key values

0123456789101112131415…MPI_COMM_WORLD0123newcommmycolor==0

0123newcommmycolor==1

0123newcommmycolor==2

0123newcommmycolor==3

0123newcommmycolor==4

Sep 21, 2014 (MPI_Comm_rank

was too late)

Chap.8 Groups & Comms, Env.Manag.

Each process gets only its own

sub-communicator

Each process gets only its own

sub-communicator

Apr 29, 2016 speech bubble

Always 4 process get same color à grouped in an own newcomm Apr 29, 2016

color à mycolor

Creation is collective in the old communicator. Aug 2016 Corrections from 2016-ITWM1

Nov 03-05, 2017 Corrections from 2017-PAR

Animated

White cover for

HLRS Logo

Höchstleistungsrechenzentrum Stuttgart Rolf Rabenseifner MPI Course

[3] Slide 33 / 429

Message P

assing Interface (MP

I) [03]

Exercise 1+2 — Two independent rings

•  Exercise 1: Modify the allreduce or the pass-around-the-ring program: –  Split the communicator into 1/3 and 2/3, e.g., with color = (rank > ⌊size−1/3 ⌋)

as input for MPI_Comm_split –  A: Use allreduce or ring algorithm in both sub-communicators

and calculate sum of ranks in MPI_COMM_WORLD •  E.g., with 12 processes à split into 4 & 8 with global ranks 0..3 & 4..11 and sums 6 & 60 àsumA

–  B: Same, but sum of local ranks in sub-comms •  E.g., with 12 processes à split into 4 & 8 with local ranks 0..3 & 0..7 and sums 6 & 28 à sumB

•  Exercise 2 – advanced: –  Same, but with MPI_Comm_group(),

MPI_Group_range_incl(), and MPI_Comm_create()

•  instead of MPI_Comm_split() •  Two different ranges for color 0 and 1 !!! •  Same results in sumA/B as in Exercise 1

•  Use the results from course Chapter 6 Collectives or Chapter 4 Nonblocking Comm.: ~/MPI/course/F_30/Ch6/ring_allreduce_30.f90 OR ~/MPI/course/F_30/Ch4/ring_30.f90 (with mpi_f08 module)

~/MPI/course/F_20/Ch6/ring_allreduce_20.f90 OR ~/MPI/course/F_20/Ch4/ring_20.f90 (with mpi module) ~/MPI/course/C/Ch6/ring_allreduce.c OR ~/MPI/course/C/Ch4/ring.c

Sol.

Nov 04, 2015 Separation: Basic+Advanced

0..3 ∑=6

0..7 ∑=28

Chap.8 Groups & Comms, Env.Manag.

0..3 ∑=6

4..11 ∑=60

Apr 29, 2016.

Expected results with 12 processes: PE world: 0, color=0 sub:0 SumA= 6 SumB= 6 PE world: 1, color=0 sub:1 SumA= 6 SumB= 6 PE world: 2, color=0 sub:2 SumA= 6 SumB= 6 PE world: 3, color=0 sub:3 SumA= 6 SumB= 6 PE world: 4, color=1 sub:0 SumA=60 SumB=28 PE world: 5, color=1 sub:1 SumA=60 SumB=28 PE world: 6, color=1 sub:2 SumA=60 SumB=28 PE world: 7, color=1 sub:3 SumA=60 SumB=28 PE world: 8, color=1 sub:4 SumA=60 SumB=28 PE world: 9, color=1 sub:5 SumA=60 SumB=28 PE world:10, color=1 sub:6 SumA=60 SumB=28 PE world:11, color=1 sub:7 SumA=60 SumB=28

A:

B:

Apr 29, 2016 Exercise split into two parts

2nd slide integrated Aug 2016

Corrections from 2016-ITWM1

Aug 2016 Corrections from 2016-ITWM1

Aug 2016 Corrections from 2016-ITWM1

Nov 03-05, 2017 Corrections from 2017-PAR

Animated Exa.2 Ring: grey letters

Jan 17, 2018 Corrections from 2017-JSC

+ Nov 26, Claudia Blaas-Sch.: Added login-slide link

see also login-slides

White cover for

HLRS Logo

Höchstleistungsrechenzentrum Stuttgart Rolf Rabenseifner MPI Course

[3] Slide 34 / 429

Sub-groups and sub-communicators

•  Two levels: –  Group of processes

•  Without the ability to communicate •  Local routines to build group & sub-sets •  Same ranks as in related communicator

–  Communicators •  Group of processes with additional ability to communicate

•  Sub-groups and sub-communicators

Chap.8 Groups & Comms, Env.Manag.

Several ways to establish

sub-communicators

June 2016 Corrections from 2016-VSC1 and 2016-HY-S and ISC 2016

New header note

MPI_Comm_group Many MPI_Group_… routines to establish sub-groups

Two methods to establish the new sub-communicator(s): • MPI_Comm_create (can establish many disjunctive sub-comms,

collective over the original communicator) • MPI_Comm_create_group (collective over the sub-group)

same color value within each sub-communicator

or MPI_Comm_split

New in MPI-3.0

Nov 03-05, 2017 Corrections from 2017-PAR

New & animated figures

Several sub-comms in

one call since MPI-2.2

& MPI_Comm_split_type à course Chapter 11

Scalability problems when handling many processes in each process

New in MPI-3.0

White cover for

HLRS Logo

Höchstleistungsrechenzentrum Stuttgart Rolf Rabenseifner MPI Course

[3] Slide 35 / 429

Example: MPI_Group_range_incl() + MPI_Comm_create()

int my_rank, mycolor, my_newrank, ranges[1][3]; MPI_Group world_group, sub_group; MPI_Comm newcomm;

MPI_Comm_rank (MPI_COMM_WORLD, &my_rank); MPI_Comm_group(MPI_COMM_WORLD, &world_group);

mycolor = my_rank/4; /* first rank of my range:*/ ranges[0][0] = mycolor*4; /* last rank of my range:*/ ranges[0][1] = mycolor*4 + (4-1); /* stride of ranks: */ ranges[0][2] = 1; MPI_Group_range_incl ( world_group, 1, ranges, &sub_group);

MPI_Comm_create(MPI_COMM_WORLD, sub_group, &newcomm); MPI_Comm_rank (newcomm, &my_newrank);

Always 4 process get same color à grouped in an own sub_group à grouped in an own newcomm

0123456789101112131415…MPI_COMM_WORLD0123newcommmycolor==0

0123newcommmycolor==1

0123newcommmycolor==2

0123newcommmycolor==3

0123newcommmycolor==4

Sep 21, 2014 (MPI_Comm_rank

was too late)

Chap.8 Groups & Comms, Env.Manag.

Only one range Three values per range: [0]: first rank [1]: last rank [2]: stride

Apr 29, 2016 New slide

Skip practical, move to (2) advanced Move to next course chapter, i.e., skip practical and (2) advanced topics (13 slides) April 2016, Author: Rolf Rabenseifner

Apr 29, 2016 2 REMOVED SLIDE: Sub-group routines Sub-communicators

Group of the processes in MPI_COMM_WOLRD. Group and sub-group creation is local (non-collective).

(Sub-)communicator creation is collective.

Nov 03-05, 2017 Corrections from 2017-PAR

Animated

Nov 03-05, 2017 Corrections from 2017-PAR

+ Nov 08, Claudia Blaas-Sch.

White cover for

HLRS Logo

Höchstleistungsrechenzentrum Stuttgart Rolf Rabenseifner MPI Course

[3] Slide 36 / 429

Chapter 8-(1): Split into two rings & inter-communicator MPI_Comm_size(MPI_COMM_WORLD, &world_size); MPI_Comm_rank(MPI_COMM_WORLD, &my_world_rank);

mycolor = (my_world_rank > (world_size-1)/3); /* This definition of mycolor implies that the first color is 0 à see calculation of remote_leader on next slide*/ MPI_Comm_split(MPI_COMM_WORLD, mycolor, 0, &sub_comm); MPI_Comm_size(sub_comm, &sub_size); MPI_Comm_rank(sub_comm, &my_sub_rank);

right = (my_sub_rank+1) % sub_size; left = (my_sub_rank-1+sub_size) % sub_size;

sumA = 0; snd_buf = my_world_rank; for( i = 0; i < sub_size; i++) { MPI_Issend(&snd_buf, 1, MPI_INT, right, to_right, sub_comm, &request); MPI_Recv(&rcv_buf, 1, MPI_INT, left, to_right, sub_comm, &status); MPI_Wait(&request, &status); snd_buf = rcv_buf; sumA += rcv_buf; }

sumB = 0; snd_buf = my_sub_rank; … à Ring with my_sub_rank à result in sumB

C

36

MPI/course/C/Ch8/ring_tworings.c

back Fortran à in principle, no difference to C

Apr 29, 2016 chapter split.

Apr 29, 2016 color à mycolor

Apr 29, 2016 Fortran Files.