Hybrid Computing and Programming

17
+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki – KAUST Supercomputing Lab Florent Lebeau - CAPS

Transcript of Hybrid Computing and Programming

+

Hybrid Computing @ KAUST Many Cores and OpenACC

Alain Clo - KAUST Research Computing Saber Feki – KAUST Supercomputing Lab Florent Lebeau - CAPS

+ Agenda – Hybrid Computing

n Hybrid Computing n From Multi-Physics to Multi-Computing - The Needs n Ecosystem – SW – HW n Trends and Convergence - Market

n ManyCores and OpenACC

n Economics

n Kaust Examples n Acoustics n Electromagnetics

n From Academia to Industry – The Opportunity@KAUST

n OpenACC Training on Jan30th B9-R2220 9:30am

+ From Multi Physics – Multi Scales – to Hybrid-Computing

MultiPhysics

•  Fracture Simulation •  Reservoir Modelling •  Aerosol

Maths and Discretization

•  Partial Derivatives Equations

•  Volume Integral Equations

Programing Models

•  OpenACC – fine grain •  OpenMP – coarse grain •  MPI - large grain

Multi Computing

•  CPUs •  GPGPU Accelerators

FPGA

Fractures Simulation are Compute Intensive. Accelerators can absorb the peak needs, can OpenACC help to use it?

+Hybrid Computing Platforms

n  Hybrid = Heterogeneous

n  Hybrid Computing Platforms are made of CPUs +GPUs or Accelerators or FPGA

n  Examples of Vendors : n  NVidia : GPUs

n  AMD : Accelerators and GPUs

n  Intel : Accelerators Xeon Phi

n  FPGA : Convey, Maxeler, SRC

n  http://www.conveycomputer.com/

n  http://www.maxeler.com/

n  http://www.srccomp.com/

+ Hybrid Computing - CPU+GPU – Many Cores

NVidia Cuda DP Gflops

Cores Mhz GB SMP

S1070 1.3 345 192 1200 4 30

C2075 2.0 515 448 1600 5.3 14

K20X 3.5 1310 2688 730 6.1 14

+ Programming GPU Environment

n CUDA n  Cuda 5 drivers

n  Cuda SDK, cuda compilers, debuggers, profilers

n  Cuda Toolkits : samples

n Libraries : cuFFT, cuBLAS, cuSPARSE

n Applications (catalog ~ 300 CUDA/GPU Enabled) n  Molecular Dynamics : Amber, Gromacs, Lammps, Namd, Vasp

n  Computational Chemistry : NW Chem

n  Computational Structural Mechanics : Abaqus, Ansys

n  Geophysics : CGG Veritas, Paradigm Echos, Schlumberger WesternGeco

n  Maths : Matlab, Mathematica, Maple

http://www.nvidia.com/docs/IO/123576/nv-applications-catalog-lowres.pdf

+ Programming Environments Evolution - OpenACC

n CUDA 2006 – OpenCL 2008 - GPGPU

n OpenACC in 2011 n  CAPS

n  PGI

n  CRAY

n Advantages of OpenACC n  Preserves the legacy

n  Incremental Optimization and porting on the GPU/Accelerator

n  Very Simple to Implement

n  Looks like OpenMP

n  Exploit broad Opportunities of Optimizations (fine and coarse grain)

n  http://www.openacc-standard.org/

+Type of Parallelism - Technology granularity

Fine Grain

Coarse Grain

Large Grain

Ex. Domain decomposition

Message Passing

Task parallelism

Data stream parallelism

Instruction level parallelism

SIMD Instructions (SSE,..)

Dynamic, load balancing oriented

Data locality oriented

Target accelerators / many-cores

Compilers’ target

Application programmers’ level

+Programming Models OpenMP - OpenACC

    OpenMP   Cuda   OpenACC  

Memory  Model   Coherent    –  Shared  Variables  -­‐  Private  Variables  

-­‐Global  Memory  Not  Coherent    -­‐Shared  Local  Memory  Coherent  inside  blocks  

-­‐Global  Memory  Not  Coherent    -­‐Shared  Local  Memory  Coherent  inside  blocks  

Parallel  Constructs  

SIMD  -­‐  loops  SPMD  -­‐  regions  MIMD  -­‐  tasks  

Kernel  with  hierarchy  of    -­‐Grid  -­‐Block/Warp    -­‐Thread  

Kernel  or  Parallel  Hierarchy  of  -­‐Gang  -­‐Worker  -­‐Vector  

+ Trends and Convergence

+Hardware and Programming Convergence

n  Many Cores Adoption n  Intel : Sandybridge, MIC

n  AMD/ATI : Radeon, Fusion

n  Nvidia : Kepler, Maxwell

n  OpenACC n  Cray

n  PGI

n  NVidia

n  CAPS

+Market is growing n  The global economy in HPC is growing again (IDC 2011)

n  2010 grew by 10%, to reach $9.5 billion

n  forecasting ~7% growth over the next 5 years

n  30% of all HPC sites use Accelerators mostly GPGPUs (IDC)

n  Top500 list – Nov 2012 n  #1 Titan@ORNL : 18 PetaFlops system with 261000 K20 cores

n  3 of the first 10 are Hybrid Computers using Accelerators either Intel Xeon PHI or Nvidia

n  Accelerators are being adopted by major mainstream vendors

n  Accelerators are part of the ExaScale Race

+ Hybrid Computing @ KAUST 0.5 Petaflops on GPGPU n  KAUST awarded CUDA Research Center

n  GPGPU Computing at KAUST > 0.5 Pflops q  Laptops ~ 100 Tflops q  Desktops ~ 300Tflops q  Few Intel Xeon Phi MICs q  Extreme Computing : 50 Tflops q  Noor : 64 Tesla C1060, on 24 Fermi and 64 Kepler TBD > 100 Tflops

n  NEW OpenACC Compilers q  CAPS q  PGI

n  GPU Applications and Libraries q  Matlab, Maple, Mathematica, Abaqus, Ansys q  MAGMA, Fast Multipole Method (FMM)

n  Competences at KAUST KSL and Research Computing

+

Thank You

+ Hybrid Computing @ KAUST 0.65 Petaflops on GPGPU n  KAUST awarded CUDA Research Center

n  GPGPU Computing at KAUST > 0.65 Pflops q  Laptops ~ 100 Tflops q  Desktops ~ 400Tflops q  Few Intel Xeon Phi MICs q  Extreme Computing : 50 Tflops q  Noor : 32 Tesla C1060, on 24 Fermi and 64 Kepler TBD > 100 Tflops

n  NEW OpenACC Compilers q  CAPS q  PGI

n  GPU Applications and Libraries q  Matlab, Maple, Mathematica, Abaqus, Ansys q  MAGMA, Fast Multipole Method (FMM)

n  Competences at KAUST KSL and Research Computing

+Academia to Industry

n Hybrid Computing is a big opportunity for KAUST n  KAUST has the Critical Mass

n  to create value in Research and Industry

n  Develop New Algorithms

n  Develop New Libraries and Applications

n Develop New Knowledge, New Competences

n Create Business through Economic Development

n CAPS is a good example of transfer from Academia to Industry

+OpenACC Training on Jan30th in Building 9 R2220 9:30am

n Introduction to GPU computing

n CUDA architecture and programming model

n OpenACC Overview & compilers

n OpenACC Programming

n ModelManaging data with OpenACC

n OpenACC loop constructs

n Asynchronism with OpenACC

n OpenACC runtime API