Post on 06-Apr-2023
+
Hybrid Computing @ KAUST Many Cores and OpenACC
Alain Clo - KAUST Research Computing Saber Feki – KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Agenda – Hybrid Computing
n Hybrid Computing n From Multi-Physics to Multi-Computing - The Needs n Ecosystem – SW – HW n Trends and Convergence - Market
n ManyCores and OpenACC
n Economics
n Kaust Examples n Acoustics n Electromagnetics
n From Academia to Industry – The Opportunity@KAUST
n OpenACC Training on Jan30th B9-R2220 9:30am
+ From Multi Physics – Multi Scales – to Hybrid-Computing
MultiPhysics
• Fracture Simulation • Reservoir Modelling • Aerosol
Maths and Discretization
• Partial Derivatives Equations
• Volume Integral Equations
Programing Models
• OpenACC – fine grain • OpenMP – coarse grain • MPI - large grain
Multi Computing
• CPUs • GPGPU Accelerators
FPGA
Fractures Simulation are Compute Intensive. Accelerators can absorb the peak needs, can OpenACC help to use it?
+Hybrid Computing Platforms
n Hybrid = Heterogeneous
n Hybrid Computing Platforms are made of CPUs +GPUs or Accelerators or FPGA
n Examples of Vendors : n NVidia : GPUs
n AMD : Accelerators and GPUs
n Intel : Accelerators Xeon Phi
n FPGA : Convey, Maxeler, SRC
n http://www.conveycomputer.com/
n http://www.maxeler.com/
n http://www.srccomp.com/
+ Hybrid Computing - CPU+GPU – Many Cores
NVidia Cuda DP Gflops
Cores Mhz GB SMP
S1070 1.3 345 192 1200 4 30
C2075 2.0 515 448 1600 5.3 14
K20X 3.5 1310 2688 730 6.1 14
+ Programming GPU Environment
n CUDA n Cuda 5 drivers
n Cuda SDK, cuda compilers, debuggers, profilers
n Cuda Toolkits : samples
n Libraries : cuFFT, cuBLAS, cuSPARSE
n Applications (catalog ~ 300 CUDA/GPU Enabled) n Molecular Dynamics : Amber, Gromacs, Lammps, Namd, Vasp
n Computational Chemistry : NW Chem
n Computational Structural Mechanics : Abaqus, Ansys
n Geophysics : CGG Veritas, Paradigm Echos, Schlumberger WesternGeco
n Maths : Matlab, Mathematica, Maple
http://www.nvidia.com/docs/IO/123576/nv-applications-catalog-lowres.pdf
+ Programming Environments Evolution - OpenACC
n CUDA 2006 – OpenCL 2008 - GPGPU
n OpenACC in 2011 n CAPS
n PGI
n CRAY
n Advantages of OpenACC n Preserves the legacy
n Incremental Optimization and porting on the GPU/Accelerator
n Very Simple to Implement
n Looks like OpenMP
n Exploit broad Opportunities of Optimizations (fine and coarse grain)
n http://www.openacc-standard.org/
+Type of Parallelism - Technology granularity
Fine Grain
Coarse Grain
Large Grain
Ex. Domain decomposition
Message Passing
Task parallelism
Data stream parallelism
Instruction level parallelism
SIMD Instructions (SSE,..)
Dynamic, load balancing oriented
Data locality oriented
Target accelerators / many-cores
Compilers’ target
Application programmers’ level
+Programming Models OpenMP - OpenACC
OpenMP Cuda OpenACC
Memory Model Coherent – Shared Variables -‐ Private Variables
-‐Global Memory Not Coherent -‐Shared Local Memory Coherent inside blocks
-‐Global Memory Not Coherent -‐Shared Local Memory Coherent inside blocks
Parallel Constructs
SIMD -‐ loops SPMD -‐ regions MIMD -‐ tasks
Kernel with hierarchy of -‐Grid -‐Block/Warp -‐Thread
Kernel or Parallel Hierarchy of -‐Gang -‐Worker -‐Vector
+Hardware and Programming Convergence
n Many Cores Adoption n Intel : Sandybridge, MIC
n AMD/ATI : Radeon, Fusion
n Nvidia : Kepler, Maxwell
n OpenACC n Cray
n PGI
n NVidia
n CAPS
+Market is growing n The global economy in HPC is growing again (IDC 2011)
n 2010 grew by 10%, to reach $9.5 billion
n forecasting ~7% growth over the next 5 years
n 30% of all HPC sites use Accelerators mostly GPGPUs (IDC)
n Top500 list – Nov 2012 n #1 Titan@ORNL : 18 PetaFlops system with 261000 K20 cores
n 3 of the first 10 are Hybrid Computers using Accelerators either Intel Xeon PHI or Nvidia
n Accelerators are being adopted by major mainstream vendors
n Accelerators are part of the ExaScale Race
+ Hybrid Computing @ KAUST 0.5 Petaflops on GPGPU n KAUST awarded CUDA Research Center
n GPGPU Computing at KAUST > 0.5 Pflops q Laptops ~ 100 Tflops q Desktops ~ 300Tflops q Few Intel Xeon Phi MICs q Extreme Computing : 50 Tflops q Noor : 64 Tesla C1060, on 24 Fermi and 64 Kepler TBD > 100 Tflops
n NEW OpenACC Compilers q CAPS q PGI
n GPU Applications and Libraries q Matlab, Maple, Mathematica, Abaqus, Ansys q MAGMA, Fast Multipole Method (FMM)
n Competences at KAUST KSL and Research Computing
+ Hybrid Computing @ KAUST 0.65 Petaflops on GPGPU n KAUST awarded CUDA Research Center
n GPGPU Computing at KAUST > 0.65 Pflops q Laptops ~ 100 Tflops q Desktops ~ 400Tflops q Few Intel Xeon Phi MICs q Extreme Computing : 50 Tflops q Noor : 32 Tesla C1060, on 24 Fermi and 64 Kepler TBD > 100 Tflops
n NEW OpenACC Compilers q CAPS q PGI
n GPU Applications and Libraries q Matlab, Maple, Mathematica, Abaqus, Ansys q MAGMA, Fast Multipole Method (FMM)
n Competences at KAUST KSL and Research Computing
+Academia to Industry
n Hybrid Computing is a big opportunity for KAUST n KAUST has the Critical Mass
n to create value in Research and Industry
n Develop New Algorithms
n Develop New Libraries and Applications
n Develop New Knowledge, New Competences
n Create Business through Economic Development
n CAPS is a good example of transfer from Academia to Industry