HPC Development in China A Brief Review and Prospect ...

DEPEI QIAN

Qian Depei, Professor at Sun Yat-sen university and Beihang University, Dean

of the School of Data and Computer Science of Sun Yat-sen University.

Since 1996 he has been the member of the expert group and expert

committee of the National High-tech Research & Development Program (the

863 program) in information technology. He was the chief scientist of three

863 key projects on high performance computing since 2002. Currently, he is

the chief scientist of the 863 key project on high productivity computer and

application service environment.

His current research interests include high performance computer architecture

and implementation technologies, distributed computing, network

management and network performance measurement. He has published over

300 papers in journals and conferences.

HPC Development in China: A Brief Review and Prospect

HPC development in China: a brief review

and prospect

Depei Qian Beihang University/Sun Yat-sen University

ASC 2017, Wuxi, China April. 27, 2017

Outline

• A Brief review

• Issues in exa-scale system development

• Prospect of HPC in China’s 13th 5-year

Plan

http://images.google.cn/imgres?imgurl=http://a2.att.hudong.com/06/00/01300000169220121834008583031_s.jpg&imgrefurl=http://www.hudong.com/wiki/863%E8%AE%A1%E5%88%92&usg=__RS8uVV9JtI1ghsQfJtqObm1Ozm0=&h=240&w=240&sz=14&hl=zh-CN&start=2&um=1&itbs=1&tbnid=2vJxr0eNpvBLIM:&tbnh=110&tbnw=110&prev=/images?q=863&um=1&hl=zh-CN&newwindow=1&client=aff-cs-TTraveler&sa=N&tbs=isch:1

A Brief review


Three 863 key projects on HPC

• 2002-2005：High Performance Computer and Core Software

– Research on resource sharing and collaborative work

– Grid-enabled applications in multiple areas

– TFlops computers and China National Grid (CNGrid) testbed

• 2006-2010：High Productivity Computer and Grid Service Environment

– High productivity

• Application performance

• Efficiency in program development

• Portability of programs

• Robust of the system

– Emphasizing service features of the HPC environment

– Developing peta-scale computers

• 2010-2016：High Productivity Computer and Application Service Environment

– Developing 100PF computers

– Developing large scale HPC applications

– Upgrading of CNGird


High performance Computers

(1996-2016)

• 1996

– China: Dawning 1000, 2.5GF

– US：1 TF • 2016

– China: Sunway Taihulight, 125PF, 50 million times increase in 20 years

• Milestones – 2001: Dawning 3000, 432GF，cluster

– 2003: DeepComp 6800, 5.32TF, cluster

– 2004: Dawning 4000, 11.2TF, cluster

– 2008: DeepComp 7000, 150TF, Heterogeneous cluster

Dawning 5000A, 230 TF, cluster

– 2010: TH-1A, 4.7PF, heterogeneous accelerated architecture

Dawning 6000, 3PF, heterogeneous accelerated architecture

– 2011: Sunway Bluelight, multicore-based, 1PF, home-grown processors

Dawning3000 DeepComp 6800

Dawning 4000 DeepComp 7000

Dawning 5000A Dawning 6000

TH-1A Sunway Bluelight


High performance Computers (1996-2016)

• 2013: Tianhe-2

– CPU+MIC Heterogeneous

accelerated architecture

– 54.9 PF peak, 33.9 PF Linpack,

No. 1 in Top500 for 6 times

from 2013 to 2015

– Installed at the National

Supercomputing Center in

Guangzhou

– Will be upgraded to 100PF this

year

• 2016: Sunway TaihuLight

– Implemented with home-grown

Shenwei many-core processors,

10 million cores in total

– 125 PF peak, 93 PF Linpack, No.

1 in Top500 in June and Nov. of

2016

– Installed at the National

Supercomputing Center in Wuxi

Sunway Bluelight Tianhe-2


High performance computing

environment (1996-2016)

• 1996 – One national HPC center in

Hefei, equipped with Dawning-I, 640 MIPS

– US computing infrastructure: PACI supported by NSF and DoE centers, emerging of Grid technology

• 2016 – China National Grid,

composed of 17 national supercomputing centers and HPC centers, world leading class computing resources


High performance computing

applications (1996-2016)

• 1996 – Limited HPC application in weather forecasting, oil exploration – 16-32 parallelism – Relying on import application software

• 2016 – HPC applications in many domains – 10-million core parallelism reached, Gordon Bell Prize in 2016 – Developed a number application software, adopted by production systems

• aircraft design • high speed train design • oil & gas exploration • new drug discovery • ensemble weather forecasting • bio-information • car development • design optimization of large fluid machinery • electromagnetic computation • …


Important experiences

• Coordinated effort of the national research programs and the regional development plans

– Providing matching funding for development of leading class HPC systems

– Joint effort by the MOST and local government in establishing the national supercomputing centers

• Multi-lateral cooperation

– Supercomputing centers play an important role • in determining the metrics of the HPC systems developed

• In selection of the team to develop the system

– enterprises participate in the national R&D program • Inspur, Sougon, and Lenovo were involved in HPC system development

• promoting the development while improving the technical capability and competitiveness of the company

– application organizations lead the development of the application software

• Balanced and coordinated development of the HPC machines, HPC environment, and HPC application


Problems identified

• Lack of the long-term national program for high performance computing

• Weak in kernel HPC technologies – processor/accelerator

– novel devices (new memory, storage, and network)

– large scale parallel algorithms and programs implementation

• Application software is the bottleneck – applications rely on imported commercial software

• expensive

• small scale parallelism

• restricted by export regulation

• Shortage in cross-disciplinary talents – No enough talents with both domain and IT knowledge

• Lack of multi-disciplinary collaboration


Issues in exa-scale system

development


Major Challenges to exa-scale

systems

• Power consumption

• Performance obtained by applications

• Programmability

• Resilience

• How to make tradeoffs between performance,

power consumption, and programmability?

• How to achieve continuous no-stop operation?

• How to adapt to a wide range of applications

with reasonable efficiency?


Architecture

• Novel architectures beyond the current

heterogeneous accelerated/manycore-based

expected

• Co-processor or partitioned heterogeneous

architecture?

– Low utilization of the co-processor in some

applications, using CPU only

– Bottleneck in moving data between CPU

and co-processor

• Application-aware architecture

– on-chip integration of special purpose units

(idea from Prof. Andrew Chien)

– using the right tool to do right things

– dynamic reconfigurable, how to program?


Memory system

• Pursuing large capacity, low latency, high bandwidth

• Increase capacity and lower power consumption by using DRAM/NVM together

– Data placement issue

– Handle the high write cost and limited lifetime of NVM due to write

• Improving bandwidth and latency by using the 3D stack technology

• Reduce the data move by placing the data closer to processing

– HBM/HMC near processor

– On-chip DRAM

– Simple functions in memory

• Reduce data copy cost by using unified memory space in heterogeneous architecture


Interconnect

• Pursuing low latency, high bandwidth and low energy consumption

• Adopt new technologies

– Silicon photonics communication between components

– Optical interconnect / communication

– Miniature optical devices

• High scalability adapting to exa-scale system interconnect requirement

– Connecting 10,000+ nodes

– Low-hop, low-latency topology

– Reliable and intelligent routing


Programming the

heterogeneous systems

• Addressing the issues in programming the heterogeneous

parallel systems

– efficient expression of the parallelism, dependence, data

sharing, execution semantics

– problem decomposition appropriate for heterogeneous

systems

• Improving programming by means of a holistic approach

– New programming models

– Programming languages extension and compiler

– Parallel debugging

– Runtime support and optimization

– Architectural support


Computational models and

algorithms

• Full-chain innovation

– mathematical methods

– computer algorithms

– algorithm implementation and optimization

• A good mathematical method is often more effective than hardware improvement and algorithm optimization to the performance

• Architecture-aware algorithm implementation and optimization is necessary for heterogeneous systems

• Domain-specific libraries for improving software productivity and performance


Resilience

• Resilience is one of the key issues of the exa-scale system

– Large scale of the system

• 50K to 100K nodes

• Huge amount of components

– Very short MTBF

– Long time non-stop operation required for solving large scale

problems

• Reliability measures at different levels, including device,

node, and system levels

• Software / hardware coordination

– fast context saving and recovery for checkpointing in case of

short MTBF

– fault-tolerance at the algorithm and application software level


Importance of the tools

• Development and optimization of large scale parallel software require scalable and efficient tools

• Particularly important for systems implemented with home-grown processors

– current commercial and research tools do not support

• Three kinds of default tools required

– Parallel debugger for correctness

– Performance tuner for performance

– Energy optimizer for energy efficiency


Urgent need for an eco-system

• The eco-system for exa-scale system based on home-grown processors is in a urgent need

– languages, compilers, OS, runtime

– tools

– application development support

– application software

• Need to attract the hardware manufacturers and the third party software developers

– product family instead of a single machine

• Collaboration between industry, academia and end-users required


Prospect of HPC in China’s 13th 5-

year Plan


Reform of research system in China

• The national research and development system

is being reformed

– Merge 100+ different national R&D

programs/initiatives into 5 tracks of national

programs

• Basic research program (NSFC)

• Mega-science and technology programs

• Key R&D program (former 863, 973, enabling

programs)

• Enterprise innovation program

• Facility/talent program


A New key project on HPC

• High performance computing has been identified

as a priority subject under the key R&D program

• Strategic studies and planning have been

conducted since 2013

• A proposal on HPC in the 13th five-year plan was

submitted in early 2015

• A key R&D project approved in Oct. 2015 by a

multi-government agency committee led by the

MOST


Motivations for the new key project

• The key value of exa-scale computers identified

– Addressing the grand challenge problems • Energy shortage, pollution, climate change…

– Enabling industry transformation • supporting development of important products

– high speed train, commercial aircraft, automobile…

• promoting economy transformation

– For social development and people’s benefit • new drug discovery, precision medicine, digital media…

– Enabling scientific discovery • high energy physics, computational chemistry, new material,

astrophysics…

• Promote computer industry by technology transfer

• Developing HPC systems by self-controllable technologies

– a lesson learnt from the recent embargo regulation


Goals

• Strengthening R&D on kernel technologies and

pursuing the leading position in high performance

computer development

• Promoting HPC applications and establishing the

application eco-system

• Building up an HPC infrastructure with service

features and exploring the path to the HPC service

industry


Major tasks

• Exa-scale computer development – R&D on novel architectures and key technologies of the exa-scale

computer

– Developing the exa-scale computer based on home-grown processors

– Technology transfer to promote development of high-end servers

• HPC applications development – Basic research on exa-scale modeling methods and parallel

algorithms

– Developing high performance application software

– Establishing the HPC application eco-system

• HPC environment development – Developing software and platform for national HPC environment

– Upgrading of the national HPC environment CNGrid

– Developing service systems on the national HPC environment

• Each task will cover basic research, key technology development, and application demonstration


• Basic research

– Novel high performance interconnect

• Theoretical work on the novel interconnect

– based on the enabling technologies of 3D chips, silicon

photonics and on-chip networks

– Programming & execution models for exa-scale

systems

• new programming models for heterogeneous systems

• Improving programming efficiency

Task 1: Exa-scale computer development



• Key technology – Prototype systems for verifying the exa-scale

system technologies • possible architectures for exa-scale computers • implementation strategies • technologies for energy efficiency • prototype system

– 512 nodes – 5-10TFlops/node – 10-20Gflops/W – point to point bandwidth>200Gbps – MPI latency<1.5us – Emphasis on self-controllable technologies

• system software for prototypes • 3 typical applications to verify the design



• Key technology

– exa-scale system technologies

• architecture optimized for multi-objectives

• high efficient computing node

• high performance processor/accelerator design

• exa-scale system software

• scalable interconnect

• parallel I/O

• exa-scale infrastructure

• energy efficiency

• exa-scale system reliability



• Exa-scale computer system development – exaflops in peak

– Linpack efficiency >60%

– 10PB memory

– EB storage

– 30GF/w energy efficiency

– interconnect >500Gbps

– large scale system management and resource

scheduling

– easy-to-use parallel programming environment

– system monitoring and fault tolerance

– support large scale applications


Task 2: HPC application development

• Basic research

– computable modeling and computational

methods for exa-scale systems

– scalable highly efficient parallel algorithms

and parallel libraries for exa-scale systems



• Key technology

– programming framework for exa-scale software

development, including framework for

• structured mesh

• unstructured mesh

• mesh-free combinatory geometry

• finite element

• graph computing

• supporting development of at least 40 software

with million-core parallelism



• Key technology and demo applications

– Numerical devices and their applications

• numerical nuclear reactor

– four components: Including reactor core particle transport, thermal hydraulics, structural mechanics and material optimization,

– non-linear coupling of multi-physics processes

• numerical aircraft

– multi-disciplinary optimization covering aerodynamics, structural strength and fluid solid interaction

• numerical earth system

– earth system modeling for studying climate change

– non-linear coupling of multi-physical and chemical processes covering atmosphere, ocean, land, and sea ice

• numerical engine

– high fidelity simulation system for numerical prototyping of commercial aircraft engine

– enabling fast and accurate virtual airworthiness experiments



• Key technology and demo applications – high performance application software for domain

applications • complex engineering project and critical equipment

• numerical simulation of ocean

• design of energy-efficient large fluid machineries

• drug discovery

• electromagnetic environment simulation

• ship design

• oil exploration

• digital media rendering

– high performance application software for scientific research

• material science

• high energy physics

• astrophysics

• life science



• Eco-system for HPC application software development – establishing a national-level R&D center for HPC

application software

– build up of a platform for HPC software development and optimization

– tools for performance/energy efficiency and pre-/post-processing

– build up software resource repository

– developing typical domain application software

– a joint effort involving national supercomputing centers, universities, and institutes


Task 3: HPC environment development

• Basic research – models and architecture of computational services

• service representation of distributed computing and storage resources

• architecture for computational services

• resource discovery and access

• unified management

• trading and accounting

– virtual data space

• architecture for cross-domain virtual data space

• integration of distributed storage

• unified access and management of virtual data space

• domain partition, separation and security



• Key technology

– mechanism and platform for the national HPC environment

• technical support for service–mode operation

• upgrading the national HPC environment (CNGrid)

– >500PF computing resources

– >500PB storage

– >500 application software and tools

– >5000 users (team users)



• Demo applications

– service systems based on the national HPC

environment

• integrated business platform, e.g.

– complex product design

– HPC-enabled EDA platform

• application villages

– innovation and optimization of industrial products

– drug discovery

– SME computing and simulation platform

• platform for HPC education

– provide computing resources and services to

undergraduate and graduate students


Calls for proposals

• The first call for proposal was issued in Feb. ,

2016. 19 projects passed the evaluation and

were launched in July, 2016

• The second call (for 2017) was issued in Oct.,

2016, the proposal evaluation has ended and the

final results are expected to be announced soon

• These two rounds of call cover most of the

subjects of the key project except the exa-scale

system development.

• The exa-scale system development will be

started after completion of the three prototypes


Thank you!


HPC Development in China A Brief Review and Prospect ...

Documents

Transcript of HPC Development in China A Brief Review and Prospect ...