HPC Development in China A Brief Review and Prospect ...
-
Upload
khangminh22 -
Category
Documents
-
view
0 -
download
0
Transcript of HPC Development in China A Brief Review and Prospect ...
DEPEI QIAN
Qian Depei, Professor at Sun Yat-sen university and Beihang University, Dean
of the School of Data and Computer Science of Sun Yat-sen University.
Since 1996 he has been the member of the expert group and expert
committee of the National High-tech Research & Development Program (the
863 program) in information technology. He was the chief scientist of three
863 key projects on high performance computing since 2002. Currently, he is
the chief scientist of the 863 key project on high productivity computer and
application service environment.
His current research interests include high performance computer architecture
and implementation technologies, distributed computing, network
management and network performance measurement. He has published over
300 papers in journals and conferences.
HPC Development in China: A Brief Review and Prospect
HPC development in China: a brief review
and prospect
Depei Qian Beihang University/Sun Yat-sen University
ASC 2017, Wuxi, China April. 27, 2017
Outline
• A Brief review
• Issues in exa-scale system development
• Prospect of HPC in China’s 13th 5-year
Plan
A Brief review
Three 863 key projects on HPC
• 2002-2005:High Performance Computer and Core Software
– Research on resource sharing and collaborative work
– Grid-enabled applications in multiple areas
– TFlops computers and China National Grid (CNGrid) testbed
• 2006-2010:High Productivity Computer and Grid Service Environment
– High productivity
• Application performance
• Efficiency in program development
• Portability of programs
• Robust of the system
– Emphasizing service features of the HPC environment
– Developing peta-scale computers
• 2010-2016:High Productivity Computer and Application Service Environment
– Developing 100PF computers
– Developing large scale HPC applications
– Upgrading of CNGird
High performance Computers
(1996-2016)
• 1996
– China: Dawning 1000, 2.5GF
– US:1 TF • 2016
– China: Sunway Taihulight, 125PF, 50 million times increase in 20 years
• Milestones – 2001: Dawning 3000, 432GF,cluster
– 2003: DeepComp 6800, 5.32TF, cluster
– 2004: Dawning 4000, 11.2TF, cluster
– 2008: DeepComp 7000, 150TF, Heterogeneous cluster
Dawning 5000A, 230 TF, cluster
– 2010: TH-1A, 4.7PF, heterogeneous accelerated architecture
Dawning 6000, 3PF, heterogeneous accelerated architecture
– 2011: Sunway Bluelight, multicore-based, 1PF, home-grown processors
Dawning3000 DeepComp 6800
Dawning 4000 DeepComp 7000
Dawning 5000A Dawning 6000
TH-1A Sunway Bluelight
High performance Computers (1996-2016)
• 2013: Tianhe-2
– CPU+MIC Heterogeneous
accelerated architecture
– 54.9 PF peak, 33.9 PF Linpack,
No. 1 in Top500 for 6 times
from 2013 to 2015
– Installed at the National
Supercomputing Center in
Guangzhou
– Will be upgraded to 100PF this
year
• 2016: Sunway TaihuLight
– Implemented with home-grown
Shenwei many-core processors,
10 million cores in total
– 125 PF peak, 93 PF Linpack, No.
1 in Top500 in June and Nov. of
2016
– Installed at the National
Supercomputing Center in Wuxi
Sunway Bluelight Tianhe-2
High performance computing
environment (1996-2016)
• 1996 – One national HPC center in
Hefei, equipped with Dawning-I, 640 MIPS
– US computing infrastructure: PACI supported by NSF and DoE centers, emerging of Grid technology
• 2016 – China National Grid,
composed of 17 national supercomputing centers and HPC centers, world leading class computing resources
High performance computing
applications (1996-2016)
• 1996 – Limited HPC application in weather forecasting, oil exploration – 16-32 parallelism – Relying on import application software
• 2016 – HPC applications in many domains – 10-million core parallelism reached, Gordon Bell Prize in 2016 – Developed a number application software, adopted by production systems
• aircraft design • high speed train design • oil & gas exploration • new drug discovery • ensemble weather forecasting • bio-information • car development • design optimization of large fluid machinery • electromagnetic computation • …
Important experiences
• Coordinated effort of the national research programs and the regional development plans
– Providing matching funding for development of leading class HPC systems
– Joint effort by the MOST and local government in establishing the national supercomputing centers
• Multi-lateral cooperation
– Supercomputing centers play an important role • in determining the metrics of the HPC systems developed
• In selection of the team to develop the system
– enterprises participate in the national R&D program • Inspur, Sougon, and Lenovo were involved in HPC system development
• promoting the development while improving the technical capability and competitiveness of the company
– application organizations lead the development of the application software
• Balanced and coordinated development of the HPC machines, HPC environment, and HPC application
Problems identified
• Lack of the long-term national program for high performance computing
• Weak in kernel HPC technologies – processor/accelerator
– novel devices (new memory, storage, and network)
– large scale parallel algorithms and programs implementation
• Application software is the bottleneck – applications rely on imported commercial software
• expensive
• small scale parallelism
• restricted by export regulation
• Shortage in cross-disciplinary talents – No enough talents with both domain and IT knowledge
• Lack of multi-disciplinary collaboration
Issues in exa-scale system
development
Major Challenges to exa-scale
systems
• Power consumption
• Performance obtained by applications
• Programmability
• Resilience
• How to make tradeoffs between performance,
power consumption, and programmability?
• How to achieve continuous no-stop operation?
• How to adapt to a wide range of applications
with reasonable efficiency?
Architecture
• Novel architectures beyond the current
heterogeneous accelerated/manycore-based
expected
• Co-processor or partitioned heterogeneous
architecture?
– Low utilization of the co-processor in some
applications, using CPU only
– Bottleneck in moving data between CPU
and co-processor
• Application-aware architecture
– on-chip integration of special purpose units
(idea from Prof. Andrew Chien)
– using the right tool to do right things
– dynamic reconfigurable, how to program?
Memory system
• Pursuing large capacity, low latency, high bandwidth
• Increase capacity and lower power consumption by using DRAM/NVM together
– Data placement issue
– Handle the high write cost and limited lifetime of NVM due to write
• Improving bandwidth and latency by using the 3D stack technology
• Reduce the data move by placing the data closer to processing
– HBM/HMC near processor
– On-chip DRAM
– Simple functions in memory
• Reduce data copy cost by using unified memory space in heterogeneous architecture
Interconnect
• Pursuing low latency, high bandwidth and low energy consumption
• Adopt new technologies
– Silicon photonics communication between components
– Optical interconnect / communication
– Miniature optical devices
• High scalability adapting to exa-scale system interconnect requirement
– Connecting 10,000+ nodes
– Low-hop, low-latency topology
– Reliable and intelligent routing
Programming the
heterogeneous systems
• Addressing the issues in programming the heterogeneous
parallel systems
– efficient expression of the parallelism, dependence, data
sharing, execution semantics
– problem decomposition appropriate for heterogeneous
systems
• Improving programming by means of a holistic approach
– New programming models
– Programming languages extension and compiler
– Parallel debugging
– Runtime support and optimization
– Architectural support
Computational models and
algorithms
• Full-chain innovation
– mathematical methods
– computer algorithms
– algorithm implementation and optimization
• A good mathematical method is often more effective than hardware improvement and algorithm optimization to the performance
• Architecture-aware algorithm implementation and optimization is necessary for heterogeneous systems
• Domain-specific libraries for improving software productivity and performance
Resilience
• Resilience is one of the key issues of the exa-scale system
– Large scale of the system
• 50K to 100K nodes
• Huge amount of components
– Very short MTBF
– Long time non-stop operation required for solving large scale
problems
• Reliability measures at different levels, including device,
node, and system levels
• Software / hardware coordination
– fast context saving and recovery for checkpointing in case of
short MTBF
– fault-tolerance at the algorithm and application software level
Importance of the tools
• Development and optimization of large scale parallel software require scalable and efficient tools
• Particularly important for systems implemented with home-grown processors
– current commercial and research tools do not support
• Three kinds of default tools required
– Parallel debugger for correctness
– Performance tuner for performance
– Energy optimizer for energy efficiency
Urgent need for an eco-system
• The eco-system for exa-scale system based on home-grown processors is in a urgent need
– languages, compilers, OS, runtime
– tools
– application development support
– application software
• Need to attract the hardware manufacturers and the third party software developers
– product family instead of a single machine
• Collaboration between industry, academia and end-users required
Prospect of HPC in China’s 13th 5-
year Plan
Reform of research system in China
• The national research and development system
is being reformed
– Merge 100+ different national R&D
programs/initiatives into 5 tracks of national
programs
• Basic research program (NSFC)
• Mega-science and technology programs
• Key R&D program (former 863, 973, enabling
programs)
• Enterprise innovation program
• Facility/talent program
A New key project on HPC
• High performance computing has been identified
as a priority subject under the key R&D program
• Strategic studies and planning have been
conducted since 2013
• A proposal on HPC in the 13th five-year plan was
submitted in early 2015
• A key R&D project approved in Oct. 2015 by a
multi-government agency committee led by the
MOST
Motivations for the new key project
• The key value of exa-scale computers identified
– Addressing the grand challenge problems • Energy shortage, pollution, climate change…
– Enabling industry transformation • supporting development of important products
– high speed train, commercial aircraft, automobile…
• promoting economy transformation
– For social development and people’s benefit • new drug discovery, precision medicine, digital media…
– Enabling scientific discovery • high energy physics, computational chemistry, new material,
astrophysics…
• Promote computer industry by technology transfer
• Developing HPC systems by self-controllable technologies
– a lesson learnt from the recent embargo regulation
Goals
• Strengthening R&D on kernel technologies and
pursuing the leading position in high performance
computer development
• Promoting HPC applications and establishing the
application eco-system
• Building up an HPC infrastructure with service
features and exploring the path to the HPC service
industry
Major tasks
• Exa-scale computer development – R&D on novel architectures and key technologies of the exa-scale
computer
– Developing the exa-scale computer based on home-grown processors
– Technology transfer to promote development of high-end servers
• HPC applications development – Basic research on exa-scale modeling methods and parallel
algorithms
– Developing high performance application software
– Establishing the HPC application eco-system
• HPC environment development – Developing software and platform for national HPC environment
– Upgrading of the national HPC environment CNGrid
– Developing service systems on the national HPC environment
• Each task will cover basic research, key technology development, and application demonstration
• Basic research
– Novel high performance interconnect
• Theoretical work on the novel interconnect
– based on the enabling technologies of 3D chips, silicon
photonics and on-chip networks
– Programming & execution models for exa-scale
systems
• new programming models for heterogeneous systems
• Improving programming efficiency
Task 1: Exa-scale computer development
Task 1: Exa-scale computer development
• Key technology – Prototype systems for verifying the exa-scale
system technologies • possible architectures for exa-scale computers • implementation strategies • technologies for energy efficiency • prototype system
– 512 nodes – 5-10TFlops/node – 10-20Gflops/W – point to point bandwidth>200Gbps – MPI latency<1.5us – Emphasis on self-controllable technologies
• system software for prototypes • 3 typical applications to verify the design
Task 1: Exa-scale computer development
• Key technology
– exa-scale system technologies
• architecture optimized for multi-objectives
• high efficient computing node
• high performance processor/accelerator design
• exa-scale system software
• scalable interconnect
• parallel I/O
• exa-scale infrastructure
• energy efficiency
• exa-scale system reliability
Task 1: Exa-scale computer development
• Exa-scale computer system development – exaflops in peak
– Linpack efficiency >60%
– 10PB memory
– EB storage
– 30GF/w energy efficiency
– interconnect >500Gbps
– large scale system management and resource
scheduling
– easy-to-use parallel programming environment
– system monitoring and fault tolerance
– support large scale applications
Task 2: HPC application development
• Basic research
– computable modeling and computational
methods for exa-scale systems
– scalable highly efficient parallel algorithms
and parallel libraries for exa-scale systems
Task 2: HPC application development
• Key technology
– programming framework for exa-scale software
development, including framework for
• structured mesh
• unstructured mesh
• mesh-free combinatory geometry
• finite element
• graph computing
• supporting development of at least 40 software
with million-core parallelism
Task 2: HPC application development
• Key technology and demo applications
– Numerical devices and their applications
• numerical nuclear reactor
– four components: Including reactor core particle transport, thermal hydraulics, structural mechanics and material optimization,
– non-linear coupling of multi-physics processes
• numerical aircraft
– multi-disciplinary optimization covering aerodynamics, structural strength and fluid solid interaction
• numerical earth system
– earth system modeling for studying climate change
– non-linear coupling of multi-physical and chemical processes covering atmosphere, ocean, land, and sea ice
• numerical engine
– high fidelity simulation system for numerical prototyping of commercial aircraft engine
– enabling fast and accurate virtual airworthiness experiments
Task 2: HPC application development
• Key technology and demo applications – high performance application software for domain
applications • complex engineering project and critical equipment
• numerical simulation of ocean
• design of energy-efficient large fluid machineries
• drug discovery
• electromagnetic environment simulation
• ship design
• oil exploration
• digital media rendering
– high performance application software for scientific research
• material science
• high energy physics
• astrophysics
• life science
Task 2: HPC application development
• Eco-system for HPC application software development – establishing a national-level R&D center for HPC
application software
– build up of a platform for HPC software development and optimization
– tools for performance/energy efficiency and pre-/post-processing
– build up software resource repository
– developing typical domain application software
– a joint effort involving national supercomputing centers, universities, and institutes
Task 3: HPC environment development
• Basic research – models and architecture of computational services
• service representation of distributed computing and storage resources
• architecture for computational services
• resource discovery and access
• unified management
• trading and accounting
– virtual data space
• architecture for cross-domain virtual data space
• integration of distributed storage
• unified access and management of virtual data space
• domain partition, separation and security
Task 3: HPC environment development
• Key technology
– mechanism and platform for the national HPC environment
• technical support for service–mode operation
• upgrading the national HPC environment (CNGrid)
– >500PF computing resources
– >500PB storage
– >500 application software and tools
– >5000 users (team users)
Task 3: HPC environment development
• Demo applications
– service systems based on the national HPC
environment
• integrated business platform, e.g.
– complex product design
– HPC-enabled EDA platform
• application villages
– innovation and optimization of industrial products
– drug discovery
– SME computing and simulation platform
• platform for HPC education
– provide computing resources and services to
undergraduate and graduate students
Calls for proposals
• The first call for proposal was issued in Feb. ,
2016. 19 projects passed the evaluation and
were launched in July, 2016
• The second call (for 2017) was issued in Oct.,
2016, the proposal evaluation has ended and the
final results are expected to be announced soon
• These two rounds of call cover most of the
subjects of the key project except the exa-scale
system development.
• The exa-scale system development will be
started after completion of the three prototypes
Thank you!