ft her Automatic Code Orchestration from Descriptive Implementations older Professor Brian Vinter Niels Bohr Institute KU
ft her Automatic Code Orchestration from Descriptive Implementations Prototypin for Exascale older Professor Brian Vinter Niels Bohr Institute KU
Domain of interest tillin ru yknin tillin, Users are scientists not prorammers Written and maintained by a small roup of people Chanes often We don t think of Lare community codes Codes where the prorammer is not the scientist Code that will run millions of times
Prototypin today tillin ru yknin tillin, Idea Days Prototype Months Full version
Prototypin today tillin ru yknin tillin, Idea Paper Days Prototype Matlab Months Full version C++
Prototypin today tillin ru yknin tillin, Idea Paper Days Prototype Matlab Months Full version C++ Prototypin: TFlops Taret: PFlops
Prototypin tomorrow tillin ru yknin tillin, Now Tflops => Pflops 2020 Tflops => Pflops => Eflops or Pflops => Eflops
Observations tillin ru yknin tillin, Prototypin is a fundamental requirement Prototypin works on small datasets 1,000 x smaller typically 1,000,000 is less realistic We need a new approach for prototypin for Exascale
Requirements for Exascale Prototypin tillin ru yknin tillin, Prototype must be hih productivity Like Matlab Must run on Petascale machines Must have a speed in the same order as a naïve C++ implementation
Bohrium tillin ru yknin tillin, Build around a flexible n-dimentional tensor data-structure Desined to support all NumPy operations Supported lanuaes (for now): Python/Numpy C/C++.Net (C#, F#, ) Supported hardware: Multicores GPGPUs Clusters/MPP (FPGAs)
Implementation tillin ru yknin tillin, Bohrium is a Just-In-Time compiler Batches all nd-array operations until a flush is required Then analyzes the batched operations and compiles taretspecific code If it does not already exist
Prorammin Approach tillin ru yknin tillin,
C++/OMP tillin ru yknin tillin,
Optimized C tillin ru yknin tillin,
C/OMP/MPI tillin ru yknin tillin,
OpenCL tillin ru yknin tillin,
Multicore tillin ru yknin tillin,
Multicore C and C++ tillin ru yknin tillin,
GPU tillin ru yknin tillin,
Cluster tillin ru yknin tillin,
Black-Scholes tillin ru yknin tillin,
Multicore tillin ru yknin tillin, Speedup 32.9 Scalability: 25.5
GPU tillin ru yknin tillin,
Cluster tillin ru yknin tillin,
NICE tillin ru yknin tillin,
Multicore tillin ru yknin tillin, Speedup 15,5 Scalability: 16,4
GPU tillin ru yknin tillin,
Cluster tillin ru yknin tillin,
Shallow Water tillin ru yknin tillin,
Multicore tillin ru yknin tillin, Speedup 47.8 Scalability: 14.1
GPU tillin ru yknin tillin,
Cluster tillin ru yknin tillin,
Conclusions tillin ru yknin tillin, We believe that we need a new tool for prototypin in the Exascale ae Bohrium is one possible solution to that challene Usin hihly descriptive code we can extract parallelism and memory access patterns from the user code Performance is actually very close to that of naïve C++
tillin ru yknin tillin,