Performance Evaluation of C, Julia, Kokkos and Python/Numba in Exascale High Performance Computing

janrinok

from SoylentNews on 2023-03-27 13:04 (#6A799)

Julia and Kokkos perform comparably with C/OpenMP on CPUs, while Julia implementations are competitive with CUDA and HIP on GPUs:

High-level dynamic languages such as Python, Julia, and R have been at the forefront of artificial intelligence/machine learning (AI/ML), data analysis, and interactive computing workflows in the last decade. Traditional high-performance computing (HPC) frameworks that power the underlying low-level computations for performance and scalability are written in compiled languages: C, C++, and Fortran.
[...] We analyze single node scalability on two systems hosted at the Oak Ridge Leadership Computing Facility (OLCF)1-Wombat, which uses Arm Ampere Neoverse CPUs and 2 NVIDIA A100 GPUs, and Crusher, which is equipped with AMD EPYC 7A53 CPUs and 8 MI250X GPUs and serves as a test bed for Frontier, the first exascale system on the TOP500 list.
[...] We run hand-rolled general matrix multiplication (GEMM) code for dense matrices using Julia, Python/Numba and Kokkos implementations and compare the performance with C for multithreaded CPU (OpenMP) and single GPU (CUDA/HIP) systems. GEMM is an important kernel in the Basic Linear Algebra Subprograms (BLAS) used across several deep learning AI frameworks, for which modern GPU architectures have been heavily optimized via tensor cores.
[...] For CPUs, Julia performance was comparable to C/OpenMP combined with LLVM-based ArmClang and AMDClang vendor compilers. For the AMD GPUs, Julia AMDGPU.jl performance was comparable to HIP. Nevertheless, there is still a performance gap on NVIDIA A100 GPUs for single-precision floating point cases.
[...] We observe that Python/Numba implementations still lack the support needed to reach comparable CPU and GPU performance on these systems, and AMD GPU support is deprecated.

Pre-print article:
William F. Godoy and Pedro Valero-Lara and T. Elise Dettling and Christian Trefftz and Ian Jorquera and Thomas Sheehy and Ross G. Miller and Marc Gonzalez-Tallada and Jeffrey S. Vetter and Valentin Churavy, Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes, Accepted at the 28th HIPS workshop, held in conjunction with IPDPS 2023, 2023, 2303.06195, https://doi.org/10.48550/arXiv.2303.06195

Original Submission

Source	RSS or Atom Feed
Feed Location	https://soylentnews.org/index.rss
Feed Title	SoylentNews
Feed Link	https://soylentnews.org/
Feed Copyright	Copyright 2014, SoylentNews