SYCLcon '22 - Celerity: How (Well) Does the SYCL API Translate to Distributed Clusters?
As the SYCL ecosystem matures, adoption across both research and industry projects steadily increases. By offering a modern, vendor-agnostic way of programming a wide array of accelerator hardware, SYCL has the potential to become an important player in high-performance computing (HPC) as well. In fact, existing pre-exascale and upcoming exascale machines already officially support SYCL or even recommend it as one of their preferred programming models. As such, the question of how well the SYCL programming model translates to distributed computing becomes more prevalent. While traditional approaches such as combining SYCL with the Message Passing Interface (MPI) will undoubtedly remain relevant for years to come, a more forward-thinking approach may be to try and extend SYCL’s ease of use for single node systems to a distributed cluster. The first project to explore this in greater detail is Celerity, a distributed runtime system and API that heavily leans on SYCL in both its API design as well as its underlying execution engine. The validity of its design is currently being evaluated through the porting of two industry use cases for large scale distributed execution as part of the LIGATE project. While Celerity is neither a true subset nor superset of the SYCL API, experienced SYCL users will immediately recognize the familiar structure of its API. In this talk, we will review the SYCL API from the perspective of Celerity and distributed memory programming in general. We will highlight challenges encountered and opportunities for future improvement of the SYCL API. We will begin our presentation by giving an overview of the Celerity programming model, highlighting its similarities to SYCL and introducing core additions to the API. We will showcase how a typical Celerity program is structured, and how an existing SYCL application can be converted to Celerity. Additionally, we will give a brief overview of how Celerity itself uses SYCL internally to power its distributed execution semantics. The main portion of this presentation will concern itself with investigating important features of SYCL and how well they translate to distributed clusters. We will begin by examining core features such as the high-level data-driven APIs of queues, buffers, command groups and accessors in a distributed context. Next, we will highlight newer additions to SYCL such as host tasks and reductions. Finally, we will take a look at APIs that may be considered problematic from a distributed memory perspective, such as unified shared memory (USM). We will conclude our presentation with an outlook on what future versions of SYCL could bring to the table to further improve compatibility with distributed memory clusters. We will review HPC use cases that may not yet be fully covered by SYCL and present several potential improvements that would enhance the experience for both us as library developers as well as users of the traditional MPI~+~SYCL approach. Speaker: Philip Salzmann (University of Innsbruck) Co-Authors: Fabian Knorr and Peter Thoman (University of Innsbruck), and Biagio Cosenza (University of Salerno)