#NEXTGenIO2019
Advancements in memory technologies open the door to the development of novel domain specific system architectures delivering not only improved performance but reduced energy consumption as well.
Gives an overview about the architecture of the NEXTGenIO prototype.
Outlines the different platform modes such as 1LM and 2LM and their use cases.
Provides an abstract overview of the basic NEXTGenIO software stack.
Shows the fundametal differences between the use of non-volatile main memory as storage and classic storage as we use it today.
Improving data intensive workflows in modern supercomputers by any means possible continues to be a focus of both industry and academic research. And while big steps have been made, most have served to move the I/O bottleneck somewhere else and not actually solve it.
Putting NVRAM into computers, semi-shared burst buffers or even into storage servers to improve overall performance and reduce...
The SAGE storage system developed by the SAGE consortium provides a unique paradigm to store, access and process data in the realm of extreme-scale data-intensive computing. The storage system can incorporate multiple types of storage device technologies in a multi-tier I/O hierarchy, including flash, disk, and more importantly NVDIMMs.
The core software infrastructure driving the storage...
Scientific workflows are well established in parallel computing. A workflow represents a conceptual description of work items and their dependencies. Researchers can use workflows to abstract away implementation details or resources to focus on the high-level behaviour of their work items. Due to these abstractions and the complexity of scientific workflows, finding performance bottlenecks...
With an exponential growth of data, distributed storage systems have become not only the heart, but also the bottleneck of datacenters. High-latency data access, poor scalability, impracticability to manage large datasets, and lack of query capabilities are just a few examples of common hurdles. With ultra-low latency and fine-grained access to persistent storage, Intel Optane DC Persistent...
In this presentation, we will show how CFD simulations can be accelerated using DCPMM. We will show performance results from the NEXTGenIO cluster and explain the different methods employed to improve the end-to-end performance of a real-life CFD simulation.
The architecture of the NextGenIO prototype featuring Intel’s Optane DCPMMs presents both challenges and opportunities as a platform for ECMWF’s time-critical operational workflow.
This workflow generates massive amounts of I/O in short bursts, accumulating to tens of TiB in hourly windows. From this output, millions of user-defined daily products are generated and disseminated to member...
NVRAM technology used for storage and memory within compute nodes of a HPC system has the potential to bring large performance benefits and enable new functionality, especially for computing scientific workflows and processing scientific data. However, safely and efficiently utilising such technology on a large scale, multi-user, HPC system presents challenges using existing applications and...
The emergence of non-volatile memory technologies in high performance servers has resulted in multiple initiatives to exploit these new technologies for scientific computing. Traditionally challenges for adopting new technologies in HPC has focused on CPU technologies – however the paradigm shift of new ways of exploiting memory technologies presents new challenges.
As part of the NextGenIO...
Increased complexity and resolution of modeling systems and increased demand for environmental input to downstream products combine to pose significant challenges for increased volume and velocity of I/O as part of a research or operational workflows, particularly when seeking to exploit distributed computing architectures. While many of the challenges faced are common in the earth system...
Earth system models (ESMs) have increased the spatial resolution to achieve more accurate solutions, producing an enormous amount of data. However, some ESMs use inefficient sequential I/O schemes that do not scale well when many parallel resources are used. This is the case of the OpenIFS model.
OpenIFS is a free and simplified version of the Integrated Forecasting System (IFS), available...
The Earth-System Data Middleware (ESDM) is a performance-aware middleware that builds upon a data model similar to NetCDF and utilises a self-describing on-disk data format for storing structured data. ESDM allows to employ multiple (shared and local) storage systems concurrently and explicitly supports heterogeneous storage infrastructures. From the user/application perspective, the...
The I/O bottleneck is still a challenge to overcome when an HPC system is built. The NEXTGenIO project has investigated this issue over the past four years. A prototype hardware platform based around new non-volatile memory (NVRAM) technology has been designed and built. NVRAM is used to bridging the latency gap between memory and disk. In addition to the hardware, a full software stack has...