fsucas.jpg

Computer Architecture and SysTems Research Lab (CASTL)

Address: 114 Milton Carothers Hall (MCH), Tallahassee, FL 32306; Contact: Dr. Weikuan Yu (yuw@cs.fsu.edu), 850-644-5442

High-Performance Burst Buffer Systems

Contact: Dr. Weikuan Yu

BurstFS: User-Level Burst Buffer File System

Burst buffers are becoming an indispensable hardware resource on large-scale supercomputers to buffer the bursty I/O from scientific applications. However, there is a lack of software support for burst buffers to be efficiently shared by applications within a batch-submitted job and recycled across different batch jobs. In addition, burst buffers need to cope with a variety of challenging I/O patterns from data-intensive scientific applications. In this study, we have designed an ephemeral Burst Buffer File System (BurstFS) that supports scalable and efficient aggregation of I/O bandwidth from burst buffers while having the same life cycle as a batch-submitted job. BurstFS features several techniques including scalable metadata indexing, co-located I/O delegation, and server-side read clustering and pipelining. Through extensive tuning and analysis, we have validated that BurstFS has accomplished our design objectives, with linear scalability in terms of aggregated I/O bandwidth for parallel writes and reads.

BurstMem: a Memcached-based Remote Shared Burst Buffer System

The growth of computing power on large-scale systems requires commensurate high-bandwidth I/O systems. Many parallel file systems are designed to provide fast sustainable I/O in response to applications’ soaring requirements. To meet this need, a novel system is imperative to temporarily buffer the bursty I/O and gradually flush datasets to long-term parallel file systems. In this paper, we introduce the design of BurstMem, a high-performance burst buffer system. BurstMem provides a storage framework with efficient storage and communica- tion management strategies. Our experiments demonstrate that BurstMem is able to speed up the I/O performance of scientific applications by up to 8.5× on leadership computer systems.

MetaKV: A Specialized Key-Value Store for Distrbuted Burst Buffer Systems

Distributed burst buffers are a promising storage architecture for handling I/O workloads for exascale computing. Their aggregate storage bandwidth grows linearly with system node count. However, although scientific applications can achieve scalable write bandwidth by having each process write to its node-local burst buffer, metadata challenges remain formidable, especially for files shared across many processes. This is due to the global index that needs to be created to organize the distributed file segments in the shared file and that needs to be accessed before any process can read the file segments. Because this global index can be accessed concurrently by thousands or more processes in a scientific application, the scalability of metadata management is a severe performance limiting factor.

We have proposed MetaKV as a key-value store that provides fast, scalable metadata management for HPC metadata workloads for distributed burst buffers. MetaKV complements the functionality of an existing key-value store with specialized metadata services that efficiently handle bursty and concurrent metadata workloads: compressed storage management, supervised block clustering, and log-ring based collective message reduction. Our experiments demonstrate that MetaKV outperforms the state-of-the-art key-value stores by a significant margin: it improves put and get metadata operations by as much as 2.66x and 6.29x, respectively, and the benefits of MetaKV increase with increasing metadata workload demand.

People

  • FACULTY
  • STUDENTS
  1. Teng Wang
  2. Yue Zhu

Publications

  1. T. Wang*, A. Moody, Y. Zhu*, K Mohror, K. Sato, T. Islam, and W. Yu. MetaKV: A Key-Value Store for Metadata Management of Distributed Burst Buffers. 31st IEEE International Parallel and Distributed Processing Symposium. Orlando, FL. (Acceptance rate: 22%). May 2017.
  2. T. Wang*, K. Mohror, A. Moody, K. Sato, W. Yu. An Ephemeral Burst-Buffer File System for Scientific Applications. International Conference for High performance Computing Networking, Storage and Analysis. Salt Lake City, Utah. (Acceptance rate: 18%). November 2016.
  3. T. Wang*, S. Oral, Y. Wang*, B. Settlemyer, S. Atchley, W. Yu. BurstMem: A High-Performance Burst Buffer System for Scientific applications. 2014 IEEE Conference on Big Data (Acceptance rate: 18.5%). Washington, DC. October 2014.

Acknowledgements

This work is funded in part by Lawrence Livermore National Laboratory and Florida State University.


Personal Tools