Computer Architecture and SysTems Research Lab (CASTL)

Address: 114 Milton Carothers Hall (MCH), Tallahassee, FL 32306; Contact: Dr. Weikuan Yu (yuw@cs.fsu.edu), 850-644-5442

Tadoop: A Dual-Purpose Framework for Data Analytics and HPC

This project is supported by an NSF award ACI-1432892 titled as EAGER: Tadoop: A Dual-Purpose Framework Taming the Bipolarity of Storage and Communication for High-Performance Computing and Data Analytics.

Project Mission

High-performance computing (HPC) providers and applications need next-generation solutions to process big data from scientific simulations. Conventional HPC systems found in national laboratories and universities are constructed based on the compute-centric paradigm while enterprise big data analytics applications prefer a data-centric paradigm such as MapReduce. Distinct architectural differences between these two paradigms demand unconventional approaches. This project takes a radically different approach to investigate key architectural components in compute-centric and data-centric paradigms, designs a transformative dual-purpose framework called Tadoop that addresses their bipolarity issues in storage and communication management, and unifies them for both HPC and enterprise analytics applications.

Research Activities

This high-risk Tadoop framework can enable a transformative data infrastructure for both HPC and data analytics applications and lead to broader impact in several aspects, such as demonstrating the transformation of existing HPC infrastructures into dual-purpose systems for computing and analytics, improving computer science curricula and instruction effectiveness, strengthening multidisciplinary data analytics research, releasing open-source software code, and transferring technologies for commercial service.

  1. Y. Wang, R. Goldstone, W. Yu, T. Wang. Characterization and Optimization of Memory-Resident MapReduce on HPC Systems. 28th IEEE International Parallel and Distributed Processing Symposium (Acceptance rate: 21%). Tucson, AZ. May 2014.
  2. C. Xu*, R. Goldstone, Z. Liu*, H. Chen*, B. Neitzel, W. Yu. Exploiting Analytics Shipping with Virtualized MapReduce on HPC Backend Storage Servers. IEEE Transactions on Parallel and Distributed Systems. DOI: 10.1109/TPDS.2015.2389262.


This work is funded in part by National Science Foundation awards ACI-1432892 while at Auburn and ACI-1561041 while at FSU.

Get Source Code

If you are interested in getting a copy of our source code that enables virtualized analytics shipping on Lustre, please file in a request via this form. An email message will be sent to you with the link to our code.

