Table of Contents

Tadoop: A Dual-Purpose Framework for Data Analytics and HPC

This project is supported by an NSF award ACI-1432892 titled as EAGER: Tadoop: A Dual-Purpose Framework Taming the Bipolarity of Storage and Communication for High-Performance Computing and Data Analytics.

Contact: Dr. Weikuan Yu

Project Mission

High-performance computing (HPC) providers and applications need next-generation solutions to process big data from scientific simulations. Conventional HPC systems found in national laboratories and universities are constructed based on the compute-centric paradigm while enterprise big data analytics applications prefer a data-centric paradigm such as MapReduce. Distinct architectural differences between these two paradigms demand unconventional approaches. This project takes a radically different approach to investigate key architectural components in compute-centric and data-centric paradigms, designs a transformative dual-purpose framework called Tadoop that addresses their bipolarity issues in storage and communication management, and unifies them for both HPC and enterprise analytics applications.

Research Activities

This high-risk Tadoop framework can enable a transformative data infrastructure for both HPC and data analytics applications and lead to broader impact in several aspects, such as demonstrating the transformation of existing HPC infrastructures into dual-purpose systems for computing and analytics, improving computer science curricula and instruction effectiveness, strengthening multidisciplinary data analytics research, releasing open-source software code, and transferring technologies for commercial service.

Research Accomplishments

People

  1. Dr. Jianhui Yue
  1. Yandong Wang
  2. Cong Xu
  3. Zhuo Liu
  4. Fang Zhou
  5. Teng Wang
  6. Kevin Vasko
  7. Hai Pham

Publications while at FSU

  1. [SC'16]: T. Wang*, K. Mohror, A. Moody, K. Sato, W. Yu. An Ephemeral Burst-Buffer File System for Scientific Applications. International Conference for High performance Computing Networking, Storage and Analysis. Salt Lake City, Utah. November 2016. (Acceptance rate: 18%).
  2. [MASCOTS’18] Yue Zhu, Fahim Chowdhury, Huansong Fu, Adam Moody, Kathryn Mohror, Kento Sato and Weikuan Yu. Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems. 26th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. Milwaukee, WI, Sep 2018.
  3. [P2S2'18] W. Yu, Z. Liu, and X. Ding. Semantics-Aware Prediction for Analytic Queries in MapReduce Environment. 11th International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2). Eugene, OR. August 2018.
  4. [FSU'18] Noah Nethery, Weikuan Yu. Classifying Mozart or Not-Mozart Using Deep Neural Networks with Notated Music. Undergraduate Research Symposium Poster. Florida State University. April 2018.
  5. [JCC'17] Zhuo Liu*, Bin Wang*, and W. Yu. HALO: a fast and durable disk write cache using phase change memory. Journal of Cluster Computing. 2017.
  6. [CCGrid'18] H. Fu*, M. Gorentla Venkata, Shaeke Salman*, N. Imam, and W. Yu. SHMEMGraph: Efficient and Balanced Graph Processing Using One-sided Communication. 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). Washington, DC. (Acceptance rate: 21%). May 2018.
  7. [OpenSHMEM'17] Huansong Fu, Manjunath Gorentla Venkata, Neena Imam and Weikuan Yu. Portable SHMEMCache: A High-Performance Key-Value store on OpenSHMEM and MPI. Fourth workshop on OpenSHMEM and Related Technologies. Annapolis, Maryland. August 2017.
  8. [CCGrid'17] H. Fu*, M. Gorentla Venkata, A. Roy Choudhury*, N. Imam, and W. Yu. High-Performance Key-Value Store On OpenSHMEM. 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). Madrid, Spain. (Acceptance rate: 23%). May 2017.
  9. [ParCo'16] Huansong Fu, Haiquan Chen, Yue Zhu and Weikuan Yu. FARMS: Efficient MapReduce Speculation for Failure Recovery in Short Jobs. Journal of Parallel Computing.
  10. [PACT'16] B. Wang*, Y. Zhu*, W. Yu. OAWS: Memory Occlusion Aware Warp Scheduling. International Conference on Parallel Architecture and Compilation Techniques (PACT 2016). September 2016. (Acceptance rate: 26%). Haifa, Israel.
  11. [TPDS'16] C. Xu*, R. Goldstone, Z. Liu*, H. Chen*, B. Neitzel, W. Yu. Exploiting Analytics Shipping with Virtualized MapReduce on HPC Backend Storage Servers. IEEE Transactions on Parallel and Distributed Systems.
  12. [NAS'15]: Fang Zhou*, Hai Pham*, Jianhui Yue*, Hao Zou* and Weikuan Yu. SFMapReduce: An Optimized MapReduce Framework for Small Files. IEEE International Conference on Network, Architecture and Storage (NAS). August 2015, Boston, MA
  13. [DISCS'15] Huansong Fu, Yue Zhu and Weikuan Yu. A Case Study of MapReduce Speculation Mechanism for Failure Recovery. International Workshop on Data-Intensive Scalable Computing Systems (DISCS'15) in conjunction with the ACM/IEEE Supercomputing Conference. Austin, TX. Nov 2015.
  14. [DISCS'15]: L. Shi, Z. Wang, W. Yu, X. Meng. Performance Evaluation and Tuning of BioPig for Genomic Analysis. The 2015 International Workshop on Data-Intensive Scalable Computing Systems (DISCS'15). Paper.

Publications while at Auburn

  1. Y. Wang, R. Goldstone, W. Yu, T. Wang. Characterization and Optimization of Memory-Resident MapReduce on HPC Systems. 28th IEEE International Parallel and Distributed Processing Symposium (Acceptance rate: 21%). Tucson, AZ. May 2014.
  2. C. Xu*, R. Goldstone, Z. Liu*, H. Chen*, B. Neitzel, W. Yu. Exploiting Analytics Shipping with Virtualized MapReduce on HPC Backend Storage Servers. IEEE Transactions on Parallel and Distributed Systems. DOI: 10.1109/TPDS.2015.2389262.
  3. [IPDPS'15] Yandong Wang, Huansong Fu and Weikuan Yu. Cracking Down MapReduce Failure Amplification through Analytics Logging and Migration. 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS'15). Hyderabad, India. May 2015.
  4. [SIGMetrics'14] J. Tan, Y. Wang, W. Yu, L. Zhang. Non-work-conserving effects in MapReduce: Diffusion Limit and Criticality. ACM SigMetrics 2014 (Acceptance rate: 17%). Austin, TX. June 2014.
  5. [IPDPS'14] Yandong Wang, Robin Goldstone, Weikuan Yu, Teng Wang. Characterization and Optimization of Memory-Resident MapReduce on HPC Systems. 28th IEEE International Parallel and Distributed Processing Symposium. Tucson, AZ. May 2014.

Acknowledgements

This work is funded in part by National Science Foundation awards ACI-1432892 while at Auburn and ACI-1561041 while at FSU.

Get Source Code

If you are interested in getting a copy of our source code that enables virtualized analytics shipping on Lustre, please file in a request via this form. An email message will be sent to you with the link to our code.