fsucas.jpg

Computer Architecture and SysTems Research Lab (CASTL)

Address: 114 Milton Carothers Hall (MCH), Tallahassee, FL 32306; Contact: Dr. Weikuan (Will) Yu (yuw@cs.fsu.edu), 850-644-5442

Tadoop: A Dual-Purpose Framework for Data Analytics and HPC

This project is supported by an NSF award ACI-1432892 titled as EAGER: Tadoop: A Dual-Purpose Framework Taming the Bipolarity of Storage and Communication for High-Performance Computing and Data Analytics.

Contact: Dr. Weikuan Yu

Project Mission

High-performance computing (HPC) providers and applications need next-generation solutions to process big data from scientific simulations. Conventional HPC systems found in national laboratories and universities are constructed based on the compute-centric paradigm while enterprise big data analytics applications prefer a data-centric paradigm such as MapReduce. Distinct architectural differences between these two paradigms demand unconventional approaches. This project takes a radically different approach to investigate key architectural components in compute-centric and data-centric paradigms, designs a transformative dual-purpose framework called Tadoop that addresses their bipolarity issues in storage and communication management, and unifies them for both HPC and enterprise analytics applications.

Research Activities

This high-risk Tadoop framework can enable a transformative data infrastructure for both HPC and data analytics applications and lead to broader impact in several aspects, such as demonstrating the transformation of existing HPC infrastructures into dual-purpose systems for computing and analytics, improving computer science curricula and instruction effectiveness, strengthening multidisciplinary data analytics research, releasing open-source software code, and transferring technologies for commercial service.

Research Accomplishments

People

  • FACULTY
  1. Dr. Jianhui Yue
  • STUDENTS
  1. Yandong Wang
  2. Cong Xu
  3. Zhuo Liu
  4. Fang Zhou
  5. Teng Wang
  6. Kevin Vasko
  7. Hai Pham

Publications while at FSU

  1. [SC'16]: T. Wang*, K. Mohror, A. Moody, K. Sato, W. Yu. An Ephemeral Burst-Buffer File System for Scientific Applications. International Conference for High performance Computing Networking, Storage and Analysis. Salt Lake City, Utah. November 2016. (Acceptance rate: 18%).
  2. [MASCOTS’18] Yue Zhu, Fahim Chowdhury, Huansong Fu, Adam Moody, Kathryn Mohror, Kento Sato and Weikuan Yu. Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems. 26th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. Milwaukee, WI, Sep 2018.
  3. [P2S2'18] W. Yu, Z. Liu, and X. Ding. Semantics-Aware Prediction for Analytic Queries in MapReduce Environment. 11th International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2). Eugene, OR. August 2018.
  4. [FSU'18] Noah Nethery, Weikuan Yu. Classifying Mozart or Not-Mozart Using Deep Neural Networks with Notated Music. Undergraduate Research Symposium Poster. Florida State University. April 2018.
  5. [JCC'17] Zhuo Liu*, Bin Wang*, and W. Yu. HALO: a fast and durable disk write cache using phase change memory. Journal of Cluster Computing. 2017.
  6. [CCGrid'18] H. Fu*, M. Gorentla Venkata, Shaeke Salman*, N. Imam, and W. Yu. SHMEMGraph: Efficient and Balanced Graph Processing Using One-sided Communication. 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). Washington, DC. (Acceptance rate: 21%). May 2018.
  7. [OpenSHMEM'17] Huansong Fu, Manjunath Gorentla Venkata, Neena Imam and Weikuan Yu. Portable SHMEMCache: A High-Performance Key-Value store on OpenSHMEM and MPI. Fourth workshop on OpenSHMEM and Related Technologies. Annapolis, Maryland. August 2017.
  8. [CCGrid'17] H. Fu*, M. Gorentla Venkata, A. Roy Choudhury*, N. Imam, and W. Yu. High-Performance Key-Value Store On OpenSHMEM. 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). Madrid, Spain. (Acceptance rate: 23%). May 2017.
  9. [ParCo'16] Huansong Fu, Haiquan Chen, Yue Zhu and Weikuan Yu. FARMS: Efficient MapReduce Speculation for Failure Recovery in Short Jobs. Journal of Parallel Computing.
  10. [PACT'16] B. Wang*, Y. Zhu*, W. Yu. OAWS: Memory Occlusion Aware Warp Scheduling. International Conference on Parallel Architecture and Compilation Techniques (PACT 2016). September 2016. (Acceptance rate: 26%). Haifa, Israel.
  11. [TPDS'16] C. Xu*, R. Goldstone, Z. Liu*, H. Chen*, B. Neitzel, W. Yu. Exploiting Analytics Shipping with Virtualized MapReduce on HPC Backend Storage Servers. IEEE Transactions on Parallel and Distributed Systems.
  12. [DISCS'15] Huansong Fu, Yue Zhu and Weikuan Yu. A Case Study of MapReduce Speculation Mechanism for Failure Recovery. International Workshop on Data-Intensive Scalable Computing Systems (DISCS'15) in conjunction with the ACM/IEEE Supercomputing Conference. Austin, TX. Nov 2015.
  13. [DISCS'15]: L. Shi, Z. Wang, W. Yu, X. Meng. Performance Evaluation and Tuning of BioPig for Genomic Analysis. The 2015 International Workshop on Data-Intensive Scalable Computing Systems (DISCS'15). Paper.

Publications while at Auburn

  1. Y. Wang, R. Goldstone, W. Yu, T. Wang. Characterization and Optimization of Memory-Resident MapReduce on HPC Systems. 28th IEEE International Parallel and Distributed Processing Symposium (Acceptance rate: 21%). Tucson, AZ. May 2014.
  2. C. Xu*, R. Goldstone, Z. Liu*, H. Chen*, B. Neitzel, W. Yu. Exploiting Analytics Shipping with Virtualized MapReduce on HPC Backend Storage Servers. IEEE Transactions on Parallel and Distributed Systems. DOI: 10.1109/TPDS.2015.2389262.
  3. [IPDPS'15] Yandong Wang, Huansong Fu and Weikuan Yu. Cracking Down MapReduce Failure Amplification through Analytics Logging and Migration. 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS'15). Hyderabad, India. May 2015.
  4. [SIGMetrics'14] J. Tan, Y. Wang, W. Yu, L. Zhang. Non-work-conserving effects in MapReduce: Diffusion Limit and Criticality. ACM SigMetrics 2014 (Acceptance rate: 17%). Austin, TX. June 2014.
  5. [IPDPS'14] Yandong Wang, Robin Goldstone, Weikuan Yu, Teng Wang. Characterization and Optimization of Memory-Resident MapReduce on HPC Systems. 28th IEEE International Parallel and Distributed Processing Symposium. Tucson, AZ. May 2014.

Acknowledgements

This work is funded in part by National Science Foundation awards ACI-1432892 while at Auburn and ACI-1561041 while at FSU.

Get Source Code

If you are interested in getting a copy of our source code that enables virtualized analytics shipping on Lustre, please file in a request via this form. An email message will be sent to you with the link to our code.


Personal Tools