Table of Contents

XooMR: Cross-Layer and Cross-Phase Cooperation in MapReduce

This project is supported by an NSF award CNS-1320016 and CNS-1564647 titled as CSR: Small: XooMR: Cross-Layer and Cross-Phase Cooperation for Fair and Efficient MapReduce Cloud Computing, parallel and distributed systems.

Contact: Dr. Weikuan Yu

Project Mission

Many jobs can show up at the same time on a system with conflicting resource requirements on MapReduce systems. Such trend requires that (1) the data analytics programming models such as MapReduce be able to leverage system resources efficiently, and that (2) the schedulers on MapReduce systems be able to deliver fair services to all jobs. During the execution of concurrent jobs on a shared MapReduce system, a number of important issues can arise. First, there is a serious unfairness issue to small jobs. Because of their distinct execution behavior from short-lived map tasks, long-lasting reduce tasks typically occupy reduce slots until completion or failure. Thus reduce tasks from long jobs can block late-arriving jobs from getting reduce slots, particularly causing serious delays to small jobs. To the users on a shared cluster, this means that small jobs can be delayed for a long time depending on the current load of the cluster. Second, there is a periodic idleness during the execution of reduce tasks. This causes a degradation of overall system efficiency. Third, there is a lack of cooperation between different execution phases of MapReduce, different software layers of data management, and the scheduling of tasks among different jobs.

This project investigates cross-layer cooperation techniques to achieve system efficiency, cross-phase techniques to enhance job fairness and system throughput, and cross-job task co-scheduling techniques to exploit the temporal relationship among jobs for better throughput and services to analytic queries composed of multiple MapReduce jobs.

Research Activities

Cross-Task Coordination

Hadoop is a widely adopted open source implementation of MapReduce programming model for big data processing. It represents system resources as available map and reduce slots and assigns them to various tasks. This execution model gives little regard to the need of cross-task coordination on the use of shared system resources on a compute node, which results in task interference. In addition, the existing Hadoop merge algorithm can cause excessive I/O.

We have undertaken an effort to examine the lack of task coordination and its impact to the efficiency of data management in MapReduce programs. With an extensive analysis of Hadoop MapReduce framework, particularly cross-task interactions, we reveal that Hadoop programs face two main performance-critical issues to exploit the best capacity of system resources. Both issues, namely task interference and excessive I/O, can be attributed to the lack of task coordination. The former can cause prolonged execution time for MapTasks and ReduceTasks; the latter can cause dramatic degradation of disk I/O bandwidth. These problems prevent the system resources on a compute node from being effectively utilized, constraining the efficiency of MapReduce programs.

Based on these findings, we have designed a cross-task coordination framework called CooMR for efficient data management in MapReduce programs. CooMR is designed with three new techniques to enable close coordination among tasks while not complicating the task execution model of Hadoop. These techniques are Cross-taskOpportunisticMemory Sharing (COMS), LOg-structured I/O ConSolidation (LOCS), and Key-based In-Situ Merge (KISM). Our evaluation demonstrates that CooMR is able to increase task coordination, improve system resource utilization, and significantly speed up the execution time of MapReduce programs

Preemptive Reduce Task Scheduling

Hadoop MapReduce adopts a two-phase (map and reduce) scheme to schedule tasks among data-intensive applications. However, under this scheme, Hadoop schedulers do not work effectively for both phases. We reveal that there exists a serious fairness issue among jobs of different sizes, leading to prolonged execution for small jobs, which are starving for reduce slots held by large jobs.

To support many users and jobs (large batch jobs and small interactive queries), Hadoop MapReduce adopts a two-phase (map and reduce) scheme to schedule tasks for data-intensive applications. The Hadoop Fair Scheduler (HFS) [4] and Hadoop Capacity Scheduler (HCS) [3] have focused on fairness among MapTasks. These schedulers strive to maximize the use of system capacity and ensure fairness among different jobs. However, they do not work effectively for both phases. What complicates the matter is the distinct execution behaviors of MapTasks and ReduceTasks. Unlike MapTasks which are launched one group after the other to process data splits, ReduceTasks have a different execution pattern. Once a ReduceTask is launched, it occupies the reduce slot until completion or failure.

To address this fairness issue and ensure fast completion for jobs of various sizes, we design a combination of two techniques: the Preemptive ReduceTask mechanism and the Fair Completion Scheduler. Preemptive ReduceTask is a solution to correct the monopolizing behavior of long ReduceTasks. By enabling a lightweight working-conserving option to preempt ReduceTasks, Preemptive ReduceTask offers a mechanism to dynamically change the allocation of reduce slots. On top of this preemptive mechanism, the Fair Completion Scheduler is designed to allocate and balance the reduce slots among jobs of different sizes.

People

  1. Dr. Hui Chen
  1. Yandong Wang
  2. Cong Xu
  3. Xiaobing Li
  4. Zhuo Liu
  5. Cong Xu
  6. Hai Pham
  7. Xinning Wang
  8. Fang Zhou
  9. Huangsong Fu
  1. Huangsong Fu
  2. Yue Zhu
  3. Muhib Khan
  4. Amit Nath
  5. Ahad Alam

Publications while at FSU

  1. [MASCOTS’18] Yue Zhu, Fahim Chowdhury, Huansong Fu, Adam Moody, Kathryn Mohror, Kento Sato and Weikuan Yu. Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems. 26th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. Milwaukee, WI, Sep 2018.
  2. [P2S2'18] W. Yu, Z. Liu, and X. Ding. Semantics-Aware Prediction for Analytic Queries in MapReduce Environment. 11th International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2). Eugene, OR. August 2018.
  3. [FSU'18] Noah Nethery, Weikuan Yu. Classifying Mozart or Not-Mozart Using Deep Neural Networks with Notated Music. Undergraduate Research Symposium Poster. Florida State University. April 2018.
  4. [JCC'17] Zhuo Liu*, Bin Wang*, and W. Yu. HALO: a fast and durable disk write cache using phase change memory. Journal of Cluster Computing. 2017.
  5. [CCGrid'18] H. Fu*, M. Gorentla Venkata, Shaeke Salman*, N. Imam, and W. Yu. SHMEMGraph: Efficient and Balanced Graph Processing Using One-sided Communication. 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). Washington, DC. (Acceptance rate: 21%). May 2018.
  6. [OpenSHMEM'17] Huansong Fu, Manjunath Gorentla Venkata, Neena Imam and Weikuan Yu. Portable SHMEMCache: A High-Performance Key-Value store on OpenSHMEM and MPI. Fourth workshop on OpenSHMEM and Related Technologies. Annapolis, Maryland. August 2017.
  7. [CCGrid'17] H. Fu*, M. Gorentla Venkata, A. Roy Choudhury*, N. Imam, and W. Yu. High-Performance Key-Value Store On OpenSHMEM. 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). Madrid, Spain. (Acceptance rate: 23%). May 2017.
  8. [ParCo'16] Huansong Fu, Haiquan Chen, Yue Zhu and Weikuan Yu. FARMS: Efficient MapReduce Speculation for Failure Recovery in Short Jobs. Journal of Parallel Computing.
  9. [PACT'16] B. Wang*, Y. Zhu*, W. Yu. OAWS: Memory Occlusion Aware Warp Scheduling. International Conference on Parallel Architecture and Compilation Techniques (PACT 2016). September 2016. (Acceptance rate: 26%). Haifa, Israel.
  10. [TPDS'16] C. Xu*, R. Goldstone, Z. Liu*, H. Chen*, B. Neitzel, W. Yu. Exploiting Analytics Shipping with Virtualized MapReduce on HPC Backend Storage Servers. IEEE Transactions on Parallel and Distributed Systems.
  11. [DISCS'15] Huansong Fu, Yue Zhu and Weikuan Yu. A Case Study of MapReduce Speculation Mechanism for Failure Recovery. International Workshop on Data-Intensive Scalable Computing Systems (DISCS'15) in conjunction with the ACM/IEEE Supercomputing Conference. Austin, TX. Nov 2015.
  12. [DISCS'15]: L. Shi, Z. Wang, W. Yu, X. Meng. Performance Evaluation and Tuning of BioPig for Genomic Analysis. The 2015 International Workshop on Data-Intensive Scalable Computing Systems (DISCS'15). Paper.

Publications while at Auburn

  1. [MASCOTS'15]: X. Wang*, B. Wang*, Z. Liu*, W. Yu. Preserving Row Buffer Locality for PCM Wear-Leveling Under Massive Parallelism. 23rd International Conference on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. October 2015, Atlanta, GA.
  2. [Cluster'15]: T. Wang*, H.S. Oral, H. Pritchard*, B. Wang* and W. Yu. TRIO: Burst Buffer Based I/O Orchestration. IEEE International Conference on Cluster Computing. September 2015, Chicago, IL.
  3. [ICS'15]: B. Wang*, W. Yu, X.H. Sun, X. Wang. DaCache: Memory Divergence-Aware GPU Cache Management. 29th International Conference on Supercomputing, June 2015, Newport Beach, CA.
  4. [DATE'15]: B. Wang*, Z. Liu*, X. Wang*, and W. Yu. Eliminating Intra-Warp Conflict Misses in GPU. The 18th Conference on Design Automation and Test in Europe. (Long paper. Acceptance rate: 22.4%). Grenoble, Fr. March 2015.
  5. [IPDPS'15] Yandong Wang, Huansong Fu and Weikuan Yu. Cracking Down MapReduce Failure Amplification through Analytics Logging and Migration. 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS'15). Hyderabad, India. May 2015.
  6. [SIGMetrics'14] J. Tan, Y. Wang, W. Yu, L. Zhang. Non-work-conserving effects in MapReduce: Diffusion Limit and Criticality. ACM SigMetrics 2014 (Acceptance rate: 17%). Austin, TX. June 2014.
  7. [BigData'14]: T. Wang*, S. Oral, Y. Wang*, B. Settlemyer, S. Atchley, W. Yu. BurstMem: A High-Performance Burst Buffer System for Scientific applications. 2014 IEEE Conference on Big Data (Acceptance rate: 18.5%). Washington, DC. October 2014. Paper.
  8. [IPDPS'14] Yandong Wang, Robin Goldstone, Weikuan Yu, Teng Wang. Characterization and Optimization of Memory-Resident MapReduce on HPC Systems. 28th IEEE International Parallel and Distributed Processing Symposium. Tucson, AZ. May 2014.
  9. [TC'15] Weikuan Yu, Yandong Wang, Xinyu Que, Cong Xu. Virtual Shuffling for Efficient Data Movement in MapReduce. IEEE Transactions on Computers.
  10. [SC'13] Xiaobing Li, Yandong Wang, Yizheng Jiao, Cong Xu. Weikuan Yu. Cross-Task Coordination for Efficient Data Management in MapReduce Programs. Li and Wang contributed equally to the paper. International Conference for High performance Computing Networking, Storage and Analysis. Denver, CO. November 2013.

Broader Imapcts

This project has profound impacts in several aspects. These include (1) strengthening computer science courses at Auburn University, and enhancing instruction effectiveness with student research projects on MapReduce; (2) recruiting and cultivating students of diverse backgrounds, particularly under-represented minority and female student groups for careers in computing; (3) disseminating research results as publications, presentations, conference tutorials and demonstrations, releasing open-source software codes, and eventually pushing them for integration into the official Hadoop code base; and (4) collaborating with industry, strengthening research partnerships, and cultivating opportunities for technology transfer to industry.

Acknowledgements

This work is funded in part by National Science Foundation awards CNS-1320016 and CNS-1564647, and by an Alabama Innovation Award.

Get Source Code

Due to transition of the project from Auburn to FSU, the code release is delayed until our project recovers with more participants and contributions.