Table of Contents

XooMR: Cross-Layer and Cross-Phase Cooperation in MapReduce

This project is supported by an NSF award CNS-1059376 titled as CSR: Small: XooMR: Cross-Layer and Cross-Phase Cooperation for Fair and Efficient MapReduce Cloud Computing, parallel and distributed systems.

Contact: Dr. Weikuan Yu

Project Mission

Many jobs can show up at the same time on a system with conflicting resource requirements on MapReduce systems. Such trend requires that (1) the data analytics programming models such as MapReduce be able to leverage system resources efficiently, and that (2) the schedulers on MapReduce systems be able to deliver fair services to all jobs. During the execution of concurrent jobs on a shared MapReduce system, a number of important issues can arise. First, there is a serious unfairness issue to small jobs. Because of their distinct execution behavior from short-lived map tasks, long-lasting reduce tasks typically occupy reduce slots until completion or failure. Thus reduce tasks from long jobs can block late-arriving jobs from getting reduce slots, particularly causing serious delays to small jobs. To the users on a shared cluster, this means that small jobs can be delayed for a long time depending on the current load of the cluster. Second, there is a periodic idleness during the execution of reduce tasks. This causes a degradation of overall system efficiency. Third, there is a lack of cooperation between different execution phases of MapReduce, different software layers of data management, and the scheduling of tasks among different jobs.

This project investigates cross-layer cooperation techniques to achieve system efficiency, cross-phase techniques to enhance job fairness and system throughput, and cross-job task co-scheduling techniques to exploit the temporal relationship among jobs for better throughput and services to analytic queries composed of multiple MapReduce jobs.

Research Activities

Cross-Task Coordination

Hadoop is a widely adopted open source implementation of MapReduce programming model for big data processing. It represents system resources as available map and reduce slots and assigns them to various tasks. This execution model gives little regard to the need of cross-task coordination on the use of shared system resources on a compute node, which results in task interference. In addition, the existing Hadoop merge algorithm can cause excessive I/O.

We have undertaken an effort to examine the lack of task coordination and its impact to the efficiency of data management in MapReduce programs. With an extensive analysis of Hadoop MapReduce framework, particularly cross-task interactions, we reveal that Hadoop programs face two main performance-critical issues to exploit the best capacity of system resources. Both issues, namely task interference and excessive I/O, can be attributed to the lack of task coordination. The former can cause prolonged execution time for MapTasks and ReduceTasks; the latter can cause dramatic degradation of disk I/O bandwidth. These problems prevent the system resources on a compute node from being effectively utilized, constraining the efficiency of MapReduce programs.

Based on these findings, we have designed a cross-task coordination framework called CooMR for efficient data management in MapReduce programs. CooMR is designed with three new techniques to enable close coordination among tasks while not complicating the task execution model of Hadoop. These techniques are Cross-taskOpportunisticMemory Sharing (COMS), LOg-structured I/O ConSolidation (LOCS), and Key-based In-Situ Merge (KISM). Our evaluation demonstrates that CooMR is able to increase task coordination, improve system resource utilization, and significantly speed up the execution time of MapReduce programs

Preemptive Reduce Task Scheduling

Hadoop MapReduce adopts a two-phase (map and reduce) scheme to schedule tasks among data-intensive applications. However, under this scheme, Hadoop schedulers do not work effectively for both phases. We reveal that there exists a serious fairness issue among jobs of different sizes, leading to prolonged execution for small jobs, which are starving for reduce slots held by large jobs.

To support many users and jobs (large batch jobs and small interactive queries), Hadoop MapReduce adopts a two-phase (map and reduce) scheme to schedule tasks for data-intensive applications. The Hadoop Fair Scheduler (HFS) [4] and Hadoop Capacity Scheduler (HCS) [3] have focused on fairness among MapTasks. These schedulers strive to maximize the use of system capacity and ensure fairness among different jobs. However, they do not work effectively for both phases. What complicates the matter is the distinct execution behaviors of MapTasks and ReduceTasks. Unlike MapTasks which are launched one group after the other to process data splits, ReduceTasks have a different execution pattern. Once a ReduceTask is launched, it occupies the reduce slot until completion or failure.

To address this fairness issue and ensure fast completion for jobs of various sizes, we design a combination of two techniques: the Preemptive ReduceTask mechanism and the Fair Completion Scheduler. Preemptive ReduceTask is a solution to correct the monopolizing behavior of long ReduceTasks. By enabling a lightweight working-conserving option to preempt ReduceTasks, Preemptive ReduceTask offers a mechanism to dynamically change the allocation of reduce slots. On top of this preemptive mechanism, the Fair Completion Scheduler is designed to allocate and balance the reduce slots among jobs of different sizes.

People

  1. Dr. Hui Chen
  1. Yandong Wang
  2. Cong Xu
  3. Xiaobing Li
  4. Zhuo Liu
  5. Cong Xu
  6. Hai Pham
  7. Xinning Wang
  8. Fang Zhou
  9. Huangsong Fu

Publications

  1. J. Tan, Y. Wang, W. Yu, L. Zhang. Non-work-conserving effects in MapReduce: Diffusion Limit and Criticality. ACM SigMetrics 2014 (Acceptance rate: 17%). Austin, TX. June 2014.
  2. Yandong Wang, Robin Goldstone, Weikuan Yu, Teng Wang. Characterization and Optimization of Memory-Resident MapReduce on HPC Systems. 28th IEEE International Parallel and Distributed Processing Symposium. Tucson, AZ. May 2014.
  3. Weikuan Yu, Yandong Wang, Xinyu Que, Cong Xu. Virtual Shuffling for Efficient Data Movement in MapReduce. IEEE Transactions on Computers.
  4. Xiaobing Li, Yandong Wang, Yizheng Jiao, Cong Xu. Weikuan Yu. Cross-Task Coordination for Efficient Data Management in MapReduce Programs. Li and Wang contributed equally to the paper. International Conference for High performance Computing Networking, Storage and Analysis. Denver, CO. November 2013.
  5. Yandong Wang, Jian Tan, Weikuan Yu, Li Zhang, Xiaoqiao Meng. Preemptive ReduceTask Scheduling for Fair and Fast Job Completion. 10th International Conference on Autonomic Computing (ICAC'13). (Acceptance Rate: 22%). June 2013.

Broader Imapcts

This project has profound impacts in several aspects. These include (1) strengthening computer science courses at Auburn University, and enhancing instruction effectiveness with student research projects on MapReduce; (2) recruiting and cultivating students of diverse backgrounds, particularly under-represented minority and female student groups for careers in computing; (3) disseminating research results as publications, presentations, conference tutorials and demonstrations, releasing open-source software codes, and eventually pushing them for integration into the official Hadoop code base; and (4) collaborating with industry, strengthening research partnerships, and cultivating opportunities for technology transfer to industry.

Acknowledgements

This work is funded in part by National Science Foundation award CNS-1320016, and by an Alabama Innovation Award.

Get Source Code

Due to transition of the project from Auburn to FSU, the code release is delayed until our project recovers with more participants and contributions.