Computer Architecture and SysTems Research Lab (CASTL)

Address: 114 Milton Carothers Hall (MCH), Tallahassee, FL 32306; Contact: Dr. Weikuan (Will) Yu (yuw@cs.fsu.edu), 850-644-5442

DirecMR: Reconciling the Dichotomy of MapReduce for Efficient Speculation and Resilience

This project is supported by an NSF award CCF-1744336 titled as DirecMR: Reconciling the Dichotomy of MapReduce for Efficient Speculation and Resilience.

Contact: Dr. Weikuan Yu

Project Mission

MapReduce systems have great capabilities in processing large amounts of data and have become a research target for governmental, academic and industrial organizations. However, their task management and fault handling policies do not recognize a tacit dichotomy that exists between its inherent two phases (map and reduce). This results in a number of critical issues, such as resource underutilization, prolonged task execution, myopic speculation, and failure amplifications.

This project adopts a transformative combination of theoretical analysis, simulation and modeling, and systems design and implementation approaches in order to reconcile the dichotomy of MapReduce. The techniques from this project are potentially impactful to all organizations that deploy MapReduce systems and support Big Data applications from business analytics, social networks, and scientific computing research. Instead of empirical analysis of system behaviors to pinpoint resource management and task scheduling abnormalities, this project takes a different perspective on MapReduce efficiency and resilience, and formulates a Markov chain for the transition of Hadoop MapReduce containers, and a fork-join model for the queueing of map and reduce tasks. These formulations facilitate a theoretical analysis of the dichotomy of MapReduce and help shed light on its impact to asymptotic behaviors of large-scale workloads.

This project aims to blend simulation and real system development together, and addresses the myopic speculation caused by dichotomy, liberates the scope of task speculation, and ensures task resilience without failure amplifications. These techniques are developed to enhance MapReduce platforms such as YARN and Spark. Besides the target on MapReduce systems, the research from this project addresses a general issue in distributed analytics environments.


  1. Huangsong Fu
  2. Yue Zhu
  3. Ahana Roy Choudhury
  4. Muhib Khan
  5. Amit Nath


  1. [MASCOTS’18] Yue Zhu, Fahim Chowdhury, Huansong Fu, Adam Moody, Kathryn Mohror, Kento Sato and Weikuan Yu. Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems. 26th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. Milwaukee, WI, Sep 2018.
  2. [CCGrid'18] H. Fu*, M. Gorentla Venkata, Shaeke Salman*, N. Imam, and W. Yu. SHMEMGraph: Efficient and Balanced Graph Processing Using One-sided Communication. 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). Washington, DC. (Acceptance rate: 21%). May 2018.
  3. [ArxIV'19] H. Fu*, Y Zhu*, Amit Kumar Nath*, Md. Muhib Khan* and Weikuan Yu. Enhancing MapReduce Fault Recovery Through Binocular Speculation. Published at ArxIV.org (https://arxiv.org/pdf/1901.07715.pdf) in January 2019.

Broader Imapcts

This project has broader impacts in several aspects. These include (1) enhancing computer science curricula at Illinois Institute of Technology and Florida State University, and improving instruction effectiveness with hands-on student projects on big data analytics, task management, speculation and fault handling; (2) educating, recruiting and cultivating students of diverse backgrounds, particularly under-represented minority and female student groups for careers in computer science; and (3) releasing software from this project as open source, integrating the codes to the official MapReduce releases. Our research addresses a general issue in distributed analytics environments and is applicable to other MapReduce systems such as Spark and Flink. The experience and lessons learned through this research will benefit future systems development for big data applications.


This work is funded in part by National Science Foundation award CCF-1744336.

Personal Tools