PreFail: Programmable and Efficient Failure Testing Framework
Authors:
Joshi, Pallavi
Gunawi, Haryadi
Sen, Koushik
Technical Report Identifier: EECS-2011-3
January 18, 2011
EECS-2011-3.pdf
Abstract: As hardware failures are no longer rare in the era of cloud computing, reliability has become a first-class design goal of today's cloud software systems. To ensure software's fault-tolerance "prevails" against hardware failures, cloud systems have to be tested against multiple, diverse failures that are likely to occur in the real-world. Such failure testing poses several challenges including the need to explore a large number of combinations of failures, and also by implication, to debug a large number of test runs that fail during testing. In this paper, we present PreFail, a programmable and efficient failure testing framework. With PreFail, a tester can express a variety of failure exploration policies, skip redundant fault-injection tests, run failure testing in parallel, and reduce the time to debug failed test runs.