NOTE: I didn't write this summary. I'm just posting it for Jake Sherin, because he doesn't have access to the blog at the moment.
D3S is a checker that uses predicates,
specified by a developer, and checks these predicates while a system
is running. The goal is to automate a lot of what would normally be a
manual debugging process. In addition, it allows run-time checking to
scale to large systems and makes the checking fault tolerant. D3S
addresses three main challenges. The first is that the developer must
be able to easily point out what properties to check but allow the
checking to use multiple machines so large systems can be checked at
run-time. To address this, checkers can be organized in a
directed-acyclic graph. Each vertex is a computational stage, and
sequential C++ functions can be written for checkers in this stage.
State tuples are outputted by the checkers and flow to the downstream
vertices in the graph. A vertex can be mapped to several verifier
processes that run the checkers in parallel, so that D3S can use
multiple machines to scale run-time checking. The second challenge is
that D3S must be able to deal with failures of checking machines. To
do this, D3S monitors verifier processes, so that when one fails, a
new one can be started with the input of the failed process. The
third challenge is that D3S should handle failures of processes being
checked. This is solved by removing the failed processes from
globally-consistent snapshots before checking them. A logical clock
is used to order the exposed state tuples, and there is knowledge of
which processes are in the snapshot at each timestamp.
D3S allows for a developer to write
predicates,which are compiled by D3S for a state exposer and a
checking logic module. These can be inserted while the system is
running, but violations that occurred prior to the insertion may be
missed. In addition, D3S run-time can partition computation across
multiple machines, without much effort from the developer. D3S
supports stream processing, where vertices only transmit difference
in their output compared to the last timestamp, and check the state
incrementally. This is useful because consecutive timestamps may only
have a slight difference in state. Sampling is also supported by D3S
in order to help developers further reduce the overhead.
Global snapshots are integral to D3S.
First, a global timestamp is acquired, using a logical clock. When a
process sends a message, it attaches its logical clock as well. A
receiving process sets its logical clock to the maximum of its local
logical clock and the attached clock, so that D3S run-time preserves
happens-before relationships, can order all tuples in a consistent
total order, and can construct snapshots.
A predicate is technically modeled as
a function defined over a finite number of consecutive snapshots. A
consecutive snapshot is a set of all snapshots of live processes at
time t. D3S needs a complete
snapshot, though, in order to verify a predicate correctly. This is
obtained by getting the state of all processes that constitute the
system at a timestamp.
D3S
was implemented on five different systems to check its effectiveness.
It was implemented on PacificA, a semi-structured distributed system,
on MPS Paxos Implementation, a Paxos-based replicated state machine
service, on a web search engine, on the i3
Chord implementation, and on a BitTorrent client. They all basically
have the same result, in that D3S detected bugs in a shorter period
of time than it would have taken developers otherwise. However, it
does show how D3S has a wide array of useful situations. The only
issue is that the predicates must be useful in order to get useful
results, so it is up to the developer to create those effectively.
D3S's
performance, according the data, varies depending on the number of
threads, but even in worst case it has just below 8% overhead. The
main thing that the data seems to stress is that the number of
threads is vital to performance. Related work is discussed and
compared with D3S. The other methods are replay-based predicate
checking, online monitoring, log analysis, large-scale parallel
applications, and model checking. They do mention that they see
replay as an essential complimentary technology to online predicate
checking. Ultimately, the goal would be to use online checking to
catch violations, then enable time-travel debug of a recent history
with bounded time replay. Future work focus is on combining online
predicate checking and offline replay, as well as exploring ways for
a developer to easily find bugs in data center applications that are
composed of many systems.