NOTE: I didn't write this summary. I'm just posting it for Jake Sherin, because he doesn't have access to the blog at the moment.
D3S is a checker that uses predicates,
specified by a developer, and checks these predicates while a system
is running. The goal is to automate a lot of what would normally be a
manual debugging process. In addition, it allows run-time checking to
scale to large systems and makes the checking fault tolerant. D3S
addresses three main challenges. The first is that the developer must
be able to easily point out what properties to check but allow the
checking to use multiple machines so large systems can be checked at
run-time. To address this, checkers can be organized in a
directed-acyclic graph. Each vertex is a computational stage, and
sequential C++ functions can be written for checkers in this stage.
State tuples are outputted by the checkers and flow to the downstream
vertices in the graph. A vertex can be mapped to several verifier
processes that run the checkers in parallel, so that D3S can use
multiple machines to scale run-time checking. The second challenge is
that D3S must be able to deal with failures of checking machines. To
do this, D3S monitors verifier processes, so that when one fails, a
new one can be started with the input of the failed process. The
third challenge is that D3S should handle failures of processes being
checked. This is solved by removing the failed processes from
globally-consistent snapshots before checking them. A logical clock
is used to order the exposed state tuples, and there is knowledge of
which processes are in the snapshot at each timestamp.
D3S allows for a developer to write
predicates,which are compiled by D3S for a state exposer and a
checking logic module. These can be inserted while the system is
running, but violations that occurred prior to the insertion may be
missed. In addition, D3S run-time can partition computation across
multiple machines, without much effort from the developer. D3S
supports stream processing, where vertices only transmit difference
in their output compared to the last timestamp, and check the state
incrementally. This is useful because consecutive timestamps may only
have a slight difference in state. Sampling is also supported by D3S
in order to help developers further reduce the overhead.
Global snapshots are integral to D3S.
First, a global timestamp is acquired, using a logical clock. When a
process sends a message, it attaches its logical clock as well. A
receiving process sets its logical clock to the maximum of its local
logical clock and the attached clock, so that D3S run-time preserves
happens-before relationships, can order all tuples in a consistent
total order, and can construct snapshots.
A predicate is technically modeled as
a function defined over a finite number of consecutive snapshots. A
consecutive snapshot is a set of all snapshots of live processes at
time t. D3S needs a complete
snapshot, though, in order to verify a predicate correctly. This is
obtained by getting the state of all processes that constitute the
system at a timestamp.
D3S
was implemented on five different systems to check its effectiveness.
It was implemented on PacificA, a semi-structured distributed system,
on MPS Paxos Implementation, a Paxos-based replicated state machine
service, on a web search engine, on the i3
Chord implementation, and on a BitTorrent client. They all basically
have the same result, in that D3S detected bugs in a shorter period
of time than it would have taken developers otherwise. However, it
does show how D3S has a wide array of useful situations. The only
issue is that the predicates must be useful in order to get useful
results, so it is up to the developer to create those effectively.
D3S's
performance, according the data, varies depending on the number of
threads, but even in worst case it has just below 8% overhead. The
main thing that the data seems to stress is that the number of
threads is vital to performance. Related work is discussed and
compared with D3S. The other methods are replay-based predicate
checking, online monitoring, log analysis, large-scale parallel
applications, and model checking. They do mention that they see
replay as an essential complimentary technology to online predicate
checking. Ultimately, the goal would be to use online checking to
catch violations, then enable time-travel debug of a recent history
with bounded time replay. Future work focus is on combining online
predicate checking and offline replay, as well as exploring ways for
a developer to easily find bugs in data center applications that are
composed of many systems.
Debugging Deployed Distributed Systems, D3S, shows cool results of debugging distributed systems efficiently. It deals with several important challenges and shows some of its solutions bringing its impressive evaluation result up; at most 8%, mostly around 1% slowdown in some of different circumstances (although it seems little bit vague due to the fact that the worst case could have more than 8% slowdown with 512/1024 bytes packet and 1000 frequency). However, it seems D3S does not implement realistic method. Under the real large-scale distributed circumstances, we cannot make sure it works well or not. Moreover, their endeavor decreasing overhead diminishes useful information to debug as well though it is a trade-off undoubtedly. To be more useful, D3S must get improved on this part.
ReplyDeleteThe concern is brought up in the paper - in the end of section 5, the paper addresses concerns regarding realistic large-scale systems. This particular problem really epitomizes the root of working with distributed systems - it is hard to pinpoint where errors are occurring when an application involves many systems, as each system is potentially changing constantly.
DeleteWhile the paper moves us one step closer towards debugging large-scale systems, it also highlights the difficulties of doing so.
One of the things that I always find interesting is the number of monitoring nodes that need to be added to systems for things like D3S, or Codons, or Consensus Routing to work. With all of the issues being discovered with distributed systems, it's very possible that the number of monitoring systems needed could put a serious damper on productivity in the system because of all the bandwidth and data flow necessary to keep the system in good shape. This is less of a specific comment and more of a general observation, but it's something I've definitely seen in all of these papers.
ReplyDeleteI definitely agree. The test cases they perform in most of these papers are far from realistic simulations, but they just pass that off as an assumption that their system "scales well". It would be nice to see some better testing and reliable results.
DeleteThat's true, but I think the key thing to observe is that in the absence of such systems, an even bigger damper could be put on productivity: failure. So sure the monitoring may cost you 2% (probably more) of the system data, but a system with 2% less capacity is better than one that's broken.
DeleteIf you think about the nature of a distributed system, their relative weight makes sense: if you want to provide a service (be it monitoring, dns, whatever) to a large group, you can't really expect to do that in a way that's not as scalable as the system itself. So the only way you could ever hope to monitor a distributed system is... well with a distributed system. And you're right, one naturally runs into the same problems we've been seeing the whole time (trying to understand system state, distribute information, etc.) . But it's not clear to me that there is any alternative.
I think the thing to compare/look at here is the "baseline cost"; what would be the overhead if you have something like D3S but don't use it? Whatever else you add to improve reliability comes at a cost and here they try to give you an idea of that.
DeleteI had a problem with the testing methodology in this paper, since the authors used both 1. a very small network and 2. uniform machines. If they are trying to simulate a realistic scenario for, say, bit torrent clients, then a mix of machines, bandwidths, networks, and users is necessary to have a truly accurate simulation.
ReplyDeleteBeyond that, the idea of the monitoring system seems interesting, but I couldn't get past the distracting (and prevalent) spelling and grammatical mistakes. The paper seemed very unprofessional and unreliable as a result.
I really wish they would have given examples of the rules they wrote on the 5 systems tested. It seems like you have to know a lot about what's happening under the hood in the systems. So this tool would be ideal for a developer of a single system. I just wonder if it could be used in a deployment system where there are several DSs interacting with each other.
ReplyDeleteTrue but that's the case w/ any assertion you write, right?
DeleteAlthough I agree that there were some spelling / grammar issues, I think that its hard to discredit someone who built such a complicated system based solely on their English skills. This is a rather unfortunate consequence of the fact that we require research to be done in English, but don't do much to help authors with poor english skills. Some conferences have english review systems to help authors, but apparently this paper did not get that type of help.
ReplyDeleteAs for their testing, it is unfortunate that they used similar machines, but they did say that they injected errors into the system to account for the similarities. Overall, it would not have been difficult to use varied types of machines to test their system, which does show some carelessness in their research.
Gotta loving the present participle - best verb form!
Delete