This short post is written for the paper Practical Byzantine Fault Tolerance by
Miguel Castro and Barbara Liskov from MIT. Before we read this paper, a very important terminology should be mentioned and understood. Byzantine fault: Previously, when we mention fault tolerance, we mean resilience to node failure, or non-responding. However, in real world, we also need to protect the system from malicious action, and mis-sent information or instruction. These problems are called Byzantine fault. Therefore byzantine fault tolerance means the mechanism to protect the system from these types of situations. The paper is named Practical because all papers related to byzantine fault tolerance system before are mostly devoted to the theoretical part of this topic, and they assumes a synchronous environment. While this paper tackles the topic in a practical point of view with an algorithm to solve this problem in asynchronous system settings.
Algorithm works as follows:
1. a client sends a request to invoke a service operation to the primary
2. the primary multi-casts the request to the backups
3. replicas execute the request and send a reply to the client
4. the client waits for one replies from different replicas with the same result;
this is the result of the operation
The algorithm guarantees liveness and safety because all non-faulty replicas agree on a total order for the execution of requests despite failures.
It's important to point out that this paper was written in 1999 (which I didn't realize until it started describing the test machines). I feel like this may have been related to the lack of discussion of scalability in this system. I think this would be the factor that it would be limited by in a modern distributed system.
ReplyDeleteAs Yaonan said, this is the first paper which exhibits Byzantine fault tolerance in a real system. Before this, people have only published theoretical work exhibiting it. Data on the replication groups which tolerated many faults is the only thing found missing in this paper. It exhibits clearly the superiority and robustness of the Byzantine Fault Tolerant File System over NFS without requiring any syncing between nodes.
ReplyDeleteThe work is one of the pioneering works in the area of Distributed Systems and one of the major reasons behind Prof. Barbara Liskov's late but well-deserved Turing Award in 2008
This is a minor point in the article, but it seemed very interesting
ReplyDeleteto me. Since this paper focuses on practical Byzantine tolerance, the
author notes takes the assumptions of the model seriously,
specifically, that Byzantine node failures (i.e., malicious behavior)
are independent and then give practical advice as to ensuring this.
Some strategies that they give (different operating systems, different
administrators) seem inconvenient but still practical but the one that
stood out to me was that different nodes should actually be using
different implementations of the protocol. I can see how this might
work in some very open systems (bittorrent I believe has many commonly
used implementations) but the sheer amount of extra work involved in
maintaining even two different implementations of one protocol seems
highly impractical in any real-world distributed systems.
Finally it seems to me that having multiple implementations might just
multiply the chances that any given implementation has a vulnerability
which would probably be a bigger problem than having independent
failures (unless you can somehow manage this uncertainty by ensuring
that at most floor((n-1)/2) nodes possibly contain vulnerabilities!)
Good paper indeed.
ReplyDeleteTwo things are unclear for me though;
1. What would happen if faulty replicas interfere in new primary selection during view-change protocol?
2. How is it possible that there is only %3 communication(performance) overhead, compared to a regular NFS? (this could be due to public-key signature of a regular NFS that I don't know.)
Regarding your first point, isn't the primary replica of a certain view statically known by some kind of round-robin like formula? That's why when timers expire in view v, and the view-change message is being circulated, the new-view message is multicasted by the primary of the new view (v+1) as soon as it receives >= 2f view-change messages for view v+1
DeleteIt seems like the also neither implement or test the view changes. Which to me would have been the most interesting part of the benchmarks...
DeleteAn interesting part of this paper is its creative algorithm pioneered Practical Byzantine fault tolerance as other students mentioned. That is why this paper was cited nearly 1000 times. Algorithm of Practical Byzantine fault tolerance uses 3-phases of pre-prepare, prepare, and commit. Even though both pre-prepare and prepare message are used to totally order requests sent in the same view even when the primary is faulty, the pre-prepare messages make this algorithm distinguished itself by giving itself an ability to cope with view change dealing with backup.
ReplyDeleteThis work also has several other interesting parts but one of the most important thing is optimizations for reducing communication. By these optimizations, it could guarantee good performance comparing to previous non-practical Byzantine fault tolerance algorithms.