Friday, January 25, 2013

better DHT, Scribe

In the first paper Designing a DHT for low latency and high throughput, the authors give us an idea of how to build a Distributed Hash Table (DHT for short) with lower latency and higher throughput. Because of the fast growing need for distributed system to handle large amount of data, a better distributed hash table for data fetching and storing becomes more and more desperately needed. When considering dictionary-structure, couple aspects should be consider. Latency, which indicates the time it takes to process one request, and throughput, which indicates the ability to handle large amount of requests simultaneously, are definitely among the most important ones. In this paper, the author presents several ways to reduce the latency, including changing of data layout, switching from iterative look up methods to recursive ones, choosing nearby nodes as routing table entries, and integrating routing and fetching. Personally, I like the idea of switching from iterative to recursive the most, because it might eliminate half of latency by letting intermediate nodes forward the lookup immediately before the acknowledgement of previous hop. To me, it seems like a way of thinking that might help reduce latency of many other structures. Regarding higher throughput, this paper introduces Striped Transport Protocol (STP) to solve the problems of low throughput coming with traditional TCP connection by allowing nodes to put and get data directly to other nodes, instead of routing data through multiple overlay hops.

The second paper describe SCRIBE: A large-scale and decentralized application-level multicast infrastructure, along with Pastry, a p2p location and routing substrate upon which Scribe was built, which form a robust, self-organizing overlay network. Scribe let individual node create a group, individual node join group created by other nodes, and most importantly multicast messages to all members of the group with appropriate credentials. However, one property must also be noticed: Scribe specifies no particularly delivery order,  by which the optimization of multicasting process can be achieved since there is no restriction on delivery order. In detail, applications in API of Scribe include create, join, leave, and multicast. With the help of Pastry, Scribe manage group creation, joining, and multicasting. One important property of Pastry and Scribe is that both of them are fully decentralized, which means all decisions are based on local information. This property guarantee the independency of individual. This property also fit in the system well because all connections are peer-tp-peer. It is also through this model that scalability and reliability are achieved.


3 comments:

  1. I also thought that the change from iterative to recursive lookups in the first paper was pretty logical. It eliminates most of the overlap in communication between the originating node and the eventual destination node. It seems like it would take a significant number of dropped or lost packets to erase this improvement in latency. The backup of an iterative lookup after a failed recursive lookup eliminates the possibility of a decrease in reliability. The only change is in performance which seems as if it would be consistently better.

    ReplyDelete
  2. Well, there's the not evaluated issue of churn; the option may be logical for this but what about, let's say, the BitTorrent DHT?

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete