EECS 345 2013 Distributed Systems: Spanner

Sunday, February 24, 2013

Spanner

Spanner: Google's Globally-Distributed Database

Spanner is a scalable, multi-version, globally-distributed, synchronously-replicated database, and the paper discusses parts of its design in varying levels of depth.

There is a lot in this paper, so I'll just try to hit the highs and lows, starting with the good parts. The TrueTime API solves a big problem when trying to order events in a distributed system, and as they mention several times, without it the central design of Spanner would be impossible. TrueTime is a sub-system (?) used by Spanner that provides a global time for the current moment along with a confidence interval based on the worst-case clock skew. The nodes providing these services rely on either system level atomic clocks, which have a known skew rate and must be synchronized periodically, or GPS clocks, having a greater accuracy but are susceptible communication failures. Given the interval nature this system provides, there are times when Spanner will have to wait when resolving two conflicting requests, or when a transaction has been committed, but this tends to be a short interval on the order of ~10ms, and two transactions are very unlikely to touch the same rows in the database, just given the scale of the system itself. TrueTime is a cool system, and one I hope they open source.

Spanner itself provides a semi-relational model to the client. This is really cool, typically distributed databases (in my limited experiences reading about them) provide only key-value store, and pushed the burden on the application to do any more advanced lookups. They use a trick involving the primary keys of a row to map to the other values, and use interleaving of table data in directories to allow certain features of a relational database while still remaining distributed.

I wished this paper had listed more metrics about its performance, and compare an application that uses spanner versus Bigtable or Megastore, both of which were mentioned a few times in the paper. Since this system was primary designed to be used with Google's F1 advertising backend, it would have been nice to see some quantities on throughput in the new system versus the old (they have the new numbers), as well as application level complexities that might have been resolved and system level upkeep. Seeing numbers on latencies on the user-end of an application when leaders are failing or a data-center get knocked offline would have also been interesting. I don't doubt that Google has the numbers on these scenarios, I just wonder why they included the numbers they did.

16 comments:

UnknownFebruary 24, 2013 at 5:54 PM
I thought the argument for the TrueTime API was pretty convincing. The more certain you are about how uncertain you are about your data, the more likely you are to react appropriately..
ReplyDelete
Replies
UnknownFebruary 24, 2013 at 8:16 PM
It's interesting to see how Google combines various technologies (like Colossus, Paxos, etc) to gain performance or usability benefits for each platform they create. They created BigTable to move away from a relational model, but have now moved back to that with Spanner. Apparently a replacement for Colossus is also in the works, that aims to attain much better latencies and fault-tolerance, so it will be cool to see how the next iteration of these products works.

Regarding TrueTime, while it would be kind of cool to see how it is implemented, it requires a good bit of specialized hardware, so I'm not sure how much effect open-sourcing it would have.

I agree that more evaluation results would have been nice to see. The paper talks about latency and throughput experiments that were carried out, but does not present this data.
ReplyDelete
Replies
AnonymousFebruary 24, 2013 at 9:07 PM
A directory is said to be the smallest granularity whose geographic replication properties can be application-specified. A point that I did not quite understand was about the sharding of extremely large directories into fragments, that can possibly be served from different Paxos groups. Does this not imply that, in essence, fragments should be the smallest granularity that should be application-configurable? Or is it that even though directories are split into fragments (and possibly served from different groups), the configuration properties are applied at the fragment level?
ReplyDelete
Replies
Besim AvciFebruary 24, 2013 at 9:23 PM
This paper initiates a new era in distributed databases in my opinion. That being said, the usage of atomic clocks and GPS receivers really push this work forward. In addition, having lock-free reads is one of the keys to reduce latency. Also, the scalability of this system is really marvelous.

One thing I was looking for in the evaluation section is that how Spooner perform against BigTable and likes in a small scale. Also, although it seems very implementable and easy to comprehend, I cannot find a very good reason why this kind of work came out this late.
ReplyDelete
Replies
JosiahFebruary 24, 2013 at 9:48 PM
The most surprising aspect of this paper, to me, was how bad things were in F1 before Google started looking into Spanner. They mention that manually re-sharding data took two years to fully complete the last time it was performed, coupled with other complexities. Google always seems to be a company on top of the latest technologies, so I was very surprised to read how bad things had gotten.

On the other hand, migration to a completely new backend technology is never easy, especially at the scale of the largest technological companies in the world. In this way, the Spanner system serves almost as a warning, and a lesson, to theorize solutions to big data well in advance of the necessary timeframe.

All-in-all, Spanner is an impressive feat of technology and theory. Great read.
ReplyDelete
Replies
UnknownFebruary 24, 2013 at 11:04 PM
I believe what stands out in this paper is the TrueTime API. By making Distributed Systems with stronger time semantics (almost near perfect), this is just what the Systems community were looking for (This was just like making something which was only probable before, near certain now).

I do not see the war between the Spanner (NewSQL) and BigTable (NoSQL) war ending soon especially as most NoSQL-ites do not believe that NewSQL would work for all large Databases though they seem to agree that it works for Google. I believe though that Google puts up a strong point strengthened by the fact that even Facebook and other OSNs are looking for a Spanner-like System.

I also was surprised like Besim to see this work coming up so late, and agree with Josiah that things seemed really bad before Spanner. This works for Google especially because the nature of the queries are by nature simple.

This needs to be seen how it scales up to more complex queries. Data coming from CDNs based on OSNs and Recommendation-Systems would generate more complex queries. Also, I see even search getting more complex in near future thanks to "dreams" like search based on data (mp3, jpegs and more) than mere text.
ReplyDelete
Replies
AnonymousFebruary 24, 2013 at 11:10 PM
The biggest innovation for Google spanner is its consistent global timestamp, TrueTime API. The philosophy is, the uncertainty is quantified and can be made small (with atomic clocks and GPS).
Google is migrating their database to Spanner, it will be interesting if Spanner is commercialized. The need for database system with high scalability and consistency has never been satisfied, NoSQL cannot fully replace traditional RDBMS. Google Spanner is an incredible innovation that provides unbelievable scalability and transaction support.
ReplyDelete
Replies
Jin (Byungjin Jun)February 25, 2013 at 12:18 AM
Google has been doing awesome works and I guess most of computer guys are agree with it. Google spanner is cool as well as their other works. A huge defect of CAP theory in NoSQL has been pointed out; none of its three features, capability, availability and partition tolerance, cannot accomplished simultaneously. However, Google just provides groundwork to overcome it in this paper as introducing TrueTime which is the key work of their research. Since NoSQL was introduced most recently, in 2009, though the terminology was used 1998 first, I think it's not late but quite fast that Google introduced spanner last year.
ReplyDelete
Replies
UnknownFebruary 25, 2013 at 7:58 AM
It definitely seems like the TrueTime API makes the biggest impression, and is incredibly effective. The strengthening of time semantics is something that the DS community can benefit from, but Google's TrueTime does seem to be almost too good, and so I'm a little skeptical that in practice it is as clean as they make it appear.
ReplyDelete
Replies
AlayaSEOOctober 14, 2020 at 3:19 PM
Took me time to read all the comments, but I really enjoyed the article. It proved to be Very helpful to me and I am sure to all the commenters here! It’s always nice when you can not only be informed, but also entertained! spanner set India
ReplyDelete
Replies

Add comment