Sunday, June 06, 2010

Steve Huffman talks about lessons learned at Reddit

Very interesting talk. Helpful to those who want to build scalable distributed systems.

Video: http://vimeo.com/10506751
Transcript is here: http://carsonified.com/blog/dev/steve-huffman-on-lessons-learned-at-reddit/

One interesting lesson is lesson 3: open schema.
They combined relational data model and key-value store. Relational data model is powerful (in some sense) and can naturally represent real-world data models. However, as data set size grows, it seems that relational databases cannot scale up easily. As a result, many large companies develop their own storage systems - Amazon's Dynamo, Google's BigTable, Cassandra, etc. Some use key-value model which seems to be able to scale better than relational model. To change schema does NOT incur much overhead. Different rows/entities can have different number of columns/attributes.

Another post on caching in web app: http://www.mysqlperformanceblog.com/2010/05/19/beyond-great-cache-hit-ratio/