Comments on: Scalable Web Applications

By: Rizky Rukmana

Rizky Rukmana — Tue, 01 Dec 2009 04:17:58 +0000

Hi,

This is interested post.
Anyway what is Application Level Partitioning ? is it database Sharding?

tip 7: Shared Storage System like NFS/SAN.
tip 8: Separate your static files or CDNed.
tip 9: Do database archiving
tip 10: Separate OLTP database from OLAP usage

By: Nick

Nick — Thu, 25 Jun 2009 18:42:31 +0000

Well there you have it, from the main man himself. Thank you so much for the clarifications Eli!

By: EliW

EliW — Thu, 25 Jun 2009 18:36:44 +0000

Hey Nick! Thanks for this writeup, it’s a great summary of my talk. A few things I want to clarify:

The references to ‘Facebook’ in this talk about caching discrete units of data. My example there was actually Digg.com (I used Facebook in an example about partitioning). In this case, it’s using memcached, and simply instead of caching large sets of data that will have duplication in them. (Such as caching all the stories dugg by each of a user’s 100 friends as a whole unit). It’s about breaking that down into reusable pieces. So in this case, separately putting in memcached the ‘stories dugg’ by each user. Then when someone logs in, you just request each of their friend’s information. Yes, you end up making 100 memcached queries for example in that case. But that’s still better than making 1 DB query. And by making the units small enough, you’ve gained other efficiencies as discussed in this article (when the second user logs on, you get to read even less from the database)

On a second point. The example code you give above nick for doing master/slave writing. It’s some good basic example code. But it should be clarified that it’s not a complete solution. Part of that solution would involve handling failover for example. (If one slave fails to connect, just try another one, gaining you not only scalability, but uptime)

Also, as your code there is written, it always makes a connection to both the master and a slave. It would be preferable to make those connections on the fly. Afterall that way if you have a page that never needs a ‘master’ connection in the first place, then you don’t waste resources connecting to it.

And as response to claylenhart: You are correct. For ‘small to medium to large-ish’ tables, just making indices is perfectly valid and was already assumed. What Nick’s writeup here doesn’t quite convey, was that these were ‘steps’ I was going through, in order. Doing Vertical and Horizontal partitioning are things that should only be breached well beyond the realm of issues you are talking about. Where you are talking about billions of rows (and potentially wide rows), and where you are literally seeing performance issues of your indices not keeping up. These are extreme measures at the outer limits of scalability.

Thanks again Nick!

By: » Zend Framework 1.8 Workshop at the Dutch PHP Conference 09 Programming the new world: Programming your life and the net, one day at a time

Thu, 25 Jun 2009 08:51:47 +0000

[…] every single model but only the DAOs. If you want to know more about scalability I will be doing an entry on this based on the incredible presentation done by Eli White, PHP Community Manager & […]

By: Amit

Amit — Thu, 25 Jun 2009 07:25:39 +0000

Your self examples and good closing notes made it a good read even though its mostly summary of “Habits of Highly Scalable Web Applications” slide By Eli White. Thanks for sharing your thoughts.

By: claylenhart

claylenhart — Wed, 24 Jun 2009 20:52:53 +0000

I want to discourage the two partitioning tips, 3 and 4.

For vertical partitioning, a good alternative is covering indexes that would include the frequently used columns such as id, nickname, password and firstname. The database engine will just read the columns from the index and will not read the table, effectively giving vertical partitioning without the effort.

For horizontal partitioning, indexes are not as bad as you might think. For 8 million rows, the DB engine will only need to read about 4 pages of data to find a singe record (it depends on the width of the clustered index). Splitting the table into multiple 1 million row tables will, at best, reduce this by one to 3 pages, but still could be 4 pages. In the best case when you only need to examine one table, it will be better, but when you need to examine multiple tables, you’ll likely join these partitioned tables and the query will likely look for each record in all the partitioned tables making the query much slower.

By: zerone

zerone — Wed, 24 Jun 2009 16:39:41 +0000

Hey, nice article!

This facebook data units are files? Plain files? Or a bunch of serialized data?

Or a file per user with the top online friends (for the chat, for example). Can you elaborate a bit more. Thanks