First, to all my fans out there (hello, anyone really there, or you all have given up?): Ok, Ok, I get it -- I cannot call myself a blogger and be absent for two months. Sorry! But Mr. Regularity is back on the scene (I know, this regularity term has been hijacked by some over the counter medicine folks, but so be it).
So what's been keeping me busy? Clouds and Mashups, my twin passions. I will try to alternate the two in my postings this year. Let me first begin with cloud.
I want to run a hypothesis by you all. I see two sets of workloads in the cloud. A large number of small problems (example, salesforce.com, where all queries/transactions come with a tenant-id attached) or a small number of large problems (example, google with its bigtable usage). Sometimes we tend to get over excited about the latter -- infinite scalability, 1000's of nodes, a computer science student's dream. But as salesforce.com has shown, one can make a handy billion dollars by efficiently solving a large number of small problems too. The nice thing about managing a large number of small problems is that one does not, up and down in the stack, need to manage everything as one large server, one large storage or one large database. Right "scaleout" models can be built at different layers of the stack, giving one a lot of flexibility. That is why salesforce.com can run on Oracle, whereas google is custom top to bottom.
If we think this way, then we immediately understand infrastructural needs, which is (the size of the problem)*(number of {concurrent} problems). It is clear that for google, this translates to multiple hundreds of thousands. One reason why the size of the problem for google is large is because of the #of bits (~PB) and another is because the amount of computation needed for analytics is elastic and the larger, the better, and therefore can be easily ~1000/problem.
For salesforce, my suspicion is that the total infrastructural need is considerably less. Now you might say, salesforce is about transactional apps, and transactional apps are not very "infrastructure" intensive. Whatever the merits of that argument, I find this way of looking at cloud workloads to be quite worthwhile.