A few months back, Hamid Pirahesh and I were doing a roundtable with a customer of ours, on cloud and data. We got into a set of standard issues -- data security being the primary one (more on it in a later post), but when the dialog turned to hadoop, a person raised his hands and asked, "what has hadoop got to do with cloud?" I responded, somewhat quickly perhaps, "Nothing specific, and I am willing to have a dialog with you on hadoop in and out of the cloud context", but it got me thinking. Is there a relationship, or not? Let's look at typical cloud characteristics...
1. Spinup/Spindown. I would be very surprised if this is the model of hadoop adoption. I see the model being very close to the current "warehouse" deployment model, wherein a set of "servers" are dedicated to the warehouse environment, and the data are prepped and ready. And then a set of queries come and go. Similarly, we expect hadoop deployments to be a dedicated environment, where data resides (semi-) permanently, and hadoop (analytic) jobs come and go. This spinup/spindown model makes my head hurt, given the size of data involved. So a -1 cloud characteristic in this dimension.
2. Elasticity. A job can run on 100 nodes, but if it is not sufficient, it can use 200. Hadoop makes that happen nicely, because the data might be equally accessible from those 200 nodes (based on internal hdfs replication). So individual jobs can be elastic. The whole hadoop installation is not, but this is a good cloud characteristic, so a +1 on this dimension.
3. Flexibility of Schema. To me, this has zero to do with cloud. Just because some of the web jobs require this (or simpleDB like noSQL database), that does not make it cloud in any which way.
4. Reliability in the model of "loose coupling". Yes, hadoop has some good characteristics that make it scale resiliently in the presence of failures. However, as people begin to bet on the data that sits in hadoop (not just the computation), what kind of reliability will satisfy them? A web crawl can always be redone, but a set of clickstream data sitting in hadoop, what would make people comfortable? So a 0.5 in favor of the cloud dimension.
There you have it. Further analysis showed me that in hindsight I was right when I answered the customer :) :)
Comments?
Love it! The most concise and clear explanation of the most confusing topic. For some reason everyone seems to equate Hadoop with Cloud. To me hadoop has always been an interesting approach to solving tough computational challenges. And Cloud Computing is great IT infrastructure provisioning model. And Cloud Computing can be an excellent way of provisioning resources for hadoop jobs, but for some reason people tend to put an equal sign between "hadoop" and "cloud".
Posted by: twitter.com/katsnelson | February 04, 2010 at 01:02 PM
We shall defend ourselves to the last breath of man and beast. (William II, King of England)
Posted by: Air jordan shoes | September 15, 2010 at 08:32 PM
I still dnt understand ..can it be explained in layman terms?
Posted by: apoorva | October 06, 2010 at 08:41 PM
Colors are specific to the carriers because carriers request those colors are specific to them. Its called differentiation, look it up.So this is the reason why we say that sherry starts when other wines finish.By the way, what is your view about amontillado sherry? This is my favourite! I love the complexity that both ageing processes give it.link my name to see my website now,thanks.
Posted by: Retro Jordan | October 13, 2010 at 12:07 AM
动感超人是小新最喜爱的卡通形象,时尚、动感、叛逆和自由,是小新最好的朋友。
by 搜索引擎营销
Posted by: 动感超人 | October 25, 2010 at 04:16 AM
That was one of the funniest things I think I've ever read.
I'd be honored if you would consider linking your blog to mine
Keep up the good work-- and to the Cons who are complaining, you really need to work a little harder at reading comprehension.
Posted by: air jordans | November 09, 2010 at 10:51 PM
It’s really great post..I have read a number of posts of yours, but this is the one that I like the most. So expecting some more ideas from your side.I would like to appreciate your work and would like to tell to my friends.Thanks for sharing;
Posted by: uggs discount | November 14, 2010 at 07:26 PM
Sábado seria o primeiro dia para cozinhar e colocar uma de
nossas receitas aqui no blog maaaaaaas, a falta de gás
estragou com todos os nosso planos! =(
Pelo menos a noite nos rendeu bons vinhos,
papo gostoso, boas risadas e muita troca de energia positiva
pra começar bem mais uma semana!
Posted by: wholesale Juicy Handbags | November 26, 2010 at 10:21 PM
I see hadoop(mapreduce) as a PaaS (Platform as a service).
IMO, Those who have a doubt in hadoop not being exactly fit in cloud scenario, are actually trying to compare it with IaaS and then saying no, its not.
Posted by: Ashish Ranjan | December 20, 2010 at 02:45 PM
i like to read your posts. thanks for this one.
Posted by: Devremülk | December 28, 2010 at 09:22 AM
Au point que je ne l'ai acheté que plus tard sur un site de vente d'occase.
Posted by: Nike Shox For Cheap | February 14, 2011 at 04:54 PM
Sábado seria o primeiro dia para cozinhar e colocar uma de
nossas receitas aqui no blog maaaaaaas,
Posted by: mbt walking shoes | February 23, 2011 at 11:04 PM
To make recovery easier on you, it is helpful to set up an area where you will spend most of your time.
Posted by: breast augmentation scottdale | April 11, 2011 at 05:10 AM