I have had a chance to speak about Watson in several fora -- the latest being the Economist Forum on Big Data, where I was the closing act. I am incredibly proud of what IBMers have done. And I think what they have done shows a lot of implications as we think through the data deluge...
I showed the inner workings of Watson, and made the following points:
- Interface matters -- understanding natural language and responding in speech sometimes make a big difference, and Watson definitely blazed a new path there. Google's innovation, in my mind, has been as much in interface as in the algorithms behind the interface. As we think through Big Data, let's not forget the interface into the systems -- for asking questions, for receiving analysis -- q&a might not be the only form, but PIG's, M/R programs etc. is definitely not it.
- Machines, big machines, big data machines, are brawn, maybe brainy-brawn, not brains. Watson uses 10,000's of watts, the computer between the ears uses 20. Sheer brawn might take mega watts, IBM's researchers were able to bring it down to 10,000's of watts, so I call it brainy brawn. But brainy brawn is not a replacement of brain. I like to use Intelligence Amplification (from William Ross Ashby) as an analogy, thought others such as Patty Maes use other terms like Intelligence Augmentation.
- Size is not the only thing that matters :) Watson deals with 200 million pages, wow. That small? At 10K a page, we are talking 2TB of information. I routinely work with clients that have database sizes 100x of that. That is why I think that we have done our industry a dis-service by calling it "Big Data." That tends to get macho on size. I like to think of it is as Big Analytics, and that is why our products in this space are called BigInsights -- deep insight matters, not size.
- The world is not perfect. Data tells us different things, often conflicting. We need to weigh alternative hypotheses through some evidence based reasoning. That is what Watson excels in -- it generates hypotheses (candidate answers), and then weighs different evidences ("popularity", "earlier categories"...) to determine the most likely answer. This is the transformative aspect of Watson -- do not fixate on the oneright answer, focus on a range, and reason through them.
- Hadoop matters -- the backend (the "boiler room" as I called in the room) is used to crunch through the documents to prep for the interactive Jeopardy matches. There is no other system flexible enough to allow for the flexible knowledge extraction that we need.
I had many other interesting observations, which I will make over time. However, my hats off to the Economist, and to the hosts, including Vijay Vaitheeswaran, for having such a well executed event.
I will be keynoting at the Yahoo Hadoop summit next week, and will make some of the above points, but will also talk a lot about what we are seeing with hadoop usage in the enterprises. Hope to see you there.
ps: edited to correct some spellings. i am not talking of household machines (nee, "braun"), i am talking of muscles (nee, "brawn").