On May 11th, I had the privilege to be part of a small gathering of analysts, partners and customers to discuss IBM's POV on Big Data. The gathering was at IBM's TJ Watson Research Center, where the Jeopardy event had been staged.
We talked about the fact that while volume (bigness) is important, there are two other V's that are equally, if not more, important. Variety (dealing with all forms of semi-structured and unstructured data, or as Curt Monash told me, "poly-structured" data), as well as velocity (speed of analytics, not just batch) are also important. In the context of the former, we announced our Apache Hadoop based IBM BigInsight Basic edition, V1.1. In the context of the latter, we announced the availability of InfoSphere Streams, V2.
The analysts tweets (follow #ibmbigdata on twitter.com) built upon that theme, adding "validity" (a point made by a customer speaker from Acxiom) as another V. Whether we have three V's or 4 or even 5, the point that I made in a panel that I was on, was that volume, which gave the area its Big Data moniker was not the most important, at least for the clients we are dealing with.
We also had Eric Baldeschwieler from Yahoo speak on what is happening in Yahoo wrt Hadoop, and what is happening in the open source community. The reason we wanted Eric to speak was to emphasize for our clients that while hadoop has been originating in the so-called web facing companies, the problems these companies are tackling, the IT infrastructure that they have (including the presence of warehouses) is not that remarkably different than our own client's landscape. While yahoo might be at an extreme with respect to the number of nodes running hadoop (over 40K), the problems they have solved to make hadoop more robust and be a good analytical infratsructure, is equally applicable for our enterprise clients.
But perhaps the most important reason I wanted Eric to speak is for Yahoo and IBM together to emphasize that we are fully behind Apache Hadoop, and we want to prevent its fork, and we will build upon it, and contribute back to it.
We had lots of questions on IBM differentiation, and I will speak to that in a subsequent post, and more details about our offerings and partnerships too.
But as always, and here I must give a deep appreciation to IBM organizers of these events, we had a client panel that was extremely well received. Just like the client panel in the cloud forum that I talked about in the past, this one spoke to various client use cases -- bioinformatics, healthcare, trading and credit rating -- and demonstrated that Big Data is making a difference, here and now, in a wide variety of enterprise use cases.
Just last Monday, NY Times had an article about a McKinsey report, and then I read through the McKinsey report itself, that again points to the thing that has been obvious to many of us for the last year or so. We are at the beginnings of a new wave in the enterprise.
And IBM is going to be a very strong participant here, deeply working with the open source community, making their adoption in the enteprise safe for our clients to use, since we will stand behind and innovate alongwith the community.