DB2 XML: Enabling the Right Tradeoff
In 2001, I had been at the Almaden labs as the head of computer science for a few months. Almaden database folks -- Hamid Pirahesh, Eugene Shekita, Bruce Lindsay, and a countless number of others -- had started looking into what it means to do XML inside the database. New innovations were happening, and then we came to a fork in the road: should we build a new widget for XML processing, or should we extend DB2. There were arguments both ways -- on the one hand, we would get engineering efficiencies and be able to move our current customers to exploiting XML if we were to build this as a feature of DB2; on the other hand, what if some new pure play player came out of the chute and ran away with the XML DB market, and we were not nimble enough because we were thinking of this as an adjacent market to the relational market? We debated and debated, and I must say, I was somewhat pushing hard for a pure play. But the arguments on the other side were convincing too, and that is what we finally did. In our DB2 V9, Viper release, we came out with our pureXML capabilities... It took us some time, and perhaps a pure play would have been faster. After all, we were modifying a complex piece of code, and also competing with other priorities for our database evolution.
So did we make the right decision? While the market evolution is still in its infancy, I will say, unequivocally, yes. Most of our current customers needs the flexibility of relational and XML storage. Why? The key requirements that these customers have in their use of XML -- schema diversity and schema evolution (which Hamid, in his usual colorful ways, calls schema chaos) -- all point to a mixed use. Why? Using only relational constructs leads to complex shredding and designing of relations that are either almost meaningless (such as <name, value> pairs) or designing 1000's of relations, one for each schema (at the very least), adding new tables as schema evolves, and still encoding complex shredding logic in the application. Using only pure XML structure is very attractive when one is implementing new applications on new data. Our implementation in DB2 is really very nice in this, and many a customer are choosing DB2 over competition for exactly this. And in presence of schema chaos, application writers get great flexibility, but the stability of the applications which comes from known schema is lost.
And that is where the hybrid approach comes into play. Most of the application designs require a "fixed" part of the schema, and an evolving part (e.g., some variance in the 50 states, some variance in tax forms, some variance in customer records across different countries). Now the fixed part can also be implemented in XML structures, but many of our customers prefer the use of relational structures for that -- it provides them with existence of a large library of tools -- application development, business intelligence and administrative. So they take the fixed parts, and build relational structures out of them. And take the varying parts, and put them in XML structures. Unlike blobs, the XML structures are very rich and deeply queryable, and I believe that IBM's implementation are the best that there are. But the coexistence of relational and XML structures gives them the flexibility to decide which option to use for which part of their data needs, and all seem to like that flexibility.
I will, over time, give specific customer examples in each of these -- Storebrand is one such, but there are many many others. Qi Jin, the senior manager of XML development in my division, has been walking me through many of these. The larger team -- Anjul Bhambhri, Susan Malaika, Bert Van Der Linden, Sriram Padmanabhan, and many others -- have been not only building these technologies, but deeply engaging with our customers, understanding these usage patterns.
So we made the right decision 5 years before we released the product. Anyone who says that large companies cannot do revolutionalary innovations, IMO, is wrong ("free" might be a business model innovation, but is not one from a technology perspective). First, many innovations which are revolutionary (like XML), are adjacent to the current space (as our customers are demanding), and large companies can and do expand into these spaces. Second, enterprise software environments are complex enough that disruptions rush headlong into established middleware that pure play disruptions are more a myth than a relaity (not to say that such do not exist, but are very rare). So, just like in XML space, I feel the future of innovation in IBM, and in particular, in Information Management, where I sit, is very bright.
Controversial enough?
Comments