Anant Jhingran's Musings

My Photo

About

Recent Posts

  • Rajeev Motwani
  • With all the aaSes out there, can smart-alecky titles be far behind?
  • IBM's Amazon Offerings in Information Management
  • IBM's middleware services on Amazon
  • Cloud Substitutability: Is it Really Important?
  • Cloud: Good for large number of small problems or small number of large problems?
  • Mashup Video
  • Cloud Standards
  • SaaS advantage -- in testing?
  • As with any issue, there are no problems finding people on both sides of an issue
Subscribe to this blog's feed
Blog powered by TypePad

Future of Database Research is excellent, but what is the future of data?

Even though the esteemed group chose not to invite me :), it is hard for me to disagree with the conclusions in this report.  It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular.

But today I want to address a different topic -- the democratization of information.  Ever since I attended the Linked Data Planet conference, I got a religion.  Not one of strict linking -- that is fine, and see my other posts on my views there.  But really of "open data".  Many many forms of data are now available on the web.  US Census information.  Indian Census information.  Wikipedia's Infobox.  Country Facts.  They are available in linkeddata form, such as linked open data, but they are also available in xls, in csv, in <tr> forms in html; they are everywhere.  There are brokers of these information -- strikeiron, melissadata.  They are free to use, or absolutely not free to use. 

To me, this is the real transformation.  Not just the future of database research, but the future of data.

August 25, 2008 | Permalink | Comments (1)

Privacy and Data Mining

So I was on a panel in Washington DC last week in a workshop on Privacy and Data Mining for Department of Homeland Security.  The topic, of course, can be quite incendiary.  But the focus of the panel was on privacy preserving data mining technology, so we kept the discussion at a technical level and avoided politics or policy.

The points I made on the panel (other panelists were Chris Clifton from Purdue, and Rebecca Wright from DIMACS and Rutgers) were the following
  1. Commercial businesses are very much interested in this too, and this interest is driving technology innovation that can be leveraged in DHS situations.
  2. While data mining and privacy preserving aspects of it have the fiery appeal, the fact is that privacy preservation is critical across all aspects of a data lifecycle -- data at rest, data in motion, data graveyard, test, production, integration etc. -- many copies of data existing before and after the data mining. Consequently, having bullet-proof privacy preservation in the data mining "part" of this chain is not sufficient.
  3. That IBM and other vendors have built up a repertoire of capabilities, primarily from a commercial perspective.  I highlighted the research of Hippocratic Databases and the privacy preserving technologies in our Entity Analytics Solutions as two sets of examples of these.
It is always risky wading into such controversial topics, but I enjoyed the panel very much, learnt from the other panelists, and equally importantly, learnt from other panels.  I think such dialogs are critical. 

I want to especially thank Tyrone Grandison, Alexandre Evfimievski, Jeff Jonas, Katie Ignaszewski
and our Chief Privacy Officer, Harriet Pearson, for helping me with the talk and the panel.

August 01, 2008 | Permalink | Comments (0)

IBM Mashup Center

In addition to the GA of our mashup offerings, in a package called IBM Mashup Center, we now have the same software available for everyone to play around with on Lotus Greenhouse.  Go get an account and try it out, add to our wiki. 

Fundamentally, our mashup offering is a mashup of two offerings --

  1. InfoSphere MashupHub, which allows you to catalog information, and mix and match feeds -- whether they are departmental, enterprise or external.  Fundamentally, the mix and match happens by dragging and dropping feeds and wiring them together for "data operations" (select, joins, merge, transform etc.).  On Greenhouse, InfoSphere MashupHub can be reached here.
  2. Lotus Mashups, which allows you to create pages (interactive views) by dragging and dropping widgets on to a palette and wiring them together.  The widgets display information (typically from InforSphere MashupHub) and/or perform different behavior based on actions in another widget.  Pages or widgets created for or in Lotus Mashups get cataloged back in InfoSphere MashupHub.  On Greehouse, Lotus Mashups can be reached here.

I will be periodicaly posting mashups on Greenhouse so that I can help explain the different use cases, but you will find many already there. 

Let us know your feedback.

July 08, 2008 | Permalink | Comments (1)

Database Research: What's hot and what's not

Recently, two of my colleagues -- Sekar Krishnamurthy and Guy Lohman -- did a piece of informal analysis of SIGMOD, VLDB, WWW and ICDE conferences, to understand what the academic and research community thinks are important areas.  While there is no chi-square tests and regression analysis here, I give it to you without commentary.  Enjoy.

What's Hot What's Not
Privacy & Security, esp. Anonymization Engine internals, esp. concurrency control
Schema mapping & data integration New index methods
Meta-data, provenance, and annotation Big DBMS systems-building projects (due to open source)
Stream Databases XML
Spatio-Temporal DB Traditional Query Optimization
OLAP and data warehouse query processing Autonomic computing

Pub-Sub
Experiments on PostgresSQL Experiments on MySQL

June 25, 2008 | Permalink | Comments (2)

Linked Data Planet

So I attended the Linked Data Planet conference in NYC last week.  Apart from the horrors of discovering that basic breakfast at the hotel can cost $35, it was quite an interesting experience.  Tim Berners-Lee spoke on day 1.  He has a fairly strict definition of what linked data means and it is at an instance level.  But I think his vision still lacks compelling use cases.  The philosophy seems to be, "build it and uses will emerge." 

Next day, I gave a keynote.  The points I made were roughly as follows

  1. While linking is important (1+1=11 in base 10!), it is not always the case that instance level linking is the only way to go.  Enterprises frequently link data at schema level, and people are also sometimes linking at some social level which might not get captured in the schema or instance level linking.
  2. There needs to be a virtuous cycle of "linked data -> value creation -> more linked data -> more value creation." Build it and they will come is not necessarily the best philosophy.
  3. For instance based linking, one needs tools to automatically structure all the existing information, and correctly link it together.  If one is to depend on only new information being properly linked, then the whole movement will suffer.  I gave example of two IBM technologies -- Avatar and Entity Analytics -- that can help in this area.  Of course, being a database person, I had to address the issue of persistence of linked (RDF) data.  While it is clear that databases were not naturally designed to be triple stores, the history of databases is littered with approaches that tried something different and went to /dev/null.
  4. I gave two expamples -- Many eyes and the mashup work that we and a FOAF (Joni Graves) are doing -- to illustrate that people are critical linkers, and often link at a schema level.
  5. Finally, in enterprises, I gave several examples of schema based linking -- master data, information integration etc., but which are often complemented by people and instance based linking.
  6. And even if all data gets linked, one has to worry about the business issues like quality of information, copyright etc.
  7. The conclusion was -- important interplay between people, schema and instances, none more dominant than the other; and that business and value creation has to go with the technology.

Many thanks to all the people whose works I referenced, and especially Shiv Vaithyanathan & co of the Avatar group to opening up my eyes to the instance vs. schema differences, and to Bob Schloss for providing me with a pulse of the audience so that I did not completely blow it.

While my enteprise-y focus probably turned off some people who believe in everything open, I did find many kindred spirits.  And I enjoyed interactions with many people I met.  Kingsley Idehen was so fascinating to talk to, I am going to folow up with him as I think through everything I will do to further our joint linked data vision.

All in all, I learnt a lot and hopefully I helped the people in the audience also understand a slightly different perspective.  Thanks to Ken North for arranging my talk there.

[Edit of 6/23/08].  Here are my slides...

June 22, 2008 | Permalink | Comments (11)

DB2 turns 25

Well, a few years back (I had not yet heard the term "Database" then), DB2 came out.  Its 25th anniversay (June 08) is an appropriate time for reminiscing, again given the Jim Gray tribute we just had.  Here are some articles on our 25th anniversary -- Eweek and InformationWeek.  However, what makes me most excited is that far from being commodities, databases are hubs of innovations, with newer capabilities (some highlighted in the eweek article above) continuously being added in and around databases.

June 17, 2008 | Permalink | Comments (2)

Jim Gray -- A Tribute

Yesterday I got a chance to visit my Alma Mater -- Berkeley -- to celebrate with other database folks the achievements (scientific, personal and others) of Jim Gray, who as all of you know, went missing more than a year back in the Pacific.  The tribute (not a memorial) was organized by many, but the hands of my advisor -- Mike Stonebraker -- were all over it, and I wanted to say personal thanks to him for such an informative, funny and poignant tribute.  I am so proud to be associated with a community that respects and recognizes great people.

Jim's technical achievements are many.  Let me name a few.  First, as Bruce Lindsay pointed out, he is the father of transaction processing -- the thing that makes businesses go around.  He is also a performance guru -- giving us the pre-cursors of the TPC-A through H that we all love and hate :)  He is a person that transcends company boundaries (as one Microsoft exec said in the tribute, they were just paying his salary for his work for the whole community) -- exemplified by Terraserver and Sky Server. 

But finally, and this came off in every speech -- he is a genuine human being -- a mentor for hundreds and a champion for science.  We all miss him.

My last meeting with him was at Stanford, where he and I reminisced about the new programming languages -- PHP's and Ruby, and how he told me he too was playing with them and felt that architects who did not code (architecting by powerpoint) were of diminishing value.  So for all his humanity and gentleness, he could be brutally honest too. 

Beyond remembering Jim Gray, this was an occasion to meet all the colleagues that I had not met in a long time.  I really love the database community.

June 01, 2008 | Permalink | Comments (1)

SDForum's 3rd Ruby on Rails Conference

So I gave a talk last Friday at SDForum's 3rd Ruby on Rails Conference, courtesy a request by Michael "Max" Maximilien.  My talk, "RoR for the Enterprise: Ready or Not?  A Database Perspective," built on lots of discussions with Leon Katsnelson, Klaus Roder and Max basically had these two main points:

  1. For RoR that manage their own data, the scalability and flexibility of RoR app building needs to be matched with the scalability and flexibility of the database.  DB2 is a great choice for this.  DB2 Express-C will get you started with no pain.  DB2 pureXML will allow you to manage XML (I gave an example of an in house application that Leon is building).  And DB2 performance, reliability and disaster recovery will allow you to grow.
  2. For RoR applications that extend existing databases and applications, there are also some positives.  In particular, our Info 2.0 vision and REST/Atom friendliness of Ruby go well together.  However, it is not a bed of roses for most cases.  Enterprise data and databases tend to be not as clean as what RoR wants.  Autoincrementing "id" field, really? Also, ruby skills, separation of concerns etc. make the style of RoR development and that of enterprise class applications not necessarily a great match.  So more work is needed there (I got comments which implied that the database vendors need to do all the things to come closer to the RoR community, which I think is extreme.  The database community has its own momentum, and it needs to be a partnership.)  What do you guys think?

At the end of the talk, one of the organizers caught me and taped me for a 5 min interview.  Highlights of the interview are here.

April 21, 2008 | Permalink | Comments (9)

Legacy Modernization and Skills

James Governor, with whom I had a very interesting unconference at IBM's Impact (more on it later), commented on a dialog he had with Yvonne Perkins in his blog -- on the whole aspect of CICS modernization.  Fundamentally, he is asserting that just wrappering is not enough, without skills growth a platform can only ever be a cash cow...

CICS has an IM counterpart -- IMS (which is often known as a database, but really is two things in one -- transaction manager and database). We are finding, indeed, skills to be an important issue, so we want to make the programming in and out of it easier, including soa verbs in and out, but a 100 other things. Xquery in and out. Mashups out.... I do not have the facts about our clients, but I am observing that in IMS development, this makes dev on ims also very exciting. And it attracts a lot of new talent.  Many start off at the edges, which due to modernization effort is almost no different than similar work in other middleware products that abound here at the site I am at.  But many get comfortable enough, that they migrate into the bowels of the IMS code, whose code, in many many cases, is older than them!!! A relatively large fraction of IMS developers satisfy this criterion!

So abstraction is not bad, it drives new usage, but for me, equally importantly, it drives new skills that keeps the platform going and going and going...

April 18, 2008 | Permalink | Comments (0)

Wow, all this excitement!!!

Google announces its own App Exchange!!! What would Amazon do? Databases are dead, long live the cloud!

Interesting, but wait!

If for an enterprise, data and information is the king, either data moves to analytics and processing in the cloud, or analytics and processing moves to the data. Which way the world will go? Can you really say?

And for a completely different perspective, see this.

April 09, 2008 | Permalink | Comments (0)

« | »