Kellblog

Formerly The Mark Logic CEO Blog, this blog is written by Dave Kellogg, CEO of Mark Logic Corporation, covering next-generation database management, enterprise search, and content management technologies along with commentary on Silicon Valley, venture capital, and the business of software.

Kellblog header image 2

Expert Questions Utility of Oracle 11g XML

January 19th, 2007 · No Comments

Here’s a great piece on SearchOracle where Oracle expert Brian Peasland is interviewed by news editor Mark Brunelli discussing the upcoming Oracle 11g database.

Here’s the piece of the article that talks about the new XML functionality.

Mark Brunelli: Oracle says the next version of its flagship DBMS — Oracle Database 11g — includes new XML-related upgrades. They focus on XML DB and include a new XML binary data type and a new XML index. Are these types of upgrades important? Why or why not?

Brian Peasland: To me, most of the XML stuff inside the database is not important at all because XML is structured in a hierarchical format, and it was proven in the 1980s that hierarchical formats for data do not allow for fast, efficient retrieval of that data. I’ve always been confused when they start putting XML format into a relational database. We can’t simply query one [piece of] XML data in one table and compare it to another one quickly and easily. Doing so normally means that you have to break the data apart. I’ve never been a fan of putting XML inside the database, but I haven’t had a need to really do that. I do know some people who do need to do that and it does make some sense for their specific application. But they’re storing XML more like someone would store a Word document or a .wav file inside the database. They’re not storing it as corporate data that they want to query on later. They want to actually store XML as a complete file, not as pieces of data.

Brunelli: Why do those people you speak of store XML as a complete file?

Peasland: For instance, the new version of Microsoft Office is going to XML as its standard format. Instead of storing the old Word-type proprietary document format inside the database, maybe they want to store this new XML format inside the database as a document. And then they can run something like Oracle text against that to do rapid searches of documents. But that’s a different use than what I see some people trying to do, where they’re taking corporate data such as a list of employees and storing that as XML inside the database. To me, those attributes and those rows which represent an instance of that entity should be stored in a relational table.

I think Peasland does a create job of capturing the difference between content and data, from a traditional, database-person point of view. What he’ saying (imho) is

  • If you have XML-wrapped data then I don’t understand why you wouldn’t un-XML-wrap it and store it in tables as you can easily and should rightfully and relationally do. I totally agree.
  • If you have XML documents (i.e., content), then perhaps there are some modest advantages to storing each document as a row in an Oracle table (in a column of type XML). If you do that then, well, you could search it using Oracle Text.

I think he’s proven by exclusion that RDBMS native/binary XML support is nearly useless. Case 1: If you have data, then store it in tables as data (that’s what an RDBMS is built to do). Case 2: If you have content and don’t want to do anything with it — but want it in a database for search, tracking, and backup purposes — then stuff it in an XML column. But hey, couldn’t you do that with a CLOB?

QED.

The case he’s forgotten of course is Case 2B: you have content and you want to do something with it — open it up, leverage the XML tags, do fine-grained search and retrieval, integrate and/or repurpose content. In that case, you want an XML content server like MarkLogic. That’s what it’s designed to do.

Tags: Oracle · XML · XML content server · relational database

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

Leave a Comment