Kellblog

This blog is written by Dave Kellogg, CEO of MarkLogic Corporation, covering next-generation information management, enterprise search, and content management technologies along with commentary on Silicon Valley, venture capital, and the business of software.

Kellblog header image 2

The Database Tea Party: The NoSQL Movement

February 24th, 2010 · 35 Comments

Adam Smith’s invisible hand never rests.  Just five years ago, the database market looked like a static, three-player $10B/year oligopoly where the primary forces were inertia and profit-taking.  Today, we have two major forces disrupting the comfortable stasis that has developed over the past 30 years.

  • One force is DBMS specialization:  while the general-purpose RDBMS is useful for a broad range of applications, it is optimal for few of them.  The RDBMS has slowly become expensive bloatware that is functionally a jack of all trades, master of none.  MIT’s Michael Stonebraker calls the RDBMS a one size fits all solution.
  • The other force is NoSQL, an organic and rapidly-growing industry movement away from relational databases, driven by a number of factors including both technology and cost.

The purpose of this post is to share my thoughts on NoSQL.  Make no mistake, like the Tea Party Movement, NoSQL is a rebellion; just look at the name.  But like most demonstrations, not everyone is marching for the same reasons.  Here are some of the things I think various members of the NoSQL crowd are marching against:

  • Table-oriented, 1960s-era database technology:  RDBMSs were designed for handling data and short-text fields, necessitate mapping programmatic objects to tables (i.e., the impedance mismatch), and require the use of an increasingly stone-age query language, SQL.
  • Scalability:  relational databases were not designed to handle and do not generally cope well with Internet-scale, “big data” applications.  Most of the big Internet companies (e.g., Google, Yahoo, Facebook) do not rely on RDBMS technology for this reason.
  • High prices and the heavy-handed treatment of customers:  both stem from the underlying oligopoly and the lack of credible alternative suppliers
  • Closed source:  the inability to customize the internals of the DBMS engine to meet specific needs
  • Bloatware:  ironically that while RDBMSs are perceived as light in requirements that matter (e.g., scalability), they are  also seen as over-engineered for features that don’t.  (ACID transactions are a favorite target in this department.)
  • DBA supremacy.  For years, corporate DBAs called the shots on where strategic data assets would be stored, and thus how they would be accessed.  This created headaches for the programmers of the world who, in response, have done as much as possible to abstract away the database (e.g., Ruby on Rails).

On the flip side, there are things the NoSQL crowd are fighting for:

  • Open source, implying control.  The ability that open source software provides to customize product functionality.
  • Open source, implying free.  The often-flawed notion that the absence of software license fees results in a reduced lifetime cost of ownership.
  • Coolness, or the “I want to be like Google” effect.  If Google’s got BigTable,  Yahoo’s got Hadoop, and Facebook’s got Cassandra, then we should build our own, too.  Our app is hard; we’re smart guys, too.
  • Vengeance, or the “I’m so mad at Oracle that I’ll do anything” effect.  Yes, some folks are just plain mad enough at Oracle to either go write their own DBMS, or take on the support of a very low-level infrastructure technology.

So, if you’re considering a NoSQL solution — a class in which I include MarkLogic — you need to figure out what you’re marching against, what you’re fighting for, and ultimately what will meet your needs at the lowest total cost of ownership.

My first recommendation to detect and, where applicable, kill off the coolness effect.  Google is swimming in money and PhDs.  They can build anything they want regardless of whether they should and, right or wrong,  for Google it just doesn’t matter.  So unless you have Google’s business model and talent pool, you probably shouldn’t copy their development tendencies.

Heck, I get the coolness attraction.  I think infrastructure software is cool, too.  That’s why I was an OS geek early on and have spent my career around databases.  But I surely don’t think that F1000 companies and government agencies should build their own DBMSs, nor fall into the trap of thinking that open source low-level stores are a free and easy way to avoid Oracle license fees.  Cool shouldn’t be in the equation.  Technology suitability and total cost should be.  Period.

My second recommendation is to orthogonalize the open source question, making it independent of functional requirements.  (This breaks if source customization is a requirement, but remember that requirement is often fictional:  most open source users don’t customize.)  If you’re struggling with an RDBMS on a given application problem you shouldn’t say:  we need an open source, NoSQL type thing.  You should say:  we need to look at relational database alternatives.  Those alternatives include a open source database projects (e.g., MongoDB, CouchDB) and distributed computing frameworks (e.g., Hadoop), but they also include commercial software offerings such as specialized DBMSs like Streambase (for real-time streams), Aster (for analytics on big data), and MarkLogic (for semi-structured data).  Don’t throw out the commercial-software-benefits baby with the RDBMS bathwater.

My personal take on this issue is that:

  • Relational databases, like the mainframe in 1985,  are entering the Autumn of their lives.  They won’t die quickly and mainframe isn’t dead today, but their best days are behind them.
  • Our kids will see SQL the way we see COBOL.  Some people can’t stand when I say this, but I think they’re in denial.  There is no logical reason to assume that the relational database and the SQL language are the endpoints in database evolution.  Yes, Larry Ellison is powerful.  But Adam Smith is more so.
  • Our kids will see no data/document dichotomy.  They will just see digital information.  We need to understand and remember that the data/document dichotomy is an artifact of the limitations of the tools and technologies with which we grew up.
  • Some of the NoSQL hype is an over-reaction to the database oligopoly.  I believe there are organizations out there who should be using alternative commercial databases, but instead are using open source NoSQL-type projects due to coolness, anger, or a mistaken belief that open source always has a lower total cost of ownership.  I believe rationality will return to these people.  One day management will say:  “Holy cow!  Why in the world are we paying programmers to write and support software at this low a level?”  (This is potentially avoidable if you can mentally project yourself into the future now and imagine how you will look back at the coming three years.)
  • Some of the NoSQL hype is a valid reaction to the technological limits of relational databases and the impedance mismatch in programming on them.

In the end, I think it’s great that the NoSQL movement is happening.  It’s awakening people to traditional RDBMS alternatives.  It’s making people understand that they don’t have to write big checks for commodity software.  It’s helping people solve problems that they can’t solve, or solve efficiently, on relational technology.

My axe to grind is simple:  just because you’re throwing out Oracle, don’t throw out all DBMSs and all commercial software with it.  Take a breath.  Look at all your alternatives.  Study total costs and technology applicability.  And make your best decision.

Interesting Writings on NoSQL

Tags: Michael Stonebraker · NoSQL · RDBMS · relational database

35 responses so far ↓

  • 1 The NoSQL Movement: The Object – RDMBS Incompatability : HalWebGuy. Online Media Geek. // Feb 25, 2010 at 6:03 am

    [...] language.  And while I don’t have the gripes that others may have with the language itself [http://www.kellblog.com/2010/02/24/the-database-tea-party-the-nosql-movement/], relational databases do come with a bag full of limitations and challenges.  I’ll tell you [...]

  • 2 GregorDV // Feb 25, 2010 at 6:16 am

    A very interesting write-up with one little oversight: you’re wrong.

    I am part of a large program to write a NoSQL database for military applications. It’s not a backlash against paying Oracle (the DoD has a blanket license for Oracle installations) or a philosophical stance by the hippies in the defense arena; it’s the fact that RDBMSs are built in a different space in the CAP trades (see http://www.julianbrowne.com/article/viewer/brewers-cap-theorem).

    Google, Amazon, Facebook, and DARPA all recognized that when you scale systems large enough, you can never put enough iron in one place to get the job done (and you wouldn’t want to, to prevent a single point of failure). Once you accept that you have a distributed system, you need to give up consistency or availability, which the fundamental transactionality of traditional RDBMSs cannot abide. Based on the realization that something fundamentally different needed to be built, a lot of Very Smart People tackled the problem in a variety of different ways, making different trades along the way. Eventually, we all started getting together and trading ideas, and we realized that we needed some moniker to call all of these different databases that were not the traditional relational databases. The NoSQL name was coined more along the lines of “anything outside of the SQL part of the Venn diagram” rather than “opposed to SQL”.

    So – the NoSQL databases are a pragmatic response to growing scale of databases and the falling prices of commodity hardware. It’s not a noble counterculture movement (although it does attract the sort that have a great deal of mental flexibility), it’s just a way to get business done cheaper.

  • 3 Saying yes to NoSQL — Too much information // Feb 25, 2010 at 6:18 am

    [...] questions about the general applicability of the key-value/column table stores. As Dave Kellog notes, “unless you’ve got Google’s business model and talent pool, you probably shouldn’t [...]

  • 4 George James // Feb 25, 2010 at 6:48 am

    I think you are over analyzing. There’s nothing wrong with relational theory, its good stuff. NoSQL is a kickback agains SQL, plain and simple.

    The problem is that SQL is a poor way of interfacing a program to a database. What’s more it hasn’t evolved since about 1985, whereas programming languages have.

    The NoSQL movement has lumped together a host of disparate solutions that have just one common factor. No SQL. The reason they all exist is because the existing SQL based databases were not up to the job and the main thing holding back Oracle and the like is that they are wedded to SQL.

    People are now able to start thinking about the relationship between their code and it’s data in a fresh and innovative way. It’ll be interesting (and exciting) to see how this plays out.

    Fwiw, my money is on REST and JSON with who knows what behind that.

  • 5 Dominique Rabeuf // Feb 25, 2010 at 7:21 am

    About eight months ago, I have attempted to formalize how XML documents could be represented in an SQL database using the schema definition (I named it: XSD to DDL).

    I have summarized here http://www.web21th.com/schemas/xsd-ddl.htm
    I quickly stopped because the work became a great complexity.
    At this time I searched to use XForms with a relational database.
    It was actually a little silly, but this experience convinced me that I had to give up with RDBMS and then adopt native XML.

    I have not devote as much time as I wanted.
    But finally, new style usage go to finger tips (and brain too) very quickly if we find the useful information and that one gets in touch with the experts and advanced users.

    And about autumn of SQL, remember Cullinet.

    I both now use eXist-db and Mark Logic, facing XSLTForms

    One must of course make an effort to deepen XML, including XQuery / XForms / XML Schema.
    Read also the technical documentation (administration, programming interfaces) databases.

    We check very quickly that there is not only a volume gain in development, but that application naturally follows the objectives and design, and also large savings of administration and database maintenance will be at the rendezvous

  • 6 jacobus cambria // Feb 25, 2010 at 8:29 am

    You may want to do some more investigation as to why SQL really has nothing to do with the Relational Model of data.

    Part of the reason that SQL DBMSs are having difficulty is that they don’t obey the principles of the Relational model.

    This just goes to show that everything old in the computer industry gets to be renamed and become new again. A lot of these “new” projects are based on database ideas that were rejected in the 1960s precisely because they don’t work.

  • 7 NoSQL explained correctly (finally) « Otaku, Cedric's weblog // Feb 25, 2010 at 8:44 am

    [...] This is a comment to this article. [...]

  • 8 Bradford // Feb 25, 2010 at 12:33 pm

    Wow. “Tea Party”. Really? Where do I even start…they’re idiot reactionaries. NoSQL folks are engineers who want to make life easier :)

    “NoSQL” not about being anti-SQL, it’s about using the right tool for the job. And it’s not just about “big data”, it’s about ease of use and effortless scalability.

    That’s why we built the Drawn to Scale platform. (http://www.drawntoscalehq.com)

    The RDBMS was designed over 40 years ago for hardware and software that just doesn’t make sense for every use case out there. It’s difficult to scale, doesn’t work with modern data

    I agree: study *all* your technology, and make the best choice. Now, thanks to NoSQL, that’s a possibility.

  • 9 Bradford // Feb 25, 2010 at 12:54 pm

    Oh… and Hadoop isn’t a key-value store. It’s a distributed filesystem combined with batch-processing framework.

    HBase, the database built on top of Hadoop, is modeled after BigTable. It’s a distributed, column-oriented DB where you store and retrieve data by a key. It’s also used by dozens of companies in production.

  • 10 Lonny // Feb 25, 2010 at 7:34 pm

    You left off the .com in the link to your own company on this post. Just thought you might like to know. You can delete my comment after you see this.

  • 11 SQL at 40: Ready for Retirement? : Beyond Search // Feb 25, 2010 at 9:04 pm

    [...] (formerly the Web log of the CEO of Mark Logic Corporation). The title caught my attention: “The Database Tea Party: The NoSQL Movement.” If you are struggling with your favorite 50-year old database technology, you will want to read [...]

  • 12 Mike // Feb 26, 2010 at 10:13 am

    I recently blogged about the rise of NoSQL in the cloud and the threat to the SQL world, with predictions on how it will play out. http://scaledb.blogspot.com/2010/02/will-nosql-movement-unseat-database.html

  • 13 PLM Platforms: Retirement Or noSQL Knock-Out? « Daily PLM Think Tank Blog // Mar 1, 2010 at 2:28 am

    [...] term, I’d recommend to take a look on this wikipedia article. Also, I found the following article – The noSQL movement, written by Dave Kellogg on his blog as a very interesting research in [...]

  • 14 NoSQL y el fin de las bases de datos como las conocíamos | Incognitosis // Mar 1, 2010 at 6:20 am

    [...] explica muy claramente Dave Kellog en un artículo en su blog en el que analiza el actual estado del mercado de las bases de datos, y que se divide claramente en [...]

  • 15 rogerdv // Mar 1, 2010 at 8:10 am

    I dont think the traditional RDBMS will die. How many people needs to handle the amount of data that requires discarding Oracle? Just a few. There will be market for Oracle, MySQL, etc, for a long time.

  • 16 Dave Kellogg // Mar 2, 2010 at 3:08 pm

    I don’t disagree. Things in IT take a very long time to die. The question is not literally whether something will go away; it’s whether it’s best days are ahead or behind. As I think I mentioned in some post, the mainframe started “dying” in the mid 1980s (and COBOL along with it). But yes, both are still here today. So it’s not binary. But my belief is that the general-purpose RDBMS is and will be challenged — which for the past 25 or so years it really hasn’t, save for OLAP servers — and the market will change from “which of the big three do I use” to a much broader set of alternatives.

  • 17 Dave Kellogg // Mar 2, 2010 at 3:17 pm

    mike,

    nice post, thanks for sharing.

  • 18 Dave Kellogg // Mar 2, 2010 at 3:20 pm

    Thanks Lonny, pretty funny. Helps to prove that this blog isn’t really all about marketing Mark Logic.

  • 19 Dave Kellogg // Mar 2, 2010 at 3:22 pm

    Bradford,

    On Hadoop, I think you’re right and Wikipedia largely agrees with you: Apache Hadoop is a Java software framework that supports data-intensive distributed applications under a free license. More here.

  • 20 Dave Kellogg // Mar 2, 2010 at 3:25 pm

    Bradford, Thanks. I wasn’t trying to extend the tea party metaphor to anything other the generic idea of rebellion — beyond the basic notion of rebelling against the status quo I wasn’t trying to draw any other similarities. Plus, hey, a good headline is a good headline.

  • 21 Dave Kellogg // Mar 2, 2010 at 3:28 pm

    Jacobus,

    I’m aware that SQL and the relational model are *not* two sides of the same coin, but in practice SQL and today’s relational databases are. i.e., As a matter of theory, you are correct, but as a matter of practice, SQL is the language to which one speaks to an RDBMS. People interested more in relational theory will enjoy this book by Chris Date.

  • 22 Dave Kellogg // Mar 2, 2010 at 3:29 pm

    Dominique, thanks for your comment. I do indeed remember Cullinet.

  • 23 Dave Kellogg // Mar 2, 2010 at 3:31 pm

    George,

    Thanks for your comment. Relational theory is fine (see prior comment response). I think for a combination of the reasons listed in the post many people are tired of RDBMSs. For some it’s the pricing. For others, it’s the programming language interfaces (but, btw, object databases had wonderful programming interfaces). To me the NoSQL alternatives are a mixed bag as the reasons that people are attracted to the NoSQL movement.

  • 24 Dave Kellogg // Mar 2, 2010 at 3:42 pm

    Gregor,

    Thanks for your opinion. I am happy to hear the story of why you and your project were drawn to a NoSQL solution. And thanks for your reference to Brewer’s CAP “theorem” though I think this reference is a little more straightforward (and less colorful) than the one you provided. For those interested, here’s the original talk.

    Based on what you commented, I’d classify you as someone who went NoSQL because of one of the reasons I cited: ACID transactions are overkill / inappropriate / must be traded off for what you are trying to do. That’s one of about 8 reasons I give for why people go NoSQL and I’m certainly not trying to suggest either [1] that every person joins for every reason or [2] that every person who joins considers themselves in a movement.

    By the way, for someone disagreeing with me, you’re agreeing a lot: you did exactly what I suggested which was: step away from the hype, consider your alternatives, and pick the best solution for you and at the best cost. So I think your primary beef is me calling it a movement, and here I’d say we simply disagree. I do believe NoSQL is a movement, though I didn’t mean to suggest anything “noble” about it.

    By the way, those DoD Oracle licenses may have been “free” for your project if you choose to use them, but they were certainly not “free” for the DoD. Oracle does roughly $5B/year in DBMS revenues generating >50% operating margins. And while your project may have had “free” access to Oracle someone up the chain is indeed paying for those licenses and those people are starting to issue directives to look at cheaper solutions/alternatives.

  • 25 Andy // Mar 2, 2010 at 5:01 pm

    I don’t think it makes sense for you to say the NoSQL crowd is fighting against closed source.

    There are plenty of open source choices for RDBMS: Postgresql, MySQL, Firebird. Yes they are all free and have huge user bases (especially MySQL)

    The companies you cited as “not relying on RDBMS technology” – google, yahoo, facebook – are all in reality heavy users of MySQL. Google and Facebook even have their own patches for MySQL.

    The use of NoSQL vs. SQL databases simply comes down to this: use the right tool for the right job.

  • 26 Dave Kellogg // Mar 2, 2010 at 5:27 pm

    Hi Andy,

    To be clear, I think the NoSQL thing is a movement/protest (which some commenters clearly disagree with), but like all protests people “in the march” at there for different reasons.

    So when I made the list of things people were protesting against, I did not mean to imply that all protestors were protesting for all items on the list.

    I’m sure FB, Google et alia us RDBMSs in some places — hey anyone who uses salesforce.com is indirectly using Oracle. But for core infrastructure for their web apps they are using a lot in-house built stuff that they’ve then open sourced.

    Agree on the right tool for the right job thing, but remember to consider all alternatives and total cost. To me, it’s not just Oracle vs. Hadoop. It’s a bunch of RDBMSs and a bunch of database alternatives.

  • 27 Chris Wallace // Mar 2, 2010 at 6:11 pm

    Dave,

    Its been a few years (20+) since we closed the bar at the Redmond Marriott after many hours of solving the world’s problems. I’m delighted to see you’ve still got the enthusiasm for this generally cynical industry.

    Keep up the thought provoking dialogue.

  • 28 Dave Kellogg // Mar 3, 2010 at 6:54 am

    Chris,

    Great to hear from you and thanks for reading.

  • 29 Krishnan Parasuraman // Mar 4, 2010 at 12:15 pm

    Mark,
    Excellent post. As usual, you have captured all elements of the arguments succinctly. I would like to corroborate your conclusions with one additional point. Would be curious to know your take on that.

    One thing we can not overrule is the deep penetration of SQL within the business/end user community. However sophisticated BI tools an organization might have, you often find business users rolling up their sleeves and introspecting data using SQL.
    SQL is declarative and turing complete and offers a data introspection paradigm that relates to an analysts thought process. It may not necessarily be elegant for many classes of problems but at least it bears familiarity.
    In order to drive No-SQL to the same level of adoption within the business user community, we will need a fairly sophisticated set of tools that abstract the language nitty-grittyies. People have been trying to do that for the last 25 years for procedural languages and have not succeeded. We have simplified the programatic access (e.g. Ruby on Rails) but still its a skill barrier that most end users find too daunting to overcome.

    In my opinion we might end up with a more hybrid approach towards the SQL limitations. Many of the specialized massively parallel data warehousing technologies are beginning to provide hooks and IDEs to write procedural routines, that can be executed in parallel, and invoked from SQL.
    So its likely that the next evolution of SQL might be “Mixed-SQL” where one preserves the simplicity of SQL but amplifies its capabilities thru the salient features of a No-SQL like movement.

  • 30 Links 7/3/2010: Deutsche Börse and Red Hat | Boycott Novell // Mar 7, 2010 at 3:39 pm

    [...] The Database Tea Party: The NoSQL Movement In the end, I think it’s great that the NoSQL movement is happening. It’s awakening people to traditional RDBMS alternatives. It’s making people understand that they don’t have to write big checks for commodity software. It’s helping people solve problems that they can’t solve, or solve efficiently, on relational technology. [...]

  • 31 Search Facets » Let’s not let “NoSQL” go the way of “Web 2.0” // Mar 10, 2010 at 1:26 pm

    [...] Kellogg of Mark Logic recently addressed this question in relation to their XML database and reached a different answer. His post provides a nice overview [...]

  • 32 Dan Kubb // Mar 13, 2010 at 10:19 pm

    Dave, I think you might want to know that Date renamed the book Database in Depth and updated the information in: http://amzn.com/0596523068

    I read both those books back to back, and I can say that while alot of the information is similar, but the new one does go into alot more detail in several areas.

  • 33 SQL or NoSQL? « TechLedger // Mar 28, 2010 at 10:56 pm

    [...] 29, 2010 in Database Dave Kellogg: We have two major forces disrupting the comfortable stasis that has developed over the past 30 [...]

  • 34 รับสร้างบ้าน // Apr 7, 2010 at 9:31 pm

    Hello everyone and blog owners that bring water to be useful information to share thanks.

  • 35 ตกแต่งบ้าน // Apr 27, 2010 at 1:01 am

    Hi, everyone, and blog owners can bring useful information to share knowledge really cute Thank you very much.ตกแต่งภายใน

Leave a Comment