Why is MongoDB wildly popular? It’s a data structure thing.

Updated 11/7/14: Fixed typos

“Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won’t usually need your code; it’ll be obvious.” – Eric Raymond, in The Cathedral and the Bazaar, 1997

Linguistic innovation

The fundamental task of programming is telling a computer how to do something.  Because of this, much of the innovation in the field of software development has been linguistic innovation; that is, innovation in the ease and effectiveness with which a programmer is able to instruct a computer system.

While machines operate in binary, we don’t talk to them that way. Every decade has introduced higher-level programming languages, and with each, an advancement in the ability of programmers to express themselves. These advancements include improvements in how we express data structures as well as how we express algorithms.

The Object-Relational impedance mismatch

Almost all modern programming languages support OO, and when we model entities in our code, we usually model them using a composition of primitive types (ints, strings, etc…), arrays, and objects.

While each language might handle the details differently, the idea of nested object structures has become our universal language for describing ‘things’.

The data structures we use to persist data have not evolved at the same rate. For the past 30 years the primary data structure for persistent data has been the Table – a set of Rows comprised of Columns containing scalar values (ints, strings, etc…). This is the world of the relational database, popularized in the 1980’s by its transactionality, speedy queries, space efficiency over other contemporary database systems, and a meat-eating ORCL salesforce.

The difference between the way we model things in code, via objects, and the way they are represented in persistent storage, via tables, has been the source of much difficulty for programmers. Millennia of man-effort have been put  against solving the problem of changing the shape of data from the object form to the relational form and back.

Tools called Object-Relational Mapping systems (ORMs) exist for every object-oriented language in existence, and even with these tools, almost any programmer will complain that doing O/R mapping in any meaningful way is a time-consuming chore.

Ted Neward hit it spot on when he said:

“Object-Relational mapping is the Vietnam of our industry”

There were attempts made at object databases in the 90s, but there was no technology that ever became a real alternative to the relational database. The document database, and in particular MongoDB, is the first successful Web-era object store, and because of that, represents the first big linguistic innovation in persistent data structures in a very long time. Instead of flat, two-dimensional tables of records, we have collections of rich, recursive, N-dimensional objects (a.k.a. documents) for records.

An Example: the Blog Post

Consider the blog post. Most likely you would have a class / object structure for modeling blog posts in your code, but if you are using a relational database to store your blog data, each entry would be spread across a handful of tables.

As a developer, you need to know how to convert each ‘BlogPost’ object to and from the set of tables that house them in the relational model.

A different approach

Using MongoDB, your blog posts can be stored in a single collection, with each entry looking like this:

{
    _id: 1234,
    author: { name: "Bob Davis", email : "bob@bob.com" },
    post: "In these troubled times I like to …",
    date: { $date: "2010-07-12 13:23UTC" },
    location: [ -121.2322, 42.1223222 ],
    rating: 2.2,
    comments: [
       { user: "jgs32@hotmail.com",
         upVotes: 22,
         downVotes: 14,
         text: "Great point! I agree" },
       { user: "holly.davidson@gmail.com",
         upVotes: 421,
         downVotes: 22,
         text: "You are a moron" }
    ],
    tags: [ "Politics", "Virginia" ]
 }

With a document database your data is stored almost exactly as it is represented in your program. There is no complex mapping exercise (although one often chooses to bind objects to instances of particular classes in code).

What’s MongoDB good for?

MongoDB is great for modeling many of the entities that back most modern web-apps, either consumer or enterprise:

  • Account and user profiles: can store arrays of addresses with ease
  • CMS: the flexible schema of MongoDB is great for heterogeneous collections of content types
  • Form data: MongoDB makes it easy to evolve the structure of form data over time
  • Blogs / user-generated content: can keep data with complex relationships together in one object
  • Messaging: vary message meta-data easily per message or message type without needing to maintain separate collections or schemas
  • System configuration: just a nice object graph of configuration values, which is very natural in MongoDB
  • Log data of any kind: structured log data is the future
  • Graphs: just objects and pointers – a perfect fit
  • Location based data: MongoDB understands geo-spatial coordinates and natively supports geo-spatial indexing

Looking forward: the data is the interface

There is a famous quote by Eric Raymond, in The Cathedral and the Bazaar (rephrasing an earlier quote by Fred Brooks from the famous The Mythical Man-Month):

“Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won’t  usually need your code; it’ll be obvious.”

Data structures embody the essence of our programs and our ideas. Therefore, as programmers, we are constantly inviting innovation in the ease with which we can define expressive data structures to model our application domain.

People often ask me why MongoDB is so wildly popular. I tell them it’s a data structure thing.

While MongoDB may have ridden onto the scene under the banner of scalability with the rest of the NoSQL database technologies,  the disproportionate success of MongoDB is largely based on its innovation as a data structure store that lets us more easily and expressively model the ‘things’ at the heart of our applications. For this reason MongoDB, or something very like it, will become the dominant database paradigm for operational data storage, with relational databases filling the role of a specialized tool.

Having the same basic data model in our code and in the database is the superior method for most use-cases, as it dramatically simplifies the task of application development, and eliminates the layers of complex mapping code that are otherwise required. While a JSON-based document database may in retrospect seem obvious (if it doesn’t yet, it will), doing it right, as the folks at 10gen have, represents a major innovation.

will@mongolab

Subscribe

Subscribe to our e-mail newsletter to receive updates.

  • http://twitter.com/morningtime_com Morningtime

    Thanks, I’m moving my ecommerce application from MySQL to MongoDb to store product catalogs. Previously this were several dozens of joined tables, now just 1 document – moving to Mongo is like going surfing in Hawaii.

    • http://twitter.com/benwen benwen

      Our Hawaii contingent agrees with you. 

    • http://collaborable.com/ Eric Ingram

      There’s a new open source e-commerce platform combined with MongoDB, and it’s available soon: http://getfwd.com

      • http://twitter.com/benwen benwen

        Cool stuff.  I saw getfwd.com a little while ago and thought it was cool.  Tnx for ping!  

  • ajung

    MongoDB is popular because it is used by people who think that it fits their brain but does not.
    MongoDB is popular because it attracts the former PHP/MySQL programmer swamp.

  • Christian

    Your example document structure (and real life document structures) contains redundant information: author: { name: “Bob Davis”, email : “bob@bob.com” }. If you don’t wan’t to update many documents if the author changes it’s email, and if you want to save space, you have to do something like joins too.

    • http://twitter.com/benwen benwen

      Flexible schema and document-orientedness doesn’t free a designer from making schema choices.  Even with relational table-oriented solutions at larger scales there’s a normalization vs. partial denormalization design tension.  Roughly that’s space/atomicity vs. access time trade-off.  One must also consider the relative frequency and value of each access pattern.  A human at a browser may have only a few seconds of patience, but a large batch analytics job may have overnight latencies, but be worth millions of dollars.  YMMV.

  • http://javarevisited.blogspot.com/ Javin Paul

    I have been hearing mango db from long time  but your blog example makes it quite easy to understand. Thanks for this great post.

  • andrewvc

    The oft quoted impedance mismatch isn’t due to bad software design in RDBMSes, but due to two fundamentally different systems interacting. A database is fundamentally different than an in-memory data-structure. Modeling in mongo can be much harder than it is in SQL in many ways.

    For instance, there’s no way to atomically update the author in your example since it’s so denormalized. For a blog, this probably doesn’t matter, for many other applications (say ecommerce) it very much does. Additionally, more normalized data is frequently much easier to work with, as a single atomic update can do the job of a large crawl/update in an object/document DB.

    Lastly, RDBMSes let you execute extremely powerful ad-hoc queries with very little syntax. AFAIK there’s no equivalent to a PG window function in mongo (and no, just saying you can always use map reduce doesn’t count).

    I do, btw appreciate the list of stuff Mongo is good for (though I see you guys left out stuff like finance as that would definitely go in a list of things Mongo is bad for).

    I would say though, that mongo makes little sense for log data (which is present in that list) unless you plan on always having less log data than memory due to mongo’s use of MMAP for everything. Log data usually needs to be queried, and mongo’s poor handling of on-disk data makes it a bad choice for this.

    • http://twitter.com/benwen benwen

      The claim isn’t that a relational table design is bad.  Far be it from that.  Codd et. al. are visionary.  The tension is that forcing a fully normalized state at rest and then having to constantly move back to the denormalized form is unnecessarily contributing to the heat death of the universe.  

      A flexibly schema’d denormalize-able data store lets a designer choose an intermediate point that may be more natural for the problem domain at hand.  As you note, that flexibility in modeling carries with it a burden of choosing the data representation in a well-fit manner.  

      For the blog/author example, choosing a partially normalized model, where an indirection to another collection (in relational-speak “table”) of authors is completely acceptable.  Part of the art therein is to understand the space/atomicity and access time trade-offs for the most common and most valuable paths. Of course, a relational database can store denormalized data as well; choosing the denormalization point is a practice of the craft. 

      What a core relational table physical model lacks here is the per-document (or row) flexibility of a document-orientation.  Once denormalized, being able to optionally include, extend, nest, or exclude key/values (or columns) is powerful and liberating.  There’s enough rope to hang oneself by over using that flexibility too, creating an unmaintainable knot of mismatched schema.  

      In MongoDB 2.2, the aggregation framework is highly anticipated and allows for some very interesting grouping and cross-document calculation. I’m not a PG guy, so if you’ll excuse the shallow interpretation, it seems similar to PG windows.  More here with an infographic that may help understanding.   http://blog.mongolab.com/2012/07/aggregation-example/ 
      As for logging, I’d be curious as to your opinion of fluentd, a syslog-like daemon for JSON formatted logs.  (Written by our friends over at Treasure Data.)  I’d argue that having a key-indexed datastore is more powerful for querying than a straight flat log search or fragile regex mechanism.  And in any case, fitting into memory wouldn’t work either for an ad hoc search.  Yes, a streaming search or regex would make sense, but then it’s less of a storage / query problem domain.  Thank you for the extended comment.

    • Jon

      Hi Andrewvc,

      Could you please speak further to MongoDB and Finance?

      Thanks

  • Kim

    MongoDB is excellent when you have _a lot_ of fairly similarly structured data (like logs..). It pretty much sucks for your normal “enterprise” system with a billion different entities and their relationships. 

    • Patrick Bohan

      I work on an “enterprise” MongoDB database that has lots of heterogeneous types of data, and it works beautifully, it is the exact opposite of “suck”.

      • Julian Reyes E

        @google-ad5a7252f318a4e27373ce8e4fadcb72:disqus can you help showing me a technical example. how to handle relationships or handle the update/deletes when the information is denormalized?

  • Pingback: – Found: One Baby in a Pool of Bathwater

  • Pingback: MongoLab explains why everyone loves MongoDB (and raises $5M) ← techtings

  • Pingback: MongoLab explains why everyone loves MongoDB (and raises $5M) — Data | GigaOM

  • http://www.facebook.com/benjamin.abbottscott Benjamin Abbott-Scott

    ORM unfortunately does not go away when using NoSQL.  It just bubbles up into the code, where it suffers the slings and errors of outrageous git pushes.

    Way back when, we had these discrete document collections, each referenced by a key, and you could perform inserts, updates, lock data, synchronize across multiple servers, add layers of caching, etc etc… It was called a filesystem.  NoSQL is little more than a directory tree of Storables, with more overhead, and less maintainability.

  • http://profiles.google.com/joshsled Josh Sled

    The quote at the top of your essay is unfairly attributed; ESR was re-phrasing and indeed quoting Brooks, from the /The Mythical Man Month/, Chapter 9.

    • http://mongolab.com MongoLab

      Well, yes and no. Technically correct. But imho there’s a fair bit of modern-day value-add in esr’s edits …

      (from http://www.free-soft.org/literature/papers/esr/cathedral-bazaar/cathedral-bazaar-5.html )
      :

      *9. Smart data structures and dumb code works a lot better than the other way around.*

      Brooks, Chapter 9: “Show me your and conceal your [data structures], and I shall continue to be mystified. Show me your [data structures], and I won’t usually need your ; it’ll be obvious.”
      Actually, he said “flowcharts” and “tables”. But allowing for thirty years of terminological/cultural shift, it’s almost the same point.

  • http://profiles.google.com/bousquet.n Nicolas Bousquet

    Thinking that NoSQL and MongoDB can make you ignore the ORM problem or its equivalent is naive.

    There several differences between let say a MongoDB document and a class hierarchy. There is no visibility, no inheritence for example. More, you’ll not have only one entity storing the whole object graph but many of theses. Maybe you’ll want not be fully normalized. That’s ok, but the same rules still apply.

    You still need to view your data structures as data structures and not as object models. You still need to understand consistency, concurrent update and take care that for depending on how you use your data you’ll need a totally different model.

    The problem of an ORM or its equivalent is not the objects, it is not the data. It is not the mapping. ORMs work just fine, if you understand them. That is you don’t try to map your complete data model to a complete object model. This will always fail for a non trivial example, even with a NoSQL flavor.

    You think to see your data as data. That happen eventually to be seen as objects if you are into that. But your object only contain relevant data for your current task, and mapping is not unversal but linked to this task. No lazy anything, no inheritence. And the mapping become easy and fast. Data problems are modelized in the database (relational or not) and that all.

    • Christopher Rueber

      Calling people naive, when you’re using examples of problems that have fairly simple solutions (visibility, inheritance), only makes you look like you’re just arguing to argue, without fully understanding how the solution works/can work.

  • Pingback: Links & reads for 2013 Week 6 | Martin's Weekly Curations

  • Pingback: Found: One Baby in a Pool of Bathwater | The Akiban Blog - Database Flexibility without Compromise

  • Pingback: OPENDBTEAM.com - Why is MongoDB wildly popular? It’s a data structure thing.

  • http://www.facebook.com/volodymyr.bilyachat Volodymyr Bilyachat

    I have short question lets say we have many places where user can comment and one place where we should show some thing like a feed, how is that possible to do that because What i am thinking about its just save duplicates in “feed” collection so when user post comment we will save in original document lets say for photo or article, and save duplicate in feed or is this here any better ways to figure it out?

  • http://www.facebook.com/skynet8 Akatsuki Sai

    I hope only one language only exist on the future… JAVASCRIPT! =p

    • Patrick Daures

      You must be kidding, right ?

    • Hermansyah Sofyan

      LOL!!! that never happens bro unless u r the only programmer left on earth

  • http://www.facebook.com/skynet8 Akatsuki Sai

    EXPRESS.IO, MONGODB, MONGOOSE, ANGULAR, BOOTSTRAP = FACEBOOK2! XD

    • Randy

      With that you cant be straight , you need to have backbone . : )

      • http://www.facebook.com/skynet8 Akatsuki Sai

        you mean backbonejs? yes, but i’m trying to make it simple and less code although i also think that backbone and angular can be combined.. I just want to experiment if it possible with just few recipe can make a cute mini facebook.. =)

        • sankalp singha

          Add ReactJs to it.. You definitely have the next FB :P

  • Ionut Manolache

    Ok, so do you many-to-many relations as long as there is no “join” in the NoSQL dbs?

    • Ionut Manolache

      *…how do you do….

  • obiwanginobli

    rails as ORM

  • rohan

    I have found a good tutorial to mongoDB.
    http://learnandsharetoall.blogspot.in/2014/01/what-is-mongodb.html

    Hopefully it helps.

  • Akatsuki Sai

    Performance, speed and easy to manipulate database are advantages of mongodb to mysql. But relation from table to another table or collection to another collection is quite a little more extra work needed. In mysql, it’s automatically cascade dependents data of the one to be deleted while in mongodb, you need to programmatically code to check first if there dependent data before deleting.. for user’s posts must be deleted first before deleting it’s account.. But I’m positive kind elite programmer there will make plugin to solve this problem.. =)

  • Pingback: MongoDB review

  • Pedro Figas

    I have started to use MongoDB recently and I can feel that along it’s benefits and features, it requires well thought-out data models and is also packed with knobs of configuration that need to be understood very well before hitting production (as in I took their course, read their manuals and experimented and I still feel like I don’t grasp it like a bauss).

    What I’d love is a Hybrid system! A database system that would somehow join the best of both worlds efficiently, mainly to save roundtrips to separate relational and a non-relational databases and provide querying like the Universe has never seen.

    I’m certain that someone has already built a demon of sorts, but I haven’t heard of it (at least without digging a few meters of Google soil). I’m sure the brain of the web will conjure up something rock-solid and community-strong soon.

  • Pingback: MongoDB and Java | CM Software Technology

  • Munier

    I am looking for an article that represents a general structure of object relational database

  • Shalin Siriwaradhana

    I got a question for you can object oriented concepts applied to javascripts?

    Creately

    • http://dandascalescu.com/ Dan Dascalescu

      I got a question for you do you speak English?

      • http://shalinsiriwardana.asia/ Shalin Siriwaradhana

        Sorry if i make you confused, its a question I had for OOP