Polymorphism

2009-12-23 – 15:49

Yesterday, I asked if anyone was building something like Fossil using a NoSQL database.  In response, someone named Pete (who didn’t leave a real email address) asked, “Why NoSQL? This is a perfect application for an SQL database.”  Respectfully, I disagree, but my reason will take a little explaining.

Over the past 16 months, several waves of students have been working with me on a replacement for Trac called Basie. Like Trac, Basie is meant to be a minimalist software forge: it combines version control, ticketing, wikis, the like into one package, but is much simpler than the open source and commercial forges that Jordi Cabot and I surveyed. Unlike Trac, Basie is built using modern web tools (Django and jQuery), and supports multiple projects per forge and per-project mailing lists out of the box.

We’ve dealt with quite a few design challenges while building Basie, and have a few more piled up to worry about in January—see, for example, Ian Lienert’s post about deleting vs. hiding, or Andrew Schurman’s look at why integrating with IRC is hard (short answer: channel management). Many of the hardest challenges, though, have a common root cause: relational databases don’t support polymorphism. Take tagging, for example: in order to find all items tagged with “upgrade”, we have to issue and aggregate multiple queries, because the entities that have been tagged are stored in separate tables. Not only does this hurt performance and make the code (much) harder to understand, it also means makes plug-and-play extensibility a lot harder, since anyone who wants to add a new module to Basie has to either edit the tagging code to reference that new module, or wrestle with some not-yet-implemented registration and callback mechanism that moves the grief out of Django’s ORM (where it belongs) and into pure Python code.

We ran into the same problem building the status dashboard, where we wanted one module (the dashboard) to be able to ask questions of others without knowing exactly what those others were or how they were implemented. This is trivial in a programming language that provides polymorphism (as almost all modern languages do), but there’s no standard, straightforward way to do it with SQL.

Hence my interest in NoSQL databases. What they’re explicitly doing is setting aside the “rows and columns” model in favor of—well, in favor of a bunch of different things, depending on which one we’re talking about. But in doing that, they’re sort-of-accidentally making a lot of other innovations possible. After all, if PostgreSQL came out with some kind of polymorphism extension, I probably wouldn’t use it, because I wouldn’t want to be tied to any one relational database. If I choose to use MongoDB or CouchDB, though, I’m committing to a single-source solution anyway, so why not make full use of everything it offers?  Simon Willison (who knows much more about all of this than I do) made a similar point in July when asked how hard it would be to get Django running on top of MongoDB:

I remain sceptical of projects that attempt to map Django’s extremely relational ORM to non-relational backends. Why would you want to do this in the first place? Presumably because you want to use parts of the Django ecosystem - in particular the admin, generic views and pagination - with a different persistent store.

I would argue that you don’t want a ORM backend for MongoDB - instead, you want the admin, generic views and pagination to work with alternative storage mechanisms. Instead of depending directly on the ORM, they should make use of an abstract interface which can be mapped to the ORM but can also map to other types of persistent store.

See also this article from John Nunemaker.

  1. 10 Responses to “Polymorphism”

  2. I thought it would be worth pointing out that SQLAlchemy does polymorphism wonderfully. I’ve used it in several projects with great success:

    http://www.sqlalchemy.org/docs/05/mappers.html#mapping-class-inheritance-hierarchies

    By Travis Bradshaw on Dec 24, 2009

  3. This is a very interesting take on no sql dbs. I like idea. I’ll also take a look at Basie

    By Jorge Vargas on Dec 24, 2009

  4. Have you tried zodb?

    There are lots of tools that use zodb, and you can even use sql dbs as a backend if you like (with relstorage).

    Personally I’m using a very simple key/value store. ( see http://renesd.blogspot.com/2009/12/pywebsitesqlitepickle-sqlite-vs-pickle.html ) Since zodb does not support python3 and there seems to be no roadmap for doing so.

    By Rene Dudfield on Dec 24, 2009

  5. @Travis Thanks for the pointer — nice to have the hackery hidden under an ORM layer, but I’d still like a DB that made the hackery unnecessary :-)

    By Greg Wilson on Dec 24, 2009

  6. This post has been itching my brain all evening, to the point of losing sleep. There are three things here to me:

    1. NoSQL - exactly as Greg says: if you’re going to use this stuff, abstracting away the platform doesn’t make sense.

    2. *SQL - for things that SQL does well, one probably should continue to use it even in a NoSQL application, e.g. account management. And there’s no point here complaining that you can’t do joins from the *SQL to NoSQL world (e.g. again, to map say a user account into NoSQL rows), since normalization, joins, etc are what you gave up to gain the NoSQL features.

    3. Common stuff - independent of whether one is doing *SQL or NoSQL applications, or hybrids, there’s things that are common to both/all: paging through results (row level stuff), displaying results in templates (row data stuff), and so forth. Here’s where we’d like to see common code, no matter what the back end that’s providing the data. Unfortunately, there’s probably very few people with the synthesis knowledge and reputation to adapt Django to this right now.

    By David Janes on Dec 24, 2009

  7. Greg, Fossil *is* a NoSQL database - albeit one that is specialized to the particular task of version control, not a general purpose library designed to build other applications around. Fossil is not tied to SQL or SQLite. The current Fossil implementation uses SQLite as its local data store and as a high-level scripting language to simplify the implementation, but you could very easily reimplement Fossil using any old key/value pair or pile-of-files approach for the local data storage - you’d just need to work a little harder and write a bunch more code. SQLite is used in Fossil because it helps make the job easier and it was readily at hand. Nothing in the file format specification for Fossil says anything about SQL, relations, or tables. Fossil is in no way designed around SQL or SQLite.

    But your blog does bring out the fact that I need to do a better job of explaining the concepts behind Fossil and perhaps even use the word “NoSQL” someplace in the Fossil documentation. I’ll work on that after the Christmas break….

    By D. Richard Hipp on Dec 25, 2009

  8. I think there’s a conflating problem, which is the habit of ORMs to encourage programmers to use ORM classes to hold model logic. It’s currently a commonly accepted “good practice” but I disagree with it.

    I think apps requiring great flexibility should make more frequent use of the State pattern, delegating persistence to ORM objects but retaining model logic. I think the additional layer of indirection affords the flexibility needed to solve problems like the tagging problem, because a domain object like a bug report can delegate to more than ORM object, and be easily decorated by things like “Taggable,” when you want to avoid having multiple tag namespaces.

    On a separate note, I’m not sure I follow your logic about not using e.g. Postgres’s full power. If Postgres had an extension that fixed your problem, why would it be worse to be tied to it than it is to be tied to one specific NoSQL program?

    By Aran on Dec 26, 2009

  9. Aran:

    I think part of the attraction of using SQL is that you can switch backends, so you don’t want to use Postgres-specific features because that would tie you down.

    On the other hand, once you’ve picked, say CouchDB, there’s really nothing else you could easily switch to, so you might as well take full advantage of CouchDB’s power. (And we’re not going to mention anything about not _actually_ being able to easily switch to a different SQL backend. ;)

    By Blake Winton on Jan 2, 2010

  10. As promised, additional thoughts on why Fossil is
    a NoSQL database:

    http://www.fossil-scm.org/fossil/doc/tip/www/theory1.wiki

    By D. Richard Hipp on Jan 5, 2010

  11. You might be interested in this project which wants to add support for non-relational DBs in Django’s ORM:

    http://bitbucket.org/wkornewald/django-nonrel-multidb/

    By Waldemar Kornewald on Jan 8, 2010

Post a Comment