Go On, Convince Me

2007-06-12 – 20:54

(Posted on behalf of Alex Krizhevsky.)

I and a few other students have spent the last four weeks working on DrProject, a project management tool derived from Trac, and written in Python, that has been under development at the University of Toronto since 2002. We are here for the entire summer under the supervision of Karen Reid and Greg Wilson, whom we like so much that a small movement (OK, just me) has emerged with the noble goal of making the stuffed likeness of Greg DrProject’s official mascot of.

One of the tasks with which I have not been particularly involved is porting the database code in DrProject from Trac’s built-in object-relational mapper (ORM) to one that is becoming somewhat of a standard for Python applications, called SQLAlchemy. So what have I learned? I’ve learned that there exist these things called object-relational mappers. But I should have known about them earlier, because my dad spent the last few years writing one in Java (he was unhappy with Hibernate, the most popular Java ORM/persistence framework).

I’m still having trouble convincing myself that these things offer a distinct advantage over plain-old SQL, at least when we have to write the object definitions by hand. That’s a pain. Why shouldn’t the objects be generated from the table definitions? The table definitions are right there. ORMs like SQLAlchemy make the code harder to modify, too, as any change in the database definition must be (manually) reflected in the object definitions. (Unless you use the ORM to define the database. But it seems like a database should be independent of the application accessing it. And defining the database with an ORM is redundant if you have more than one application accessing the database. Plus it forces you to learn a new database definition language.)

And another thing! SQL is much more portable than ORMs. ORMs are vastly different, and they each have their own syntax. But SQL is, for the most part, standardized. Switching to SQLAlchemy is taking us quite a bit of effort. But switching from MySQL to PostgreSQL, for example, would involve much less manual labour. It would be easier on our backs, and we’d live longer. But I concede that this is somewhat of an apples-to-oranges comparison.

And a third thing! I don’t trust the SQL generated by my ORM. Who knows what it’s doing? Is it running one query or 100? (see: N+1 selects problem.) Is the query efficient?

I get this much: the reason objects are nice is because they have types. You can access a field of an object mapped to a database table and it will be a date or an int or whatever it’s supposed to be. In Java this is a real benefit, because the only alternative is to run an SQL query and get back an Object[] with a value for each column. Then, you litter your code with casts. Such code quickly becomes very ugly and hard to read and love. But Python doesn’t have this problem. It’s untyped.

Here’s another pro-ORM argument I get: lots of very short, trivial SQL statements littered throughout the code make its purpose hard to understand. I agree with this, and here I see a clear advantage of ORMs. But often the queries get complicated, and then we’re back to writing complex SQL, except with objects. This is insanely stupid, and if I weren’t so in control of my emotions I’d be inclined to say something more emotional here. But since I’m not emotional, I don’t need to calm down. So on to the next sentence: it can be argued that such code is always the result of a poor mapping between objects and database tables. A mapping that doesn’t capture all of the relationships. Maybe. Sometimes. I still suspect that complex queries are at times unavoidable. Like I said, I haven’t been involved with a lot of ORM code. But anyone can blog. It’s a whole Internet universe of mediocrity, and I’m participating!

So here’s a final mediocre thought: ORMs that generate object definitions might be worthwhile, I’m not sure. But ORMs like SQLAlchemy that force us to write our own object definitions are stupid and should not be used. I’d give more mediocre reasons, but this blog entry is already long enough. But please, convince me that they are totally awesome.

  1. 4 Responses to “Go On, Convince Me”

  2. 1. http://erosson.com/migrate/trac/
    2.0. MySQL vs PostgreSQL, maybe. Try building a compatibility layer between MySQL and SQLite.
    2.1. Code reuse. Define your model once, and have all your favourite modules use it and the common codebase for accessing the database… If you’re going to say “I could do that anyways by writing a central library”, then you’re just reinventing what SQLAlchemy provides.
    2.2. Plus, it’s fun to tell it to join five different tables and having it know exactly what keys to join on. Or better yet, let it figure out on its own that some joins are required in order to get the data you need.
    3. engine.echo = True, or get familiar with what each method does and you’ll know how it’s going to build the query.

    There’s a ton of other reasons, many of which are listed on the front page: http://www.sqlalchemy.org/

    By Andrey on Jun 12, 2007

  3. Well, I’ll speak from the limited experience that I have: a couple (5 or 6) little PHP scripts that use SQL, three little Alchemy projects and then DrProject.

    When I think back and compare the ordeal I went through with my first couple PHP/MySQL scripts to the hack jobs that were my first couple Alchemy apps, the clear winner is raw SQL. It doesn’t take much time to learn enough SQL to get by, and it usually does about what you expect it to.

    Compare that with Alchemy, which has a steeper (although not altogether terrible) learning curve and makes you as a programmer feel terrible because you have no idea what on Earth is happening, things just magically work (most of the time).

    But, from my experience with Drproject, there is one huge advantage that ORMs have over SQL when you’re working with a large project: the database is one less thing you have to think about.

    When I’m coding, I don’t want to think about database schemas, table and row names, left joins and select unions. I just want an object with data. I want to know is that a Ticket object has an owner, and that owner is a Person, and if I change the owner, the database will change too.

    Now, I suppose you could write some methods in the object that let you create the object from the database, so the “end developer” doesn’t need to think in SQL… But isn’t that what Alchemy does?

    It also strikes me as a BadThing™ that the code knows about the inner workings of the database… But I can’t think of any good examples off the top of my tired head.

    Maybe ask your dad?

    By David Wolever on Jun 12, 2007

  4. Hi,
    so I won’t convince you that ORMs are awesome, I am just going to address some of the issues you mentioned here that made you think otherwise. I’ve only worked with SQLAlchemy and I heard that it is one of the best ones out there.

    So, to address your first point, you can load the tables from the db into an object without having to define an object to do that by using autoLoad=True when creating the Table object. The only problem with this is that you have to stick to the names of fields used in the db tables to access the object’s attributes.

    In regards to portability, I think one main advantage of using ORMs is portability. If you had used an ORM from the start, the switch from one DBMS to another is just a change in a few lines of code. I have used SQLAlchemy in the past and didn’t really find the need to switch to another ORM, so it’s highly unlikely you will need to switch to another ORM which took the most effort in your case.

    I can not agree with you more on not trusting the SQL generated by ORMs. I don’t think it is query efficient. I also hate the fact that you can’t express complex queries, which you will have to write! SQLAlchemy gives you the option of executing plain SQL queries (some execute() method) but I would have liked to stick to only one method.

    An advantage of ORMs is that they make your code more readable and for a developer who is not too experienced with SQL, generating SQL queries is like calling object methods so it relies on familiar concepts.

    By Lilian Hanna on Jun 13, 2007

  1. 1 Trackback(s)

  2. Jul 5, 2007: www.crankycoder.com » Blog Archive » Databases == PAIN

Post a Comment