Once More Unto the REST

2009-02-01 – 18:26

Two of my consulting course students (Mohammad Jalali and Rory Tulk) are looking at (semi-)automatically building REST APIs for web application frameworks like Django. The idea is to complement the metadata used to persist objects to databases in object-relational mapping tools with metadata to allow URL-mapped over-the-web interaction. One of their starting points is CherryPy’s “expose” decorator, which binds objects to elements in a URL tree. The students would like to be able to bind classes and constructors as well, so that (for example) /myapp/api/person/GregWilson gets data for the person whose ID is ‘GregWilson’.

But there are questions. First, how do REST web services represent foreign key relationships (including many-to-many) in the data they return to the client? Are dependent objects returned with the original request, or are references returned which must be fulfilled with future requests? ORMs address this problem by embedding lazy fetch-on-demand references in objects, because additional database queries are (comparatively) fast, but HTTP requests are too slow for that to be a sensible approach.

Second, can we uniquely identify object instances based on some identifier (REST URL maybe)? If so, can we re-create the server-side object graph on the client to avoid replication of original server-side data? (For example, if ‘GregWilson’ and ‘AlanTuring’ share an address, we don’t want two copies of the Address object on the client, but just one with two references to it.) Ignoring concurrency for the moment, can we utilize object identity to cache the shared objects on the client side, and only make HTTP requests when absolutely required? If so, can we optimize data return via REST requests based on the clean/dirty state of objects in the client-side cache?

And then there’s the question of bulk operations. If we want to get all of the tickets assigned to ‘GregWilson’, do we use one URL to identify the list of ticket IDs, and then fetch the bodies of the tickets one by one afterward (effectively abandoning atomicity)? The more we look at these issues, the more it looks like what we’re going to wind up with will really just be RPC under another name. Advice and examples would be welcome.

  1. 3 Responses to “Once More Unto the REST”

  2. Excellent questions, and I think REST offers no obvious/easy answers (not to say REST-based efforts can’t or shouldn’t be asking these exact questions). We have a heavily Atom/AtomPub-oriented system we’ve built here at UT Austin which is addressing some of these issues.

    We represent foreign key relationships with atom:link — one entity (which we call child) has an atom:link@rel=http://example.com/rel/parent with an href= of “parent”. Parent has an atom:link@rel=”" and href=”". That’s the syntax. On the backend, each “collection” of items defines a number of content-types. Each item is declared to be of one item types and types can have parent-child relationships (typical one-to-many, though two types can be both parent and child of each other — thus creating many-to-many). So these “types” are essentially analogous to OO classes or RDBMS tables. An item can be requested as an entry, in which you get only the “links” OR as a feed in which links are resolved and you get everything as an atom:feed, the primary object being first in the feed. POST, UPDATE, and DELETE operations are hip to these relationships and will do the right thing a la AtomPub (e.g., do a GET, remove the link@rel=parent, PUT it back and that item no longer has that relationship). It sounds pretty complicated in my summary, but now that we have it in place, building apps on top is not too difficult.

    As far as the object graph goes, I’d simply say server-side caching is your friend and HTTP requests cheap (ha — easy for me to say). We tend to cache (on the server) heavily and simply request what we need when we need it. May not be the best way. We have also had pretty good luck creating an object graph in the DOM (using JSON from the server) and I think more exploration there would be worth it. I have found XML to be remarkably difficult to deal with in the browser, so we use a JSONified Atom format for ajaxy things that we use javascript templates to translate back into application/atom+xml (pretty easy) before sending to the server after any updates have been made a the UI level.

    For bulk operations, Atom/AtomPub has feeds (for GET operations) and I have also found text/uri-list to be a handy media type when I, for instance, need to delete a series of things — do a GET to retrieve the list, then a DELETE on each.

    I don’t know if that addresses your exact questions and it is, of course, just one approach. Another thing you are bound to run into is REST lack of (explicit) support for partial updates. There was an interesting thread on rest-discuss recently about that. In fact, I’ll bet if you post your questions to rest-discuss you’ll get a bunch of well-considered answers.

    In case you interested, our project is at dase.googlecode.com

    –peter keane

    By Peter Keane on Feb 1, 2009

  3. Looks like some text was stripped from my comment (I shouldn’t have used angle brackets). Should read:

    We represent foreign key relationships with atom:link — one entity (which we call child) has an atom:link@rel=http://example.com/rel/parent with an href={atom:id or deferenceable url} of “parent”. Parent has an atom:link@rel=”{type of the child}” and href=”{query that returns all items of child type, filtered by has-this-id-as-parent}”.

    By Peter Keane on Feb 1, 2009

  4. Q: First, how do REST web services represent foreign key relationships (including many-to-many) in the data they return to the client?

    A: Hyperlinks.

    Q: Second, can we uniquely identify object instances based on some identifier (REST URL maybe)?

    A: Yes. You must.

    Q: If so, can we re-create the server-side object graph on the client to avoid replication of original server-side data? (For example, if ‘GregWilson’ and ‘AlanTuring’ share an address, we don’t want two copies of the Address object on the client, but just one with two references to it.)

    A: Hyperlinks. And caching.

    Q: Ignoring concurrency for the moment, can we utilize object identity to cache the shared objects on the client side, and only make HTTP requests when absolutely required?

    A: If you don’t, you’re losing the biggest benefit of a REST architecture.

    Q: If so, can we optimize data return via REST requests based on the clean/dirty state of objects in the client-side cache?

    A: Yes. Follow the uniform interface and use PUT to update resource state, which will invalidate intermediate caches.

    Q: And then there’s the question of bulk operations. If we want to get all of the tickets assigned to ‘GregWilson’, do we use one URL to identify the list of ticket IDs, and then fetch the bodies of the tickets one by one afterward (effectively abandoning atomicity)? The more we look at these issues, the more it looks like what we’re going to wind up with will really just be RPC under another name. Advice and examples would be welcome.

    A: Depends. Read Eric Brewer’s Conjecture on Consistency, Availability, Partitioning (pick any 2) for background. If the tickets-assigned-to-Greg-Wilson resource can be a little bit out of sync with individual ticket details, then return the whole shebang in one request and use timeouts to invalidate it from caches. If it cannot, then return a list of ids and fetch each one independently. Note that both approaches have consistency issues: inter-group in the former and intra-group in the latter. As Pat Helland says, in networked architectures, “System-B can only see what some of System-A’s data __used to look like__.”

    By Robert Brewer on Feb 2, 2009

Post a Comment