Old Dogs Are Suspicious of New Tricks
March 30, 2008 – 6:47 amThere was a brief flurry of email on the DrProject list this week about using JSON instead of XML for communication between clients (in our case, browsers and the occasional client-side script) and servers. The younger members of the team were excited about the advantages: less typing, easier to read, faster to convert to/from data structures, and shiny newness. I said “no” because of the disadvantages: every implementation works slightly differently (XML may be broken, but at least it’s broken the same way everywhere), and it introduces a serious security risk [1] — I might not be able to think of a way to exploit it if all connections are HTTPS, but I’m just one guy, and I’d rather not require DrP’s users to bet that I’m smarter than all the villains in the world.
The whole discussion was an interesting reminder of how my priorities have shifted over the last ten years. I would have been on the juniors’ side in the mid-1990s; hell, back then, I still toyed with the idea of teaching people Scheme as a first language. I guess part of growing old as a technologist is caring more about not stepping on traps than about missed opportunities…
[1] There are two ways to process JSON: use eval(), or parse it. If you’re using eval(), you are taking the risk that someone has embedded function calls in the “data” they’re sending you, in which case you just handed them the keys. If you’re parsing, then what’s the advantage over XML, for which there are many well-tested out-of-the-box parsers?
18 Responses to “Old Dogs Are Suspicious of New Tricks”
I’ve never been that convinced by the security argument against JSON. If you’re talking to your own server (which the same-origin restriction enforces) then someone needs to be able to modify the data you are sending out in order to introduce malicious code. If they can mess with your JSON stream they can certainly mess with the HTML or JavaScript you are already serving up, so there’s no additional threat.
By Simon Willison on Mar 30, 2008
I think your argument for well-tested xml parsers is a bit moot since there are oodles of well-tested json parsers available as well. Unless you’re wiring in your own XMLHttpRequests and doing all your AJAX manually (in which case I’d be screaming why!?) whatever javascript library you’re using surely has a safe JSON parser. I generally use mootools but I’m sure dojo and prototype have something very similar.
When doing an ajax request in mootools you do new Request and pass the parameters. That lets you return random stuff (such as a blog of html for example) but if you’re working with JSON you can do new Request.JSON which will parse the returned structure for you on the client side before passing it along to the rest of your program. If it’s broken or unsafe you get nothing back from your request.
Writing a parser for JSON is almost impossible to get wrong too since what makes JSON so great is how incredibly simple and consistent it is.
By Guillaume Theoret on Mar 30, 2008
I guess the simplest way to sum up my reasoning is that, “If you manage to find a way past the outer wall, I’ve left the doors unlocked” is less secure than the alternatives.
By Greg Wilson on Mar 30, 2008
I should also mention that Yahoo! (my former employer) switched to JSON as their client-to-server transport pretty much exclusively several years ago - and they take web security seriously enough that there’s an entire arm of the company called “the paranoids” dedicated to researching and preventing exploits.
By Simon Willison on Mar 30, 2008
My biggest argument against XML is the ambiguity.
When you send data over XML, you’re not just sending data but you’re sending an entire culture along with it. And you need to be familiar with said culture to parse it.
What elements did they put in individual nodes? What elements are attributes? What elements are unique subnodes? What elements are redundant subnodes? What types are expected in each node type? Are the types defined in the attributes?
At that point, you get into having to use DTD’s and WSDL, with some sort of SOAP-ish container.
As you can see, “parsing XML” isn’t just parsing XML. There’s a whole slew of assumptions about conventions you have to make, or parse half a dozen other formats to determine, before you can parse any XML.
I’m sure you’ll agree with me, there is some *terrifying* XML out there. In fact, most of it is.
JSON is much simpler and less prone to this kind of ambiguity. Its ambiguity is captured in the parsers instead of the language itself. I think that’s a better trade off than the weeks of developer time it takes to put up with XML over the lifespan of a project.
By Andrey Petrov on Mar 30, 2008
It’s also worth mentioning that JSON is convenient. In XML, everything is a string, which is not amazingly useful. JSON at least gives you ints, strings, floats, objects, and arrays out of the box.
JSON is useful even if you’re not using eval as your parser. As long as you don’t use eval, security is not problematic.
By Kevin Dangoor on Mar 30, 2008
Greg,
I am in the old dog camp myself. Experience has taught me that given a gameplan, stick with it. There is always time to experiment later.
But I do have a query. Your assessment of JSON and the eval() statement though correct I think points you on the short track. My bigger observation would be is JSON the real issue? Why aren’t you assuring yourself that you are using trusted sources first? If you are not then don’t assume that XML is going to save your bacon.
Just a thought.
By JohnMc on Mar 30, 2008
> If you’re parsing, then what’s the advantage [of JSON] over XML
Simplicity, both of the specification and of the data stream.
As a programmer examining or editing code that uses/produces JSON, I can fit the entire specification in my head, which is very valuable while developing. The same is not the case for XML.
As a debugger or tester, I can examine or manually modify a JSON data stream and see quite easily how it should be parsed. The same is not true of XML.
By bignose on Mar 30, 2008
@Everyone: OK, maybe I’m wrong about this one—it’s happened before :-). That said:
@Andrey: JSON isn’t really less ambiguous—you still have to know what that list of lists of dictionaries *means*. As Blake Winton pointed out in email, parsing JSON still depends who’s doing the parsing.
@Kevin: I agree, if you don’t eval, it’s just as secure as XML. In that case, we’re arguing over the runtime and coding costs of one parsed representation vs. another.
@Bignose: I agree, JSON is a lot easier to read. I suspect this is the main reason it caught on, and other justifications are just rationalizations. (FWIW, I think it’s a pretty good reason…)
@Simon: Interesting to know that Yahoo has switched. Are they using eval() client-side, or parsing?
By Greg Wilson on Mar 31, 2008
Yahoo! use eval() to parse, but they run the JSON string through a regular expression to check that it matches JSON syntax first:
http://developer.yahoo.com/yui/docs/JSON.js.html
The same technique is described with more detailed inline comments here:
http://www.json.org/json2.js
By Simon Willison on Mar 31, 2008
Great conversation… I must acknowledge that I am not a programmer… Though, one comment caught my attention.
“When you send data over XML, you’re not just sending data but you’re sending an entire culture along with it. And you need to be familiar with said culture to parse it.” Andrey Petrov
My question here is: Is this not a fundamental part of developing the ’semantic web’: to create well defined definitions (DTD’s) - and standards (where possible) - of the complexities (the culture) involved in contribution & integration??
Another reasons that I like the concept of XML is the ability to utilize non-exponentiated parsing (PureXML in DB2 Viper for example) reducing the need for big math and encouraging mid-tier processing: granted this may come at a cost of greater demands on the I/O stream… But, this is a balance that can be managed through software design.
SO, it seems to me that the ambiguity referred to in XML goes back to the age old debate of simplicity versus flexibility. Thoughts??
All corrections are welcomed.
By Shawn Berney on Mar 31, 2008
Greg said “@Kevin: I agree, if you don’t eval, it’s just as secure as XML.”
Well, not quite…
My malicious page can include and then whenever someone who has logged in to your site views my page, Shazam! I get access to that data. Since it’s a script tag, it’s not subject to the same-origin-policy, and since I can override the constructor of Object, you don’t even need to assign the result to a variable. I am a little surprised that no-one else has mentioned this, but I guess normal people don’t think like Adam. (And I mean that as a compliment.
Since XML isn’t executable by anyone, you can’t pull the same trick. At least, I don’t think you can. Perhaps if there was some way to get at the data of an image tag…
By Blake Winton on Mar 31, 2008
@Greg: Not necessarily less ambiguous, but at least less *prone* to ambiguity. The way Python is less prone to ugly code than other languages starting with P.
foo
bar
baz
Can you see anything you would have done differently?
This is a real example I’ve had to deal with (element names changed). I’ve seen other examples like that before. Have a look at any AJAX XML and ask yourself if there’s anything you would have done different — 95% of the time, there is. Not so much because there is no right/wrong, but moreso because there is no “simplest” way to do things most of the time. You can always justify the additional complexity. You can always change elements to attributes, attributes to subnodes, etc.
{”request”: [{”url”: “foo”}, {”path”: “baz”}]}
Doesn’t get much simpler.
@Shawn: Yes, it’s more constrained, less flexible… but it’s important to use the right tool for the job. You wouldn’t use JSON to represent HTML — that’s where XML is king.
Raw data transport is more often simple than not. If it isn’t, then it probably should be simplified.
By Andrey Petrov on Mar 31, 2008
Looks like my fancy XML got stripped out. Another downside to using XML?
Here’s a pastebin of it: http://pastebin.ca/965646
By Andrey Petrov on Mar 31, 2008
@Andrey: I agree it is all about the right tool for the job…
To me, XML is merely a tool that provides context to text - and by extension, basic reasoning. Example, XSLTs extends XML by applying contextual transformation based on defined reasoning (screen size, ect.).
If you only require data interchange - and are using regular expressions anyway - the only(?) reason to use XML over .txt or .csv formats is communication. Here structure is only necessary to make apparent the data’s value (instead of the old fashioned syntax comments)…
If you’re using JSON to execute functions, then we are really no longer talking about a data model per se. XML has it’s roots in publishing, not scripting. To me, these are two very different tools.
Granted, these ideas are based strictly on theoretical ‘book’ learning. Please correct me if these manuals have steered me in the wrong direction…
By Shawn Berney on Apr 1, 2008
Let me post a slight correction… I suppose a very valuable reason to use XML is to take advantage of the many libraries to manipulate data… Here, JSON may compete nicely (especially with built in functions).
By Shawn Berney on Apr 1, 2008
The very next item in my feed reader is http://www.dehora.net/journal/2008/03/30/no-free-lunch-for-programming-libraries/ .
By Dmitri on Apr 1, 2008