A Meme I’d Like To Crush
December 15, 2007 – 10:47 amOver on O’Reilly Radar, Nat Torkington recently posted:
According to a blog post by (update: a friend of) one of the developers, Amazon’s SimpleDB is built on Erlang. Cool! Another datapoint for the trend we see towards parallel-capable languages like Erlang and Haskell.
“Parallel-capable”? I know the arguments—values in pure functional languages (PFLs) are immutable once created, so there can’t be race conditions, so parallel programming is easier, and parallel programs will scale better—but I have yet to see any data to substantiate that oft-repeated meme. I wrote a book about parallel programming in the early 90s, and have maintained a more-than-casual interest in the topic ever since (mostly through work with computational scientists). I don’t believe that PFLs make non-trivial parallel programs easier to write. I don’t believe they make parallel programming harder, either, and the reason I don’t is that I haven’t seen any empirical studies of real programmers writing real programs that point either way. I hereby offer a very nice bottle of wine and/or all seven seasons of Buffy the Vampire Slayer to the first person to conduct a study rigorous enough to be accepted as a senior undergraduate project in psychology—a standard which, sadly, most discussion of the merits of various programming languages, tools, and paradigms signally fails to meet.
Note: if you’d like to know what those standards are—i.e., what kind of evidence you should require someone who’s pushing the software equivalent of a cure for baldness to give you—please have a look at:
B.A. Kitchenham, S.L. Pfleeger, L.M. Pickard, P.W. Jones, D.C. Hoaglin, K. El Emam, and J. Rosenberg: “Preliminary guidelines for empirical research in software engineering?. IEEE Transactions on Software Engineering, 28(8), August 2002.
Barbara A. Kitchenham, Tore Dyba, and Magne Jorgensen: “Evidence-Based Software Engineering”. Proc. 26th International Conference on Software Engineering (ICSE’04), 2004.
11 Responses to “A Meme I’d Like To Crush”
Erlang’s strength in parallel programming doesn’t derive from being a pure functional language, because it’s NOT a pure functional language. Haskell is, though.
Purity certainly does make a program easier to parallelize; but arguably it makes the program harder to write in the first place, even for the simple non-parallel case. This is because pure functional languages are quite unfamiliar for most programmers who have learnt the imperative or object-oriented style.
Torkington’s phrase “parallel-capable” is problematic. I hope he does not mean to imply that languages like C++, Java etc are “parallel-incapable”, which is clearly false. However there is a spectrum of difficulty, with C++ etc lying further towards the hard end and Erlang and Haskell lying towards the easier end.
By Neil Bartlett on Dec 17, 2007
I suppose we’ll have that study as soon as we have *any* study showing the benefits of one approach vs. any other in programming. I’ve seen attempts at such studies but it seems to me that all they do is expose the assumptions and prejudices of their authors.
By steve h on Dec 19, 2007
Yeah, I agree, using the term “parallel-capable” in that context is absurd. But what exactly what exactly is *your* point? Is it that we shouldn’t pay attention to Erlang or Haskell because there hasn’t been psychological study on how easy their concurrency models are? I really hope you academics can find something more meaningful to do. Such research would be about as valuable as a study to decide which is more intense: Jet Skying or Snowmobiling.
So, how are the book sales going? Sorry, I’ve already blown my budget for the holidays, so I won’t be able to order a copy.
By Joseph Kepler on Dec 19, 2007
Wow, we get to produce a super complicated, totally biased study that has no real conclusions (other than which of the sampled programmers is more experience), and we get a Buffy DVD out of it? This sounds like a great opportunity.
I can understand your issue with the obvious hyperbole in Nat’s statement, but I think it is short-sighted to disregard the testimonial of the hundreds of people that have recently moved their applications to Erlang. They all state the same thing: it is much easier to produce a reliable, maintainable application in Erlang than with other parallel frameworks.
By Thomas Lackner on Dec 19, 2007
Steve H: yes, badly designed studies say more about experimenter bias than they do about the subject supposedly being investigated. That doesn’t mean all studies have to be badly designed
Joseph K: Not sure what you mean by “you academics”, and no, I didn’t say that we shouldn’t pay attention to Erlang/Haskell/whatever — I said we should try to find out if the claims being made for them are true or not. If they are, that’s interesting; if they’re not, that’s interesting too.
Thomas L: have you included testimonials from people who tried Erlang and gave up in your sample? If not, perhaps “totally biased” cuts both ways
See also http://pyre.third-bit.com/blog/archives/1271.html.
By Greg Wilson on Dec 20, 2007
You might want to check out the Wide Finder Project: http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder
By Tim on Dec 26, 2007
One of the underlying tenets of Erlang is that you need to use the right kind of concurrency support for your particular problem. For example, Parlog (parallel version of Prolog) was considered during the early experiments that led up to Erlang, but it was considered to have the wrong granularity of concurrency for telecoms applications. When I read “parallel-capable”, I think more of Parlog-style programming than Erlang. While I can testify that Erlang’s concurrency model is just right for telecoms control systems, I know of no serious projects using it for parallel processing. If the chunks of work are large enough that some 10-50 usecs of cost, creating a process and coordinating with it, is acceptable overhead, then Erlang is an interesting alternative, but for many problems, this is is obviously unacceptable.
Haskell does have some interesting work in progress on data parallelism (http://haskell.org/haskellwiki/GHC/Data_Parallel_Haskell), but I’m sure they’re a long way away from any form of proof yet.
By Ulf Wiger on Dec 28, 2007
If I explain why, will you promise not to send me Buffy?
I believe the reason is easiest to understand if you code in an impure language but in an almost entirely purely functional style. This limits mutation to comparatively tiny parts of the code base and massively reduces the number of mutexes required to add parallelism safely.
For example, our visualization software used to be written in (highly imperative) C++ and, consequently, was practically impossible to parallelize. However, switching to OCaml (an impure functional programming language) made it easy to rewrite the software more concisely, more efficiently and with only a tiny amount of mutation (in performance-critical parts). Consequently, parallelizing the OCaml code required only 4 mutexes in 250kLOC of code!
So purely functional programming helps because it makes the number of locks in your code scale up very slowly with the size of the code base itself.
However, if you’re asking how/if the elimination of all mutation and the replacement of locks with other concurrency constructs solves the parallelism problem, I agree. I have seen a lot of evidence to suggest that it helps with (even cures) massive concurrency but never in the context of parallel programming.
The Haskell community keep making such claims and I keep responding by asking them to parallelize some of the example programs from my books but all I’ve ever seen is Fibonacci number generators and other such programs that have absolutely no bearing on real parallel programming.
By Jon Harrop on Jan 1, 2008