Persistence of Memory

Salvador Dali’s masterpiece, The Persistence of Memory, is one of the most recognizable works of art in the world. The critic Dawn Ades described it as “a Surrealist meditation on the collapse of our notions of a fixed cosmic order” induced by Einstein’s penetrating analysis of the concepts of time and space in the physical world. Just as Dali’s Persistence of Memory demarcated the transition from age-old conceptions of time and space in physics, so does the computational concept of persistence of memory mark the transition from sequential time and mutable space to parallel time and immutable space.

A short while ago I described the distinction between parallelism and concurrency in terms of a cost model that assigns a parallel, as well as sequential, time complexity to a program. Parallelism is all about efficiency, not semantics; the meaning of a program is not affected by whether it is executed on one processor or many. Functional languages expose parallelism by limiting sequential dependencies among the parts of a computation; imperative languages introduce inessential dependencies that impede parallelism.

Another critical ingredient for parallelism is the concept of a persistent data structure, one for which its associated operations transform, rather than alter, it. A persistent dictionary, for example, has the characteristic that inserting an element results in a new dictionary with an expanded domain; the old dictionary persists, and is still available for further transformation. When calculating the sum of 2 and 2, resulting in 4, no one imagines that the 2’s are “used up” in the process! Nor does one worry whether the sum of 1 and 3 is the “same” 4 or a “different” 4! The very question is absurd (or, more precisely, trivial). So why do we worry about whether the result of inserting 2 into dict “uses up” the old dict? And why do we worry about whether inserting 2 into the empty dictionary twice results in the “same” dictionary or a “different” one?

Yet both academic and practical computing has all but confined itself to ephemeral data structures, which exhibit precisely such behavior. Operations on an ephemeral data structure “use up” the data structure, making it unavailable for further use without going to some trouble to get it back. The pathologies resulting from this abound. Standard compiler texts, for example, devote a chapter to concepts like “block structured symbol tables”, which are, in fact, nothing more than persistent dictionaries done the hard way. More generally, whenever a data structure has multiple futures, such as when backtracking or exploiting parallelism, ephemeral data structures get in the way. Indeed, the bulk of object-oriented programming, with its absurd over-emphasis on the “message passing” metaphor, stress the alteration of objects as the central organizing principle, confounding parallelism and complicating simple algorithms.

A prime virtue of functional languages is that persistence is the default case, but they can as readily support ephemeral data structures as any imperative (including object-oriented) language. All functional languages include types of mutable cells and mutable arrays, and provide support for conventional, sequential, imperative programming with semicolons and even curly braces! (Some do this better than others; Haskell is, in my view, the world’s best imperative programming language, and second-best functional language, but that’s a subject for another post.) But why would you want to? Why deprive yourself of the benefits of persistence, and insist instead on an ephemeral data structure?

This question came up recently in our planning for the Functional Programming class that we are teaching this semester for freshmen at Carnegie Mellon. All semester we have been using functional programming techniques to build clean, verifiable, modular, parallel programs. The students routinely prove theorems about their code, structure programs using abstract types, and exploit parallelism to improve the asymptotic performance of their programs. Recent homework assignments include the implementation of the parallel Barnes-Hut algorithm for the n-body problem in physics, and the parallel Jamboree algorithm for game-tree search in perfect information games. Persistent data structures are the key to making this possible; just try to code Barnes-Hut in an imperative language, and you will find yourself in a morass worrying about concurrency when you should instead be envisioning a recursive tree decomposition of space, and the computation of forces using formulas from high school physics.

We tried hard to find good motivations for using an ephemeral data structure when you can just as easily (actually, much more easily) use a persistent one. As we went through them, we realized that all of the standard arguments are questionable or false. The usual one is some vague notion of “efficiency” in either time or space. While I concede that one can, in principle, solve a particular, confined problem more efficiently by doing absolutely everything by hand (memory management, scheduling, arithmetic), in the overwhelming majority of cases the demands of evolution of code far outweigh the potential advantages of doing everything by hand. Modularity matters most when it comes to building and evolving large systems; functional languages, with persistent data structures, support modularity the best. (I’ll have more to say about this in a future post.)

Most of the arguments about efficiency, though, ignore questions of functionality. It is senseless to compare the “efficiency” of one data structure that provides different functionality than another. A persistent data structure does more for you than does an ephemeral one. It allows you to have multiple futures, including those that evolve in parallel with one another. It makes no sense to insist that some ephemeral approximation of such a data structure is “more efficient” if it does not provide those capabilities! And making it do so is, invariably, a bitch! Conventional ephemeral data structures are not readily parallelizable; it’s often a publishable result to get a decent degree of parallelism using imperative methods. By contrast, even freshmen (admittedly, Carnegie Mellon freshmen) can implement a parallel game tree search or a tree-based solution to the n-body problem in a one-week homework assignment.

So what’s the deal? Why should we care about ephemeral data structures at all? I have no idea. Mostly, it’s a legacy cost imposed on us by the overzealous emphasis on imperative methods and machine-based, rather than language-based, models of computation. But this will change. Starting this fall, introductory data structures and algorithms will be liberated from the limitations of imperative, object-oriented programming, and will instead stress persistent (as well as ephemeral) data structures, and parallel (including as a special case sequential) algorithms. The future of computing depends on parallelism (for efficiency), distribution (for scale), and verification (for quality). Only functional languages support all three naturally and conveniently; the old ones just don’t cut it.

Update: Here’s a chart summarizing the situation as I see it:

$\displaystyle \begin{array}{|c|c|c|} \hline & \text{Ephemeral} & \text{Persistent} \\ \hline \text{Sequential} & \textit{imperative} & \textit{benign effects} \\ \hline \text{Parallel} & \textit{hard to get right} & \textit{functional} \\ \hline \end{array}$

Conventional imperative programming works well for the ephemeral, sequential case; it is notoriously hard to use imperative methods for parallelism. Benign effects, as exemplified by self-adjusting data structures, can be used to give rise to persistent behavior in the sequential setting, but the use of effects impedes parallelism.

This entry was posted on Saturday, April 9th, 2011 at 11:25 pm and is filed under Programming, Teaching. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

21 Responses to Persistence of Memory

Notes on Idris – The Breakfast Post says:

December 2, 2017 at 11:41 am

[…] over Lisp, like the RLisp that the REDUCE algebra system was largely written in. Robert Harper once described Haskell as “the world’s best imperative programming language” and I can see where that comes […]

LikeLike
tksfz says:

March 11, 2017 at 12:20 am

Can you elaborate on how the intro classes will change at CMU? Will they use a different language?

LikeLike
- Robert Harper says:
  
  March 11, 2017 at 12:23 am
  
  The intro curriculum changed about five or six years ago. Emphasis now is on verification and parallelism, using both functional and imperative languages.
  
  LikeLike
johnzabroski says:

April 12, 2011 at 11:26 pm

Standard compiler texts, for example, devote a chapter to concepts like “block structured symbol tables”, which are, in fact, nothing more than persistent dictionaries done the hard way. More generally, whenever a data structure has multiple futures, such as when backtracking or exploiting parallelism, ephemeral data structures get in the way.

Is a General Ledger a counter-example? In other words, the GoF Memento design pattern specifically addresses the fact behaviors like the Command pattern only perform operations on a data structure, since it would damage the cohesiveness of the solution if the Command had to know how to undo itself, which might require exposing the object’s representation, among other problems. A General Ledger is an object that allows checking against a balance sheet, including providing enough structure for tracing into purchase orders and sales receipts.

What I am saying is that you are pointing out the need for structure by saying ephemeral data structure shouldn’t be used to solve persistence problems… Come to that, DUH! ;-)

Pedagogically, then, I would assert it is more interesting to compare two structurally valid solutions and then debate which is easier to comprehend, which is anti-parallel, and so on.

FRP solves the Memento/Command collaboration problem in a structural fashion by lifting (sequences of) values to types. Journaling is delineated by bounded causal relationships, which is how effects like repeated animation can be modeled. Since the Context is well-defined, unit testing is easy pz.

LikeLike
- Abstract Type says:
  
  April 13, 2011 at 11:37 am
  
  Mike Spivey’s formulation says it well: design patterns = functional programming done badly. Most of that stuff is catch-up, trying desperately to get where we’ve already been for decades, done the hard way.
  
  LikeLike
aleccolocco says:

April 10, 2011 at 6:38 pm

Looking from the bottom up, the hash table (HT) has two weak spots:

1- Data is sparse (so the HT should fit within cache)
2- Lookup+update transactions must be sequential

Current processors have latency of 3 and reciprocal throughput of 1. If 3 or more load (MOV) operations can execute in parallel the latency of 3 is “hidden”. But since HT lookup+update are interdependent, the full latency of 3 is suffered.

Also in current processors, it costs the same to load 2 bytes or 32 bytes. HT can’t take advantage of that without swelling and growing out of the cache.

If the algorithm using the HT is parallelizable, you should consider using simpler structures and process in batch.

I’m working on a blog post about this issue with some test numbers for pattern matching.

LikeLike
glorkspangle says:

April 10, 2011 at 5:56 pm

ccshan: neither. I mean “that is the way that our computers are designed to work, and have been since before I was born, and quite likely still will be after I am dead.”
The implementation accidents – what RAM happens to look like in any given decade, and what theories of physics are used in building it – are irrelevant to this. I was at the Science Museum in London last weekend, where they have a great display of different memory units, including mercury delay lines, wire delay lines, tube storage, drum storage, core store, and silicon chips. It’s all ephemeral, in Bob’s terminology.

LikeLike
- ccshan says:
  
  April 11, 2011 at 4:12 am
  
  glorkspangle, I don’t know why to classify “what RAM happens to look like in any given decade, and what theories of physics are used in building it” as “implementation accidents”, but not what abstractions are presented by microcodes and operating systems of the century. Is it perhaps about the different time scales of evolution?
  
  LikeLike
bowaggoner says:

April 10, 2011 at 3:41 pm

Very intriguing post, but I think you are glossing over some of the difficulties of thinking about persistent data structures.

For example, modifying an element in an ephemeral array is O(1). What’s the complexity for using a persistent array? This is certainly non-obvious, non-trivial, and requires a great deal of thinking about the underlying implementation. Not that there’s anything wrong with doing a great deal of thinking, but I think it’s worth considering situations in which persistent data structures complicate the question instead of simplifying it.

LikeLike
- abstract type says:
  
  April 10, 2011 at 10:30 pm
  
  If you think about it, you’ll realize that you cannot access an arbitrary element of an ephemeral array in constant time either. Suppose the indices are 64 bit words. Inside your operating system there is address translation software that is based on a splay tree, imposing a log cost to manage the access, the index space being far too large to handle otherwise.
  
  LikeLike
- comex says:
  
  April 10, 2011 at 10:40 pm
  
  “Array” usually refers to a structure where each element is numbered sequentially. As long as the operating system isn’t swapping in memory when you access it, that concern does not apply.
  
  LikeLike
- bowaggoner says:
  
  April 12, 2011 at 9:52 pm
  
  Good point, and thanks for the response. Sure, implementation is often messy in practice and complicates our neat, tidy notions of complexity.
  
  But I am not convinced that this addresses the analysis of persistent data structures completely. I’ve heard it said that design of algorithms is mainly about design/use of data structures, so it must to be important, at some point, to have a grasp of the general implementation or at least complexity of persistent versus ephemeral data structures.
  
  So, when do you consider these issues important versus irrelevant to the topic at hand; in what context is it a good subject to teach? How important in the use of persistent d.s. do you think it is to understand the implementation? I would be very interested to hear more about it if you have time for a response.
  
  LikeLike
- Abstract Type says:
  
  April 12, 2011 at 10:16 pm
  
  My general principle is to associate a cost semantics to a language that accounts for both time and space complexity so that we can reason about efficiency at the level of the language we actually write, rather than in terms of how it is (allegedly) compiled.
  
  I would also stress that it is often a mistake to emphasize the asymptotic complexity of a single operation in an abstract type (such as the cost of lookup for a mapping structure such as a sequence/vector). The reason is that it may be that in the persistent case there is a “local cost” of a log factor in the individual operation, this may well be hidden inside the “global cost” of the overall application. So, for example, if the overall algorithm is O(n lg n), and there are O(n) calls to some operation o of an adt, then there is no problem if there is a lg n penalty for persistence on the operation o alone; the “latency” is hidden in the overall cost anyway. So this is yet another situation in which conventional wisdom is misleading.
  
  LikeLike
legreenfield says:

April 10, 2011 at 1:21 pm

Persistent data structures can map well to flash storage devices as they don’t allow for in place modification. But instead of using natural high level algorithms we have low level software (or hardware!) that emulate mutable block devices. Funny!

LikeLike
Sjoerd Visscher says:

April 10, 2011 at 10:45 am

I have found that even in JavaScript persistent data structures perform better then ephemeral ones, simply because they gives you more opportunities for caching/memoization.

LikeLike
Mihai Balint says:

April 10, 2011 at 5:56 am

Let me start by saying that I have red most of your posts on this blog, some I’ve liked, with some I have agreed however I don’t understand the constant bashing of object modularity. Some commenters have mentioned a problem of “open recursion” on which I have failed to find the huge research papers that would warrant the level of hand waving of at least some of the comments that I have seen here. But let’s leave that on the side for now.

The other modularity improving feature of objects is encapsulation which you mention nothing about and from quotes like the one from the “What is a functional language” post one may think you also want to throw that proverbial child out with the bathwater.

[quote]
the “objective” part notwithstanding
[/quote]

Here is a simple scenario: assume that we are developing a program that does some graphical processing in two dimensions. We have developed some functions two draw points, line segments and circles on a computer monitor. We have also developed functions that perform all kinds of useful calculations: distance of two points, angle between the supporting lines of two segments, projection of a point onto the supporting line of a segment and so on. All these function receive appropriate parameters in the following functional form: points are passed as two numbers (first is the X, last is the Y), line segments as four numbers, circles as three numbers. Wonderful !

What if I notice that my program has a few critically tight loops that spend too much time converting points from cartesian x,y to radial coordinates? I may find it appropriate to change the representation of points in all my program. However, without object encapsulation this means that I will have to change all of the existing functions for distance, projection and the rest which in industrial terms is quite a bitch. With encapsulated objects, all I have to do is simply change the internals of the Point object, replacing getX() and getY() with their new implementation. Perhaps not even that if I choose to have both representations hidden within Point objects.

My point is that I would appreciate it if you would deliver the promised modularity related post earlier rather than later.

LikeLike
- abstract type says:
  
  April 10, 2011 at 9:29 am
  
  Please see the chapter “Classes and Methods” in PFPL.
  
  LikeLike
- Andreas Rossberg says:
  
  April 10, 2011 at 11:05 am
  
  You seem to equate encapsulation (aka data abstraction) with objects, although there are other ways to achieve it, especially abstract data types (typically created using modules). These have slightly different modularity trade-offs, but actually seem a much better fit for your specific problem, given that it naturally involves binary or n-ary operators on points, which are all but natural with objects.
  
  LikeLike
- Mihai Balint says:
  
  April 11, 2011 at 12:07 pm
  
  Fine, you have encapsulation with modules which implement an ADT and you have method types which you implement either for all ADT-implementing-modules or just for some but then you convert from one implementation to another.
  
  Beyond the fact that your objects are immutable which has its benefits (and can also be done with OO) I’m still not yet convinced to go all functional, although I really like closures, continuations and lambdas.
  
  LikeLike
glorkspangle says:

April 10, 2011 at 3:47 am

The rhetoric is fun, but there isn’t really any mystery about why ‘traditional’ languages are sequential and ephemeral. It’s because the machines work that way: the semantics of the hardware are explicitly load, store, modify-in-place. Programming languages were first devised as a way to abstract the capabilities of the machine, so they followed the same model. Even Lisp, principal fore-runner of functional languages, is burdened with far too much rplacd nonsense. Algorithms were developed to work that way, and doing any efficient computation in any other way had to wait until the 1980s, by which time there was a sizeable software industry, and almost all of the field of computer science, doing things the wrong way. The rest is simple momentum.

Of course, making the languages and implementations to allow us to do things the right way was no small matter. I have happy memories of bumming fractions of instruction cycles out of every allocation sequence, so that MLWorks would run 2% faster….

LikeLike
- ccshan says:
  
  April 10, 2011 at 2:53 pm
  
  When you write “the machines work that way”, do you mean string theory or quantum field theory?
  
  LikeLike