eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

Reexamining qsort, eager vs. lazy algorithm analysis and Ruby's (and other's) GC

Yesterday's post on The Comonad.Reader referred to my analysis of the list-based "quicksort" and described a faster function based on difference lists. It gave an example that troubled me for a while, and added that "the author [referring to me] was counting conses, unfortunately that isn't a valid metric":

   reverse (a:as) = reverse as ++ [a]
   reverse [] = []

This function exhibits O( n sup 2 ) performance, even though there seems to be one cons per level; this would show that just counting conses isn't enough, and invalidate my analysis. The catch is that there is not only one cons per level, and one must be careful and not forget the conses implied by the list concatenation operation. In my analysis of qsort, the number of conses verified

C(n) = p + C(p) + 1 + q + C(q) + p + 1

with n = p + q + 1, where the final p + 1 term represents the conses needed to concatenate the sorted sublists.

If the above function is analyzed the same way I did with qsort, the number of conses satisfies

C(n) = C(n - 1) + n - 1 + 1

which clearly implies C(n) = O ( n sup 2 ).

The key term is n + 1, which represents the conses associated to the concatenation, the same way p + 1 did in qsort's expression; had it not been there, the function would be O(n). So, while this example doesn't invalidate my proof, it serves as a good reminder of the cost of list concatenation and how easily it turns linear-time algorithms into O( n sup 2 ).

(Incidentally, it turns out that qsort would be O(n log n) even without the extra term caused by list concatenation, so the proof, albeit incorrect, would have reached the correct conclusion.)

Strict vs. non-strict, eager vs. lazy analysis

While it seems the original analysis was sound, a new question was raised indirectly when Edward Kmett wrote that

it's not the number of times you cons that matters, but the number of thunks that have to be forced to get to the answer

I've shown above that this is only true if you aren't careful when counting conses, but the problem it suggests deserves to be considered by itself: can the analysis under a strict semantics be taken as an upper bound of the complexity in the non-strict or lazy cases?

I always assumed that the answer was a straight "yes". Intuitively, under lazy evaluation you have thunks with the data necessary to evaluate their associated expressions only when needed, so surely at most as much work as if they were always evaluated is performed, right?

After pondering about this longly, I've found a case where this wouldn't be true, suggested directly by an alternative definition of non-strict semantics:

Non-strict semantics means that a function can have a definite value although its argument is undefined.

What if an algorithm depended on an undefined value for termination? In practice, this would mean that it relies on an expression raising an exception to stop the computation and return some result after recovery. Whereas in the strict case the expression would be evaluated unconditionally and the computation would terminate, with a non-strict semantics it would be left unevaluated if not needed, thus failing to raise the exception and finish in a timely manner.

Edit (2008-07-18): There's an example below for an impure language. Anyway impurity + laziness is a nonsensical combination and the example represents 2 different programs in one that happen to give the same answer, so I'll be happy to analyze pure programs and assume that "lazy evaluation always needs less reduction steps than eager evaluation" to obtain upper bounds (thanks apfelmus).

The above case is contrived, and such a program would be needlessly complex. In fact, in OCaml it would be expressed awkwardly with a ref, and in Haskell pattern matching over an Either value (implicit in bind) would normally force evaluation, thus behaving as the strict counterpart.

Barring this exception, it seems to me that analyzing under a strict semantics remains a good way to get an upper bound for the non-strict case, but lacking a formal proof (and even with one) you can never be too sure. Any pointers?

Something relevant might be found by following the trail given by this paper by P. Wadler, whose abstract states the following:

In a strict functional language, analysis of time complexity is straightforward, because of the following compositional rule: The time to evaluate (ƒ(g x)) equals the time to evaluate (g x) plus the time to evaluate (ƒ y), where y is the value of (g x). However, in a lazy language, this rule only gives an upper bound, possibly a crude one.

I haven't read the paper yet nor followed its references/citations to see if there's a proof for that. The problem I can foresee is that the counterexample I came up with might not happen in the simplified languages/computational models used in most papers.

At any rate, it seems that, for all but a few contrived programs (relying on "useless" expressions being evaluated to raise an exception and terminate), simple "eager" analysis remains a good way to obtain an upper bound for the non-strict case. If it is low enough, we can stop there and enjoy.

Superlinear GC work: why Ruby scales badly, and a possible explanation of qsort's behavior?

The Comonad.Reader article attempted to reconcile the theory with the observation that the "quicksort" based on difference lists scales better than the original one. We came up with the following conjecture: what if the amount of GC work scales superlinearly? The improved function uses less memory and would thus be less affected, if GHC's GC had for instance a O(n log n) term.

Ruby's GC, for instance, does O( n sup 2 ) work (n being the amount of allocated data), because a GC run is launched after a fixed amount of memory has been allocated (in addition to other conditions). Doing it only when a given fraction (e.g. 25%) of the previous heap size has been allocated would make it O(n), but then the interpreter would use much more memory (making most programs slower or unusable in practice).

GHC's GC clearly isn't O( n sup 2 ), but I can imagine the implementors tolerating a O(n log n) factor.

I don't know the internals of GHC's GC, but I can reason about OCaml's relatively simple yet fast garbage collector (I don't remember reading that GHC had anything approaching the system used in the JVM in terms of complexity; ony thing is certain: GHC's GC is not concurrent). It uses a two generations, with a copying collector for the minor heap which migrates data to the major heap, managed with an incremental mark and sweep algorithm. Each time a minor GC is run, the system estimates how much (incremental) work is needed in the major heap and performs it. Clearly, how the system scales will depend on that estimate. For instance, if it indicated that all the major heap has got to be processed, O( n sup 2 ) work would have to be done. On the other hand, if only a constant amount of work were performed, it'd be O(n), but the amount of memory required would increase substantially.

I'll have to read about GHC's runtime and measure the relative performance of the original and improved "quicksorts" in OCaml to see if this conjecture is supported by the data.



nonlinearities in GC time - edwardk (2008-07-17 (Thr) 11:57:11)

It would appear that the larger than n log n behavior I thought I was seeing was an artifact of not running large enough experiments to cause both approaches to reach the same amount of memory pressure.

When run long enough to thoroughly thrash my machine, over an appropriately large amount of time the difference in performance appears to be the merely constant factor you'd expect from theory.

This I doesn't resolve the question of why non-linearities in the gc overhead _don't_ appear as a long term factor, because the differences there appeared to be on a constant scale as well.

I suppose a more interesting challenge to that would be to generate an algorithm that generates a non-constant polynomial amount of space worth of data as an intermediate result and one that generates a constant amount of data as an intermediate result that both have the same theoretical running time and benchmark their actual relative performance.

mfp 2008-07-17 (Thr) 16:32:19

Interesting indeed.

What about this: adding up the numbers up to N. The first variant generates a list and folds over it, the second is but a loop.

Both are O(n), but while the first uses O(n) space, the second only takes O(1), so if there's any non-linearity in the GC it would be easy to spot in the program that uses lists (timing the second program is hardly worth it).

juraCluro 2008-10-06 (Mon) 12:34:58

How i may contact admin this site? I have a question. iijiivei


Run-time analysis - apfelmus (2008-07-17 (Don) 12:32:03)

Analyzing the run-time of functional programs is based on graph reduction, not on "counting conses". The Haskell wikibook has some preliminary material. Currently, the best introduction is probably found in Bird's book "Introduction to Functional Programming using Haskell". In particular, Bird also analyzes the reverse function.

Concerning the question whether lazy evaluation always needs less reduction steps than eager evaluation, the answer is "yes". Admittedly, this folklore theorem is usually taken for granted, I don't know a good reference for a proof. I have a sketch of proof, but this comment box is too small to contain it. ;)

You claim to think to have a counterexample, but I don't see a concrete construction. In particular, the second sentence in

What if an algorithm depended on an undefined value for termination? In practice, this would mean that it relies on an expression raising an exception to stop the computation and return some result after recovery.

is not correct. A computation that terminates while "depending" on an undefined value does not, in fact, depend on that value at all. Example:

 const 1 undefined

Catching the "I don't terminate" exception means solving the halting problem, which is impossible.

mfp 2008-07-17 (Thr) 13:05:04

Analyzing the run-time of functional programs is based on graph reduction, not on "counting conses".

Again, I'm using that to analyze the eager case and get an upper bound for the non-strict one --- not to obtain the complexity directly.

My wording is probably unfortunate. Here's a concrete example:

   let ignorearg x = ()
   
   let zero n =
     let optimized_stop_criterion n = if n > 0 then raise Exit in
     let a = ref 0 in
     let rec aux = function
         n when n <= 0 -> 0
       | n ->
             ignorearg (optimized_stop_criterion n);
             a := n;
             aux (n-1)
     in try aux n with Exit -> !a

zero(n) takes O(n) time with lazy evaluation (because the optimized_stop_criterion is never actually evaluated and the exception is not raised), and is O(1) with eager evaluation.

Obviously, this construction requires an impure language.

Catching the "I don't terminate" exception means solving the halting problem, which is impossible.

I don't see how this relates to any of the above.

Thanks for the links.

mfp 2008-07-17 (Thr) 13:18:01

Catching the "I don't terminate" exception means solving the halting problem, which is impossible.

Ah, I get it now. Think of this as an "early exit" mechanism, not something to guarantee termination (so that the algorithm also terminates with a non-strict semantics, only slower).

apfelmus 2008-07-18 (Fre) 03:20:33

Analyzing the run-time of functional programs is based on graph reduction, not on "counting conses".

Again, I'm using that to analyze the eager case and get an upper bound for the non-strict one --- not to obtain the complexity directly.

What I want to say is that counting conses is wrong regardless of eager or lazy evaluation. Instead, you have to count the number of reductions, i.e. replacements of functions with their definition.

Obviously, this construction requires an impure language.

Ah, ok. Well, lazy evaluation and impurity don't go well together anyway. In your example, a lazy version would get penalzed for not spending the time to execute the side-effect, i.e. for trying to save time in the first place.

mfp 2008-07-18 (Fri) 12:03:40

What I want to say is that counting conses is wrong regardless of eager or lazy evaluation. Instead, you have to count the number of reductions, i.e. replacements of functions with their definition.

I'm doing a pretty standard analysis here, similar to the stuff one would find in TAOCP or any text on algorithms: I'm just getting an upper bound for the number of conses (space complexity) and the number of comparisons of a sort at once.

I think I see what's happened: we’ve been talking past each other :). Whereas you were making a general statement (complexity analysis involves counting the number of reductions, which one cannot disagree with :-), I was just referring to this particular example, where the number of reductions is O(conses). I didn't claim that counting conses is the general way to analyze an algorithm, just that, for this particular one, both the space and the time complexity can be bounded by solving the recurrence for the number of conses.

Ah, ok. Well, lazy evaluation and impurity don't go well together anyway. In your example, a lazy version would get penalzed for not spending the time to execute the side-effect, i.e. for trying to save time in the first place.

Yeah, it's clear now that the above construction is irrelevant. While it does indicate that in an impure setting lazy evaluation could require more reductions, this would be the least of the problems, for the behavior of the program would differ completely, to the point that it is in fact a different function under each semantics. It is only by deliberate construction that these 2 functions return the same result in this case.

Thanks for clearing the doubt about the pure case (ultimately, the only one of interest for this technique :-).

apfelmus 2008-07-20 (Son) 02:41:09

I was just referring to this particular example, where the number of reductions is O(conses). I didn't claim that counting conses is the general way to analyze an algorithm, just that, for this particular one, both the space and the time complexity can be bounded by solving the recurrence for the number of conses.

Well, in this particular case, the recurrence relations for reductions are exactly the same. Just replace the word "cons" in your analysis with "reduction", and you get a valid analysis, so, why use "cons" at all?

In fact, you are burdened with the obligation of proof that "counting conses" does the right thing. I don't think that it does for space complexity.

And you'd have to define it first. Given a term like

 map (+1) [10,9..0] ++ [0]

how exactly do I count conses? I'm looking for a definition similar to the one that Wadler gives for reductions in section 3 of his paper.

mfp 2008-07-21 (Mon) 15:47:26

Coming up with an equivalent definition for space complexity doesn't seem too hard. Let's try.

Take the language Wadler uses:

  e ::= x                        variables
      | k                        constants
      | f e1 ... en              function applications
      | if e0 then e1 else e2    conditionals

Similarly to f_T, (I'm changing the notation because f_T looks better than f^T) which indicates the number of reductions required to evaluate f x1 ... xN, define f_C and f_TC (which give the amount of memory required to evaluate f x1 ... xN) as follows:

If f is cons,

 f_C x1 x2 = 2 + 1 + 1
 f_TC x1 x2 = 2 + 1

otherwise, f x1 .. xN = e and

 f_C x1 ... xN = N + 1 + e_C
 f_TC x1 ... xN = N + e_TC

where e_C is the space required to evaluate e.

The first terms from the above expression stand for the space in the environment to gather the parameters. f_C includes + 1 for the continuation (return address in an usual implementation).

e_C and e_TC are defined as follows:

 x_C = x_TC = 0
 k_C = k_TC = 0
 (f e1 ... eN)_C  = (f_C e1 ... eN) + e1_C + ... + eN_C
 (f e1 ... eN)_TC = (f_TC e1 ... eN) + e1_C + ... + eN_C
 (if e0 then e1 else e2)_C = e0_C + (if e0 then e1_TC else e2_TC)

(TC stands for "tail call")

This seems serviceable albeit wearisome. With a couple extensions (such as pattern matching for easier reasoning on lists and some syntactic sugar), it should work for a term like the one you gave.

If applied to qsort, the recurrence relation would differ by a constant term at each level (the space needed in the stack in a typical embodiment). I omitted it without much thought from the original space analysis because it was clearly dominated by the other terms.

Of course, I haven't proved that the above corresponds to the actual space requirements any more than Wadler's paper proves that his definitions of f_T and e_T lead to meaningful time complexity bounds. It seems pretty reasonable, but a couple constant terms here and there might be off. What would you recommend for reasoning about the space complexity?


SrLWIZQVJIzolHRPxQn - 56856dfh (2008-10-30 (Thr) 13:31:42)

Real call to me and i send you big health and you read it in this blog,


NxWtBBebpPV - Dgimis (2008-11-18 (Tue) 07:43:14)

good guest page. thank you.


OLkqhYOaWyHp - John Malic (2008-11-26 (Wed) 11:49:16)

http://rwiofff.1sweethost.com/index.html Joseph Foster company by died http://kiqwmoo.action-links.net/index.html July battle 1780 soldiers. In however in Smith http://uofrnue.1freewebspace.com/index.html West Parish Eliphalet and Nathan Long- the a poor a 3 it to Capt. A disgust connected the http://uragwey.angelcities.com/index.html Capt by of grown to such an on 2d June to person procure all respecting any that suspected being to the http://oevaeid.angelcities.com/index.html rights and liberties agreeable to span.ing the.


XIDAqsOqaEKlu - John (2008-11-26 (Wed) 13:12:25)

http://oevaeid.angelcities.com/sex-video-full.html||sex video full||clear sex videos||sex part video http://uragwey.angelcities.com/hemaphrodite-porn-videos.html||hemaphrodite porn videos||farm video sex||lesbian porn videoz http://rwiofff.1sweethost.com/porn-library-flash-videos.html||porn library flash videos||porn on ipod video||porn role play videos http://rwiofff.1sweethost.com/porn-star-video-gallery.html||porn star video gallery||porn super x video||porn model sex videos http://kiqwmoo.action-links.net/ceiebrities-sex-video.html||ceiebrities sex video||broacast sex videos||clebs sex videos


LolvxPibhFVARTLEtav - nilin (2008-12-02 (Tue) 13:00:16)

what oyu mean about real health? read thiss!,


KEfMTMdQOT - freevideo319 (2008-12-04 (Thr) 11:25:11)

http://sawq.iespana.es/se/free-gay-cum-videos.html


Last modified:2008/07/17 13:38:35
Keyword(s):[blog] [frontpage] [haskell] [ocaml] [ruby] [quicksort] [algorithm] [analysis] [eager] [lazy] [strict]
References: