Quicksort erratum
A few days ago, I called the following Haskell function, reminiscent of
Quicksort and considered the epitome of beautiful code by many, unusable for
taking "always
space and time":
qsort [] = [] qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)
Moreover, I referred to this version (using a C-like DSEL in Haskell)
due to Lennart Augustsson
as a usable one (as opposed to a
algorithm):
qsortM :: forall i a m r arr .
(MonadRef m r, Num i, Ix i, MArray arr a m, Ord a Bool) =>
arr i a -> m (arr i a)
qsortM ma = runE $ do
(lb, ub) <- embed $ getBounds ma
let mlb = pure0 lb
mub = pure0 ub
a <- liftArray ma
let qsort' l r =
if1 (r > l) $ do
i <- auto l
j <- auto (r+1)
let v = a[l] :: E m a
iLTj = i < (j :: E m i)
while iLTj $ do
while ((i += 1) < mub && a[i] < v)
skip
while (a[j -= 1] > v)
skip
if1 iLTj $ do
a[i] =:= a[j]
a[l] =:= a[j]
qsort' l (j-1)
qsort' i r
qsort' mlb mub
return ma
I was wrong. Instead of analyzing it with some care, I believed the
claims I read long ago in some now forgotten blog. I just saw the list
concatenations and reacted without thinking.
Finding an upper bound for the number of comparisons in the best case is easy. In fact, while I'am at it, I'll find a bound for the number of cons operations, which is clearly greater than that of comparisons.
Noting the number of conses C(n) for a list of size n, for n > 3 (there are a
few irregularities for n = 1 to 3 because no cons is needed when appending or
prepending [] to another list),
, where the problem for a list
of length n has been divided into two halves (lists of length p and q) plus a
lone pivot element. p (resp. q) conses are needed to build the list passed to the
recursive call to qsort, C(p) (C(q)) conses are needed for the subproblem, and
finally p + 1 conses are needed to concatenate the resulting sorted sublists.
The problem is how to find p and q. For the best case analysis,
. The n list can be divided into
the desired three parts as follows:


Therefore,

(There are two choices for the number of conses in the final list concatenation, differing by 1. The above is a conservative one, which will give an upper bound.)
Even after referring to Graham & Knuth & Patashnik, I saw no immediate way to solve this recurrence (please drop a comment if you have a closed-form solution). Fortunately, I just need an upper bound so I can use

which is easily proved by induction, since all I've done is to remove the floor operations.
Then, using the master theorem


(Note the use of
for C(n) and
for
D(n).)
The above program thus uses at most
space and
time. The upper bound is satisfactory enough; in fact, it's the same as for
the real quicksort, and, furthermore, the lowest achievable bound for a
comparison sort.
So far, I've assumed that the program evaluates everything eagerly, which might not be the case in Haskell, as it has a non-strict semantics (which is not exactly the same as being lazy). Let's see if GHC is really smart and can recognize that qsort(l) is just a permutation of l to skip the sort altogether when some operation that doesn't depend on the order is performed (this is quite close to proving that the qsort function actually sorts the list, and would be an amazing exploit indeed).
To that end, I first rewrite the function in OCaml; if GHC is able to do less work thanks to its non-strict semantics, the solution will scale better than with OCaml:
let rec qsort = function [] -> [] | x::xs -> qsort (List.filter ((>) x) xs) @ [x] @ qsort (List.filter ((<=) x) xs)
It's very similar to the Haskell one; the main difference is in the signs used with List.filter: ((>) x) means "x is greater than _", as opposed to "_ is greater than x" in Haskell.
The main function just creates a list of random integers, sums them up, and finally adds the numbers from the sorted list --- obtaining the same number, I hope (the sums are printed to stdout to make sure that computation is not omitted).
Here are the times I get:
The GHC binary takes nearly 800MB to sort a 3 million list, which explains why it suffers as n increases.
Well, this isn't... what was it, GHC 11? Being lazy does pay off if you try to get the nth (and in particular the first) element, though; using
print (head $ qsort xs)
instead of sum,
$ time ./min_sort 3000000 4498748904877 1 1.252078 real 0m3.542s user 0m3.296s sys 0m0.224s
(the real and user time include the time needed to build the list and perform
the first sum). The theory says this can be
.
GHC seems to be evaluating only the first half of the list, as it should,
resulting in quite an improvement :)
open Printf open ExtList let rec qsort = function [] -> [] | x::xs -> qsort (List.filter ((>) x) xs) @ [x] @ qsort (List.filter ((<=) x) xs) let () = let n = int_of_string (Array.get Sys.argv 1) in let a = Array.to_list (Array.init n (fun _ -> Random.int n)) in let t0 = Sys.time () in printf "%d\n" (List.fold_left (+) 0 (qsort a)); printf "Needed %8.5fs.\n" (Sys.time () -. t0); Gc.print_stat stdout
module Main where
import System.CPUTime
import System.IO
import System.Environment
import System.Random
qsort [] = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)
main = do
strn:_ <- getArgs
let n = (read strn) :: Int
gen <- getStdGen
let xs' = randomRs (1::Int, n) gen
xs <- return (take n xs')
print (sum xs) -- make sure xs is evaluated and stuff
hFlush stdout
t1 <- getCPUTime
print (head $ qsort xs)
t2 <- getCPUTime
print (fromIntegral (t2 - t1) * 1e-12)
Comment - Chris (2008-07-15 (Tue) 06:54:11)
Very clear and well-written. This is the first time I've stumbled on eigenclass (thanks to reddit); and buried within the article here is:
"[O(nlogn) is] the lowest achievable bound for a comparison sort."
I don't doubt it, but do you know of an elegant proof of the above statement?
Sean 2008-07-15 (Tue) 07:14:01
I'm not sure there are proofs, but there are many experimental confirmations that O(n log n) is the lower-bound for comparison sorts. Other sorts can achieve O(n + k) or O(n) with limitations on the type of data sorted.
mfp 2008-07-15 (Tue) 07:29:20
It's an information-theoretical argument.
Here's the gist of it: there are n! permutations of a sequence of n elements, so the amount of information needed to pick a given permutation out of the n! possibilities is log n! bits (with log in base 2). Each comparison yields at most 1 bit (if both results, < and >=, are equiprobable), so we need log n! comparisons at least. Using Stirling's approximation, log n! =~ 1/ln(2) * (n ln n - n + 1), showing that a comparison sort is OMEGA(n log n) (bounded below asymptotically by Cn*log n for some C).
RegalCynicalSunrise 2008-07-23 (Wed) 15:46:37
A few comments on the last quicksort-related article made the same mistake of assuming the naive sort is O(n^2). It seems the real issue with the naive Haskell quicksort is space explosion because it's not an in-place sort. Also, if it were more strict (as the OCaml version is) it'd be faster, as this duder shows.
No Title - Anonymous (2008-07-15 (Tue) 07:19:29)
@Chris:
see http://en.wikipedia.org/wiki/Comparison_sort#Number_of_comparisons_required_to_sort_a_list
mfp 2008-07-15 (Tue) 07:33:33
argh, should have looked for this before writing :)
Thanks - Chris (2008-07-15 (Tue) 07:24:16)
Right, I could have just Wikied it x_x;
Thanks for the link, which demonstrates yet another great use of Stirling's Approximation :-D
But that's still only the average case... - Reid (2008-07-15 (Tue) 09:20:11)
I think you realize this, but I don't think you make it clear that you are only proving this for the best case. Naive quicksort is still worst case n^2, because p!=q!=n/2. But that's okay, because that quicksort is totally slick.
mfp 2008-07-15 (Tue) 13:09:00
It's average-case analysis indeed, which is why mergesort is preferred for lists.
Perl switched to mergesort some time ago (to avoid quicksort's O(N^2) worst case), but now there's a pragma to choose the algorithm (it also shuffles the array before quicksort to avoid the O(N^2)).
quicksort erratum erratum - Edward Kmett (2008-07-16 (Wed) 15:17:22)
The list concatenations really do impose an undue cost in the Haskell version. You can see the difference in the average case analysis by the pronounced upturn in the runtime of the Haskell quicksort in your performance chart. Counting conses is not sufficient, because you have to expand a non-constant number of function calls to reach each cons.
A succinct version of Haskell quicksort with actual quicksort average case performance and an explanation of why the naive quicksort fails to have it can be found here:
http://comonad.com/reader/2008/a-sort-of-difference/
mfp 2008-07-16 (Wed) 17:08:23
quicksort erratum erratum erratum? ;) O(n^2) just doesn't fit the observed behavior. (Edward is now trying to prove O(n log^2 n).)
.GT. and .LT. - BMeph (2008-07-16 (Wed) 19:16:24)
Just as an aside, ((>) x) means the same thing in Haskell and OCaml - "x is greater than _". Haskell just lets you write it as a section.
Qsort ocaml - Vijay Chakravarthy (2008-07-17 (Thr) 09:28:34)
Just a quick question:
Would'nt let rec qsort = function
[x] -> [x] | x::xs -> qsort (List.filter ((>) x) xs) @ [x] @ qsort (List.filter ((<=) x) xs)
be faster, since it avoids creation of a bunch of empty lists, and calls to cons?
mfp 2008-07-18 (Fri) 12:29:14
Don't forget you need the [] case to make sure the function works with empty lists :) (the compiler will complain otherwise, since the pattern wouldn't be exhaustive). Also, [x] -> [x] does cons (note that [] doesn't, it's a constant constructor), but this would cons less indeed (whether it's faster or not depends on whether the extra work in the pattern matching is compensated by the reduced allocation rate):
let rec qsort = function
[] -> []
| [x] as x -> x
| x::xs -> qsort (List.filter ((>) x) xs) @ (x :: qsort (List.filter ((<=) x) xs)
where I've also replaced the intermediate [x] @ ... with x ::, saving one cons per level (not that it matters at all really :).
The obvious way to speed this up further would be to use List.partition instead of two List.filter applications, hence scanning the list only once.
Keyword(s):[blog] [frontpage] [quicksort] [haskell] [ocaml]
References:[Reexamining qsort, eager vs. lazy algorithm analysis and Ruby's (and other's) GC]